Skip to content

TEI CPU inference fails with Intel MKL errors on AMD processors when running Qwen3 embedding models #636

Open
@randomm

Description

@randomm

Environment:

  • AMD CPU (t3a.2xlarge on AWS)
  • TEI image: cpu-sha-bedb2e5
  • Model: Qwen/Qwen3-Embedding-0.6B

Error:

Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM
Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM

What we've tried:

  • Set LD_PRELOAD=/usr/local/libfakeintel.so (fake Intel library is present)
  • Set MKL_DEBUG_CPU_TYPE=5 (force AVX2)
  • Reduced batch sizes and concurrency
  • ONNX version has different compatibility issues

Root Cause:
The SGEMM errors indicate incorrect matrix dimensions/parameters being passed to Intel MKL BLAS routines. This appears to be a bug in the Candle backend's Qwen3 CPU implementation when compiled with Intel MKL, not just an AMD CPU detection issue.

Potential Solutions:

  1. Use a TEI build compiled with OpenBLAS instead of Intel MKL
  2. Fix the matrix dimension bug in the Candle backend code
  3. Use a working custom image without MKL dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions