TEI CPU inference fails with Intel MKL errors on AMD processors when running Qwen3 embedding models

**Environment:**
- AMD CPU (t3a.2xlarge on AWS)
- TEI image: cpu-sha-bedb2e5
- Model: Qwen/Qwen3-Embedding-0.6B

**Error:**
```
Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM
Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM
```

**What we've tried:**
- Set `LD_PRELOAD=/usr/local/libfakeintel.so` (fake Intel library is present)
- Set `MKL_DEBUG_CPU_TYPE=5` (force AVX2)
- Reduced batch sizes and concurrency
- ONNX version has different compatibility issues

**Root Cause:**
The SGEMM errors indicate incorrect matrix dimensions/parameters being passed to Intel MKL BLAS routines. This appears to be a bug in the Candle backend's Qwen3 CPU implementation when compiled with Intel MKL, not just an AMD CPU detection issue.

**Potential Solutions:**
1. Use a TEI build compiled with OpenBLAS instead of Intel MKL
2. Fix the matrix dimension bug in the Candle backend code
3. Use a working custom image without MKL dependencies



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TEI CPU inference fails with Intel MKL errors on AMD processors when running Qwen3 embedding models #636

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TEI CPU inference fails with Intel MKL errors on AMD processors when running Qwen3 embedding models #636

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions