ggml : get rid of BLAS and all it's variants

This is a big one

The only reason we use BLAS is that we don't have efficient implementation of `matrix x matrix` multiplication. Naively doing parallel dot products is not optimal. We need to implement some of the fundamental GEMM optimizations such as block tiling and we need to implement this in a compact way that reuses the existing dot product code and supports all quantization types

More comments on this:

- https://github.com/ggerganov/llama.cpp/issues/1867#issuecomment-1595702365
- https://github.com/ggerganov/llama.cpp/pull/1935#issuecomment-1597140738
- https://github.com/ggerganov/ggml/issues/293#issuecomment-1607387005

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : get rid of BLAS and all it's variants #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ggml : get rid of BLAS and all it's variants #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions