Issue with embedding generation #14454
-
I wrote a very simple example using a model that I am sure is ready to be used to generate embeddings. From the source code, I see that it is generated when memory is false: int llama_context::decode(const llama_batch & batch_inp) {
GGML_ASSERT((!batch_inp.token && batch_inp.embd) || (batch_inp.token && !batch_inp.embd)); // NOLINT
if (!memory) {
LLAMA_LOG_DEBUG("%s: cannot decode batches with this context (calling encode() instead)\n", __func__);
return encode(batch_inp);
}
How can I ensure that memory is initialized correctly? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Most embedding models don't have a memory (a.k.a. a KV cache). This is not an error - just a warning telling you that you can simply use |
Beta Was this translation helpful? Give feedback.
-
@ggerganov can I use |
Beta Was this translation helpful? Give feedback.
Most embedding models don't have a memory (a.k.a. a KV cache). This is not an error - just a warning telling you that you can simply use
llama_encode()
instead ofllama_decode()
.