llama: Attempt to add ModernBert #14014

huydt84 · 2025-06-04T15:24:53Z

I don't know whether my implementation is correct or not

huydt84 · 2025-06-04T15:40:09Z

hparams.set_swa_pattern can't work properly with ModernBert

huydt84 · 2025-06-04T16:15:15Z

The embedding result seems random and very low. There is something wrong with this

CISC

Delete the files you added in models, we don't need them, just make sure test-tokenizer-0 succeeds with the GGUF.

convert_hf_to_gguf.py

convert_hf_to_gguf_update.py

src/llama-hparams.h

convert_hf_to_gguf.py

ggerganov · 2025-06-05T05:34:32Z

src/llama-model.cpp

+        inpL = build_norm(inpL, model.tok_norm, nullptr, LLM_NORM, -1);
+        cb(inpL, "inp_norm", -1);
+
+        auto * inp_attn = build_attn_inp_kv_unified_iswa();


This should probably become:

Suggested change

auto * inp_attn = build_attn_inp_kv_unified_iswa();

auto * inp_attn = build_attn_inp_no_cache_iswa();

And add the corresponding mask logic in llama-graph. Special attention should be taken about how the SWA works for this model - i.e. is it symmetric or not:

# non-symmetric token i attends to [i - n_swa, i] # symmetric: token i attends to [i - n_swa/2, i + n_swa/2]

ggerganov

You have to add the new arch here:

llama.cpp/src/llama-model.cpp

Lines 13195 to 13203 in 5a8ae30

    
           switch (arch) { 
        
               case LLM_ARCH_BERT: 
        
               case LLM_ARCH_JINA_BERT_V2: 
        
               case LLM_ARCH_NOMIC_BERT: 
        
               case LLM_ARCH_NOMIC_BERT_MOE: 
        
               case LLM_ARCH_WAVTOKENIZER_DEC: 
        
                   { 
        
                       res = nullptr; 
        
                   } break;

To avoid creating a memory module (a.k.a. KV cache) for these models.

convert_hf_to_gguf.py

CISC · 2025-06-05T20:48:46Z

So, since vocab is BPE you need to add modern-bert vocab handling a few places:

llama.cpp/src/llama-vocab.cpp

Line 1557 in 9f47fa5

tokenizer_pre == "roberta-bpe") {

Set correct attribute on [MASK] token, similarly to this:

llama.cpp/src/llama-vocab.cpp

Lines 2097 to 2105 in 9f47fa5

    
           if (false 
        
                   || _contains_any(tokenizer_pre, {"jina-v2-de", "jina-v2-es", "jina-v2-code"}) 
        
                   || _contains_any(general_arch, {"nomic-bert-moe"}) 
        
              ) { 
        
               if (token_to_id.count("<mask>") == 0) { 
        
                   LLAMA_LOG_WARN("%s: Mask token is missing in vocab, please reconvert model!\n", __func__); 
        
               } else { 
        
                   _set_token_attr("<mask>", LLAMA_TOKEN_ATTR_LSTRIP, true); 
        
               }

src/llama-model.cpp

CISC · 2025-06-05T21:51:55Z

The embedding result seems random and very low. There is something wrong with this

Yep, I also noticed the same with jina-reranker-v2, most likely the same issue, will investigate.

src/llama-model.cpp

convert_hf_to_gguf.py

CISC · 2025-06-06T08:09:40Z

So, since vocab is BPE you need to add modern-bert vocab handling a few places:

@huydt84 Don't forget this-^ it's important.

CISC · 2025-06-06T08:11:20Z

The embedding result seems random and very low. There is something wrong with this

Yep, I also noticed the same with jina-reranker-v2, most likely the same issue, will investigate.

Will dig into this tonight/this weekend...

huydt84 · 2025-06-06T08:16:18Z

So, since vocab is BPE you need to add modern-bert vocab handling a few places:

@huydt84 Don't forget this-^ it's important.

Thank you! I have just added it

CISC · 2025-06-06T08:18:27Z

@huydt84 Don't forget this-^ it's important.

Thank you! I have just added it

The tokenizer_pre check is the most important one, please add that too. :)

src/llama-vocab.cpp

ggerganov

Need to add new enum llama_swa_type:

LLAMA_SWA_TYPE_SYMMETRIC  = 3,

ggerganov · 2025-06-06T14:54:44Z

src/llama-model.cpp

+        inpL = build_norm(inpL, model.tok_norm, nullptr, LLM_NORM, -1);
+        cb(inpL, "inp_norm", -1);
+
+        auto * inp_attn = build_attn_inp_no_cache_iswa();


Since this is not an actual iSWA (interleaved SWA) model, we should use simply build_attn_inp_no_cache().

ggerganov · 2025-06-06T14:57:01Z

src/llama-graph.h

@@ -241,6 +249,7 @@ class llm_graph_input_attn_no_cache : public llm_graph_input_i {

    const llama_hparams & hparams;
    const llama_cparams & cparams;
+    const int n_swa;  // Sliding window attention size (0 = disabled)


This is already available from the hparams - no need to duplicate it here.

ggerganov · 2025-06-06T14:57:36Z

src/llama-graph.cpp

+        // Check if we're using sliding window attention
+        if (n_swa > 0) {
+            const int64_t n_tokens     = ubatch->n_tokens;
+            const int64_t n_seq_tokens = ubatch->n_seq_tokens;


This branch is actually non-causal attention + sliding window. So merge it with the existing implementation below.

CISC · 2025-06-07T21:26:00Z

The embedding result seems random and very low. There is something wrong with this

Yep, I also noticed the same with jina-reranker-v2, most likely the same issue, will investigate.

Will dig into this tonight/this weekend...

Ok, the issue with jina-reranker-v2 was just that you have to apply sigmoid and normalize, guess that sigmoid option could be useful, @ggerganov?

That doesn't explain the issue with modernbert unfortunately (though I did try it for fun with Alibaba-NLP/gte-reranker-modernbert-base .. it seems to give reverse scores).

huydt84 · 2025-06-08T02:53:58Z

@CISC cc: @ggerganov

I tried to do the embedding with various models, but the output results are barely changed among those attempts. Maybe the params load or inference graph is getting problems somewhere. Can you check that part?
This is the model implementation in Huggingface: https://github.com/huggingface/transformers/blob/v4.52.3/src/transformers/models/modernbert/modeling_modernbert.py

CISC · 2025-06-08T14:27:32Z

So, I just noticed at least part of the problem:

llama.cpp/src/llama-graph.cpp

Lines 1567 to 1571 in 3ac6753

    
           if (cls != nullptr && cls_b != nullptr) { 
        
               // classification head 
        
               // https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/roberta/modeling_roberta.py#L1566 
        
               cur = ggml_add(ctx0, ggml_mul_mat(ctx0, cls, inp), cls_b); 
        
               cur = ggml_tanh(ctx0, cur);

We have cls, but not cls_b, so this has to be modified to handle that...

CISC · 2025-06-08T18:57:30Z

src/llama-model.cpp

+            // feed-forward network
+            ggml_tensor * ffn_up = build_lora_mm(model.layers[il].ffn_up, cur);
+            cb(ffn_up, "ffn_up", il);
+
+            int64_t split_point = ffn_up->ne[0] / 2;
+            ggml_tensor * output_ffn_up = ggml_cont(ctx0, ggml_view_2d(
+                                            ctx0, ffn_up, split_point,
+                                            ffn_up->ne[1], ffn_up->nb[1], 0
+                                        ));
+            ggml_tensor * output_ffn_gate = ggml_cont(ctx0, ggml_view_2d(
+                                            ctx0, ffn_up, split_point,
+                                            ffn_up->ne[1], ffn_up->nb[1],
+                                            split_point * ggml_element_size(ffn_up)
+                                        ));
+
+            // Apply activation function
+            output_ffn_up = ggml_gelu(ctx0, output_ffn_up);
+
+            // Element-wise multiplication
+            ggml_tensor * gated = ggml_mul(ctx0, output_ffn_up, output_ffn_gate);
+            cb(gated, "ffn_gated", il);
+
+            // Final projection
+            cur = build_lora_mm(model.layers[il].ffn_down, gated);


This should be merged into build_ffn as LLM_FFN_GEGLU.

Probably worth making a separate PR for visibility.

huydt-bti added 2 commits June 5, 2025 00:21

llama: attempt to add modern-bert

045b1ac

Merge branch 'master' into huydt/mb

95f49d9

github-actions bot added the python python script changes label Jun 4, 2025

huydt84 marked this pull request as draft June 4, 2025 15:27

re-format and delete unused implementations

eab776e

huydt84 marked this pull request as ready for review June 4, 2025 15:36

huydt84 marked this pull request as draft June 4, 2025 15:40

overload set_swa_pattern for modern bert

7143840

modern-bert doesn't have bias

6aa1335

huydt84 marked this pull request as ready for review June 4, 2025 16:21

CISC requested changes Jun 4, 2025

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

convert_hf_to_gguf_update.py Outdated Show resolved Hide resolved

CISC requested changes Jun 4, 2025

View reviewed changes

src/llama-hparams.h Outdated Show resolved Hide resolved

CISC requested changes Jun 4, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

delete unnecessary files

9e1179a

huydt84 requested a review from CISC June 4, 2025 22:55

ggerganov reviewed Jun 5, 2025

View reviewed changes

huydt-bti added 4 commits June 5, 2025 22:32

add build_attn_inp_no_cache_iswa with symmetric swa

fa23480

add modern-bert to llama_model::create_memory

a72cb3b

fix lint

adea1c9

access n_swa via hparams

cfebb6e

huydt84 requested a review from ggerganov June 5, 2025 13:55

CISC requested changes Jun 5, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

CISC requested changes Jun 5, 2025

View reviewed changes

src/llama-model.cpp Show resolved Hide resolved

huydt-bti added 2 commits June 6, 2025 11:37

revert changes in convert script

31e87e4

add set_vocab to modernbert convert class

1004327

CISC reviewed Jun 6, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

CISC reviewed Jun 6, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

huydt-bti added 2 commits June 6, 2025 16:48

Merge branch 'master' into huydt/mb

81f4797

parmas-related fix

03693fa

handle mask token in modern-bert bpe

2f5a72f

add modern-bert to pre_type check

68f399e

CISC reviewed Jun 6, 2025

View reviewed changes

src/llama-vocab.cpp Outdated Show resolved Hide resolved

change log warning when no mask token of modern-bert

ad2a19a

ggerganov reviewed Jun 6, 2025

View reviewed changes

huydt-bti added 2 commits June 8, 2025 11:49

fix modern-bert swa logic

c6b84e2

fix modern-bert class register

6751e69

Merge branch 'master' into huydt/mb

8b794f9

CISC reviewed Jun 8, 2025

View reviewed changes

	auto * inp_attn = build_attn_inp_kv_unified_iswa();
	auto * inp_attn = build_attn_inp_no_cache_iswa();

	switch (arch) {
	case LLM_ARCH_BERT:
	case LLM_ARCH_JINA_BERT_V2:
	case LLM_ARCH_NOMIC_BERT:
	case LLM_ARCH_NOMIC_BERT_MOE:
	case LLM_ARCH_WAVTOKENIZER_DEC:
	{
	res = nullptr;
	} break;

llama: Attempt to add ModernBert #14014

Are you sure you want to change the base?

llama: Attempt to add ModernBert #14014

Conversation

huydt84 commented Jun 4, 2025

Uh oh!

huydt84 commented Jun 4, 2025

Uh oh!

huydt84 commented Jun 4, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CISC commented Jun 5, 2025

Uh oh!

Uh oh!

CISC commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Jun 6, 2025

Uh oh!

CISC commented Jun 6, 2025

Uh oh!

huydt84 commented Jun 6, 2025

Uh oh!

CISC commented Jun 6, 2025

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

CISC commented Jun 7, 2025

Uh oh!

huydt84 commented Jun 8, 2025

Uh oh!

CISC commented Jun 8, 2025

Uh oh!

CISC Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov Jun 5, 2025 •

edited

Loading