Conversion of a LlamaForCausalLM does not support `use_cache` and `_name_or_path` #112

JEM-Mosig · 2025-04-15T09:51:45Z

When I run

hf_model = transformers.LlamaForCausalLM.from_pretrained("Unbabel/TowerInstruct-7B-v0.2")
pz_model = penzai.models.transformer.variants.llama.llama_from_huggingface_model(hf_model)

the second line fails with

ValueError: Conversion of a LlamaForCausalLM does not support these configuration attributes: {'use_cache': False, '_name_or_path': 'Unbabel/TowerInstruct-7B-v0.2'}

I use transformers.__version__ == '4.51.2' and penzai.__version__ == '0.2.5'. While use_cache is documented here, the _name_or_path does not seem to be documented.

The text was updated successfully, but these errors were encountered:

danieldjohnson · 2025-04-20T22:20:10Z

There's a check for unrecognized configuration arguments in llama_from_huggingface_model because it is otherwise pretty difficult to make sure that the converted model has the same behavior as the original one. But this might be a false positive:

use_cache seems like it controls whether the HF model returns cached keys and values, but this isn't relevant for the Penzai model since Penzai handles the KV cache differently.
_name_or_path seems like it's probably metadata that doesn't get used when the model runs.

If you're feeling adventurous you could try adding these attributes to

penzai/penzai/models/transformer/variants/llama.py

Lines 77 to 84 in bf38ed0

    
           # Ignored by conversion: 
        
           "max_position_embeddings", 
        
           "torch_dtype", 
        
           "architectures", 
        
           "bos_token_id", 
        
           "eos_token_id", 
        
           "_attn_implementation_autoset", 
        
           "head_dim",

and see if the resulting Penzai model produces the same outputs as the original hf_model. If so I'd be happy to include that change in the next Penzai release.

JEM-Mosig · 2025-04-25T14:59:19Z

It seems to work when I ignore both, but I don't know yet what happens when use_cache = True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion of a LlamaForCausalLM does not support `use_cache` and `_name_or_path` #112

Conversion of a LlamaForCausalLM does not support `use_cache` and `_name_or_path` #112

JEM-Mosig commented Apr 15, 2025

danieldjohnson commented Apr 20, 2025

JEM-Mosig commented Apr 25, 2025

Conversion of a LlamaForCausalLM does not support use_cache and _name_or_path #112

Conversion of a LlamaForCausalLM does not support use_cache and _name_or_path #112

Comments

JEM-Mosig commented Apr 15, 2025

danieldjohnson commented Apr 20, 2025

JEM-Mosig commented Apr 25, 2025

Conversion of a LlamaForCausalLM does not support `use_cache` and `_name_or_path` #112

Conversion of a LlamaForCausalLM does not support `use_cache` and `_name_or_path` #112