Skip to content

Conversion of a LlamaForCausalLM does not support use_cache and _name_or_path #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JEM-Mosig opened this issue Apr 15, 2025 · 2 comments

Comments

@JEM-Mosig
Copy link
Contributor

When I run

hf_model = transformers.LlamaForCausalLM.from_pretrained("Unbabel/TowerInstruct-7B-v0.2")
pz_model = penzai.models.transformer.variants.llama.llama_from_huggingface_model(hf_model)

the second line fails with

ValueError: Conversion of a LlamaForCausalLM does not support these configuration attributes: {'use_cache': False, '_name_or_path': 'Unbabel/TowerInstruct-7B-v0.2'}

I use transformers.__version__ == '4.51.2' and penzai.__version__ == '0.2.5'. While use_cache is documented here, the _name_or_path does not seem to be documented.

@danieldjohnson
Copy link
Collaborator

There's a check for unrecognized configuration arguments in llama_from_huggingface_model because it is otherwise pretty difficult to make sure that the converted model has the same behavior as the original one. But this might be a false positive:

  • use_cache seems like it controls whether the HF model returns cached keys and values, but this isn't relevant for the Penzai model since Penzai handles the KV cache differently.
  • _name_or_path seems like it's probably metadata that doesn't get used when the model runs.

If you're feeling adventurous you could try adding these attributes to

# Ignored by conversion:
"max_position_embeddings",
"torch_dtype",
"architectures",
"bos_token_id",
"eos_token_id",
"_attn_implementation_autoset",
"head_dim",
and see if the resulting Penzai model produces the same outputs as the original hf_model. If so I'd be happy to include that change in the next Penzai release.

@JEM-Mosig
Copy link
Contributor Author

It seems to work when I ignore both, but I don't know yet what happens when use_cache = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants