-
Notifications
You must be signed in to change notification settings - Fork 1.8k
merged RL model performed baddly #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I encountered the same problem. Although merging rl model and hf model the response seems normal. The response is irrelevant and there is a significant degradation compared to the web demo. |
After switching from "LlamaTokenizer" to "LlamaTokenizerFast", I think the problem is solved. |
Hi, @BIGPPWONG Thanks for your kindly help. But the way didn't work for me, I don't know if something of importance is missed by me. Previously, I took advantage of Question: Cuda asserted error.
Answer:hos Pen turno accuracy Section mystery kitchen₇ Castro Castro accuracy accuracy accuracy accuracy accuracy
accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy
accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy
accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy
accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy
accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy
accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy accuracy
... In order to dig into your advice. I do some test: import torch
from peft import PeftModel, PeftConfig
from transformers import AutoTokenizer, LlamaForCausalLM, LlamaTokenizerFast,LlamaTokenizer
# load tokenizers
AutoT = AutoTokenizer.from_pretrained(hf_pth)
LlamaT = LlamaTokenizer.from_pretrained(hf_pth)
LlamaFastT = LlamaTokenizerFast.from_pretrained(hf_pth)
prompt = "hello, world!" AutoT(prompt, return_tensors="pt")
{'input_ids': tensor([[ 1, 22172, 29892, 3186, 29991]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0, 0, 0]],
device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1]], device='cuda:0')} LlamaT(prompt, return_tensors="pt")
{'input_ids': tensor([[ 1, 22172, 29892, 3186, 29991]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1]],
device='cuda:0')} LlamaFastT(prompt, return_tensors="pt")
{'input_ids': tensor([[ 1, 22172, 29892, 3186, 29991]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0, 0, 0]],
device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1]], device='cuda:0')} From the test above, we can make a roughly conclusion: the In addition, I rechecked the difference between @BIGPPWONG So, Could you please tell me if I have something misunderstood? Thanks! And PS: I found the tokenizer updated a few days ago 22402, is the update relevant to my issue? |
I did the following to make stackllama perform normally.
FYI, I merged hf model with se model and rl model step by step. |
Hi @BIGPPWONG @vpegasus |
Hi @younesbelkada @BIGPPWONG Many thanks for both of you! Your kindly and detailed replies encouraged me to find out where I misdone. First of all, I copied the procedures @BIGPPWONG shared (mainly reinstall transformers from 4.29-->4.28.1). But it didn't work for me, either. As there is no more room for me to make trouble, I finally found that the culprit was my randomly using func As shown in my first description of this issue, I use script like: config = PeftConfig.from_pretrained(peft_model_id)
model = LlamaForCausalLM.from_pretrained(hf_model_pth)
model = PeftModel.from_pretrained(model, peft_model_id)
model = model.merge_and_unload() # this line troubled me.
model = model.base_model this script may make trouble. Now, I reuse merge_peft_adapter.py obediently. merge Thus, the outputs are never error codes. Below is a example of output Question: PyTorch: How to get the shape of a Tensor as a list of int
Answer: You can use `torch.tensor(shape).to_numpy()` or `torch.from_numpy(np.array([1,2])).shape` Thanks again @BIGPPWONG @younesbelkada |
Awesome! This is great to hear! |
Ok, thanks again! |
Hi @younesbelkada , i used below code for inference trained model.
The response seem to be mostly short, whereas in stack-llama rl_training.py i could see ppo_trainer.generate in combination of length sampler getting used for text generation. Could you please suggest how to use these additional params, respond_to_batch seem to be limited. |
Hi @younesbelkada I download your trained model for some testing.
I want to get the final rl model for inferencing via the following steps:
Step 1: Download Meta's llama model and convert to Transformers format, name it as
hf model
by script;Step 2: Download
se model
andrl model
from:https://huggingface.co/trl-lib/llama-7b-se-peft and https://huggingface.co/trl-lib/llama-7b-se-rl-peft , respectively.Step3: merge
hf model
andse model
via the following script:Step4: merge
merged_se model
and 'rl model
to get the final rl model via the following script:However, the response of
merged rl model
via the following script:is very bad:
But
When only merging
rl model
andhf model
, the response seems normal:Please help me which step I made wrongly. Thanks.
The text was updated successfully, but these errors were encountered: