The issue of reward_model output length

I am training a reward model using the Llama model, but the output dimensions of the reward model's j and k are different, which results in an inability to calculate loss and leads to a "mismatch size" error.
```
 File "examples/stack_llama/scripts/reward_modeling_sutpc.py", line 300, in <module>
    trainer.train(script_args.resume_from_checkpoint)
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "examples/stack_llama/scripts/reward_modeling_sutpc.py", line 284, in compute_loss
    loss = -nn.functional.logsigmoid(rewards_j - rewards_k).mean()
RuntimeError: The size of tensor a (445) must match the size of tensor b (281) at non-singleton dimension 1
```

The reward_modeling.py was used with slight modifications, only replacing AutoTokenizer with LlamaTokenizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The issue of reward_model output length #376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The issue of reward_model output length #376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions