Gradient checkpointing with DDP in a loop #10479
Replies: 6 comments 4 replies
-
Dear @shivammehta007 , I also got this error, does it have been solved? |
Beta Was this translation helpful? Give feedback.
-
This does not appear to be a lightning issue, but rather with DistributedDataParallel from torch.distributed not supporting gradient checkpointing |
Beta Was this translation helpful? Give feedback.
-
Currently, I solved this problem. Cause of the model has parameters that were not used in producing a loss. Do the following two settings, and you will find the unused parameter names.
|
Beta Was this translation helpful? Give feedback.
-
Hi @kuixu, we can find the parameter names but how to go about the solution? Do we need to remove them or what? How will the issue be solved? |
Beta Was this translation helpful? Give feedback.
-
hi, still having this issue, working with fabric, and getting error Expected to mark a variable ready only once, what is the work around this? |
Beta Was this translation helpful? Give feedback.
-
Hey any updated on this, really keen to get DDP and gradient checkpointing to work. Otherwise do we have to use FDSP? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Since my method is an Autoregressive algorithm It is making a huge gradient tape, I am trying to do something like this
It works fine on single GPU but on DDP it throws this error
I am running it with
Any workaround for this?
Beta Was this translation helpful? Give feedback.
All reactions