Logging in Multi - GPU and new on_validation_step function #20362
Unanswered
42elenz
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone!
I am overall a bit unsure about my Lightning functions since I am a beginner. I would love to get some feedback about my functions and if I did it right. I was initially working with an old version and then the whole on_validation_step, on_training_step was breaking my logging code.
Also I am wondering why I have to specify:
"ddp_find_unused_parameters_true"
I don't know why there should be unused parameter.
So this is my Trainer:
trainer = Trainer( max_epochs=args.training.epochs, logger=wandb_logger_pt, #log_every_n_steps = args.logging.log_every_n_steps, callbacks=[checkpoint_callback], accelerator="gpu", precision=16, strategy="ddp_find_unused_parameters_true" )
Next this is my Model.
I actually first wanted to use the validation_step_end to append to my self.validation:step_output but the steps were never called. So I had to do it in on_train_epoch and on_valdidation_epoch_end. Is this correct or will this blow up at some point? Probably I just didnt have enough steps so it would be called.
My big problem here is that I want to save the found IDs to a df and save the df. How do I do this with multiple threads? Can I merge threads? How can I just call the root thread for example?
Is it overall correct how I did my logging? I have some parameter that are specific to my task thats the reason I call them on end of the epochs.
For my Downstream-Task I am using a mix of selfmade metrics (balanced Accuracy) and pl metrices.
Does this look ok?:
I hope that you can provide me feedback :)
So the summarization is:
Beta Was this translation helpful? Give feedback.
All reactions