Skip to content

Single Run DDP Example Script #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 86 commits into
base: main
Choose a base branch
from
Open

Single Run DDP Example Script #9

wants to merge 86 commits into from

Conversation

LeoRoccoBreedt
Copy link
Contributor

@LeoRoccoBreedt LeoRoccoBreedt commented Apr 10, 2025

Description

  • Added a script to demonstrate how to create a Neptune Scale run object from rank zero when using DDP for a single run
  • Tests for DDP examples are commented out in legacy Neptune examples. Since we do not have a GPU setup in our current tests, might be good idea for this too in the meantime.

Related to: <ClickUp/JIRA task name>

Any expected test failures?

  • RuntimeError: use_libuv was requested but PyTorch was build without libuv support
  • Apparently no support for torch with Python 3.13 for some OS
  • Distributed training on Windows and MacOS may be problematic

Add a [X] to relevant checklist items

❔ This change

  • adds a new feature
  • fixes breaking code
  • is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

  • Refactored code (sourcery)
  • Tested code locally
  • Precommit installed and run before pushing changes
  • Added code to GitHub tests (notebooks, scripts)
  • Updated GitHub README
  • Updated the projects overview page on Notion

🧪 Test Configuration

  • OS: Windows
  • Python version: 3.12
  • Neptune version: 0.11.3
  • Affected libraries with version: neptune-scale torch

…calculate gradient norms for batch (step) rather than epoch
@LeoRoccoBreedt LeoRoccoBreedt added the enhancement New feature or request label Apr 10, 2025
@LeoRoccoBreedt LeoRoccoBreedt marked this pull request as ready for review April 10, 2025 13:57
@LeoRoccoBreedt LeoRoccoBreedt requested a review from a team as a code owner April 10, 2025 13:57
@SiddhantSadangi SiddhantSadangi requested a review from Copilot April 14, 2025 13:24
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 6 changed files in this pull request and generated 1 comment.

Files not reviewed (2)
  • how-to-guides/ddp-training/scripts/requirements.txt: Language not supported
  • how-to-guides/ddp-training/scripts/run_examples.sh: Language not supported
Comments suppressed due to low confidence (2)

how-to-guides/ddp-training/scripts/train_ddp_single_run.py:18

  • The function name 'create_dataloader_minst' appears to have a typo. Consider renaming it to 'create_dataloader_mnist' for consistency with the MNIST dataset.
def create_dataloader_minst(

how-to-guides/ddp-training/scripts/train_ddp_single_run.py:244

  • The tag 'Torch-MINST' seems to be a typo. Consider updating it to 'Torch-MNIST' to correctly reference the MNIST dataset.
run.add_tags(tags=["Torch-MINST", "ddp", "single-node", params["optimizer"]])

Comment on lines +42 to +43
| DDP training scripts | [![docs]]() | | [![github]](how-to-guides/ddp-training/scripts/) | |

Copy link
Preview

Copilot AI Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs link for 'DDP training scripts' is empty. Please provide a valid URL or remove the placeholder link to prevent confusion.

Suggested change
| DDP training scripts | [![docs]]() | | [![github]](how-to-guides/ddp-training/scripts/) | |
| DDP training scripts | | | [![github]](how-to-guides/ddp-training/scripts/) | |

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeoRoccoBreedt - can you make the suggested changes, including the ones hidden? They are valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants