Skip to content

Create gradient_norm_tracking_in_pytorch.ipynb #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kz29
Copy link

@kz29 kz29 commented May 21, 2025

This is the code for the new article: How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

Description

Include a summary of the changes and the related issue.

Related to: <ClickUp/JIRA task name>

Any expected test failures?


Add a [X] to relevant checklist items

❔ This change

  • adds a new feature
  • fixes breaking code
  • is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

  • Refactored code (sourcery)
  • Tested code locally
  • Precommit installed and run before pushing changes
  • Added code to GitHub tests (notebooks, scripts)
  • Updated GitHub README
  • Updated the projects overview page on Notion

🧪 Test Configuration

  • OS:
  • Python version:
  • Neptune version:
  • Affected libraries with version:

Summary by Sourcery

Add a new Jupyter notebook demonstrating how to monitor and log gradient norms in a PyTorch training pipeline for foundation models, using a BERT MRPC example and Neptune Scale integration.

New Features:

  • Introduce a PyTorch notebook for tracking per-parameter gradient norms and loss metrics during BERT MRPC fine-tuning with Neptune Scale
  • Include dataset loading, tokenization, data loader preparation, and model setup for end-to-end training and logging

@kz29 kz29 requested review from a team as code owners May 21, 2025 06:52
Copy link
Contributor

sourcery-ai bot commented May 21, 2025

Reviewer's Guide

This PR adds a new Jupyter notebook that demonstrates how to train a BERT sequence-classification model in PyTorch while tracking per-parameter gradient norms and training loss in Neptune Scale.

File-Level Changes

Change Details Files
Data loading and tokenization pipeline implemented
  • Imported and loaded the GLUE MRPC dataset
  • Defined a tokenize_function and mapped it over the dataset
  • Set dataset format to PyTorch tensors
  • Created shuffled, batched DataLoaders for train and validation
gradient_norm_tracking_in_pytorch.ipynb
Model and optimizer setup
  • Imported and instantiated BertForSequenceClassification
  • Moved model to CUDA
  • Configured AdamW optimizer with a 5e-5 learning rate
gradient_norm_tracking_in_pytorch.ipynb
Neptune Scale experiment integration
  • Imported Run from neptune_scale
  • Initialized Run with project, experiment name, and unique run_id
  • Logged hyperparameters via run.log_configs
  • Added tags and closed the run at the end
gradient_norm_tracking_in_pytorch.ipynb
Training loop with gradient-norm logging
  • Defined log_gradient_norms(model, step) to compute and log per-parameter grad norms
  • Implemented epoch and batch loop with zero_grad, loss.backward, optimizer.step
  • Called log_gradient_norms and run.log_metrics for loss per step
gradient_norm_tracking_in_pytorch.ipynb

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kz29 - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Avoid hardcoding credentials in code (link)
Here's what I looked at during the review
  • 🟡 General issues: 5 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

return tokenizer(examples['sentence1'], examples['sentence2'],
truncation=True, padding="longest", return_tensors="pt")

tokenized_datasets = dataset.map(tokenize_function, batched=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Remove duplicate dataset.map call

The second map and set_format calls are redundant—removing them avoids confusion and unnecessary compute.

for epoch in range(10):
for step, batch in enumerate(train_dataloader):
inputs = {k: v.to('cuda') for k, v in batch.items() if k in tokenizer.model_input_names}
labels = batch['labels'].to('cuda')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Use the correct label key from the collator output

DataCollatorWithPadding outputs 'label', not 'labels'. Use batch['label'] or adjust the collator to emit 'labels'.

Comment on lines 56 to 61
# Step 5. Define the Gradient Norm Logging Function
def log_gradient_norms(model, step):
for name, param in model.named_parameters():
if param.grad is not None:
grad_norm = param.grad.norm().item()
run.log_metrics({"gradients/" + name: grad_norm}, step=step)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Batch metric logging to reduce overhead

Accumulate all parameter norms in a dict and call run.log_metrics once per step to avoid per-parameter requests.

Suggested change
# Step 5. Define the Gradient Norm Logging Function
def log_gradient_norms(model, step):
for name, param in model.named_parameters():
if param.grad is not None:
grad_norm = param.grad.norm().item()
run.log_metrics({"gradients/" + name: grad_norm}, step=step)
# Step 5. Define the Gradient Norm Logging Function
def log_gradient_norms(model, step):
grad_norms = {}
for name, param in model.named_parameters():
if param.grad is not None:
grad_norms[f"gradients/{name}"] = param.grad.norm().item()
run.log_metrics(grad_norms, step=step)

Comment on lines 14 to 16
def tokenize_function(examples):
return tokenizer(examples['sentence1'], examples['sentence2'],
truncation=True, padding="longest", return_tensors="pt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Drop return_tensors='pt' in the map function

Let DataCollatorWithPadding handle tensor conversion; omitting return_tensors='pt' reduces memory usage during mapping.

Suggested change
def tokenize_function(examples):
return tokenizer(examples['sentence1'], examples['sentence2'],
truncation=True, padding="longest", return_tensors="pt")
def tokenize_function(examples):
return tokenizer(examples['sentence1'], examples['sentence2'],
truncation=True, padding="longest")


model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# Move model to CUDA
model.to('cuda')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Use a device variable and availability check

Define a device variable, e.g.:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

to prevent errors when no GPU is present.

Suggested implementation:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model.to(device)

  1. Ensure import torch is present in one of the top cells. If it’s missing, add:
    import torch
  2. (Optional) You can remove the comment # Move model to CUDA if it’s now outdated.

@@ -0,0 +1,281 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to include a step 1? The current notebook starts at step 2, although importing dependencies is more like a V0 or something you'd have to do anyway like a config setup. You can rename the steps from 1 in the load and preprocess section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code! Thank you for pointing it out!

Comment on lines 13 to 14
"/home/klea.ziu/.conda/envs/OCM/lib/python3.9/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.1\n",
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the pre-commit to ensure all outputs in notebooks are removed before committing.

  1. Install pre-commit
  2. pre-commit run --all-files

" api_token=\"YOUR_API_TOKEN\",# replace with your Neptune API token\n",
" project=\"your_workspace/your_project\", # replace with your workspace and project name\n",
" experiment_name=\"gradient_tracking\",\n",
" run_id=f\"gradient-{custom_id}\",\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_id is now optional - just install the latest version of neptune-scale. I'd personally only use the experiment_name.

Comment on lines 181 to 182
" api_token=\"YOUR_API_TOKEN\",# replace with your Neptune API token\n",
" project=\"your_workspace/your_project\", # replace with your workspace and project name\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for the blog, but keep the formatting consistent in all capitals to allow the readers to know they must replace these values for api_token and project.

Comment on lines 211 to 215
"def log_gradient_norms(model, step):\n",
" for name, param in model.named_parameters():\n",
" if param.grad is not None:\n",
" grad_norm = param.grad.norm().item()\n",
" run.log_metrics({\"gradients/\" + name: grad_norm}, step=step)\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this - it is how I'm doing it for some examples. I'd recommend though refactoring it to one of two suggestions

  1. Parse the run object as a parameter to make it log_gradient_norms() function for consistency or
  2. Only extract the gradient norm values and then parse them to the log_metrics() method in the training loop as a dictionary with the other metrics you're logging. This allows Neptune to log all metrics at one time making it cleaner and using best practices.

],
"source": [
"train_dataset = tokenized_datasets['train'].shuffle(seed=42).select(range(1000)) # Sample for demonstration\n",
"eval_dataset = tokenized_datasets['validation'].shuffle(seed=42).select(range(408))\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation datasets are not being used. Add it here if you're going to use it later, but consider removing if it is not used.

" run.log_metrics({\"loss\": loss.item()}, step=step + epoch * len(train_dataloader))\n",
"\n",
"# Add tags and close the run\n",
"run.add_tags([\"gradient_tracking\", \"pytorch\", \"transformers\"])\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding this line to step 3 where you're initializing the run. Keep the training loop focused on the training and logging the metrics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -0,0 +1 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you be using the readme file? If not, you can remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants