Create gradient_norm_tracking_in_pytorch.ipynb #16

kz29 · 2025-05-21T06:52:41Z

This is the code for the new article: How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

Description

Include a summary of the changes and the related issue.

Related to: <ClickUp/JIRA task name>

Any expected test failures?

Add a [X] to relevant checklist items

❔ This change

adds a new feature
fixes breaking code
is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

Refactored code (sourcery)
Tested code locally
Precommit installed and run before pushing changes
Added code to GitHub tests (notebooks, scripts)
Updated GitHub README
Updated the projects overview page on Notion

🧪 Test Configuration

OS:
Python version:
Neptune version:
Affected libraries with version:

Summary by Sourcery

Add a new Jupyter notebook demonstrating how to monitor and log gradient norms in a PyTorch training pipeline for foundation models, using a BERT MRPC example and Neptune Scale integration.

New Features:

Introduce a PyTorch notebook for tracking per-parameter gradient norms and loss metrics during BERT MRPC fine-tuning with Neptune Scale
Include dataset loading, tokenization, data loader preparation, and model setup for end-to-end training and logging

…tune scale

sourcery-ai · 2025-05-21T06:52:45Z

Reviewer's Guide

This PR adds a new Jupyter notebook that demonstrates how to train a BERT sequence-classification model in PyTorch while tracking per-parameter gradient norms and training loss in Neptune Scale.

File-Level Changes

Change	Details	Files
Data loading and tokenization pipeline implemented	Imported and loaded the GLUE MRPC dataset Defined a tokenize_function and mapped it over the dataset Set dataset format to PyTorch tensors Created shuffled, batched DataLoaders for train and validation	`gradient_norm_tracking_in_pytorch.ipynb`
Model and optimizer setup	Imported and instantiated BertForSequenceClassification Moved model to CUDA Configured AdamW optimizer with a 5e-5 learning rate	`gradient_norm_tracking_in_pytorch.ipynb`
Neptune Scale experiment integration	Imported Run from neptune_scale Initialized Run with project, experiment name, and unique run_id Logged hyperparameters via run.log_configs Added tags and closed the run at the end	`gradient_norm_tracking_in_pytorch.ipynb`
Training loop with gradient-norm logging	Defined log_gradient_norms(model, step) to compute and log per-parameter grad norms Implemented epoch and batch loop with zero_grad, loss.backward, optimizer.step Called log_gradient_norms and run.log_metrics for loss per step	`gradient_norm_tracking_in_pytorch.ipynb`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @kz29 - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

Avoid hardcoding credentials in code (link)

Here's what I looked at during the review

🟡 General issues: 5 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-21T06:58:30Z

...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb

+    return tokenizer(examples['sentence1'], examples['sentence2'], 
+                     truncation=True, padding="longest", return_tensors="pt")
+
+tokenized_datasets = dataset.map(tokenize_function, batched=True)


suggestion: Remove duplicate dataset.map call

The second map and set_format calls are redundant—removing them avoids confusion and unnecessary compute.

sourcery-ai · 2025-05-21T06:58:30Z

...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb

+for epoch in range(10):
+    for step, batch in enumerate(train_dataloader):
+        inputs = {k: v.to('cuda') for k, v in batch.items() if k in tokenizer.model_input_names}
+        labels = batch['labels'].to('cuda')


issue (bug_risk): Use the correct label key from the collator output

DataCollatorWithPadding outputs 'label', not 'labels'. Use batch['label'] or adjust the collator to emit 'labels'.

sourcery-ai · 2025-05-21T06:58:30Z

...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb

+# Step 5. Define the Gradient Norm Logging Function
+def log_gradient_norms(model, step):
+    for name, param in model.named_parameters():
+        if param.grad is not None:
+            grad_norm = param.grad.norm().item()
+            run.log_metrics({"gradients/" + name: grad_norm}, step=step)


suggestion (performance): Batch metric logging to reduce overhead

Accumulate all parameter norms in a dict and call run.log_metrics once per step to avoid per-parameter requests.

Suggested change

# Step 5. Define the Gradient Norm Logging Function

def log_gradient_norms(model, step):

for name, param in model.named_parameters():

if param.grad is not None:

grad_norm = param.grad.norm().item()

run.log_metrics({"gradients/" + name: grad_norm}, step=step)

# Step 5. Define the Gradient Norm Logging Function

def log_gradient_norms(model, step):

grad_norms = {}

for name, param in model.named_parameters():

if param.grad is not None:

grad_norms[f"gradients/{name}"] = param.grad.norm().item()

run.log_metrics(grad_norms, step=step)

...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb

sourcery-ai · 2025-05-21T06:58:30Z

...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb

+def tokenize_function(examples):
+    return tokenizer(examples['sentence1'], examples['sentence2'], 
+                     truncation=True, padding="longest", return_tensors="pt")


suggestion (performance): Drop return_tensors='pt' in the map function

Let DataCollatorWithPadding handle tensor conversion; omitting return_tensors='pt' reduces memory usage during mapping.

Suggested change

def tokenize_function(examples):

return tokenizer(examples['sentence1'], examples['sentence2'],

truncation=True, padding="longest", return_tensors="pt")

def tokenize_function(examples):

return tokenizer(examples['sentence1'], examples['sentence2'],

truncation=True, padding="longest")

sourcery-ai · 2025-05-21T06:58:30Z

...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb

+
+model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
+# Move model to CUDA
+model.to('cuda')


suggestion (bug_risk): Use a device variable and availability check

Define a device variable, e.g.:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device)

to prevent errors when no GPU is present.

Suggested implementation:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = BertForSequenceClassification.from_pretrained('bert-base-uncased') model.to(device)

Ensure import torch is present in one of the top cells. If it’s missing, add:
import torch

(Optional) You can remove the comment # Move model to CUDA if it’s now outdated.

LeoRoccoBreedt · 2025-05-26T08:34:23Z