-
Notifications
You must be signed in to change notification settings - Fork 1
Create gradient_norm_tracking_in_pytorch.ipynb #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR adds a new Jupyter notebook that demonstrates how to train a BERT sequence-classification model in PyTorch while tracking per-parameter gradient norms and training loss in Neptune Scale. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kz29 - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Avoid hardcoding credentials in code (link)
Here's what I looked at during the review
- 🟡 General issues: 5 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
return tokenizer(examples['sentence1'], examples['sentence2'], | ||
truncation=True, padding="longest", return_tensors="pt") | ||
|
||
tokenized_datasets = dataset.map(tokenize_function, batched=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Remove duplicate dataset.map call
The second map and set_format calls are redundant—removing them avoids confusion and unnecessary compute.
for epoch in range(10): | ||
for step, batch in enumerate(train_dataloader): | ||
inputs = {k: v.to('cuda') for k, v in batch.items() if k in tokenizer.model_input_names} | ||
labels = batch['labels'].to('cuda') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Use the correct label key from the collator output
DataCollatorWithPadding outputs 'label', not 'labels'. Use batch['label'] or adjust the collator to emit 'labels'.
# Step 5. Define the Gradient Norm Logging Function | ||
def log_gradient_norms(model, step): | ||
for name, param in model.named_parameters(): | ||
if param.grad is not None: | ||
grad_norm = param.grad.norm().item() | ||
run.log_metrics({"gradients/" + name: grad_norm}, step=step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Batch metric logging to reduce overhead
Accumulate all parameter norms in a dict and call run.log_metrics once per step to avoid per-parameter requests.
# Step 5. Define the Gradient Norm Logging Function | |
def log_gradient_norms(model, step): | |
for name, param in model.named_parameters(): | |
if param.grad is not None: | |
grad_norm = param.grad.norm().item() | |
run.log_metrics({"gradients/" + name: grad_norm}, step=step) | |
# Step 5. Define the Gradient Norm Logging Function | |
def log_gradient_norms(model, step): | |
grad_norms = {} | |
for name, param in model.named_parameters(): | |
if param.grad is not None: | |
grad_norms[f"gradients/{name}"] = param.grad.norm().item() | |
run.log_metrics(grad_norms, step=step) |
...gnose_and_Solve_Gradient_Issues_in_Foundation_Models/gradient_norm_tracking_in_pytorch.ipynb
Outdated
Show resolved
Hide resolved
def tokenize_function(examples): | ||
return tokenizer(examples['sentence1'], examples['sentence2'], | ||
truncation=True, padding="longest", return_tensors="pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Drop return_tensors='pt'
in the map function
Let DataCollatorWithPadding handle tensor conversion; omitting return_tensors='pt' reduces memory usage during mapping.
def tokenize_function(examples): | |
return tokenizer(examples['sentence1'], examples['sentence2'], | |
truncation=True, padding="longest", return_tensors="pt") | |
def tokenize_function(examples): | |
return tokenizer(examples['sentence1'], examples['sentence2'], | |
truncation=True, padding="longest") |
|
||
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') | ||
# Move model to CUDA | ||
model.to('cuda') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Use a device variable and availability check
Define a device variable, e.g.:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
to prevent errors when no GPU is present.
Suggested implementation:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model.to(device)
- Ensure
import torch
is present in one of the top cells. If it’s missing, add:import torch
- (Optional) You can remove the comment
# Move model to CUDA
if it’s now outdated.
@@ -0,0 +1,281 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to include a step 1? The current notebook starts at step 2, although importing dependencies is more like a V0 or something you'd have to do anyway like a config setup. You can rename the steps from 1 in the load and preprocess section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code! Thank you for pointing it out!
"/home/klea.ziu/.conda/envs/OCM/lib/python3.9/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.1\n", | ||
" warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run the pre-commit
to ensure all outputs in notebooks are removed before committing.
- Install pre-commit
- pre-commit run --all-files
" api_token=\"YOUR_API_TOKEN\",# replace with your Neptune API token\n", | ||
" project=\"your_workspace/your_project\", # replace with your workspace and project name\n", | ||
" experiment_name=\"gradient_tracking\",\n", | ||
" run_id=f\"gradient-{custom_id}\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_id
is now optional - just install the latest version of neptune-scale
. I'd personally only use the experiment_name
.
" api_token=\"YOUR_API_TOKEN\",# replace with your Neptune API token\n", | ||
" project=\"your_workspace/your_project\", # replace with your workspace and project name\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for the blog, but keep the formatting consistent in all capitals to allow the readers to know they must replace these values for api_token
and project
.
"def log_gradient_norms(model, step):\n", | ||
" for name, param in model.named_parameters():\n", | ||
" if param.grad is not None:\n", | ||
" grad_norm = param.grad.norm().item()\n", | ||
" run.log_metrics({\"gradients/\" + name: grad_norm}, step=step)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this - it is how I'm doing it for some examples. I'd recommend though refactoring it to one of two suggestions
- Parse the
run
object as a parameter to make itlog_gradient_norms()
function for consistency or - Only extract the gradient norm values and then parse them to the
log_metrics()
method in the training loop as a dictionary with the other metrics you're logging. This allows Neptune to log all metrics at one time making it cleaner and using best practices.
], | ||
"source": [ | ||
"train_dataset = tokenized_datasets['train'].shuffle(seed=42).select(range(1000)) # Sample for demonstration\n", | ||
"eval_dataset = tokenized_datasets['validation'].shuffle(seed=42).select(range(408))\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluation datasets are not being used. Add it here if you're going to use it later, but consider removing if it is not used.
" run.log_metrics({\"loss\": loss.item()}, step=step + epoch * len(train_dataloader))\n", | ||
"\n", | ||
"# Add tags and close the run\n", | ||
"run.add_tags([\"gradient_tracking\", \"pytorch\", \"transformers\"])\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding this line to step 3 where you're initializing the run. Keep the training loop focused on the training and logging the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you be using the readme file? If not, you can remove it.
This is the code for the new article: How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
Description
Include a summary of the changes and the related issue.
Related to: <ClickUp/JIRA task name>
Any expected test failures?
Add a
[X]
to relevant checklist items❔ This change
✔️ Pre-merge checklist
🧪 Test Configuration
Summary by Sourcery
Add a new Jupyter notebook demonstrating how to monitor and log gradient norms in a PyTorch training pipeline for foundation models, using a BERT MRPC example and Neptune Scale integration.
New Features: