Skip to content
This repository was archived by the owner on Nov 22, 2022. It is now read-only.

Commit 876b47e

Browse files
Kartikay Khandelwalfacebook-github-bot
authored andcommitted
Division by Zero bug in MLM Metric Reporter
Summary: While using fp16 during MLM training there's a weird bug during metric reporting that causes a division by zero failure while computing the training speed (an example is below). Preventing this failure by adding a small value to the denominator. Example: f136667996 Reviewed By: chenyangyu1988 Differential Revision: D17293910 fbshipit-source-id: 880213cbbe9f88d63126fc487e44a705c1ed459a
1 parent 9eb8e43 commit 876b47e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

pytext/metric_reporters/language_model_metric_reporter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ def add_batch_stats(
190190
if not n_batches % 1000:
191191
tps = self.realtime_meters["tps"].avg
192192
print(
193-
f"Tokens/s: {total_tokens / (now - self.time):.0f}, "
193+
f"Tokens/s: {total_tokens / ((now - self.time) + 0.000001):.0f}, "
194194
f"batch ppl: {math.exp(loss.item()):.2f}, "
195195
f"agg ppl: {math.exp(self.aggregate_loss / float(self.total_num_tokens)):.2f}, "
196196
f"number of batches: {n_batches}, "

0 commit comments

Comments
 (0)