Fix bug in copy_unk (#964)

ArmenAg · facebook-github-bot · commit b362c3124da0 · 2019-09-09T18:32:46.000-07:00
Summary: Pull Request resolved: #964 When the copy_unk flag is set to true. Any unk that is produced in the output of the Seq2Seq model is replaced by the token that was mapped to unk from the utterance. This is a easy way to get gains since outputs with unk are always wrong. Looking at the old code for copying the unk token we see that TorchScript optimizes out the actual search of the unk token in utterance: {F207887831} This diff updates the code to produce the correct TorchScript Graph {F207888470} Reviewed By: arbabu123 Differential Revision: D17213086 fbshipit-source-id: ebbfc52dcd703939316b15250110271336ef131d
diff --git a/pytext/utils/torch.py b/pytext/utils/torch.py
@@ -123,7 +123,7 @@ def lookup_words_1d(
 
     @torch.jit.script_method
     def lookup_word(self, idx: int, possible_unk_token: Optional[str] = None):
-        if idx < len(self.vocab):
+        if idx < len(self.vocab) and idx != self.unk_idx:
             return self.vocab[idx]
         else:
             return (