Skip to content

Commit 1a36efb

Browse files
committed
Reset the start_char and end_char on single Word objects if the Token object has start_char and end_char.
Will accommodate MWT Tokens which were detected by the tokenizer but not expanded by the MWT model, which can happen with typos such as it"s #1436
1 parent 081d1dc commit 1a36efb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

stanza/models/common/doc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ def set_mwt_expansions(self, expansions,
396396
word.sent = sentence
397397
word.parent = token
398398
sentence.words.append(word)
399-
if len(token.words) > 1 and token.start_char is not None and token.end_char is not None and "".join(word.text for word in token.words) == token.text:
399+
if token.start_char is not None and token.end_char is not None and "".join(word.text for word in token.words) == token.text:
400400
start_char = token.start_char
401401
for word in token.words:
402402
end_char = start_char + len(word.text)

0 commit comments

Comments
 (0)