Skip to content

Unknown words can still result in punct tag at end of sentence #1000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AngledLuffa opened this issue Apr 8, 2022 · 2 comments
Open

Unknown words can still result in punct tag at end of sentence #1000

AngledLuffa opened this issue Apr 8, 2022 · 2 comments
Labels

Comments

@AngledLuffa
Copy link
Collaborator

For example, if a sentence ends with thicc and no sentence final punctutation, thicc is labeled PUNCT

@AngledLuffa AngledLuffa added the bug label Apr 8, 2022
@AngledLuffa
Copy link
Collaborator Author

Although I don't know if this is fixed universally, I can say that the updated tagger does a better job of labeling thicc as an adjective in a sentence such as Jennifer's antennae are hella thicc. Although sadly it labels hella as an INTJ in some contexts such as Dat ass hella thicc even though it is clearly an ADV. Perhaps we need to add more uses of hella. For that matter, Dat is mistagged as INTJ as well instead of DET

@AngledLuffa
Copy link
Collaborator Author

The sentence end punctuation is a lot better as a result of this PR:

#1303

Much less of an issue on EN and PR now. Will retrain other models when the new UD release comes out, unless there are other specific languages which need fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant