Skip to content

ChineseHeadFinder: dictionary key 'INTJ' repeated with different values #1370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tanloong opened this issue Jul 6, 2023 · 3 comments
Closed

Comments

@tanloong
Copy link
Contributor

tanloong commented Jul 6, 2023

In ChineseHeadFinder.java, the key "INTJ" is duplicated with different values, at line 57 and line 101.

Is this duplication a bug or intended behavior? Sorry for the inconvenience if it is intended.

@AngledLuffa
Copy link
Contributor

Clearly a bug, as it is clobbering the old entry, which was

    nonTerminalInfo.put("INTJ", new String[][]{{right, "INTJ", "IJ", "SP"}});

The new entry makes it left headed (except for punct). Do you have any insight into which is better?

In CTB 5.1, all INTJ nodes are for single words, such as

(INTJ (IJ 唉呀))

except for this, which would appear to be a mistake based on the bracketing of the punctuation:

(IP 
  (INTJ (PU 「) (IJ 嘿咻))
  (PU !) 
  (PU 」)
  ...

I don't have CTB 9 lying around, but I will ask the people in charge of such things to put it on our cluster.

@tanloong
Copy link
Contributor Author

tanloong commented Jul 6, 2023

Thanks for the quick response!

I must admit that I don't have prior knowledge about CTB (and I don't have the CTB 9 neither). Therefore, I am unable to determine which value is better😔.

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Jul 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants