|
| 1 | +# UnicodeDB |
| 2 | + |
| 3 | +This library aims to bring the unicode database to Nim. Main goal is |
| 4 | +having O(1) access for every API and be lightweight in size. |
| 5 | + |
| 6 | +## Usage |
| 7 | + |
| 8 | +Properties: |
| 9 | +```nim |
| 10 | +import unicode |
| 11 | +import unicodedb |
| 12 | +
|
| 13 | +echo category("A".runeAt(0)) # 'L'etter, 'u'ppercase |
| 14 | +# "Lu" |
| 15 | +
|
| 16 | +echo bidirectional(Rune(0x0660)) # 'A'rabic, 'N'umber |
| 17 | +# "AN" |
| 18 | +
|
| 19 | +echo combining(Rune(0x860)) |
| 20 | +# 0 |
| 21 | +
|
| 22 | +echo((quickCheck(Rune(0x0374)) and NfMasks.NfcQcNo.ord) != 0) |
| 23 | +# true |
| 24 | +``` |
| 25 | +[docs](https://nitely.github.io/nim-unicodedb/properties.html) |
| 26 | + |
| 27 | +Names: |
| 28 | +```nim |
| 29 | +import unicode |
| 30 | +import unicodedb |
| 31 | +
|
| 32 | +echo lookupStrict("LEFT CURLY BRACKET") # '{' |
| 33 | +# Rune(0x007B) |
| 34 | +
|
| 35 | +echo name("/".runeAt(0)) |
| 36 | +# "SOLIDUS" |
| 37 | +``` |
| 38 | +[docs](https://nitely.github.io/nim-unicodedb/names.html) |
| 39 | + |
| 40 | +Compositions: |
| 41 | +```nim |
| 42 | +import unicode |
| 43 | +import unicodedb |
| 44 | +
|
| 45 | +echo composition(Rune(108), Rune(803)) |
| 46 | +# Rune(7735) |
| 47 | +``` |
| 48 | +[docs](https://nitely.github.io/nim-unicodedb/compositions.html) |
| 49 | + |
| 50 | +Decompositions: |
| 51 | +```nim |
| 52 | +import unicode |
| 53 | +import unicodedb |
| 54 | +
|
| 55 | +echo decomposition(Rune(0x0F9D)) |
| 56 | +# @[Rune(0x0F9C), Rune(0x0FB7)] |
| 57 | +``` |
| 58 | +[docs](https://nitely.github.io/nim-unicodedb/decompositions.html) |
| 59 | + |
| 60 | +## Related libraries |
| 61 | + |
| 62 | +* [nim-graphemes](https://github.com/nitely/nim-graphemes) |
| 63 | + |
| 64 | +## Storage |
| 65 | + |
| 66 | +Storage is based on *multi-stage tables* and |
| 67 | +*minimal perfect hashing* data-structures. |
| 68 | + |
| 69 | +## Sizes |
| 70 | + |
| 71 | +These are the current collections sizes: |
| 72 | + |
| 73 | +* properties is 45KB. Used by `properties(1)`, `getCategory(1)`, |
| 74 | + `getBidirectional(1)`, `getCombining(1)` and `getQc(1)` |
| 75 | +* compositions is 24KB. Used by: `composition(1)` |
| 76 | +* decompositions is 149KB. Used by `decomposition(1)` |
| 77 | + and `canonicalDecomposition` |
| 78 | +* names is 795KB. Used by `name(1)` and `lookup(1)` |
| 79 | +* names (lookup) is 301KB. Used by `lookup(1)` |
| 80 | + |
| 81 | +## Missing APIs |
| 82 | + |
| 83 | +New APIs will be added from time to time. If you need |
| 84 | +something that's missing, please open an issue or PR |
| 85 | +(please, do mention the use-case). |
| 86 | + |
| 87 | +## Tests |
| 88 | + |
| 89 | +Initial tests are ran against [a dump of] Python's |
| 90 | +`unicodedata` module to ensure correctness. |
| 91 | +Also, the related libraries have their own custom tests |
| 92 | +(some of the test data is provided by the unicode consortium). |
| 93 | + |
| 94 | +## Contributing |
| 95 | + |
| 96 | +I plan to work on most missing *related |
| 97 | +libraries* (case folding, etc). If you would |
| 98 | +like to work in one of those, please let me |
| 99 | +know and I'll add it to the list. If you find |
| 100 | +the required database data is missing, either open an |
| 101 | +issue or a PR. |
| 102 | + |
| 103 | +## LICENSE |
| 104 | + |
| 105 | +MIT |
0 commit comments