Skip to content

Commit 625b0d1

Browse files
committed
initial
0 parents  commit 625b0d1

39 files changed

+534578
-0
lines changed

.gitignore

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
nimcache/
2+
tests/tests
3+
gen/compositions
4+
gen/decompositions
5+
gen/names
6+
gen/properties

CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
v0.1.0
2+
==================
3+
4+
* Initial release

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2017 Esteban Castro Borsani
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+105
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# UnicodeDB
2+
3+
This library aims to bring the unicode database to Nim. Main goal is
4+
having O(1) access for every API and be lightweight in size.
5+
6+
## Usage
7+
8+
Properties:
9+
```nim
10+
import unicode
11+
import unicodedb
12+
13+
echo category("A".runeAt(0)) # 'L'etter, 'u'ppercase
14+
# "Lu"
15+
16+
echo bidirectional(Rune(0x0660)) # 'A'rabic, 'N'umber
17+
# "AN"
18+
19+
echo combining(Rune(0x860))
20+
# 0
21+
22+
echo((quickCheck(Rune(0x0374)) and NfMasks.NfcQcNo.ord) != 0)
23+
# true
24+
```
25+
[docs](https://nitely.github.io/nim-unicodedb/properties.html)
26+
27+
Names:
28+
```nim
29+
import unicode
30+
import unicodedb
31+
32+
echo lookupStrict("LEFT CURLY BRACKET") # '{'
33+
# Rune(0x007B)
34+
35+
echo name("/".runeAt(0))
36+
# "SOLIDUS"
37+
```
38+
[docs](https://nitely.github.io/nim-unicodedb/names.html)
39+
40+
Compositions:
41+
```nim
42+
import unicode
43+
import unicodedb
44+
45+
echo composition(Rune(108), Rune(803))
46+
# Rune(7735)
47+
```
48+
[docs](https://nitely.github.io/nim-unicodedb/compositions.html)
49+
50+
Decompositions:
51+
```nim
52+
import unicode
53+
import unicodedb
54+
55+
echo decomposition(Rune(0x0F9D))
56+
# @[Rune(0x0F9C), Rune(0x0FB7)]
57+
```
58+
[docs](https://nitely.github.io/nim-unicodedb/decompositions.html)
59+
60+
## Related libraries
61+
62+
* [nim-graphemes](https://github.com/nitely/nim-graphemes)
63+
64+
## Storage
65+
66+
Storage is based on *multi-stage tables* and
67+
*minimal perfect hashing* data-structures.
68+
69+
## Sizes
70+
71+
These are the current collections sizes:
72+
73+
* properties is 45KB. Used by `properties(1)`, `getCategory(1)`,
74+
`getBidirectional(1)`, `getCombining(1)` and `getQc(1)`
75+
* compositions is 24KB. Used by: `composition(1)`
76+
* decompositions is 149KB. Used by `decomposition(1)`
77+
and `canonicalDecomposition`
78+
* names is 795KB. Used by `name(1)` and `lookup(1)`
79+
* names (lookup) is 301KB. Used by `lookup(1)`
80+
81+
## Missing APIs
82+
83+
New APIs will be added from time to time. If you need
84+
something that's missing, please open an issue or PR
85+
(please, do mention the use-case).
86+
87+
## Tests
88+
89+
Initial tests are ran against [a dump of] Python's
90+
`unicodedata` module to ensure correctness.
91+
Also, the related libraries have their own custom tests
92+
(some of the test data is provided by the unicode consortium).
93+
94+
## Contributing
95+
96+
I plan to work on most missing *related
97+
libraries* (case folding, etc). If you would
98+
like to work in one of those, please let me
99+
know and I'll add it to the list. If you find
100+
the required database data is missing, either open an
101+
issue or a PR.
102+
103+
## LICENSE
104+
105+
MIT

0 commit comments

Comments
 (0)