Skip to content

Commit 8733592

Browse files
committed
Add support: o200k_base tokenizer.
1 parent eb9c1de commit 8733592

File tree

10 files changed

+200109
-5
lines changed

10 files changed

+200109
-5
lines changed

scripts/download_assets.sh

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/encoder.json
99
https://openaipublic.blob.core.windows.net/encodings/r50k_base.tiktoken
1010
https://openaipublic.blob.core.windows.net/encodings/p50k_base.tiktoken
1111
https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken
12+
https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken
1213
EOF
1314
)
1415

tiktoken-rs/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ println!("max_tokens: {}", max_tokens);
105105

106106
| Encoding name | OpenAI models |
107107
| ----------------------- | ------------------------------------------------------------------------- |
108+
| `o200k_base` | GPT-4o models. |
108109
| `cl100k_base` | ChatGPT models, `text-embedding-ada-002` |
109110
| `p50k_base` | Code models, `text-davinci-002`, `text-davinci-003` |
110111
| `p50k_edit` | Use for edit models like `text-davinci-edit-001`, `code-davinci-edit-001` |

0 commit comments

Comments
 (0)