tiktoken-rs
Ready-made tokenizer library for working with GPT and tiktoken
Usage
- Install this tool locally with
cargo
Then in your rust code, call the API
use p50k_base;
let bpe = p50k_base.unwrap;
let tokens = bpe.encode_with_special_tokens;
println!;
tiktoken
supports three encodings used by OpenAI models:
Encoding name | OpenAI models |
---|---|
cl100k_base |
ChatGPT models, text-embedding-ada-002 |
p50k_base |
Code models, text-davinci-002 , text-davinci-003 |
p50k_edit |
Use for edit models like text-davinci-edit-001 , code-davinci-edit-001 |
r50k_base (or gpt2 ) |
GPT-3 models like davinci |
See the examples in the repo for use cases. For more context on the different tokenizers, see the OpenAI Cookbook
Encountered any bugs?
If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.
Acknowledgements
- Thanks @spolu for the original code, and
.tiktoken
files.
License
This project is licensed under the MIT License.