[go: up one dir, main page]

tiktoken-rs 0.2.1

Library for encoding and decoding with the tiktoken library in Rust
Documentation

tiktoken-rs

Github Contributors Github Stars CI

crates.io status crates.io downloads Rust dependency status

Ready-made tokenizer library for working with GPT and tiktoken

Usage

  1. Install this tool locally with cargo
cargo add tiktoken-rs

Then in your rust code, call the API

use tiktoken_rs::tiktoken::p50k_base;
let bpe = p50k_base().unwrap();
let tokens = bpe.encode_with_special_tokens("This is an example");
println!("Token count: {}", tokens.len());

tiktoken supports three encodings used by OpenAI models:

Encoding name OpenAI models
cl100k_base ChatGPT models, text-embedding-ada-002
p50k_base Code models, text-davinci-002, text-davinci-003
p50k_edit Use for edit models like text-davinci-edit-001, code-davinci-edit-001
r50k_base (or gpt2) GPT-3 models like davinci

See the examples in the repo for use cases. For more context on the different tokenizers, see the OpenAI Cookbook

Encountered any bugs?

If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.

Acknowledgements

  • Thanks @spolu for the original code, and .tiktoken files.

License

This project is licensed under the MIT License.