rust-text-experiments

A small experimental Rust project exploring tokenization and training a basic neural network on token embeddings. This project demonstrates how to preprocess text data, convert it into embeddings, and train a simple model to encode and decode vocabulary tokens.

Features

A corpus of increasing difficulty (level_0, level_1, etc.)
An encoder-decoder to build vector representation of words (with no meaning for now).
3 experimental models. Currently I'm iterating on the attention-based model.

First model: classic neural network

See doc here

Second model: using LSTM

See doc here

Current model: Attention

See doc here

Running

cargo run --release pretrain_encoder_decoder
cargo run --release train
cargo run --release self_test
cargo run --release run The cat sat on
# the mat.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
.github/workflows		.github/workflows
common-corpus @ e80494f		common-corpus @ e80494f
docs		docs
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rust-text-experiments

Features

First model: classic neural network

Second model: using LSTM

Current model: Attention

Running

License

About

Uh oh!

Uh oh!

Languages

License

antoineMoPa/rust-text-experiments

Folders and files

Latest commit

History

Repository files navigation

rust-text-experiments

Features

First model: classic neural network

Second model: using LSTM

Current model: Attention

Running

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages