Skip to content

0xDevansh/lstm

Repository files navigation

Motivation

In this project I want to try out different recurrent architectures, including RNN, LSTM (with and without peephold connections), GRU and if time and skill permits, attention.

I'll try creating a character-level language model and compare their performance. For fun I'll use Agatha Christie novels in the public domain as my dataset

I'll try to implement these from scratch but using pytorch's tensors (for autograd).

What I did

The models I trained so far:

Hastings

This was a character level LSTM, trained on The Mysterious Affair at Styles. It was trained with a hidden dimension of 512 and embedding dimension 64. The training function is trian_character_model in train.py. The model is defined in TextModel.py.

Japp

This was another LSTM, this time using GPT's tiktoken library to tokenize the text. The training was very unstable, which I believe is due to having different training sets for the tokenizer and the model

Poirot

This is what I'm currently wokring on. I'm training a GRU model and also my own BPE tokenizer. Why this should be better than tiktoken?

  1. There are a lot of useless code-related tokens I don't need
  2. 50k tokens inflate the model size, make training slower and the model sizes bigger. We don't even use most of those tokens.

About

random experiments with lstm and text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages