This project trains a character-level neural network to generate new names based on a list of names. The network predicts the next character in a sequence, ultimately generating unique and creative names.
- Character Embedding: Converts characters into 10-dimensional embedding vectors.
- Multi-Layer Perceptron (MLP): Includes several hidden layers with batch normalization and activation functions (Tanh).
- Custom Dataset: Built from the names.txt dataset, where each name is tokenized into character sequences.
- Dynamic Name Generation: Generates names by sampling from the predicted character probabilities.
- Model Visualization: Plots the activation distributions of Tanh layers and tracks the gradients of model parameters.
The dataset (names.txt) consists of names, one per line. Each name is processed and extended with a '.' to signify the end of the name.
- Embedding Layer: A learnable embedding of size
n_embd = 10for each character. - Hidden Layers: 5 hidden layers with 100 neurons each, followed by batch normalization and Tanh activation.
- Output Layer: A softmax layer that predicts the next character from the vocabulary of 27 characters (including '.').
- Learning Rate: The learning rate is decayed from 0.1 to 0.01 after 150,000 steps.
- Loss Function: Cross-entropy loss is used to measure how well the network predicts the next character.
- Optimizer: Gradient-based optimization is performed by manually updating the parameters after backpropagation.
- Steps: Training runs for a maximum of 200,000 steps with a batch size of 32.
The dataset is split into training (80%), validation (10%), and test (10%) sets:
- Training set: 182,625 samples
- Validation set: 22,655 samples
- Test set: 22,866 samples
After training, the model is evaluated on the training, validation, and test sets, achieving the following losses:
- Training Loss: 2.00
- Validation Loss: 2.08
- Test Loss: 2.08
Here are some sample names generated by the model:
- montaymyah.
- madhayla.
- ejdra.
- shivaelle.
- arliegh.
- xaviona.
- halisa.
-
Clone the repository:
git clone https://github.com/suhass434/MakeMore.git
-
Install the dependencies:
pip install torch matplotlib
-
Place your dataset (
names.txt) in the root directory. -
To train and generate names, run the script:
python name_generator.py
The model provides two types of visualizations:
-
Activation Distribution: The activation distribution of each Tanh layer can be plotted to visualize how the neurons are activated throughout the network.
-
Gradient Update Ratios: Tracks the gradient update ratios of the model's parameters throughout the training process to ensure stable learning.