This is my personal logbook for machine learning and AI experiments. As I build, train, and evaluate different neural network architectures, I document my findings, challenges, and breakthroughs here. This repository serves both as my learning journal and a showcase of my growing expertise in AI development.
I'm particularly interested in language models and exploring how to make them more efficient while maintaining performance. My experiments span various architectures and training approaches, with a focus on what can be accomplished with consumer-grade hardware.
- Paper: "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"
- Goal: Replicate the paper's approach using only my GTX 3090
- Current Status: Reading the paper and planning implementation strategy
- Technical Setup:
- Hardware: GTX 3090 (24GB VRAM)
- Framework: PyTorch
- Key Questions:
- How small can I make a coherent LLM with my hardware constraints?
- What optimizations will be necessary for efficient training?
- How will performance compare to the original paper's results?
- Exploring quantization techniques
- Implementing attention mechanism variations
- Experimenting with different tokenization strategies
- Testing alternative training datasets
I'm always open to collaboration, feedback, or discussions about ML/AI. Feel free to reach out if you have any questions about my experiments or want to collaborate on a project.
This repository is actively maintained and updated by me as I continue my AI learning journey.