A collection of neural network architectures implemented in various frameworks, with initial focus on MLX, Apple's machine learning framework for efficient computation on Mac hardware.
This repository contains implementations of various neural network architectures for study and experimentation. The initial models use MLX because it provides efficient computation without requiring high-end GPUs, making these models accessible for users with Apple Silicon Macs. Future implementations may target other frameworks.
Location: networks/resnet/
A general-purpose convolutional neural network architecture for image classification with skip connections that help with training deeper networks.
- Implements both standard ResNet and ResNet50 with bottleneck blocks
- Includes modern design variants (ResNet-B, ResNet-C, ResNet-D) as shown in
variants.png
Paper: Deep Residual Learning for Image Recognition (He et al., 2015)
Location: networks/autoencoders/super_resolution/
Models that enhance low-resolution images by upscaling them to higher resolution using deep learning.
- Uses encoder-decoder architecture with residual blocks
- Implements pixel shuffle technique for efficient upscaling
- Supports perceptual loss using VGG features
- Trained on the DIV2K dataset
- Example results can be found in
result.png
Paper: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network (Shi et al., 2016)
Location: networks/transformers/autoregressive/
Text generation model using a decoder-only transformer architecture, similar to models like GPT.
- Includes causal attention mechanism
- Implements positional encoding, multi-head attention, and feed-forward networks
- Uses efficient key-value caching for faster inference
- Features RMSNorm instead of LayerNorm
- Supports temperature-based sampling for text generation
Paper: Attention Is All You Need (Vaswani et al., 2017)
Location: networks/vgg16/
Classic CNN architecture for image classification, also used as a feature extractor for perceptual loss in other models.
- Standard implementation with 13 convolutional layers and 3 fully connected layers
- Includes dropout for regularization
- Provides feature extraction capabilities at different network depths
- Includes weight conversion utility
Paper: Very Deep Convolutional Networks for Large-Scale Image Recognition (Simonyan & Zisserman, 2014)
Location: networks/ddpm/
Generative model based on denoising diffusion processes for high-quality image synthesis.
- U-Net architecture with time conditioning for noise prediction
- Sinusoidal time embeddings with learned transformations
- FiLM (Feature-wise Linear Modulation) for time-conditional generation
- ResNet-style blocks with RMSNorm for stable training
- Skip connections between encoder and decoder paths
- Supports variable timesteps for flexible sampling schedules
Paper: Denoising Diffusion Probabilistic Models (Ho et al., 2020)
Location: networks/utils/
Helper functions and components used across different models.
- Perceptual loss implementation using pre-trained feature extractors
- Allows weighting of different feature layers
- Python 3.11+
- Framework-specific dependencies (currently MLX)
- Additional model architectures
- Conversion tools between frameworks