Minimal implementation of the Cue-Revival State Dynamics (CRSD) prototype.
Clone the repo and run a tiny experiment:
git clone https://github.com/navaneet625/CRSD_model.git
cd CRSD_model
python train.py --config experiments/exp_crsd_tiny.yaml-
🧩 Modular Architecture
Clean directory structure (models/,data/,utils/,scripts/), facilitating reproducible experiments. -
🔡 Tokenization System
Supports character, word, and subword modes using custom tokenizers:- Byte-level
- Word
- SentencePiece
-
⚙️ Dataset Pipeline
Robust dataloader handles large-scale text corpora (e.g.,enwik8_10M.txt).
Supports automatic data splitting, padding, and batching. -
📈 Training Loop
Complete training & evaluation pipeline:- Gradient scaling
- Checkpointing
- Metrics: Loss, Accuracy, Bits-Per-Character, Perplexity
-
🧠 Research Focus
Designed for rapid experimentation with modern State Space Model (SSM) families:- S4, S4D, S5, Mamba, LinOSS, Samba
- RetNet-inspired decoders
-
🔬 Debug-Friendly
Inspection tools for tracing tokenization, data flow, model structure- Detects silent data corruption (e.g., PAD flooding)
- Detailed print/debug modes
A differentiable associative memory storing feature correlations in a dynamic matrix H of shape (B, d_k, d_v).
Update rule:
H <- gamma * H + eta * outer(k, v)
Where:
gammais a learnable decay parameteretais a learnable learning rateouter(k, v)denotes the outer product between keykand valuev
Features:
- Cosine-normalized recall for stability
- Learnable decay (
gamma) and learning rate (eta) - Optional top-k selective recall
- Detach-safe memory (can disable gradient flow)
- Mixed-precision / AMP-friendly
A prioritized, slot-based episodic memory. Retains important (key, value) pairs and replaces least important when full.
Features:
- Vectorized priority writing (replace lowest-importance slot)
- Cosine similarity recall with temperature scaling
- Optional top-k or window-limited recall
- Stable normalization (with epsilon for numerical safety)
- Fully differentiable
- Efficient for large batch or slot counts (AMP compatible)
CRSD = Contextual Recurrent Spectral Dual-Memory
A hybrid neural sequence model combining:
-
Recurrent Reservoir Dynamics
Continuous-time, parameterized updates mixing current input, previous hidden state, and an internal reservoir for fine-grained temporal modeling (efficient local recurrence). -
Spectral Dual Transform
Each hidden state passes through LayerNorm → FFT → iFFT → Linear, enabling efficient, global mixing in the frequency domain:- O(d log d) complexity
- Gradient- and energy-preserving (unitary)
-
Dual-Memory Retrieval System
Two complementary memory modules:- Hebbian associative memory for long-term key–value correlations:
$$ H \leftarrow \gamma H + \eta (k \otimes v) $$ - Episodic buffer for recent/priority-based experience recall
- Both are queried and adaptively merged via a learnable gate for unified recall
- Hebbian associative memory for long-term key–value correlations:
Tri-domain Integration:
- Time-domain recurrence (short-range)
- Frequency-domain transform (global context)
- Memory-domain retrieval (persistent knowledge)
CRSD merges transformer-level expressivity with RNN-like efficiency and stability; suitable for language modeling, continual learning, and dynamic sequence reasoning.
CRSD_model/
│
├── models/ # Model architectures (CRSDCell, memory modules, etc.)
├── data/ # Data loaders, preprocessing, tokenization
├── utils/ # Helper functions, training utilities
├── scripts/ # Experiment scripts and entrypoints
├── experiments/ # Experiment configs (YAML)
├── train.py # Training/evaluation loop
└── README.md
- Original S4 (Structured State Space Models)
- Recurrent Memory Transformers
- Frequency Domain Sequence Models
- Mamba: Linear-time SSMs
- RetNet: Retentive Networks
For questions, suggestions, or collaborations:
- GitHub: navaneet625
- Issues & discussions: GitHub Issues
This project is licensed under the MIT License. See the LICENSE file for details.