Skip to content

CRSD is a minimal, research-oriented sequence modeling framework built from scratch to explore state-space models (SSMs) and sequence-to-sequence architectures in PyTorch. It’s designed to be fully reproducible, interpretable, and extensible — suitable both for learning and for building experimental variants such as nonlinear-SSMs, gated decoders

License

Notifications You must be signed in to change notification settings

navaneet625/CRSD_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRSD Minimal Repository

Minimal implementation of the Cue-Revival State Dynamics (CRSD) prototype.


🚀 Demo: Quick Start

Clone the repo and run a tiny experiment:

git clone https://github.com/navaneet625/CRSD_model.git
cd CRSD_model
python train.py --config experiments/exp_crsd_tiny.yaml

🔍 Core Features

  • 🧩 Modular Architecture
    Clean directory structure (models/, data/, utils/, scripts/), facilitating reproducible experiments.

  • 🔡 Tokenization System
    Supports character, word, and subword modes using custom tokenizers:

    • Byte-level
    • Word
    • SentencePiece
  • ⚙️ Dataset Pipeline
    Robust dataloader handles large-scale text corpora (e.g., enwik8_10M.txt).
    Supports automatic data splitting, padding, and batching.

  • 📈 Training Loop
    Complete training & evaluation pipeline:

    • Gradient scaling
    • Checkpointing
    • Metrics: Loss, Accuracy, Bits-Per-Character, Perplexity
  • 🧠 Research Focus
    Designed for rapid experimentation with modern State Space Model (SSM) families:

    • S4, S4D, S5, Mamba, LinOSS, Samba
    • RetNet-inspired decoders
  • 🔬 Debug-Friendly
    Inspection tools for tracing tokenization, data flow, model structure

    • Detects silent data corruption (e.g., PAD flooding)
    • Detailed print/debug modes

🏛️ Dual Memory Subsystems

HebbianMemory

A differentiable associative memory storing feature correlations in a dynamic matrix H of shape (B, d_k, d_v).

Update rule:

H <- gamma * H + eta * outer(k, v)

Where:

  • gamma is a learnable decay parameter
  • eta is a learnable learning rate
  • outer(k, v) denotes the outer product between key k and value v

Features:

  • Cosine-normalized recall for stability
  • Learnable decay (gamma) and learning rate (eta)
  • Optional top-k selective recall
  • Detach-safe memory (can disable gradient flow)
  • Mixed-precision / AMP-friendly

EpisodicBuffer

A prioritized, slot-based episodic memory. Retains important (key, value) pairs and replaces least important when full.

Features:

  • Vectorized priority writing (replace lowest-importance slot)
  • Cosine similarity recall with temperature scaling
  • Optional top-k or window-limited recall
  • Stable normalization (with epsilon for numerical safety)
  • Fully differentiable
  • Efficient for large batch or slot counts (AMP compatible)

🧠 Model Summary: Contextual Recurrent Spectral Dual-Memory (CRSD)

CRSD = Contextual Recurrent Spectral Dual-Memory

A hybrid neural sequence model combining:

  1. Recurrent Reservoir Dynamics
    Continuous-time, parameterized updates mixing current input, previous hidden state, and an internal reservoir for fine-grained temporal modeling (efficient local recurrence).

  2. Spectral Dual Transform
    Each hidden state passes through LayerNorm → FFT → iFFT → Linear, enabling efficient, global mixing in the frequency domain:

    • O(d log d) complexity
    • Gradient- and energy-preserving (unitary)
  3. Dual-Memory Retrieval System
    Two complementary memory modules:

    • Hebbian associative memory for long-term key–value correlations:
      $$ H \leftarrow \gamma H + \eta (k \otimes v) $$
    • Episodic buffer for recent/priority-based experience recall
    • Both are queried and adaptively merged via a learnable gate for unified recall

Tri-domain Integration:

  • Time-domain recurrence (short-range)
  • Frequency-domain transform (global context)
  • Memory-domain retrieval (persistent knowledge)

CRSD merges transformer-level expressivity with RNN-like efficiency and stability; suitable for language modeling, continual learning, and dynamic sequence reasoning.


📁 Directory Structure

CRSD_model/
│
├── models/        # Model architectures (CRSDCell, memory modules, etc.)
├── data/          # Data loaders, preprocessing, tokenization
├── utils/         # Helper functions, training utilities
├── scripts/       # Experiment scripts and entrypoints
├── experiments/   # Experiment configs (YAML)
├── train.py       # Training/evaluation loop
└── README.md

References


Contact

For questions, suggestions, or collaborations:


📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

About

CRSD is a minimal, research-oriented sequence modeling framework built from scratch to explore state-space models (SSMs) and sequence-to-sequence architectures in PyTorch. It’s designed to be fully reproducible, interpretable, and extensible — suitable both for learning and for building experimental variants such as nonlinear-SSMs, gated decoders

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages