A novel neural network framework combining sparsity, ternary weights, and toroidal topology to reduce hallucinations in language models.
- Ternary Weights {-1, 0, +1} - 10-20x model compression via BitNet-style quantization
- Toroidal Topology - Tonnetz-based attention bias inspired by musical pitch space
- Sparse Activations - ~2% active units following SDR principles
- Temporal Context - Decaying memory for sequence coherence
- 67-80% Hallucination Reduction - Empirically validated on Mistral-7B and Phi-2
This is a research prototype implementing the SMDR architecture:
| Component | Status |
|---|---|
| Forward pass (inference) | Complete |
| Toroidal topology | Complete |
| Ternary weights | Complete |
| Sparse attention | Complete |
| Temporal context | Complete |
| Training (backprop with STE) | Complete |
| Evaluation metrics | Complete |
| Model serialization | Complete |
Training uses the Straight-Through Estimator (STE) for ternary weight gradients, allowing end-to-end backpropagation through quantization.
# Build
cargo build --release
# Quick evaluation - compare topology vs no-topology
cargo run --release --example quick_eval
# Full benchmark with hyperparameter search
cargo run --release --example benchmark -- --compare
# Visualize the Tonnetz topology and attention patterns
cargo run --release --bin smdr-visualize
# Run inference with random weights (demonstrates architecture)
cargo run --release --bin smdr-infer -- --interactive
# Training example
cargo run --release --example simple_trainThe validated use case for SMDR is adding toroidal attention bias to existing transformer models:
# Python pseudo-code for applying SMDR to HuggingFace models
def apply_tonnetz_bias(attention_scores, seq_len, radius=2.0, alpha=1.0):
"""Add Tonnetz topology bias to attention scores"""
bias = compute_tonnetz_bias_matrix(seq_len, radius, alpha)
return attention_scores + bias
# Apply only to final 1/3 of layers (layer_late strategy)
for i, layer in enumerate(model.layers):
if i >= len(model.layers) * 2 // 3:
layer.attention = wrap_with_tonnetz(layer.attention)This approach achieved 67-80% hallucination reduction on Mistral-7B-Instruct.
Input Tokens
│
▼
┌─────────────────────────────────┐
│ Embedding (Ternary Weights) │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ SMDR Layer 0 │ ◄── No topology
│ ┌─────────────────────────┐ │ (learn patterns)
│ │ Toroidal Attention │ │
│ │ + Sparse Softmax │ │
│ └─────────────────────────┘ │
│ ┌─────────────────────────┐ │
│ │ FFN (Ternary + GELU) │ │
│ └─────────────────────────┘ │
└─────────────────────────────────┘
│
▼
... (layers 1 to n-1)
│
▼
┌─────────────────────────────────┐
│ SMDR Layer n (final) │ ◄── With topology
│ ┌─────────────────────────┐ │ (constrain outputs)
│ │ Toroidal Attention │ │
│ │ + Tonnetz Bias │────┼──► Suppresses distant
│ │ + Sparse Softmax │ │ attention on torus
│ └─────────────────────────┘ │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Output Projection (Ternary) │
└─────────────────────────────────┘
│
▼
Logits
The Tonnetz is a 2D torus representing musical pitch relationships:
┌───────────────────────────────────┐
│ Perfect Fifths │
│ (+7 semitones) │
│ ◄─────────────────────► │
┌───┼───┬───┬───┬───┬───┬───┬───┬───┐ │
│ │ C │ G │ D │ A │ E │ B │F# │C# │...│ wraps
│ ├───┼───┼───┼───┼───┼───┼───┼───┤ │
M │ │ E │ B │F# │C# │G# │D# │A# │ F │...│
a │ ├───┼───┼───┼───┼───┼───┼───┼───┤ │
j │ │G# │D# │A# │ F │ C │ G │ D │ A │...│
o │ ├───┼───┼───┼───┼───┼───┼───┼───┤ │
r │ │ C │ G │ D │ A │ E │ B │F# │C# │...│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘ │
3rds │ wraps around │
└───────────────────────────────────┘
Attention is biased by toroidal distance - tokens "close" on the torus attend more strongly.
use smdr::{SMDRConfig, SMDRModel};
// Predefined sizes
let small = SMDRConfig::small(); // ~20M params
let medium = SMDRConfig::medium(); // ~100M params
let large = SMDRConfig::large(); // ~350M params
// Custom configuration
let config = SMDRConfig {
d_model: 512,
n_heads: 8,
n_layers: 6,
d_ff: 2048,
vocab_size: 32000,
tonnetz_radius: 2.0, // Local attention radius
tonnetz_alpha: 1.0, // Decay strength
weight_sparsity: 0.3, // 30% non-zero weights
..Default::default()
};
let model = SMDRModel::new(config);SMDR supports end-to-end training with ternary weights using the Straight-Through Estimator (STE):
use smdr::{TrainableConfig, SMDRTrainer};
// Configure trainable model
let config = TrainableConfig {
d_model: 64,
n_heads: 4,
n_layers: 2,
d_ff: 256,
vocab_size: 128,
max_seq_len: 256,
ternary_threshold: 0.3,
dropout: 0.0,
tonnetz_radius: 2.0,
tonnetz_alpha: 1.0, // Enable topology (0.0 to disable)
};
// Create trainer with Adam optimizer + warmup
let mut trainer = SMDRTrainer::new(config, 1e-3, warmup_steps, total_steps);
// Training loop
for epoch in 0..epochs {
for (inputs, targets) in batches {
trainer.train_step(&inputs, &targets);
}
println!("Loss: {:.4}", trainer.avg_loss());
}Key features:
- Gradients flow through ternary quantization via STE
- Adam optimizer with configurable learning rate
- Cosine warmup scheduler
- Topology applied only to final 1/3 of layers ("layer_late" strategy)
Built-in evaluation metrics for model comparison:
use smdr::{Evaluator, EvalResults, train_test_split};
// Split data
let (train, test) = train_test_split(sequences, 0.2, 42);
// Evaluate model
let evaluator = Evaluator::new(test);
let results = evaluator.evaluate(&mut model);
results.print();
// Output:
// Loss: 2.3456
// Perplexity: 10.42
// Top-1 Accuracy: 25.00%
// Top-5 Accuracy: 65.00%Metrics:
- Perplexity - e^(loss), measures model uncertainty (lower is better)
- Top-1 Accuracy - Correct prediction rate
- Top-5 Accuracy - Target in top 5 predictions
Quick evaluation on Shakespeare text (topology vs no-topology):
| Metric | No Topology | With Topology | Improvement |
|---|---|---|---|
| Perplexity | 10.73 | 10.38 | 3.3% |
| Top-5 Accuracy | 63.2% | 67.2% | +4.0% |
Run your own comparison: cargo run --release --example quick_eval
Save and load models in multiple formats:
use smdr::{ModelWeights, ModelFormat};
// Save model
let weights = ModelWeights::from_model(&model);
weights.save("model.safetensors")?; // Auto-detects format
// Or specify format explicitly
weights.save_format("model.msgpack", ModelFormat::MessagePack)?;
// Load model
let loaded = ModelWeights::load("model.safetensors")?;
let model = loaded.to_model();Supported formats:
| Format | Extension | Description |
|---|---|---|
| Safetensors | .safetensors |
HuggingFace standard, memory-mapped |
| MessagePack | .msgpack |
Compact binary, cross-language |
| JSON | .json |
Human-readable, debugging |
| Bincode | .bin |
Rust native, fast |
Compare format efficiency:
use smdr::compare_formats;
compare_formats(&weights, "/tmp");
// Output: file sizes and load times for each formatSMDR emerged from experiments applying toroidal topology constraints to attention mechanisms in transformer models. Key findings:
| Model | Hallucination Reduction |
|---|---|
| Mistral-7B-Instruct | 67-80% |
| Phi-2 | 48% |
| TinyLlama | No effect |
| Gemma-2B | No effect |
The "layer_late" strategy (applying topology only to final 1/3 of layers) proved optimal, achieving:
- 72% ± 6% average hallucination reduction
- Minimal accuracy loss on factual queries
- No additional inference cost
SMDR unifies three orthogonal dimensions of neural network optimization:
- Sparsity - SDR-inspired ~2% activation density
- Quantization - BitNet-style ternary weights
- Topology - Tonnetz-based attention geometry
The combination creates a "semantic manifold" where related concepts cluster naturally, reducing hallucination through geometric constraint rather than additional training.
SMDR dramatically reduces the carbon footprint of AI inference:
┌────────────────────────────────────────────────────────────────┐
│ CARBON EFFICIENCY STACK │
├────────────────────────────────────────────────────────────────┤
│ Rust (vs Python) │ 10-40x more efficient │
│ Ternary weights (vs FP32)│ 10-20x less memory & compute │
│ Toroidal topology │ 2-5x less attention overhead │
│ Sparse activations (2%) │ 20-50x fewer operations │
├────────────────────────────────────────────────────────────────┤
│ COMBINED │ 100-500x energy reduction │
│ GPU REQUIRED │ NO - CPU-native operations │
└────────────────────────────────────────────────────────────────┘
| GPU Feature | SMDR Reality |
|---|---|
| Tensor cores (FP16 MAC) | No multiplications needed |
| HBM bandwidth | Weights fit in CPU cache |
| Dense parallelism | Sparse, irregular patterns |
| 300-700W TDP | 15-65W CPU sufficient |
| System | Energy/Inference | CO₂/year (1M/day) |
|---|---|---|
| Cloud GPU API | ~2000 J | ~365 tons |
| Local RTX 4090 | ~225 J | ~100 tons |
| SMDR (CPU) | ~1.5 J | ~0.3 tons |
We don't just reduce hallucinations - we reduce carbon emissions.
| Working Group | SMDR Contribution |
|---|---|
| WG 1 (Foundations) | Efficient on-chain inference |
| WG 3 (Smart Contracts) | Lightweight oracle computations |
| JWG 4 (Security) | Topology reduces hallucination attacks |
| AHG 4 (Carbon Markets) | Low-carbon AI verification |
UN SDG alignment: 7 (Clean Energy), 9 (Innovation), 13 (Climate Action)
See: ISO/TC 307
MIT
@software{smdr2026,
title = {SMDR: Sparse Meta Distributed Representations},
author = {Cormier, Sylvain},
year = {2026},
url = {https://github.com/sylvaincormier/smdr}
}