Why Geometry Prevents Hallucination
Sylvain Cormier | Paraxiom Research | January 2026
Residual geometry determines whether reasoning is stable. We show that transformer latent dynamics, operating on unconstrained vector spaces, lack the conserved quantities necessary for bounded inference. This establishes a hierarchy of sufficient conditions:
mHC (Birkhoff) ⊂ ERLHS (Hamiltonian) ⊂ Karmonic (Toroidal + Spectral)
The practical consequence—reduced drift, and thereby reduced hallucination—follows from the geometry when these conditions are satisfied.
We argue that hallucination is not a training data problem, an alignment failure, or an inherent limitation of autoregressive generation. Hallucination is a geometry problem: unconstrained latent dynamics permit arbitrary drift through latent space.
| Level | Adds | Solves |
|---|---|---|
| mHC (Birkhoff polytope) | Bounded mixing | Training stability |
| ERLHS (Hamiltonian) | Conserved flow | Inference coherence |
| Karmonic (Toroidal + Spectral) | Spectral gap | Noise suppression |
Modes that align with the manifold's eigenstructure persist under repeated composition. Non-resonant modes decay as e^(-λt).
Epistemic boundary: Spectral alignment filters, stabilizes, and selects. It does not alone guarantee semantic correctness. A resonant mode may be stably wrong.
- Model: 2-layer transformer, d_model=64, 4 attention heads
- Task: Next-token prediction on sequences with controlled semantic drift (Tonnetz-adjacent valid continuations)
- Conditions: Baseline, mHC, Toroidal, Random (negative control)
- Hardware: CPU only (~4 minutes total)
============================================================
RESULTS SUMMARY
============================================================
Condition | Final Drift | Final Coh.Var | Grad Norm
------------------------------------------------------------
baseline | 0.0100 | 35.76 | 0.27
mhc | 0.0133 | 1010.54 | 1.60
toroidal | 0.0060 | 41.93 | 0.22
random | 0.1673 | 113.88 | 0.78
| Metric | Winner | Interpretation |
|---|---|---|
| Drift Rate | Toroidal (0.006) | 40% lower than baseline, 96% lower than random |
| Grad Norm | Toroidal (0.22) | Most stable training |
| Coherence Var | Baseline (35.8) | But mHC exploded (1010!) |
Random graph masking (same sparsity, no topological structure) has drift rate 0.167 vs toroidal's 0.006.
That's a 28x difference.
This proves:
- It's not "any constraint" that works
- It's specifically topological structure
- Sparsity alone is insufficient; geometry is necessary
- Toroidal constraint reduces long-range semantic jumps under a topology-aligned task
- mHC increases drift slightly despite being more "regularized" — confirms that constraint ≠ structure
- Gradient stability improves under local topological constraints but degrades under global doubly-stochastic coupling
- Baseline minimizes raw hidden-state variance but does not prevent semantic drift; toroidal attention trades a small increase in variance for a substantial reduction in drift
The catastrophic coherence variance under mHC (1010 vs ~40) suggests that doubly-stochastic constraints without spectral or geometric locality introduce global coupling instabilities.
Note: Absolute values are task- and scale-dependent; we report relative trends across conditions.
topological-coherence/
├── cormier_topological_coherence_2026.pdf # Paper (15 pages)
├── cormier_topological_coherence_2026.tex # LaTeX source
├── docs/
│ ├── UNIFIED_THEORY.md # Cross-domain unified theory
│ └── diagrams/ # SVG diagrams
├── experiments/
│ ├── tonnetz_validation.py # Minimal validation experiment
│ └── venv/ # Python environment (not tracked)
├── src/topological_coherence/ # PyPI package source
├── README.md # This file
└── LICENSE # Apache 2.0
- Python 3.8+
- ~500MB disk space for PyTorch
cd experiments
python3 -m venv venv
source venv/bin/activate
pip install torch numpypython tonnetz_validation.pyExpected runtime: ~4 minutes on CPU (no GPU required)
The experiment trains 4 models (baseline, mHC, toroidal, random) and reports:
- Drift rate (lower = better semantic coherence)
- Coherence variance (hidden state stability)
- Gradient norm (training stability)
The Tonnetz is a 2D torus where:
- Horizontal edges connect by perfect fifths
- Vertical edges connect by major thirds
- Diagonal edges connect by minor thirds
We use it as a constructive existence proof of a low-genus manifold with constant spectral gap—not as a claim about semantic universals.
For a d-dimensional torus T^d_N:
λ₁ = 2 - 2cos(2π/N) = Θ(1)
for fixed side length N, independent of total nodes N^d.
Important caveat: This holds for fixed torus side length N. Scaling N reintroduces gap decay as O(1/N²).
Standard transformer components (LayerNorm, softmax temperature, multi-head averaging) provide some implicit spectral filtering. However, none impose topological constraints—they operate pointwise or via soft weighting, not via manifold structure. They smooth without providing a conserved quantity or spectral gap guarantee.
The distinction is between ad-hoc regularization (which helps) and geometric constraint (which bounds).
@misc{cormier2026topological,
author = {Cormier, Sylvain},
title = {Topological Constraints for Coherent Language Models: Why Geometry Prevents Hallucination},
year = {2026},
publisher = {Zenodo},
url = {https://github.com/Paraxiom/topological-coherence}
}| Paper | Topic | Link |
|---|---|---|
| Unified Theory | Conservative composition across ML, blockchain, consensus | docs/UNIFIED_THEORY.md |
| ERLHS | Hamiltonian framework for coherence-preserving ML | DOI: 10.5281/zenodo.17928909 |
| Karmonic Mesh | Spectral consensus on toroidal manifolds | DOI: 10.5281/zenodo.17928991 |
| mHC | Manifold-Constrained Hyper-Connections | arXiv:2512.24880 |
| Graph Signal Processing | Spectral methods on graphs | Shuman et al., 2013 |
M_Tonnetz(i, j) = 1 if d_Tonnetz(i, j) ≤ r
exp(-α · d_Tonnetz(i,j)) otherwise
φ_θ(e) = ( σ(W₁e) mod 1, σ(W₂e) mod 1 )
L_topo = E[(a,b)~co-occur][d_T(φ(a), φ(b))] - λ · E[(a,c)~random][d_T(φ(a), φ(c))]
- Embedding complexity: Mapping tokens to Tonnetz positions requires learning or heuristics
- Recall-coherence tradeoff: Suppressing long-range attention may hurt tasks requiring non-local retrieval
- Task dependence: Optimal radius r and decay rate α are task-dependent
- Scale: Results shown on toy model; validation at scale is future work
- Scale to larger models (7B+ parameters)
- Evaluate on standard benchmarks (TruthfulQA, HaluEval)
- Compare with other geometric constraints (hyperbolic, spherical)
- Develop efficient Tonnetz embedding algorithms
- Investigate task-dependent optimal topology
Apache 2.0
- Author: Sylvain Cormier
- Email: sylvain@paraxiom.org
- Organization: Paraxiom Research
"Geometric constraints provide one principled path to coherent artificial intelligence—not the only path, but a formally grounded one."