A fractal-geometric approach to quantifying the epistemological complexity of text as perceived by language models.
This repository contains the implementation for computing correlation dimension, a measure that bridges local and global properties of text generation in autoregressive language models. Unlike perplexity, correlation dimension captures long-range structural complexity and self-similarity patterns, revealing insights into model behavior, hallucination tendencies, and various forms of text degeneration.
- NeurIPS 2025
- π Full Paper (arXiv)
- π― Conference Page
- Physical Review Research (2024)
- π arXiv | APS Journal
- Efficient computation using next-token log-probability vectors
- Robust to model quantization (down to 4-bit precision or more)
- Applicable across autoregressive architectures (Transformer, Mamba, etc.)
- Real-time inference integration
The following figure shows correlation integral curves for various pre-trained language models on the "Newton's Philosophy" article from the Stanford Encyclopedia of Philosophy, compared to i.i.d. Gaussian noise:
Models shown: GPT2-1.5B, Mamba-2.8B, Pythia-12B, Falcon3-10B, OpenLlama-13B, Yi1.5-34B
Code release coming soon. π
