Skip to content

kduxin/corrdim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Correlation Dimension of Autoregressive LLMs

Paper NeurIPS Homepage

A fractal-geometric approach to quantifying the epistemological complexity of text as perceived by language models.

This repository contains the implementation for computing correlation dimension, a measure that bridges local and global properties of text generation in autoregressive language models. Unlike perplexity, correlation dimension captures long-range structural complexity and self-similarity patterns, revealing insights into model behavior, hallucination tendencies, and various forms of text degeneration.

Quick Links

πŸ“š Publications

πŸ”— Resources

Features

  • Efficient computation using next-token log-probability vectors
  • Robust to model quantization (down to 4-bit precision or more)
  • Applicable across autoregressive architectures (Transformer, Mamba, etc.)
  • Real-time inference integration

Example: Correlation Integral Curves

The following figure shows correlation integral curves for various pre-trained language models on the "Newton's Philosophy" article from the Stanford Encyclopedia of Philosophy, compared to i.i.d. Gaussian noise:

Correlation Integral Curves

Models shown: GPT2-1.5B, Mamba-2.8B, Pythia-12B, Falcon3-10B, OpenLlama-13B, Yi1.5-34B


Code release coming soon. πŸš€

About

Correlation dimension of autoregressive LLMs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published