LOCI: A Benchmark for Synthetic Concept Induction During Evaluation

LOCI (Learning Ongoingly through Concept Induction) is a contamination-resistant benchmark measuring in-context concept formation in language models. It was developed for the Kaggle "Measuring Progress Toward AGI" competition (Learning track).

What LOCI Tests

Each episode presents a model with six labeled examples from a synthetic world and asks it to classify eight unlabeled queries. A hidden category is defined by an exact symbolic rule, but the model never sees the rule or the real attribute names — all categorical attributes and values are replaced with episode-local nonce tokens. The only usable information is the relational structure inside the episode.

This design ensures that pretraining recall is useless: the model must induce the concept from local evidence.

Three Tasks

Task	What It Measures
Core Acquisition	Single-turn concept induction from 6 support examples
Hard-Split Generalization	Low-ambiguity subset with zero support-consistent competitor rules
Delayed Retention	Same concepts, but with 3 distractor turns inserted before queries

Key Findings

Disjunctions are structurally hardest for every model and for a deterministic structured baseline — models systematically collapse OR rules into conjunction-like hypotheses
Current frontier models reach up to 83.1% query accuracy but only 53.8% exact-episode rate on core acquisition
A deterministic hypothesis-testing baseline outperforms all six evaluated models

Repository Structure

LOCI_kaggle_tasks_v1_1/     # Benchmark task code and data
  data/                     # Evaluation splits (public_dev + private_test per task)
  data_generation/          # Reference generation scripts
  notebooks/                # Kaggle Benchmarks task notebooks
paper/v6/                   # Research paper (LaTeX source, figures, tables, backing data)
writeup/                    # Kaggle competition writeup

Evaluated Models

Gemini 2.5 Flash, Claude Sonnet 4.6, DeepSeek-R1, Claude Haiku 4.5, DeepSeek V3.2, Gemma 3 27B

Running the Benchmark

The benchmark runs on the Kaggle Benchmarks platform. See LOCI_kaggle_tasks_v1_1/README.md for setup instructions.

License

CC0 (as required by competition rules). See LICENSE.

Citation

If you use LOCI in your research, please cite:

@misc{bahrani2026loci,
  title={LOCI: A Benchmark for Synthetic Concept Induction During Evaluation},
  author={Bahrani, Homam},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LOCI: A Benchmark for Synthetic Concept Induction During Evaluation

What LOCI Tests

Three Tasks

Key Findings

Repository Structure

Evaluated Models

Running the Benchmark

License

Citation

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
LOCI_kaggle_tasks_v1_1		LOCI_kaggle_tasks_v1_1
paper/v6		paper/v6
writeup		writeup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LOCI: A Benchmark for Synthetic Concept Induction During Evaluation

What LOCI Tests

Three Tasks

Key Findings

Repository Structure

Evaluated Models

Running the Benchmark

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages