LOCI (Learning Ongoingly through Concept Induction) is a contamination-resistant benchmark measuring in-context concept formation in language models. It was developed for the Kaggle "Measuring Progress Toward AGI" competition (Learning track).
Each episode presents a model with six labeled examples from a synthetic world and asks it to classify eight unlabeled queries. A hidden category is defined by an exact symbolic rule, but the model never sees the rule or the real attribute names — all categorical attributes and values are replaced with episode-local nonce tokens. The only usable information is the relational structure inside the episode.
This design ensures that pretraining recall is useless: the model must induce the concept from local evidence.
| Task | What It Measures |
|---|---|
| Core Acquisition | Single-turn concept induction from 6 support examples |
| Hard-Split Generalization | Low-ambiguity subset with zero support-consistent competitor rules |
| Delayed Retention | Same concepts, but with 3 distractor turns inserted before queries |
- Disjunctions are structurally hardest for every model and for a deterministic structured baseline — models systematically collapse OR rules into conjunction-like hypotheses
- Current frontier models reach up to 83.1% query accuracy but only 53.8% exact-episode rate on core acquisition
- A deterministic hypothesis-testing baseline outperforms all six evaluated models
LOCI_kaggle_tasks_v1_1/ # Benchmark task code and data
data/ # Evaluation splits (public_dev + private_test per task)
data_generation/ # Reference generation scripts
notebooks/ # Kaggle Benchmarks task notebooks
paper/v6/ # Research paper (LaTeX source, figures, tables, backing data)
writeup/ # Kaggle competition writeup
Gemini 2.5 Flash, Claude Sonnet 4.6, DeepSeek-R1, Claude Haiku 4.5, DeepSeek V3.2, Gemma 3 27B
The benchmark runs on the Kaggle Benchmarks platform. See LOCI_kaggle_tasks_v1_1/README.md for setup instructions.
CC0 (as required by competition rules). See LICENSE.
If you use LOCI in your research, please cite:
@misc{bahrani2026loci,
title={LOCI: A Benchmark for Synthetic Concept Induction During Evaluation},
author={Bahrani, Homam},
year={2026}
}