Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction

Official implementation of the ICLR 2026 Workshop ''Recursive Self-Improvement'' paper ''Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction''.

TL;DR: Tiny Recursive Models (TRM) solve jigsaw puzzles that vanilla Transformers cannot, using the same ~0.7M parameter budget — by iteratively refining a latent "thought" vector instead of adding more layers.



t = 0	t = 1	t = 2	t = 3

Overview

We benchmark Tiny Recursive Models (TRM) against standard Encoder-Only Transformers (EOT) on jigsaw puzzle reconstruction across grid sizes from 2×2 to 6×6. Key findings:

Both architectures perform comparably on simple grids (≤3×3, ~95% accuracy)
EOT performance collapses on complex grids; TRM maintains 94.15% accuracy on 5×5
Increasing EOT depth does not recover performance — deeper EOTs fail to converge on 5×5 and 6×6
TRM exhibits "abrupt learning" phase transitions, delayed predictably as puzzle complexity grows

Repository Structure

.
├── dataset.py        # ImagePuzzle dataset logic
├── model.py          # Model definitions
├── puzzle.py         # Fisher–Yates permutation encoding and decoding
├── utils.py          # Factory functions and training step logic
├── main.py           # Training entry point
└── requirements.txt

Setup

1. Install dependencies

pip install -r requirements.txt

2. Configure dataset path

Create a .env file in the project root:

DATASETS=/path/to/your/datasets

Or set the DATASETS variable at utils.py file.

The code expects the COCO dataset at:

$DATASETS/COCO/2017/
├── train/
└── test/

Training

python -m main

Runs are saved to run/<name>/ with the following structure:

run/trm-P5-T16-D128-F512-H8-L2-s8-t2-n3-coco/
├── config.json
├── train/
│   ├── 00.pt
│   └── ...
└── test/
    ├── 00.pt
    └── ...

Each checkpoint contains logits, labels, and (for training checkpoints) model and optimizer state dicts.

Key hyperparameters

Parameter	Description
`puzzle_size`	Grid size (e.g. `5` → 5×5 puzzle)
`tile_size`	Patch size in pixels (default: `16`)
`model_dim`	Transformer hidden dimension
`layer_num`	Number of Transformer layers
`s`	Number of macro-steps of deep-supervision (TRM only)
`t`	Number of outer "thinking" iterations per macro-step (TRM only)
`n`	Number of inner "thinking" iterations per outer loop (TRM only)

Results

Patch-level accuracy across grid sizes

Model	2×2	3×3	4×4	5×5	6×6
EOT 2	96.18	81.49	56.54	44.92	32.27
EOT 4	96.38	92.95	75.93	55.34	47.25
EOT 6	96.34	94.45	80.40	57.14	52.74
EOT 8	96.08	94.60	62.76	57.61	52.95
EOT 12	96.14	94.74	62.82	11.94	9.09
TRM 8	94.46	96.11	94.88	89.43	56.78
TRM 12	94.10	95.47	95.32	94.15	76.14

The Index of EOT shows the number of layers, while TRM means number of macro-steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction

Overview

Repository Structure

Setup

1. Install dependencies

2. Configure dataset path

Training

Key hyperparameters

Results

Patch-level accuracy across grid sizes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
imgs		imgs
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
main.py		main.py
model.py		model.py
puzzle.py		puzzle.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction

Overview

Repository Structure

Setup

1. Install dependencies

2. Configure dataset path

Training

Key hyperparameters

Results

Patch-level accuracy across grid sizes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages