Official implementation of the ICLR 2026 Workshop ''Recursive Self-Improvement'' paper ''Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction''.
TL;DR: Tiny Recursive Models (TRM) solve jigsaw puzzles that vanilla Transformers cannot, using the same ~0.7M parameter budget — by iteratively refining a latent "thought" vector instead of adding more layers.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| t = 0 | t = 1 | t = 2 | t = 3 |
We benchmark Tiny Recursive Models (TRM) against standard Encoder-Only Transformers (EOT) on jigsaw puzzle reconstruction across grid sizes from 2×2 to 6×6. Key findings:
- Both architectures perform comparably on simple grids (≤3×3, ~95% accuracy)
- EOT performance collapses on complex grids; TRM maintains 94.15% accuracy on 5×5
- Increasing EOT depth does not recover performance — deeper EOTs fail to converge on 5×5 and 6×6
- TRM exhibits "abrupt learning" phase transitions, delayed predictably as puzzle complexity grows
.
├── dataset.py # ImagePuzzle dataset logic
├── model.py # Model definitions
├── puzzle.py # Fisher–Yates permutation encoding and decoding
├── utils.py # Factory functions and training step logic
├── main.py # Training entry point
└── requirements.txt
pip install -r requirements.txtCreate a .env file in the project root:
DATASETS=/path/to/your/datasetsOr set the DATASETS variable at utils.py file.
The code expects the COCO dataset at:
$DATASETS/COCO/2017/
├── train/
└── test/
python -m mainRuns are saved to run/<name>/ with the following structure:
run/trm-P5-T16-D128-F512-H8-L2-s8-t2-n3-coco/
├── config.json
├── train/
│ ├── 00.pt
│ └── ...
└── test/
├── 00.pt
└── ...
Each checkpoint contains logits, labels, and (for training checkpoints) model and optimizer state dicts.
| Parameter | Description |
|---|---|
puzzle_size |
Grid size (e.g. 5 → 5×5 puzzle) |
tile_size |
Patch size in pixels (default: 16) |
model_dim |
Transformer hidden dimension |
layer_num |
Number of Transformer layers |
s |
Number of macro-steps of deep-supervision (TRM only) |
t |
Number of outer "thinking" iterations per macro-step (TRM only) |
n |
Number of inner "thinking" iterations per outer loop (TRM only) |
The Index of EOT shows the number of layers, while TRM means number of macro-steps.








