Relaxed Recursive Transformers

Implementation of RRT-LoRA by Google DeepMind on TinyLlama.

basics

regular transformer:

$h_t^l = f(h_t^{l-1}; \Phi^l)$

recursive version with L layers and B blocks:

$h_t^l = f(h_t^{l-1}; \Phi^\prime_{((l-1) \bmod L/B + 1)})$

relaxed version with learnable $W^\prime$ shared representation and LoRA

the rrt:

$h_t^l = f(h_t^{l-1}; \Phi^\prime_{((l-1) \bmod L/B + 1)}, \Delta\Phi^{\prime l})$

for each weight matrix at each layer:

$h = W^\prime x + BAx$ where:

$W^\prime$ is learned shared weights
$BA$ is position-specific LoRA (initialized via SVD)

init and training process

compute residuals between original and tied for each position:

$R^l = W^l - W^\prime_{((l-1) \bmod L/B + 1)}$
get initial LoRA weights via truncated SVD:

$U_r^l, \Sigma_r^l, V_r^l = \text{TruncatedSVD}(R^l; r)$
- $B^l = U_r^l \Sigma_r^l$
- $A^l = (V_r^l)^T$
during training:
- forward: $h = W^\prime x + B^lA^lx$
- backward: update BOTH $W^\prime$ AND $B^l,A^l$ matrices
- $W^\prime$ learns optimal shared representation
- $B^l,A^l$ learn position-specific adjustments

so the final learned mapping approximates:

$W^l \approx W^\prime_{((l-1) \bmod L/B + 1)} + B^lA^l$

References

@inproceedings{Bae2025Relaxed,
    title={Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA},
    author={Sangmin Bae and Adam Fisch and Hrayr Harutyunyan and Ziwei Ji and Seungyeon Kim and Tal Schuster},
    booktitle={International Conference on Learning Representations},
    year={2025}
}

How to Cite

If you use this implementation in your research, please cite it as follows:

@software{avram_rrt_lora_2024,
  author = {Avram Djordjevic},
  title = {rrt-lora: An Implementation of Relaxed Recursive Transformers},
  year = {2024},
  url = {https://github.com/avramdj/rrt-lora}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
example.py		example.py
model.py		model.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Relaxed Recursive Transformers

basics

relaxed version with learnable $W^\prime$ shared representation and LoRA

init and training process

References

How to Cite

About

Uh oh!

Releases

Packages

Uh oh!

Languages

avramdj/rrt-lora

Folders and files

Latest commit

History

Repository files navigation

Relaxed Recursive Transformers

basics

relaxed version with learnable $W^\prime$ shared representation and LoRA

init and training process

References

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages