PyTorch Transformer

A simple clean-readable and shape-annotated implementation of Attention is All You Need in PyTorch. A sample onnx file can be found in assets/transformer.onnx for visualization purposes.

It was tested on synthetic data, try to use the attention plots to figure out the transformation used to create the data!

Implementation Details

Positional Embeddings not included, similar to nn.Transformer but you can find an implementation in usage.ipynb.
Parallel MultiHeadAttention outperforms the for loop implementation significantly, as expected.
Assumes batch_first=True input by default and cna't be changed.
Uses einsum for attention computation rather than bmm for readability, this might impact performance.

KV Cache Branch

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
transformer		transformer
.gitignore		.gitignore
README.md		README.md
mha-analysis.ipynb		mha-analysis.ipynb
usage.ipynb		usage.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTorch Transformer

Implementation Details

KV Cache Branch

About

Uh oh!

Languages

vnnm404/pytorch-transformer

Folders and files

Latest commit

History

Repository files navigation

PyTorch Transformer

Implementation Details

KV Cache Branch

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages