CoTA: Compression Transformer Autoencoder

CoTA introduces a novel approach to representing sequential data. Instead of relying on a single vector, CoTA utilizes a variable number of embedding vectors. This method enhances the preservation of details and maintains the order more effectively.


Typical encoder	CoTA encoder

Unlike typical transformer encoders, the CoTA encoder selects which vectors to retain after applying the attention mechanism. The model is trained to accurately reconstruct the sequence while minimizing the number of embeddings.

CoTA can be easily integrated into any Transformer encoder model without much effort. This method doesn't require introducing any additional parameters. Instead, it uses the zero element of the vectors after the attention mechanism to decide whether to keep or eliminate a vector. Vector elimination can be implemented either by using an attention mask or by directly removing vectors from the tensor.


Decoder uses embedding to infer the masked token	Decoder has enough context to infer the masked token without embedding

Originally, CoTA was trained for next-token prediction. Several research groups studying transformer autoencoders have noted the issue of an overly powerful decoder, where the decoder can infer the token without relying on the embedding. To mitigate this problem, we restricted the decoder's visibility to just 5 tokens during training.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
checkpoint		checkpoint
dataset		dataset
log		log
paper		paper
readme/images		readme/images
.gitignore		.gitignore
README.md		README.md
batch.py		batch.py
config.py		config.py
inference.py		inference.py
metric.py		metric.py
model.py		model.py
plot.py		plot.py
prepare.py		prepare.py
prompt.py		prompt.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoTA: Compression Transformer Autoencoder

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CoTA: Compression Transformer Autoencoder

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages