Skip to content
Jorge MF edited this page May 14, 2020 · 14 revisions

CTRL: A Conditional Transformer Language Model for Controllable Generation [code] (Sept 2019)
Language transformer model with context text to generate content.

Extreme Language Model Compression with Optimal Subwords and Shared Projections (Sept 2019)
Tiny BERT. Master-student training to decrease the BERT model.

Large Memory Layers with Product Keys [code] (Jul 2019)
A key-value memory layer that can scale to very large sizes while keeping exact search on the key space.

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context [code] (Jun 2019)
Transformer architecture which adds recurrence to avoid the limitations of the context size. It uses the previous context in a recurrent way.

Language Models are Unsupervised Multitask Learners [code] (Feb 2019)
GPT-2 model of open AI for text

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks [code] (Nov 2018)
Train a network to perform several tasks where initial layers perform some tasks and deeper layers others. That is why is hierarchical.

BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding [code] (Oct 2018)
Method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus, and then use that model for downstream NLP tasks that we care about.

Attention Is All You Need (Dec 2017)
Translation of sequences based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models (Nov 2017)
Model which can plan ahead in the future when it computes its alignments between input and output sequences, constructing a matrix of proposed future alignments and a commitment vector.

A Deep Reinforced Model for Abstractive Summarization (May 2017)
Model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning.

Clone this wiki locally