Language Models with Transformers

## short summary
transformerベースのアーキテクチャ（BERT、GPT）に対して、LSTM層の追加と、構造の自動探索を行うことで、LSTMベースの中でのSOTAにperplexityで12pt改善したという研究。
<img width="567" alt="スクリーンショット 2019-04-29 17 45 22" src="https://user-images.githubusercontent.com/17867677/56885154-b24cbe00-6aa6-11e9-8a58-83286b95df2d.png">

#### LSTMの追加
language modelingでは、次の単語を予測するのに、強い文脈情報が必要であるが、transformerでは、そこが曖昧。
→ LSTM層を加える（AddLSTM）

#### ファインチューニング
WikiTextやPenn Tree Bankのようなデータセットは、そこまでサイズが大きくないので、transformerの全層を更新するのは過学習に繋がる。
→ ランダムに層を選択して、固定する。（FixSubset）

#### CAS( Coordinate Architecture Search)
ベースのtransformerモデルに対して、AddLinear, AddLSTM, FixSubsetをランダムに、AddLinearが出るまで実行する。（AddLinearは最終層の追加）
→ 構造の候補を生成
これで生成したものを学習・比較し、最も良いものを最適構造とする。
<img width="307" alt="スクリーンショット 2019-04-29 18 36 51" src="https://user-images.githubusercontent.com/17867677/56888219-71f13e00-6aae-11e9-878d-6c56a5afe938.png">

### 結果
既存のLSTMベースのLMより大幅にいい結果
<img width="518" alt="スクリーンショット 2019-04-29 18 40 49" src="https://user-images.githubusercontent.com/17867677/56888774-ca750b00-6aaf-11e9-9d37-9e7aebc0bb0f.png">

GPT2に対して、学習に要するデータ数が少ないのに匹敵する結果。
<img width="519" alt="スクリーンショット 2019-04-29 18 54 13" src="https://user-images.githubusercontent.com/17867677/56888963-41120880-6ab0-11e9-94f3-2b40f43c63df.png">


## author
Chenguang Wang Mu Li Alexander J. Smola
Amazon Web Services
{chgwang, mli, smola}@amazon.com

## URL
https://arxiv.org/abs/1904.09408

## year
2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language Models with Transformers #53

short summary

LSTMの追加

ファインチューニング

CAS( Coordinate Architecture Search)

結果

author

URL

year

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Language Models with Transformers #53

Description

short summary

LSTMの追加

ファインチューニング

CAS( Coordinate Architecture Search)

結果

author

URL

year

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions