SEQ3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression

## short summary
wordを離散潜在変数とした、seq2seqが２つ連なった機構であるsequence-to-sequence-to-sequence autoencoder(seq3)を提案し、教師なしabstract翻訳に適用。

<img width="374" alt="スクリーンショット 2019-05-04 0 35 48" src="https://user-images.githubusercontent.com/17867677/57150131-67190f00-6e08-11e9-9b9a-fbe9bdba1dc6.png">

### architecture
<img width="321" alt="スクリーンショット 2019-05-04 1 23 08" src="https://user-images.githubusercontent.com/17867677/57151240-5b7b1780-6e0b-11e9-9d2c-3c22c9af8afe.png">

まずxを入力としてCompressorで要約文yを生成し、Reconstructorでyからxを復元する。
yを得るとき、通常だとカテゴリカル分布からサンプリングする必要があるので、微分不可能な処理となる。そこで、gumbel softmaxを用いて、サンプリングを近似する。
<img width="346" alt="スクリーンショット 2019-05-04 1 09 25" src="https://user-images.githubusercontent.com/17867677/57150445-48ffde80-6e09-11e9-8ad5-edd5ddaf0a88.png">
ただし、これはあくまで複数の単語のembeddingの重み付き和なので、実際の単語と１対１対応はしていない。そのため、StraightThrough estimator (https://arxiv.org/pdf/1308.3432.pdf) を用いて、forward時はeをargmaxをとって離散化するが、backward時にはgumbel-softmaxを用いて勾配を計算する。この手法はforward時とbackward時に乖離があるが、実際うまくいく。

### loss
上２つがSEQ3の一般的なロスで、下２つが要約特化のロス。

- reconstruction loss
  - 元の文を再現できるかのロス
- LM prior loss
  - summary yが文としてreadableになるように、LSTMベースの言語モデルを用いる。言語モデルの出力する確率分布と、Compressorが出力する確率分布のKLダイバージェンスをロスとする。
- topic loss
  - 入力xと要約yが同じトピックを持つように、tf-idfで重み付けしたxとyのembeddingのコサイン類似度をロスとしてとる。
- length penalty
  - yの出力長Mを超える出力についてEOSとの間でロスをとる。

### 結果
<img width="824" alt="スクリーンショット 2019-05-04 1 42 35" src="https://user-images.githubusercontent.com/17867677/57152268-eb21c580-6e0d-11e9-9061-4d699435d73c.png">
Gigaword sentence compression datasetでの検証で他の教師なし手法を上回った。
また、出力例では、文頭が入力と一緒のものが多かったが、これは、各出力単語はその前の単語に依存するため、最初で間違えると全部間違えてしまうことから、Compressorが入力の文頭をコピーするように学習したと考えられる。

## author
Christos Baziotis1,2
, Ion Androutsopoulos2
, Ioannis Konstas3
, Alexandros Potamianos1
1 School of ECE, National Technical University of Athens, Athens, Greece
2 Department of Informatics, Athens University of Economics and Business, Athens, Greece
3
Interaction Lab, School of Math. and Comp. Sciences, Heriot-Watt University, Edinburgh, UK
cbaziotis@mail.ntua.gr, ion@aueb.gr
i.konstas@hw.ac.uk, potam@central.ntua.gr

## URL
https://arxiv.org/pdf/1904.03651.pdf

## year
NAACL2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEQ3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression #57

short summary

architecture

loss

結果

author

URL

year

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SEQ3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression #57

Description

short summary

architecture

loss

結果

author

URL

year

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions