Word RNN for Sentence Completion

A pytorch implementation of the word-level recurrent neural network for sentence completion. The code is based on Word-level language modeling RNN, and importance sampling module is from PyTorch Large-Scale Language Model.

Requirements

torchvision >= 0.2.0
torch >= 0.3.0.post4
numpy >= 1.13.3
pandas >= 0.21.0
nltk >= 3.2.5
tqdm >= 4.19.5
Cython >= 0.27.3

pip3 install -r requirements.txt

Setup

Build Log_Uniform Sampler according to Link.
Download punkt package in nltk.

Datasets

Microsoft Research Sentence Completion Challenge - Training and Test dataset can be downloaded from Link. Store the downloaded test data in ./data/completion/.
Scholastic Aptitude Test sentence completion questions - Collected questions are provided in ./data/completion/SAT_set_filled.csv.
Nineteenth century novels (19C novels) - Extract ./data/prepro/guten.tgz of preprocessed files.
One Billion Word Benchmark (1B word) - Link

Run

Training

python3 train.py --cuda --save_dir mynet

Default arguments are set for training with 19C novels. Argument settings for training with the 1B word benchmark are presented in the following table.

Argument	19C novels	1B word
corpus	guten	gbw
emsize	200	500
nhid	600	2000
outsize	400	500
lr	0.5	1.0
decay_after	5	1
decay_rate	0.5	0.8
batch_size	20	100
nsampled	-1	8192

Sentence completion

python3 sent_cmplt.py --cuda --save_dir mynet

Results

corpus	bidirec	MSR accuracy	SAT accuracy
guten	False	69.4 (0.8)*	29.6 (1.5)*
guten	True	72.3 (1.1)*	33.3 (2.0)*
gbw	False	63.2	66.5
gbw	True	64.1	69.1

*The mean accuracy of five networks trained with different random initializations is shown with the standard deviation in parentheses.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
log_uniform		log_uniform
README.md		README.md
config.py		config.py
input.py		input.py
model.py		model.py
requirements.txt		requirements.txt
sent_complt.py		sent_complt.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word RNN for Sentence Completion

Requirements

Setup

Datasets

Run

Training

Sentence completion

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ctr4si/sentence-completion

Folders and files

Latest commit

History

Repository files navigation

Word RNN for Sentence Completion

Requirements

Setup

Datasets

Run

Training

Sentence completion

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages