Opinatt: Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN

Code and dataset for the paper "Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN". If you use this code or dataset, please cite the paper.

Clone this repository cd ~; git clone https://github.com/epochx/opinatt
Install required libraries
- Install Theano and Tensorflow: pip install Theano tensorflow-gpu
- Download and install Senna, http://ronan.collobert.com/senna/
- Download and install CoreNLP 3.6, http://stanfordnlp.github.io/CoreNLP/history.html
Go to the opinatt diretory cd ~/opinatt and download the required data there
- Download GoogleNews embeddings from https://code.google.com/archive/p/word2vec/
- Download WikiDeps embeddings from https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
- Download and unzip the SemEval2014 V2 Train data from http://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools
Create the working environment
- Create data folder structure, prepare_data.sh path_where_to_create_data_folder
- Modify opinatt/enlp/settings.py accordingly
Preprocess corpora by running python process_corpus.py
Generate training data in JSON format:
- For aspect extraction run python generate_json.py --json_path path/to/json/file --train LaptopsTrain --test LaptopsTest --embeddings SennaEmbeddings.
- For aspect extraction and sentiment classification using collapsed tags, run python generate_json.py --json_path path/to/json/file --train LaptopsTrain --embeddings SennaEmbeddings --sentiment.
- For joint aspect extraction and sentiment classification run python generate_json.py --json_path path/to/json/file --train LaptopsTrain --embeddings SennaEmbeddings --joint
- When generating the training data for the Youtubean dataset, or when doing so including sentiment labels for the SemEval corpora, make sure not to pass the --test flag, as there are no test sets for those settings. The script will generate the corresponding splits based on the training portion for each case. Add the --strict label to only use sentences with a single sentiment.
Train the models:
- For the baseline: python run_baseline.py --json_path path/to/json/file --results_path path/to/results/folder
- For the attention-RNN: python run.py --json_path path/to/json/file --results_path path/to/results/folder

For more details on how to use the provided scripts use the --help option.

Contact

Feel free to email emarrese@weblab.t.u-tokyo.ac.jp for any pertinent questions/bugs regarding the code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
enlp		enlp
.gitignore		.gitignore
README.md		README.md
gen_attention_figs.py		gen_attention_figs.py
generate_json.py		generate_json.py
prepare_data.sh		prepare_data.sh
process_corpus.py		process_corpus.py
run.py		run.py
run_baseline.py		run_baseline.py
samsung_galaxy_s5.xml		samsung_galaxy_s5.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Opinatt: Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN

About

Uh oh!

Releases

Packages

Languages

epochx/opinatt

Folders and files

Latest commit

History

Repository files navigation

Opinatt: Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages