There are three folders:
- data: contains the dataset in csv and the code for spliting data into training set, validation set, and test set
- NCE: contains pretraining modules, LSTM modules, and LSTM feature extraction.
- classification: contains the Highway Net.
--- step-by-step to run DEEP SE ---(LSTM+Highway)
- put the csv files in /data
- run command "python run_script.py" in /data to divide data and prepare dictionary
- run command "python exp_lstm2v_pre.py" in /NCE for pretraining (this step takes very very long time!!!. It can be skipped since the model has been trained.)
- run commnad "python exp_script.py" in /classification for running DEEP SE
The result is in /classification/log _lstm_highway_dim_reginphid_prefixed_lm_poolmean.txt e.g. appceleratorstudio_lstm_highway_dim10_reginphid_prefixed_lm_poolmean.txt
To run the tool on your own data, you need to change the following to match with your filename:
| Folder | File | Variable | Note |
|---|---|---|---|
| data | run_script | databaseDict | Pairs of dataset filename and its pretrain filename |
| dataPres | Pretrain data file name | ||
| NCE | exp_lstm2v_pre | dataPres | Pretrain data file name |
| classification | exp_script | databaseDict | Pairs of dataset filename and its pretrain filename |
List of code dependencies and install command
beautifulsoup4: pip install BeautifulSoup4
MySQLdb: pip install MySQLdb
json: pip install json
numpy: pip install numpy
pandas: pip install pandas
cPickle: pip install cpickle
SciPy: pip install scipy
scikit-learn: ppip install scikit-learn
Theano, keras
--to be added--
list of hyperparameters for configuration --to be added--
- add a proper story point report (issue key, des, estimated SP)
- check all files, write a run script, remove password from data preprocessing, add classifier files