Enhancing text to graph representation by edge weighting and filtering through word association measures
- Polarity
- WEBKB
- R8
- 20 Newsgroups
We implemented auxiliary scripts in the scripts/ directory. To use these scripts, copy them to the project root directory (this directory). Example:
cp scripts/feature_runner.sh .The implementation was constructed using Python 3.6.13
There is a optional step, to use a virtual environment before install the dependencies:
Install the virtualenv package:
pip install virtualenvCreate a new environment:
virtualenv venvActivate the new environment:
source venv/bin/activateThe dependencies are listed in the requirements.txt file. To install, run:
pip install -r requirements.txtTo run the proposed approach:
./feature_runner.shTo run the baseline:
./baseline_feature_runner.shmprof run --output <time_mem_output_file.dat> --interval 60 feature_generator.py --dataset <dataset> --strategy <weight_strategy> --window 12 --emb_dim 100 --cut_percent <cut_p>Where:
<time_mem_output_file> : Output file to register time x memory values.
<dataset> : Input dataset: polarity, r8 or webkb
<weight_strategy> : Weight strategy: pmi, pmi_all,llr, llr_all, chi_square, chi_square_all (for proprosed weight approaches) or no_weight (for baseline)
<cut_p> : Cut p: 5, 10, 20, 30, 50, 70, 90 (for the proposed approach) or 0 (for baseline)
To run the proposed approach:
./cnn_runner.shTo run the baseline:
./baseline_cnn_runner.shpython3 cnn_main.py --dataset <dataset> --cut_percent <cut_p> --strategy <weight_strategy> --window 12 --emb_dim 100Where:
<dataset> : Input dataset: polarity, r8 or webkb
<weight_strategy> : Weight strategy: pmi, pmi_all,llr, llr_all, chi_square, chi_square_all (for proprosed weight approaches) or no_weight (for baseline)
<cut_p> : Cut p: 5, 10, 20, 30, 50, 70, 90 (for the proposed approach) or 0 (for baseline)
The results_main.py script file implements the evaluation component. This script calculates - for a dataset and cut p - the 10-fold mean f1 score for all weight strategies and the baseline and also runs the Wilcoxon test comparing each weight strategy with the baseline. These results are printed in the console running the script (for example, the Linux terminal), and also, the results are written to output files in the plots/next_level/<cut p>/<dataset> directory, in txt files name as f1_polarity_12.txt
For example, running the results_main.py script as:
python results_main.py --dataset r8 --emb_dim 100 --window 12 --cut_percent 50will generate the file: plots/next_level/0.05/polarity/f1_polarity_12.txt, containing:
no_weight,77.12544679853906,0.050221219618395326
chi_square,78.55941729972031,0.050606510004414684,p=0.275390625
chi_square_all,79.69882465466792,0.028751551738617296,p=0.10546875
llr,79.39449196564627,0.024537426341991925,p=0.193359375
llr_all,79.65264759486207,0.04178546879605646,p=0.16015625
pmi,79.42541259746658,0.03398688281606517,p=0.16015625
pmi_all,78.89614764450108,0.046800758344722936,p=0.275390625Where:
The first line contains the mean f1 score and the standard deviation for the baseline (no_weight). The other lines contain for each weight strategy the mean f1 score, the standard deviation, and the p-value for the Wilcoxon test compared to the baseline.
The results_main.py runs for a single estimation on a combination of a dataset and cut p. To run the evaluation for all experiments, run:
./plot_f1.sh