This repository host the code base for the experiments reported in the article "An Improved Text Classification Modelling Approach to Identify Security Messages in Heterogeneous Projects" accepted at the Software Quality Journal.
Details of code setup and experiments
- Extract data.zip
- Unzip the file
- Edit the experiment.prop file in the folder “fsecextplugin” (i.e. Step 2. below)
- cd fsecextplugin
- run the command: java -jar fsecextplugin.jar -config experiment.prop
The very first step is to set up a configuration file. This is not a hassle as a default config file comes handy with the zipped files. Training a classification model is simple once the parameters for the algorithms are setup. The parameters in the “experiment.prop” file are described as shown below.
DATA_PATH=data
HEADER=true
SEPARATOR=;
TRAIN_SIZE=0.90
specify how many times we should train. This concerns training. Best model is selected for validation stage
NUM_EXP=1
CLASS_INDEX=1
specify the ratios (separated by comma) for sampling minority class - SBR : NSBR (SBR is always 1) => 1:0.5, 1:1, 1:1.5, 1:2
CLASS_BALANCE_RATIOS=0,0.5,1,2
algorithms to use for training a model. LR-Logistic Regression,NB-Naive Bayes,KNN-K Nearest Neighbor,SVM-Support Vector Machine,RF-Random Forest
ALGORITHMS=NB,KNN,LR,SVM,RF
FEATURES=TFIDFHigh,Threat,Control,TC,CA,TA,TCA,TCAI
INCLUDE_SEC_FEATURES=yes
name of the folder (separated by comma) containing the train and test csv files. Note: csv files must be named as folder_train.csv (e.g. apache_train.csv) and folder_test.csv (apache_test.csv)
TRAIN_FOLDER_NAMES=derby,wicket,ambari,camel
name of the folders (separated by comma) containing other projects' csv test files for validation. Note: csv files must be named as folder_test.csv (e.g. derby_test.csv)
VALIDATION_FOLDER_NAMES=derby,wicket,ambari,camel
-
File -> import -> Exisiting maven project into workspace
-
Navigate to the "fsecext" and select it
-
Configure the experiment.prop file as above (it uses the same prop file as the cmd line) ####
-
In the main method: uncomment the following lines
-
//String config = "./experiment.prop";
-
//args = new String[2];
-
//args[0] = "-config";
-
//args[1] = config;
-
right click and run
- Results are located in the algorithm folder for each project. e.g. Random forest results will be located in "RF" folder
- classes for the statistics are located in "no.tosin.oyetoyan.experiment.statistics"
- open the StatisticalTests.java
- Generate the statistic data
- Test scripts are located in a sub directory "analysis" and in each project's folder: e.g. for ambari analysis/ambari/scripts/stats.r
- Check the PluginModel.java located in the package no.tosin.oyetoyan.experiment Use cases for integration
- Bug repositories
- Commit repositories
- Project document repositories
- etc.