Skip to content

Autoweka: evaluation results too good to be true #99

@rbvidal

Description

@rbvidal

Hi,

I am doing text classification with AutoWeka , and seem to have the same problem as described at #50

The classifier classified 100% of the instances correctly. That must be a mistake, as the whole dataset has only 156 texts divided into 5 categories.

Here is the Autoweka output:

Auto-WEKA result:
best classifier: weka.classifiers.meta.AdaBoostM1
arguments: [-P, 82, -I, 45, -Q, -S, 1, -W, weka.classifiers.trees.RandomForest, --, -I, 38, -K, 0, -depth, 14]
attribute search: null
attribute search arguments: []
attribute evaluation: null
attribute evaluation arguments: []
metric: errorRate
estimated errorRate: 0.0
training time on evaluation dataset: 0.83 seconds

You can use the chosen classifier in your own code as follows:

Classifier classifier = AbstractClassifier.forName("weka.classifiers.meta.AdaBoostM1", new String[]{"-P", "82", "-I", "45", "-Q", "-S", "1", "-W", "weka.classifiers.trees.RandomForest", "--", "-I", "38", "-K", "0", "-depth", "14"});
classifier.buildClassifier(instances);

Correctly Classified Instances 156 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 156

=== Confusion Matrix ===

a b c d e <-- classified as
72 0 0 0 0 | a = 1900-tallet_–Nyrealisme_og_modernisme
0 55 0 0 0 | b = 1855-1900
Realisme_og_naturalisme
0 0 9 0 0 | c = 1840-1860
Nasjonalromantikk
0 0 0 12 0 | d = 1890-årene
Nyromantikk
0 0 0 0 8 | e = 1700-tallet
–_Opplysningstid_og_klassisime

=== Detailed Accuracy By Class ===

             TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
             1,000    0,000    1,000      1,000    1,000      1,000    1,000     1,000     1900-tallet_–_Nyrealisme_og_modernisme
             1,000    0,000    1,000      1,000    1,000      1,000    1,000     1,000     1855-1900_–_Realisme_og_naturalisme
             1,000    0,000    1,000      1,000    1,000      1,000    1,000     1,000     1840-1860_–_Nasjonalromantikk
             1,000    0,000    1,000      1,000    1,000      1,000    1,000     1,000     1890-årene_–_Nyromantikk
             1,000    0,000    1,000      1,000    1,000      1,000    1,000     1,000     1700-tallet_–_Opplysningstid_og_klassisime

Weighted Avg. 1,000 0,000 1,000 1,000 1,000 1,000 1,000 1,000

------- 2 BEST CONFIGURATIONS -------

These are the 2 best configurations, as ranked by SMAC
Please note that this list only contains configurations evaluated on at least 10 folds,
If you need more configurations, please consider running Auto-WEKA for a longer time.

Configuration #1:
SMAC Score: 0.23874999999999996
Argument String:
-_0__wekaclassifiersmetaadaboostm1_00_p_HIDDEN 1 -_0__wekaclassifiersmetaadaboostm1_02_2_INT_P 82 -_0__wekaclassifiersmetaadaboostm1_03_INT_I 45 -_0__wekaclassifiersmetaadaboostm1_04_Q REMOVED -_0__wekaclassifiersmetaadaboostm1_05_S 1 -_1_W weka.classifiers.trees.RandomForest -_1_W_0_DASHDASH REMOVED -_1_W_1__wekaclassifierstreesrandomforest_00_INT_I 38 -_1_W_1__wekaclassifierstreesrandomforest_01_features_HIDDEN 0 -_1_W_1__wekaclassifierstreesrandomforest_02_1_INT_K 0 -_1_W_1__wekaclassifierstreesrandomforest_04_depth_HIDDEN 1 -_1_W_1__wekaclassifierstreesrandomforest_06_2_INT_depth 14 -attributesearch NONE -attributetime 180.0 -targetclass weka.classifiers.meta.AdaBoostM1

Configuration #2:
SMAC Score: 0.24416666666666664
Argument String:
-_0__wekaclassifiersfunctionssimplelogistic_00_S REMOVED -_0__wekaclassifiersfunctionssimplelogistic_01_W_HIDDEN 0 -_0__wekaclassifiersfunctionssimplelogistic_02_1_W 0 -_0__wekaclassifiersfunctionssimplelogistic_04_A REMOVED -attributesearch NONE -attributetime 180.0 -targetclass weka.classifiers.functions.SimpleLogistic

----END OF CONFIGURATION RANKING----

Temporary run directories:
/var/folders/42/bq9r97tx5fg78j5d1k_td0pr0000gn/T/autoweka4462961014404761952/
/var/folders/42/bq9r97tx5fg78j5d1k_td0pr0000gn/T/autoweka6011276688207838292/
/var/folders/42/bq9r97tx5fg78j5d1k_td0pr0000gn/T/autoweka16926395897855063430/
/var/folders/42/bq9r97tx5fg78j5d1k_td0pr0000gn/T/autoweka16685207695737687558/

For better performance, try giving Auto-WEKA more time.
Tried 368 configurations; to get good results reliably you may need to allow for trying thousands of configurations.


Are those results reliable ??

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions