Skip to content

Evaluation

Marek Sosnicki edited this page Nov 17, 2020 · 2 revisions

Application evaluation

Automated Search Helper was tested on real-life example, authors tried to find all articles related to "Mutation Testing in C++". This section presents the detailed results of the articles download with Automated Search Helper.

General description

The goal of the search was to find all papers related to Mutation Testing in C++ across Digital Libraries (DLs) supported by Automated Search Helper. The search was done in 16.08.2020, the current results may differ but the links are probably still valid. On each DL a search was run to find all articles which relates to Mutation Testing, in most cases the search query was:

"mutation testing" or "mutation analysis" or "mutant analysis"

If it was possible the search was limited to Computer Science related studies. The tool was run simultaneously for all DLs which means that if the same article appeared in multiple it may have been donwloaded only by one. The order of download was randomized.

This evaluation shows how effective the Automated Search Helper is in downloading articles full text and to compare how it works for different DLs.

Results for each digital libraries

This section presents result of the search for each DL, each of them will have the same data format presented as a list of records.

ACM

  • query link: https://dl.acm.org/action/doSearch?AllField=%22mutation+testing%22+or+%22mutation+analysis%22+or+%22mutatnt+analysis%22
  • no of articles from search query: 1007
  • no of entries in input files: 1055
  • additional comments on DL search query:
    • Some articles have two versions e.g., book chapter and conference paper, they appear once in search query but twice in generated bib files
  • no of entries in Automated Search Helper: 977
  • no of entries per status:
    • No access : 12
    • Fail to read : 0
    • Text downloaded from PDF : 929
    • Text downloaded from HTML : 23
    • Ignored (Indexes, 'Proceedings' etc.) : 13
  • additional comments:
    • The number of articles in Automated Search Helper is lower then the number of inputs, it is due to duplicates across the search.
    • ACM never provides HTML representation, but some papers are published in both ACM and IEEE- where HTML representation is often present.

IEEE

Springer

  • query link: https://link.springer.com/search?query=%22mutation+testing%22+or+%22mutation+analysis%22+or+%22mutant+analysis%22&facet-discipline=%22Computer+Science%22&facet-language=%22En%22
  • no of articles from search query: 958
  • no of entries in input files: 958
  • no of entries in Automated Search Helper: 957
  • no of entries per status:
    • No access : 96
    • Fail to read : 3
    • Text downloaded from PDF : 685
    • Text downloaded from HTML : 173
    • Ignored (Indexes, 'Proceedings' etc.) : 0
  • additional comments:
    • One reference was not processed by Automated Search Helper because it was a Book type and such types are ignored, because if the book is present in search that means that separate chapters are present too so ther is no need to duplicate the input.
    • A lot of articles had forbidden access for authors, most of them were related to medicine, all of them has abstracts downloaded
    • 3 articles were too big, Automated Search Helper has hard limit (50 pages) on reading PDFs due to long processing time.

Science Direct

  • query link: https://www.sciencedirect.com/search?qs=mutation%20testing&show=25&tak=%22mutation%20testing%22
  • no of articles from search query: 925
  • no of entries in input files: 925
  • additional comments on DL search query:
    • The search was limited to "mutation testing" because there was no way to limit the search to Computer Science area and if whole query would be used then there would be thousands of irrelevant results.
  • no of entries in Automated Search Helper: 923
  • no of entries per status:
    • No access : 226
    • Fail to read : 0
    • Text downloaded from PDF : 318
    • Text downloaded from HTML : 377
    • Ignored (Indexes, 'Proceedings' etc.) : 2
  • additional comments:
    • Application did not have access to a lot of articles, because they were mostly not related to Computer Science.
    • Few articles from the search were duplicated, they were from biology area
    • A lot of articles had forbidden access for authors, most of them were related to biology and chemistry, all of them has abstracts downloaded

Willey

Scopus

Combined results

  • total no of inputs from quieries: 5666
  • no of entries in application: 4358
  • Detailed status of articles:
    • No access : 400
    • Fail to read : 303
    • Text downloaded from PDF : 2161
    • Text downloaded from HTML : 1394
    • Ignored (Indexes, 'Proceedings' etc.) : 100

The combined results for Automated Search Helper entries are not sums because of duplicates among publishers.

Clone this wiki locally