-
Notifications
You must be signed in to change notification settings - Fork 3
Evaluation
Automated Search Helper was tested on real-life example, authors tried to find all articles related to "Mutation Testing in C++". This section presents the detailed results of the articles download with Automated Search Helper.
The goal of the search was to find all papers related to Mutation Testing in C++ across Digital Libraries (DLs) supported by Automated Search Helper. The search was done in 16.08.2020, the current results may differ but the links are probably still valid. On each DL a search was run to find all articles which relates to Mutation Testing, in most cases the search query was:
"mutation testing" or "mutation analysis" or "mutant analysis"
If it was possible the search was limited to Computer Science related studies. The tool was run simultaneously for all DLs which means that if the same article appeared in multiple it may have been donwloaded only by one. The order of download was randomized.
This evaluation shows how effective the Automated Search Helper is in downloading articles full text and to compare how it works for different DLs.
This section presents result of the search for each DL, each of them will have the same data format presented as a list of records.
- query link: https://dl.acm.org/action/doSearch?AllField=%22mutation+testing%22+or+%22mutation+analysis%22+or+%22mutatnt+analysis%22
- no of articles from search query: 1007
- no of entries in input files: 1055
- additional comments on DL search query:
- Some articles have two versions e.g., book chapter and conference paper, they appear once in search query but twice in generated
bibfiles
- Some articles have two versions e.g., book chapter and conference paper, they appear once in search query but twice in generated
- no of entries in Automated Search Helper: 977
- no of entries per status:
- No access : 12
- Fail to read : 0
- Text downloaded from PDF : 929
- Text downloaded from HTML : 23
- Ignored (Indexes, 'Proceedings' etc.) : 13
- additional comments:
- The number of articles in Automated Search Helper is lower then the number of inputs, it is due to duplicates across the search.
- ACM never provides HTML representation, but some papers are published in both ACM and IEEE- where HTML representation is often present.
- query link: https://ieeexplore.ieee.org/search/searchresult.jsp?action=search&newsearch=true&matchBoolean=true&queryText=(((%22All%20Metadata%22:%22mutation%20testing%22)%20OR%20%22All%20Metadata%22:%22mutation%20analysis%22)%20OR%20%22All%20Metadata%22:%22mutant%20analysis%22)
- no of articles from search query: 780
- no of entries in input files: 780
- no of entries in Automated Search Helper: 770
- no of entries per status:
- No access : 0
- Fail to read : 2
- Text downloaded from PDF : 125
- Text downloaded from HTML : 641
- Ignored (Indexes, 'Proceedings' etc.) : 4
- additional comments:
- 10 missing articles are some duplicates with generic name like [Front Cover] - not really papers
- Two articles failed to read due to unusual format
- Most of articles has HTML representation which really speeds up the application and allows for much better accuracy in obtaining full text
- query link: https://link.springer.com/search?query=%22mutation+testing%22+or+%22mutation+analysis%22+or+%22mutant+analysis%22&facet-discipline=%22Computer+Science%22&facet-language=%22En%22
- no of articles from search query: 958
- no of entries in input files: 958
- no of entries in Automated Search Helper: 957
- no of entries per status:
- No access : 96
- Fail to read : 3
- Text downloaded from PDF : 685
- Text downloaded from HTML : 173
- Ignored (Indexes, 'Proceedings' etc.) : 0
- additional comments:
- One reference was not processed by Automated Search Helper because it was a Book type and such types are ignored, because if the book is present in search that means that separate chapters are present too so ther is no need to duplicate the input.
- A lot of articles had forbidden access for authors, most of them were related to medicine, all of them has abstracts downloaded
- 3 articles were too big, Automated Search Helper has hard limit (50 pages) on reading PDFs due to long processing time.
- query link: https://www.sciencedirect.com/search?qs=mutation%20testing&show=25&tak=%22mutation%20testing%22
- no of articles from search query: 925
- no of entries in input files: 925
- additional comments on DL search query:
- The search was limited to "mutation testing" because there was no way to limit the search to Computer Science area and if whole query would be used then there would be thousands of irrelevant results.
- no of entries in Automated Search Helper: 923
- no of entries per status:
- No access : 226
- Fail to read : 0
- Text downloaded from PDF : 318
- Text downloaded from HTML : 377
- Ignored (Indexes, 'Proceedings' etc.) : 2
- additional comments:
- Application did not have access to a lot of articles, because they were mostly not related to Computer Science.
- Few articles from the search were duplicated, they were from biology area
- A lot of articles had forbidden access for authors, most of them were related to biology and chemistry, all of them has abstracts downloaded
- query links:
- https://onlinelibrary.wiley.com/action/doSearch?AllField=%22mutation+testing%22&content=articlesChapters&target=default&startPage=&ConceptID=68
- https://onlinelibrary.wiley.com/action/doSearch?AllField=%22mutant+analysis%22&startPage=&ConceptID=68
- https://onlinelibrary.wiley.com/action/doSearch?AllField=%22mutation+analysis%22&startPage=&ConceptID=68
- no of articles from search query: 200 + 14 + 184 = 398
- no of entries in input files: 378
- additional comments on DL search query:
- Three separate queries were used because in a joined query some results were missing.
- Few articles were not present in input files even though they were selected for download by the script.
- no of entries in Automated Search Helper: 313
- no of entries per status:
- No access : 41
- Fail to read : 1
- Text downloaded from PDF : 101
- Text downloaded from HTML : 167
- Ignored (Indexes, 'Proceedings' etc.) : 3
- additional comments:
- Duplicates were removed by Automated Search Helper so running three queries instead of one combined did not produce any overhead.
- One file failed to be read because size of PDF was too big. Automated Search Helper has hard limit (50 pages) on reading PDFs due to long processing time.
- query link: https://www.scopus.com/results/results.uri?sort=plf-f&src=s&sid=e22fea41c1479e0293b7314f4f3ec7f3&sot=a&sdt=a&cluster=scosubjabbr%2c%22COMP%22%2ct&sl=80&s=TITLE-ABS-KEY+%28+%22mutation+testing%22+OR+%22mutation+analysis%22+OR+%22mutant+analysis%22+%29&origin=searchadvanced&editSaveSearch=&txGid=f473d1ab6d4abd1449a5cfe16b817cf7
- no of articles from search query: 1551
- no of entries in input files: 1551
- no of entries in Automated Search Helper: 1545
- no of entries per status:
- No access : 62
- Fail to read : 299
- Text downloaded from PDF : 429
- Text downloaded from HTML : 675
- Ignored (Indexes, 'Proceedings' etc.) : 80
- additional comments:
- There were some duplicates in input files - most of them were ignored anyway (e.g., list of proceedings from conferences)
- A lot of articles failed to read, because some publishers indexed by Scopus are not supported, Scopus never provides a full text, to obtain it application must visit publisher website and only publishers listed in previous sections were supported
- A lot of references were ignored, Scopus indexes also lists of articles (named e.g., Proceedings) in addition to separate articles, such lists are ignored because they do not hold any text.
- total no of inputs from quieries: 5666
- no of entries in application: 4358
- Detailed status of articles:
- No access : 400
- Fail to read : 303
- Text downloaded from PDF : 2161
- Text downloaded from HTML : 1394
- Ignored (Indexes, 'Proceedings' etc.) : 100
The combined results for Automated Search Helper entries are not sums because of duplicates among publishers.