Repo for UNSW Capstone Project P106 - Evaluation and Finetuning of Phenotype Concept Recognition Tools
The goal of this project is to evaluate the performance of SOTA Phenotype Concept Recognition tools like PhenoTagger against our newly created HPO Phenotype Gold Corpus.
The Gold Corpus was created with INCEpTION. Text spans representing phenotypes were highlighted and labelled with a corresponding HPO term. See the Annotation Guidelines for details.
The annotations were exported in BioC and UIMA CAS JSON formats. See here for more on BioC and UIMA.
To recreate the Gold Corpus in INCEpTION, import the file Inception_Export.zip into INCEpTION, see here for project import instructions in the INCEpTION User Guide. Tick the option 'create missing users' when importing. Do not unzip the file before importing.
Once the project is imported, log in as 'guest', Password 'UNSWCOMP9900' and use the 'Curation' view to review the annotations (there are 2 panels with annotations in the Curation view, which can be confusing, see Curation in the INCREpTION User Guide for details.