CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes

This repository contains scripts to support different steps of the analyses performed in the CoNECo paper.

There is a Zenodo project associated with this repository.

Annotation documentation is available through Zenodo and this page: https://katnastou.github.io/annodoc-CoNECo/

Corpus statistics

There are three scripts in this directory to replicate the process of calculating corpus statistics as described in the Results and Discussion section of the manuscript. You only need to invoke the shell script in the directory.

./corpus_stats/run.sh

For word counting of the documents, BERT basic tokenization is used, with the implementation found here.

CoNECo corpus

This directory has the documents in BRAT and conll format.

Error analysis

For the error analysis the evaluation script evalso.py is used to detect False Positives and False Negatives in each document of the test set. To invoke the command in the entire Jensenlab tagged CoNECo test set using the CoNECo annotated test set as a gold standard a shell script is provided.

./error_analysis/jensenlab-tagger/run.sh

Similarly, for the Transformer-based tagger, you should run:

./error_analysis/transformer-tagger/run.sh

Large-scale tagging for Jensenlab tagger

For large-scale tagging, the tagger needs to be set up first. Instructions on how to set it up can be found here. Then one needs to execute the shell script and the results that are also available in Zenodo can be obtained.

./large-scale-jensenlab-tagger/run.sh

CoNECo transformer ner

Please refer to the original repo on how to train an NER model on CoNECo.

CoNECo transformer tagger

Please refer to the original repo on how to do a large-scale run using the model trained on CoNECo.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
CoNECo-corpus		CoNECo-corpus
CoNECo-transformer-ner @ 5c6ac34		CoNECo-transformer-ner @ 5c6ac34
CoNECo-transformer-tagger @ dcda994		CoNECo-transformer-tagger @ dcda994
corpus_stats		corpus_stats
error_analysis		error_analysis
large-scale-jensenlab-tagger		large-scale-jensenlab-tagger
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes

Corpus statistics

CoNECo corpus

Error analysis

Large-scale tagging for Jensenlab tagger

CoNECo transformer ner

CoNECo transformer tagger

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes

Corpus statistics

CoNECo corpus

Error analysis

Large-scale tagging for Jensenlab tagger

CoNECo transformer ner

CoNECo transformer tagger

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages