Skip to content

Conversation

@leoramme
Copy link

Hi!

I recently tried to reproduce the NER results, but it was really hard to set up the system config. The requirements constraints don't work anymore, as any version of numpy > 1.20 returns this error when fine-tuning the model to NER, and the pandas version isn't compatible with the other packages. scikit-learn is also a better alternative than sklearn, as sklearn is only a dummy package and installs the latest version of scikit-learn.

Because of that, I updated the requirements and created a Dockerfile. Unfortunately, because the model weights are hosted with Google Drive, I couldn't automate the download of the model weights inside the Dockerfile.

With both of these modifications, the NER results can be easily reproduced:

  1. Pull the official tensorflow-gpu image with docker pull tensorflow/tensorflow:1.15.5-gpu-py3-jupyter
  2. Download and extract BioBERT-Base v1.1 (+ PubMed 1M) inside the biobert repo:

The directory structure should look like this:

biobert/
├── biobert_v1.1_pubmed
│   ├── bert_config.json
│   ├── model.ckpt-1000000.data-00000-of-00001
│   ├── model.ckpt-1000000.index
│   ├── model.ckpt-1000000.meta
│   └── vocab.txt
├── biocodes
│   ├── [...]
├── create_pretraining_data.py
├── Dockerfile
├── download.sh
├── extract_features.py
├── figs
│   └── biobert_overview.png
├── __init__.py
├── LICENSE
├── modeling.py
├── modeling_test.py
├── optimization.py
├── optimization_test.py
├── README.md
├── requirements.txt
├── run_classifier.py
├── run_ner.py
├── run_pretraining.py
├── run_qa.py
├── run_re.py
├── sample_text.txt
├── tf_metrics.py
├── tokenization.py
└── tokenization_test.py
  1. To build the image, run docker build -t biobert .
  2. To start the image in interactive mode, run docker run --gpus all -it biobert /bin/bash (remove --gpus all if you want to use your CPU instead of GPU)

In the interactive mode, you can use run_ner.py and biocodes/ner_detokenize.py without problems, and I figured that this might be useful if someone else wants to reproduce the results or develop something on top of BioBERT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant