- Repo: https://github.com/cgmhaicenter/exBERT
- Paper: https://www.aclweb.org/anthology/2020.findings-emnlp.129/
- PyTorch 1.8.1 ✅
- Huggingface Trainer ✅
- AutoModel, AutoTokenizer ✅
- DeepSpeed Pretrain with
run_mlm.py✅ - GPU ✅ (TPU test in progress)
- Fine tune available (https://github.com/Beomi/KcBERT-finetune, In progress)
- Need to clone this repo
git clone https://github.com/Beomi/exbert-transformers
cd exbert-transformers
pip install -e ".[dev]" && pip install datasets
cd examples/pytorch/language-modeling/
./exbert_pretrain.shInstall exbert-transformers
- No need to git clone repo!
pip install git+https://github.com/Beomi/exbert-transformersLoad
from transformers import exBertModel, exBertTokenizer
model = exBertModel.from_pretrained(...)
tokenizer = exBertTokenizer.from_pretrained(...)Trained on PAWS
from transformers import exBertModel, exBertTokenizer
model = exBertModel.from_pretrained('beomi/exKcBERT-paws')
tokenizer = exBertTokenizer.from_pretrained('beomi/exKcBERT-paws')Note) The
base_modelof Finetuned model config should be""(blank)
If you want to change base BERT model or add more vocab on exBERT, add vocab or update vocab on examples/pytorch/language-modeling/exbert/vocab.txt
and update vocab_size and base_model on examples/pytorch/language-modeling/exbert/config.json.
Terminal results on Github GIST: https://gist.github.com/Beomi/1aa650f75c8e9b3dd467038004244ed2