Skip to content

Beomi/exbert-transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7,318 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

exBERT on Transformers 🤗

Original exBERT

Updated for Transformers 🤗

  • PyTorch 1.8.1 ✅
  • Huggingface Trainer ✅
  • AutoModel, AutoTokenizer ✅
  • DeepSpeed Pretrain with run_mlm.py ✅
  • GPU ✅ (TPU test in progress)
  • Fine tune available (https://github.com/Beomi/KcBERT-finetune, In progress)

How to use

Pretrain exBERT

  • Need to clone this repo
git clone https://github.com/Beomi/exbert-transformers
cd exbert-transformers
pip install -e ".[dev]" && pip install datasets
cd examples/pytorch/language-modeling/
./exbert_pretrain.sh

Finetune

Install exbert-transformers

  • No need to git clone repo!
pip install git+https://github.com/Beomi/exbert-transformers

Load

from transformers import exBertModel, exBertTokenizer

model = exBertModel.from_pretrained(...)
tokenizer = exBertTokenizer.from_pretrained(...)

Trained on PAWS

from transformers import exBertModel, exBertTokenizer

model = exBertModel.from_pretrained('beomi/exKcBERT-paws')
tokenizer = exBertTokenizer.from_pretrained('beomi/exKcBERT-paws')

Note) The base_model of Finetuned model config should be ""(blank)

Vocab update

If you want to change base BERT model or add more vocab on exBERT, add vocab or update vocab on examples/pytorch/language-modeling/exbert/vocab.txt and update vocab_size and base_model on examples/pytorch/language-modeling/exbert/config.json.

Appendix

Sample Train result example

Terminal results on Github GIST: https://gist.github.com/Beomi/1aa650f75c8e9b3dd467038004244ed2

About

exBERT on Transformers🤗

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors 846