Skip to content

The problem of training and generation scripts of CMLM #16

@JasmineChen123

Description

@JasmineChen123

Hi, thank you for releasing the code!
I have a question about the given bash scripts of training and inference.

The training scripts of the CMLM+DSLP
python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de --save-dir checkpoints --eval-tokenized-bleu \ --keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric \ --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100 \ --eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d \ --share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \ --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \ --fp16 --clip-norm 2.0 --max-update 300000 --task translation_lev --criterion nat_loss --arch glat_sd --noise full_mask \ --concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \ --activation-fn gelu --dropout 0.1 --max-tokens 8192 \ --length-loss-factor 0.1 --pred-length-offset

The "--arch glat_sd" is weird. Is it "cmlm_sd" or "cmlm_transformer"?

Another question is, could you please give us the generation scripts for CMLM (iter>1), when setting "--iter-decode-max-iter 5/10"? I find that the BLEU under iter=5/10 is much worse than that of iter=1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions