The problem of training and generation scripts of CMLM

Hi, thank you for releasing the code！
I have a question about the given bash scripts of training and inference.

The training scripts of the CMLM+DSLP
`python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de  --save-dir checkpoints  --eval-tokenized-bleu \
   --keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric \
   --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100 \
   --eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5  --fixed-validation-seed 7 --ddp-backend=no_c10d \
   --share-all-embeddings --decoder-learned-pos --encoder-learned-pos  --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \ 
   --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \
   --fp16 --clip-norm 2.0 --max-update 300000  --task translation_lev --criterion nat_loss --arch glat_sd --noise full_mask \ 
   --concat-yhat --concat-dropout 0.0  --label-smoothing 0.1 \ 
   --activation-fn gelu --dropout 0.1  --max-tokens 8192 \
   --length-loss-factor 0.1 --pred-length-offset `

The "--arch glat_sd" is weird. Is it "cmlm_sd" or "cmlm_transformer"?

Another question is, could you please give us the generation scripts for CMLM (iter>1), when setting "--iter-decode-max-iter 5/10"? I find that the BLEU under iter=5/10 is much worse than that of iter=1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem of training and generation scripts of CMLM #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The problem of training and generation scripts of CMLM #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions