[Note] PERT: Pre-training BERT with permuted language model
Can we use pre-training task other than MLM?
- https://arxiv.org/abs/2203.06906
- Proposed: Permuted Language Model(PerLM)
- Input: permute a proportion of text
- Target: position of the original token
[Note] PERT: Pre-training BERT with permuted language model Read More »