[Note] Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
https://arxiv.org/abs/2203.03582
Motivation
- CTC-based models are always weaker than AED models and requrire the assistance of external LM.
- Conditional independence assumption
- Hard to utilize contextualize information
Proposed
- Transfer the knowledge of pretrained language model(BERT, GPT-2) to CTC-based ASR model. No inference speed reduction, only use CTC branch to decode.
- Two method:
- Representation learning: use CIF or PDS(LASO) to align the number of representation.
- Two method: