[Note] Conformer: Convolution-augmented Transformer for Speech Recognition
- SOTA performance on Librispeech
- A novel way to combine CNN + Transformer
- to model both local(CNN) and global(Self-attention) dependencies
- Conformer
- 4 modules: 0.5 FFN + MHSA + CNN + 0.5 FFN
- Macaron-style half-step resudual FFN
- placing CNN after MHSA is more effective
- swish activation led to faster convergence
[Note] Conformer: Convolution-augmented Transformer for Speech Recognition Read More »