[Note] Conformer: Convolution-augmented Transformer for Speech Recognition

  • SOTA performance on Librispeech
  • A novel way to combine CNN + Transformer
    • to model both local(CNN) and global(Self-attention) dependencies
  • Conformer
    • 4 modules: 0.5 FFN + MHSA + CNN + 0.5 FFN
    • Macaron-style half-step resudual FFN
    • placing CNN after MHSA is more effective
    • swish activation led to faster convergence

[Note] Conformer: Convolution-augmented Transformer for Speech Recognition Read More »