Month: April 2022

[Note] Conformer: Convolution-augmented Transformer for Speech Recognition

  • SOTA performance on Librispeech
  • A novel way to combine CNN + Transformer
    • to model both local(CNN) and global(Self-attention) dependencies
  • Conformer
    • 4 modules: 0.5 FFN + MHSA + CNN + 0.5 FFN
    • Macaron-style half-step resudual FFN
    • placing CNN after MHSA is more effective
    • swish activation led to faster convergence

[Note] Conformer: Convolution-augmented Transformer for Speech Recognition Read More »

[Note] InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

https://arxiv.org/abs/2204.00174

  • A novel training method for CTC-based ASR using augmented intermediate representations for conditioning
    • a extension of self-condition CTC
  • Methods: noisy conditioning
    • feature space: Mask time or feature
    • token space: Insert, delete, substitute token in “condition”.

[Note] InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR Read More »