[Note] Multi-sequence Intermediate Conditioning for CTC-based ASR

https://arxiv.org/abs/2204.00175

  • An extension of intermediate-CTC
    • Inspired by HC-CTC
    • Alternated use syllables(e.g. pinyin) and character as target of intermediate layer
  • Corpus: CSJ, AISHELL-1 (ideogram language)
  • Conformer-CTC

Experiment

  • AISHELL-1
    • 4231 character
    • 404 pinyin
  • CSJ
    • 2753 character
    • 256 syllable

Conformer

  • 18 layers
  • 256 dim
  • kernelsize=15, head=4
  • FFN: 2048(AISHELL-1), 1024(CSJ)
  • learning rate factor: 5
  • CTC best path decoding, No LM

Result

  • Multi-task with auxiliary syllable-level prediction is effective
  • Altenate conditioning can capture mutual dependency between syllables and characters
  • Parallel results shows that one linear layer is insufficient to simultaneously transform the shared encoder output to multi level target.
  • CSJ-eval3(out of domain test set) shows that the auxiliary prediction can improve robustness.