https://arxiv.org/abs/2204.00174
- A novel training method for CTC-based ASR using augmented intermediate representations for conditioning
- a extension of self-condition CTC
- Methods: noisy conditioning
- feature space: Mask time or feature
- token space: Insert, delete, substitute token in “condition”.

Results

- feature masking seems ineffctive
- Token substitution perform the best
- contained many blank tokens <-> non-blank token: combine of ins and del error.


- masking latent feature cause excessive loss of information
- proposed method can stably obtain the effect of augmentation.