speech - Hank's blog

[Note] Understand the role of self attention for efficient speech recognition

By Hank Lu in blog on 31 Mar 2022

* Self-attention plays two role in the success of Transformer-based ASR * The "attention map" in self-attention module can be categorize to 2 group * "phonetic"(vertical) and "linguistic"(diagonal) * Phonetic: lower layer, extract phonologically meaningful global context * Linguistic: higher layer, attent to local context * -> the phonetic variance is standardized in lower…

[Note] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

By Hank Lu in blog on 21 Mar 2022

* https://arxiv.org/abs/2006.11477 * Self-supervised speech representation * contrastive loss: masked continous speech input -> quantized target * quantized module: gumbel softmax(latent representation codebooks) * wav2vec2.0 Large with 10min data: 5.2/8.6 LS-clean test * Fairseq * Well explained: https://neurosys.com/wav2vec-2-0-framework Feature Encoder(CNN) 將 raw audio…

[Note] Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

By Hank Lu in blog on 17 Mar 2022

https://arxiv.org/abs/2203.03582 Motivation * CTC-based models are always weaker than AED models and requrire the assistance of external LM. * Conditional independence assumption * Hard to utilize contextualize information Proposed * Transfer the knowledge of pretrained language model(BERT, GPT-2) to CTC-based ASR model. No inference speed reduction, only use…

Tomofun 狗音辨識挑戰賽：初賽資料處理與模型(Top10%)

By Hank Lu in blog on 19 Jul 2021

初賽靠著隊友 Carry 進決賽，決賽的時候主要負責 MLOps 的部份，分成兩篇文章來分別描述一下初賽時我的方法以及決賽時我們怎麼處理多出來的難關 — 在 AWS 上進行 Incremental training。…