Research

[Note] Conformer: Convolution-augmented Transformer for Speech Recognition

  • SOTA performance on Librispeech
  • A novel way to combine CNN + Transformer
    • to model both local(CNN) and global(Self-attention) dependencies
  • Conformer
    • 4 modules: 0.5 FFN + MHSA + CNN + 0.5 FFN
    • Macaron-style half-step resudual FFN
    • placing CNN after MHSA is more effective
    • swish activation led to faster convergence

[Note] Conformer: Convolution-augmented Transformer for Speech Recognition Read More »

[Note] InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

https://arxiv.org/abs/2204.00174

  • A novel training method for CTC-based ASR using augmented intermediate representations for conditioning
    • a extension of self-condition CTC
  • Methods: noisy conditioning
    • feature space: Mask time or feature
    • token space: Insert, delete, substitute token in “condition”.

[Note] InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR Read More »

[Note] Understand the role of self attention for efficient speech recognition

  • Self-attention plays two role in the success of Transformer-based ASR
    • The “attention map” in self-attention module can be categorize to 2 group
      • “phonetic”(vertical) and “linguistic”(diagonal)
    • Phonetic: lower layer, extract phonologically meaningful global context
    • Linguistic: higher layer, attent to local context
    • -the phonetic variance is standardized in lower SA layers so upper SA layers can identify local linguistic features.

[Note] Understand the role of self attention for efficient speech recognition Read More »

[Note] Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

https://arxiv.org/abs/2202.03218

  • Self-supervised learning model(wav2vec, HuBERT) gain great success
    • Fine-tune the pretrained model is computationally expensive and does not scale well. $O(10^8)$ parameters per task.

Contributions:

  • Applying adapters to wav2vec2.0 model to reduce the number of parameters required for down-stream tasks.
    • Adapters are small trainable modules can be applied into the layers of a frozen pre-trained nedwork for a particular task.

[Note] Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition Read More »

Facebook Hate Speech Detection

只要有人類的地方就會有惡意言論,而 Facebook 身為全球最大的社交平台,從以往僱用審查團隊去人工檢視,近年來也開始引入 AI 系統來輔助偵測,在 NLP 領域令人振奮的 BERT 系列模型更扮演了關鍵的角色。

本文由黃偉愷Ke-Han Lu 共同完成,是「人工智慧與大數據之商業價值」這門課的期末報告,我們分成兩大方向調查了 Facebook 在惡意言論偵測的近期發展:

Facebook Hate Speech Detection Read More »

How to Read a Paper

這學期開始進入正式課程之前,教授提供了一些關於「如何讀 Paper」的文章。對於一位剛要進入研究領域的學生來說,讀文獻是很重要的,用對方法可以節省很多心力,避免變成被論文海淹沒的菸酒生。

以下是我試著翻譯 S. Keshav “How to Read a Paper”,文中提到的三個步驟,「快速地掃過、掌握重點、實際重現結果」透過這樣的方法,可以快速過濾哪些文獻適合閱讀,哪些適合擺到一邊,以及如何閱讀、哪些部分是關鍵。

How to Read a Paper Read More »