deep-learning

[Note] Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

https://arxiv.org/abs/2203.03582

Motivation

  • CTC-based models are always weaker than AED models and requrire the assistance of external LM.
    • Conditional independence assumption
    • Hard to utilize contextualize information

Proposed

  • Transfer the knowledge of pretrained language model(BERT, GPT-2) to CTC-based ASR model. No inference speed reduction, only use CTC branch to decode.
    • Two method:
      • Representation learning: use CIF or PDS(LASO) to align the number of representation.

[Note] Improving CTC-based speech recognition via knowledge transferring from pre-trained language models Read More »

Tomofun 狗音辨識挑戰賽:初賽資料處理與模型(Top10%)

初賽靠著隊友 Carry 進決賽,決賽的時候主要負責 MLOps 的部份,分成兩篇文章來分別描述一下初賽時我的方法以及決賽時我們怎麼處理多出來的難關 — 在 AWS 上進行 Incremental training。

本實驗的貢獻:沒有用額外的資料集,也沒有 Pre-trained 模型,只將主辦單位提供的資料做 Augmentation & Pseudo Labeling 的技巧, 用 ResNet18 就獲得不錯的 Baseline 成績(Top 10%)。

Code: kehanlu/Tomofun-DogSound-recognition

Tomofun 狗音辨識挑戰賽:初賽資料處理與模型(Top10%) Read More »