NLP - Hank's blog

[Note] PERT: Pre-training BERT with permuted language model

By Hank Lu in blog on 25 Mar 2022

Can we use pre-training task other than MLM? * https://arxiv.org/abs/2203.06906 * Proposed: Permuted Language Model(PerLM) * Input: permute a proportion of text * Target: position of the original token Pretraining LM tasks * Masked LM * Whole word masking(wwm,): * alleviate "input information leaking" issue * Mask consecutive N-gram * e.g.…

Facebook Hate Speech Detection

By Hank Lu in blog on 25 Jun 2021

只要有人類的地方就會有惡意言論，而 Facebook 身為全球最大的社交平台，從以往僱用審查團隊去人工檢視，近年來也開始引入 AI 系統來輔助偵測，在 NLP 領域令人振奮的 BERT 系列模型更扮演了關鍵的角色。本文由黃偉愷, Ke-Han Lu 共同完成，是「人工智慧與大數據之商業價值」這門課的期末報告，我們分成兩大方向調查了 Facebook 在惡意言論偵測的近期發展： * Facebook Hate Speech Detection：背景介紹及以政策面探討 FB 如何審查、定義惡意言論，AI系統對於目前 FB 的影響 * Facebook BERT-based System：以技術角度介紹 BERT-based 模型的迷人之處及其原理 Facebook Hate Speech Detection 背景介紹 Facebook的創辦人馬克·祖克柏曾說：「Facebook的創建理念是，打造一個全球性的社區，加深人與人之間的聯繫，…