主題模型於語音辨識使用之改進

陳冠宇; Kuan-Yu Chen

主題模型於語音辨識使用之改進

dc.contributor	陳柏琳	zh_TW
dc.contributor	Berlin Chen	en_US
dc.contributor.author	陳冠宇	zh_TW
dc.contributor.author	Kuan-Yu Chen	en_US
dc.date.accessioned	2019-09-05T11:29:58Z
dc.date.available	2010-8-25
dc.date.available	2019-09-05T11:29:58Z
dc.date.issued	2010
dc.description.abstract	本論文探討自然語言中詞與詞之間在各種不同條件下的共同出現關係，並推導出許多不同的語言模型來描述之，進而運用於中文大詞彙連續語音辨識。當我們想要探索語言中兩個詞彼此間的共同出現關係(Co-occurrence Relationships)，傳統的做法是由整個訓練語料中統計這兩個詞在一個固定長度的移動窗(Fixed-size Moving Window)內的共同出現頻數(Frequency)，據此以估測出兩個詞之間的聯合機率分布。有別於僅從整個訓練語料中的共同出現頻數來推測任兩個詞之間的關係，本論文嘗試分析兩個詞在不同條件下共同出現的情形，進而推導出多種描述詞與詞關係的語言模型以及其估測方式；像是在不同的主題、文件或文件群的情況下，它們是否皆經常共同出現。本論文的實驗語料收錄自台灣的中文廣播新聞，由一系列的大詞彙連續語音辨識實驗結果顯示，我們所提出的各式語言模型皆可以明顯地提昇基礎語音辨識系統的效能。	zh_TW
dc.description.abstract	This thesis investigates word-word co-occurrence relationships embedded in a natural language. A variety of language models deduced from such relationships are leveraged for Mandarin large vocabulary continuous speech recognition (LVCSR). When measuring the co-occurrence relationship between a given pair of words in a language, the most common approach is to estimate the joint probability of these two words by simply computing how many times the two words occur within some fixed-size window of each other that moves along the entire training corpus. Apart from doing this, in this study, we discuss the co-occurrence relationships between any pair of words under various conditions such as topics, documents, document clusters, to name a few, and hence derive several language models used to characterize such relationships. All experiments are conducted on a Mandarin broadcast news corpus compiled in Taiwan, and the associated results seem to demonstrate the feasibility of the proposed approaches.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	GN0696470203
dc.identifier.uri	http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0696470203%22.&%22.id.&
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106718
dc.language	中文
dc.subject	中文大詞彙連續語音辨識	zh_TW
dc.subject	共同出現關係	zh_TW
dc.subject	語言模型	zh_TW
dc.subject	large vocabulary continuous speech recognition	en_US
dc.subject	co-occurrence relationships	en_US
dc.subject	language model	en_US
dc.title	主題模型於語音辨識使用之改進	zh_TW
dc.title	Improved Topic Modeling Techniques for Speech Recognition	en_US

Collections

學位論文

主題模型於語音辨識使用之改進

Files

Collections