語者確認使用不同語句嵌入函數之比較研究

李宗勳; Lee, Tsung-Hsun

語者確認使用不同語句嵌入函數之比較研究

dc.contributor	陳柏琳	zh_TW
dc.contributor	Chen, Berlin	en_US
dc.contributor.author	李宗勳	zh_TW
dc.contributor.author	Lee, Tsung-Hsun	en_US
dc.date.accessioned	2022-06-08T02:43:22Z
dc.date.available	2026-09-06
dc.date.available	2022-06-08T02:43:22Z
dc.date.issued	2021
dc.description.abstract	語者語句的嵌入函數利用了神經網路將語句映射到一個空間，在該空間中，距離反映出語者之間的相似度，這種度量學習最早被提出應用在人臉辨識。最近幾年被拿來應用在應用在語者確認，這也推動近幾年語者確認任務的發展。但還是有明顯的正確率差異在語者確認的訓練集辨識和未知語者。在未知語者的狀況下，很評估適合使用小樣本學習。在實際環境中，語者確認系統需要識別短語句的語者，但在訓練時的語者話語都是相對較長的。然而近年的語者確認模型在短語句的語者確認中表現不佳。在這裡我們使用了原型網路損失、三元組損失和最先進的小樣本學習來優化嵌入語者模型。資料集使用了VoxCeleb1和VoxCeleb2，前者資料集的語者數量有1,221，後者資料集的語者數量有5,994。實驗的結果顯示，嵌入語者模型在我們提出的損失函數有較好的表現。	zh_TW
dc.description.abstract	The speaker’s embedding model uses neural networks to map utterances to a space. The distance shows the similarity between each speaker. This metric learning was first proposed to be applied to face recognition. In recent years, it has been used in the application of speaker verification, which has also promoted the development of speaker verification tasks in recent years. However, there is still a significant difference correctness between the seen speakers and unseen speakers in training set. In the case of unseen speakers, it is very good to use few-shot learning. In the really environment, the speaker verification system needs to recognize the speaker of short utterances. But during training the speaker’s utterances are relatively long. In recent years, the speaker verification model does not perform well in short utterances. Here we use prototype network loss, triplet loss and state-of-the-art few-shot learning to optimize the speaker’s embedding model. The dataset we use VoxCeleb1 and VoxCeleb2. The number of speakers in the former dataset is 1,221 and the number of speakers in the latter dataset is 5,994. The results of the experiment show that speaker’s embedding model performs well in our proposed loss function.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	60547069S-40158
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/96e7d8e662d508f17f746e65281a588c/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117281
dc.language	中文
dc.subject	語者確認	zh_TW
dc.subject	語音辨識	zh_TW
dc.subject	小樣本學習	zh_TW
dc.subject	Speaker verification	en_US
dc.subject	Speech recognition	en_US
dc.subject	Few-shot learning	en_US
dc.title	語者確認使用不同語句嵌入函數之比較研究	zh_TW
dc.title	A Comparative Study of Utterance-Embedding Generation Functions for Speaker Verification	en_US
dc.type	學術論文

Collections

學位論文

語者確認使用不同語句嵌入函數之比較研究

Files

Collections