語者確認使用不同語句嵌入函數之比較研究

dc.contributor陳柏琳zh_TW
dc.contributorChen, Berlinen_US
dc.contributor.author李宗勳zh_TW
dc.contributor.authorLee, Tsung-Hsunen_US
dc.date.accessioned2022-06-08T02:43:22Z
dc.date.available2026-09-06
dc.date.available2022-06-08T02:43:22Z
dc.date.issued2021
dc.description.abstract語者語句的嵌入函數利用了神經網路將語句映射到一個空間,在該空間中,距離反映出語者之間的相似度,這種度量學習最早被提出應用在人臉辨識。最近幾年被拿來應用在應用在語者確認,這也推動近幾年語者確認任務的發展。但還是有明顯的正確率差異在語者確認的訓練集辨識和未知語者。在未知語者的狀況下,很評估適合使用小樣本學習。在實際環境中,語者確認系統需要識別短語句的語者,但在訓練時的語者話語都是相對較長的。然而近年的語者確認模型在短語句的語者確認中表現不佳。在這裡我們使用了原型網路損失、三元組損失和最先進的小樣本學習來優化嵌入語者模型。資料集使用了VoxCeleb1和VoxCeleb2,前者資料集的語者數量有1,221,後者資料集的語者數量有5,994。實驗的結果顯示,嵌入語者模型在我們提出的損失函數有較好的表現。zh_TW
dc.description.abstractThe speaker’s embedding model uses neural networks to map utterances to a space. The distance shows the similarity between each speaker. This metric learning was first proposed to be applied to face recognition. In recent years, it has been used in the application of speaker verification, which has also promoted the development of speaker verification tasks in recent years. However, there is still a significant difference correctness between the seen speakers and unseen speakers in training set. In the case of unseen speakers, it is very good to use few-shot learning. In the really environment, the speaker verification system needs to recognize the speaker of short utterances. But during training the speaker’s utterances are relatively long. In recent years, the speaker verification model does not perform well in short utterances. Here we use prototype network loss, triplet loss and state-of-the-art few-shot learning to optimize the speaker’s embedding model. The dataset we use VoxCeleb1 and VoxCeleb2. The number of speakers in the former dataset is 1,221 and the number of speakers in the latter dataset is 5,994. The results of the experiment show that speaker’s embedding model performs well in our proposed loss function.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier60547069S-40158
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/96e7d8e662d508f17f746e65281a588c/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117281
dc.language中文
dc.subject語者確認zh_TW
dc.subject語音辨識zh_TW
dc.subject小樣本學習zh_TW
dc.subjectSpeaker verificationen_US
dc.subjectSpeech recognitionen_US
dc.subjectFew-shot learningen_US
dc.title語者確認使用不同語句嵌入函數之比較研究zh_TW
dc.titleA Comparative Study of Utterance-Embedding Generation Functions for Speaker Verificationen_US
dc.type學術論文

Files

Collections