论文部分内容阅读
提出一种利用深度神经网络和语音学知识进行文本无关的说话人识别的方法.采用深度神经网络(Deep Neural Netw ork,DNN)来进行有监督的因子分析建模方法是目前与文本无关说话人识别的研究热点,在此基础上挖掘了不同的音素对识别性能的影响.首先根据语音学知识对DNN的输出节点进行分类,在说话人建模过程中,根据不同的类别来提取不同的后验因子(i-vectors),然后采用拼接的方式得到一个高维的i-vector用于话者识别.在NIST SRE 2012的核心测试任务上,相对于无监督的全空间变量因子分析与基于DNN的因子分析方法,提出的算法都有不同程度的性能提升.综合来看,超过了目前已知的最佳系统性能.
This paper proposes a method of text-independent speaker recognition using deep neural networks and phonetic knowledge.Adopting a supervised Neural Network (DNN) for supervised factor analysis modeling is a method that is independent of text Recognition of the research hot spots, on this basis, tap the different phonemes on the recognition performance.First, according to the phonetic knowledge of DNN output nodes are classified in the speaker modeling process, according to different categories to extract different post (I-vectors), and then use the splicing method to get a high-dimensional i-vector for speaker identification.At NIST SRE 2012’s core testing tasks, compared with unsupervised full-space variable factor analysis and DNN Factor analysis method, the proposed algorithm has varying degrees of performance improvement.Overall, beyond the best known system performance.