CN106448684A - Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system - Google Patents
Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system Download PDFInfo
- Publication number
- CN106448684A CN106448684A CN201611006202.2A CN201611006202A CN106448684A CN 106448684 A CN106448684 A CN 106448684A CN 201611006202 A CN201611006202 A CN 201611006202A CN 106448684 A CN106448684 A CN 106448684A
- Authority
- CN
- China
- Prior art keywords
- speaker
- belief network
- deep belief
- feature
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000013598 vector Substances 0.000 title claims abstract description 88
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000001228 spectrum Methods 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000000034 method Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000012790 confirmation Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 10
- 238000012360 testing method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
本发明属于语音信号处理与机器学习领域,涉及一个基于深度置信网络特征矢量的信道鲁棒声纹识别系统,由语音采集及预处理模块、原始谱特征提取模块、深度置信网络训练模块、说话人声纹特征矢量提取模块、说话人声学模型生成模块和说话人身份鉴定模块构成。通过来自不同信道的语音数据和相对应的说话人身份编号,有监督地训练一个深度置信网络,并提出了一种判别比值选择具有最佳类别区分性的深度置信网络隐含层输出来构造说话人声纹特征矢量,该特征矢量具有信道鲁棒性。相比于传统的基于i‑vector的说话人确认系统,本系统在信道失配情况下具有更高的声纹识别准确率。
The invention belongs to the field of speech signal processing and machine learning, and relates to a channel robust voiceprint recognition system based on a deep belief network feature vector, which consists of a speech collection and preprocessing module, an original spectrum feature extraction module, a deep belief network training module, and a speaker. It consists of a voiceprint feature vector extraction module, a speaker acoustic model generation module and a speaker identification module. Supervisedly train a deep belief network with speech data from different channels and the corresponding speaker ID numbers, and propose a discriminant ratio to select the hidden layer output of the deep belief network with the best class discrimination to construct the speech Voiceprint feature vector, which is channel robust. Compared with the traditional speaker confirmation system based on i‑vector, this system has a higher accuracy of voiceprint recognition in the case of channel mismatch.
Description
技术领域technical field
本发明涉及一个基于深度置信网络特征矢量的信道鲁棒声纹识别系统,属于人机语音交互技术领域。The invention relates to a channel robust voiceprint recognition system based on deep belief network feature vectors, and belongs to the technical field of human-computer voice interaction.
背景技术Background technique
声纹识别技术属于生物验证技术的一种,采用语音对说话人身份进行验证,即确认某段语音是否是指定的某个人说的。这种技术具有较好的便捷性和安全性,在银行、社保、公安、智能家居、移动支付等领域都有巨大应用前景。但在实际应用中,传统的声纹识别系统面临着信道失配的问题,即说话人注册和测试时使用不同的移动设备,导致声纹识别系统的性能下降,识别准确率下降。因此,为解决移动设备环境下的信道失配问题,本发明提出基于深度置信网络特征矢量的信道鲁棒声纹识别系统。Voiceprint recognition technology is a kind of biological verification technology, which uses voice to verify the identity of the speaker, that is, to confirm whether a certain voice is spoken by a designated person. This technology has good convenience and security, and has great application prospects in banking, social security, public security, smart home, mobile payment and other fields. However, in practical applications, the traditional voiceprint recognition system faces the problem of channel mismatch, that is, different mobile devices are used for speaker registration and testing, resulting in a decline in the performance of the voiceprint recognition system and a decline in recognition accuracy. Therefore, in order to solve the problem of channel mismatch in the mobile device environment, the present invention proposes a channel robust voiceprint recognition system based on deep belief network feature vectors.
本发明采用深度置信网络(DBN)提取说话人特征。现存的很多声纹识别系统仍然采用着语音识别中的特征如MFCC特征、PLP特征等,这些底层声学特征中主要的信息是发音文本特征,说话人信息很容易受到文本信息、信道和噪声信息的干扰,这些特征不能很好的体现说话人的特点,同时在信道失配条件下,系统的识别性能下降,从而制约了声纹识别技术的应用。信道失配指的是训练与测试时采集语音的信道不同,围绕这一问题,Kenny提出了联合因子分析(Joint Factor Analysis,JFA)技术为信道失配环境下的声纹识别研究开辟了新思路,其主要思想是将说话人高斯均值超矢量所在空间划分为三个组成部分:本征信道(Eigenchannel)空间、本征音(Eigenvoice)空间和残差(Diagonal Residual)空间,通过移除说话人均值超矢量在本征信道空间的影响,来达到抗信道失配的目的。然而,在各种信道下训练数据不均衡时,JFA技术存在明显不足。之后Dehak提出基于i-vector技术,这一建模方法的动机来源于JFA建模后的信道因子不仅包含了信道效应也夹杂了说话人信息。I-vector方法采用一个全局差异空间(Total Variability Space)来代替这两个空间,它既包含了说话人之间的差异也包含了信道之间的差异。The present invention uses a deep belief network (DBN) to extract speaker features. Many existing voiceprint recognition systems still use features in speech recognition such as MFCC features, PLP features, etc. The main information in these underlying acoustic features is pronunciation text features, and speaker information is easily affected by text information, channel and noise information. Interference, these features cannot reflect the characteristics of the speaker well, and at the same time, under the condition of channel mismatch, the recognition performance of the system decreases, which restricts the application of voiceprint recognition technology. Channel mismatch refers to the fact that the channels used to collect speech during training and testing are different. Around this problem, Kenny proposed the Joint Factor Analysis (JFA) technology, which opened up new ideas for the research of voiceprint recognition under the environment of channel mismatch. , the main idea is to divide the space where the Gaussian mean supervector of the speaker is located into three components: Eigenchannel (Eigenchannel) space, Eigenvoice (Eigenvoice) space and Diagonal Residual (Diagonal Residual) space, by removing the speaker The influence of the per capita supervector in the eigenchannel space is used to achieve the purpose of resisting channel mismatch. However, when the training data is unbalanced under various channels, the JFA technique has obvious deficiencies. Later, Dehak proposed based on i-vector technology. The motivation of this modeling method comes from the channel factor after JFA modeling not only includes channel effects but also speaker information. The I-vector method uses a global difference space (Total Variability Space) to replace these two spaces, which includes both the difference between speakers and the difference between channels.
基于i-vector技术的声纹识别系统能较好的反映说话人特性,是声纹识别的主流技术之一,但其在信道失配条件下性能一般。深度学习作为近几年新兴的机器学习技术,在多种特定的模式识别任务上取得了显著的效果。深度神经网络的一个常见应用是特征提取,相比传统手工提取的特征,深度神经网络提取的特征能更好地表征高层次抽象信息。深度置信网络(Deep Belief Network,DBN)由Geoffrey Hinton在2006年提出,是一种生成模型,通过训练神经元之间的权重,可让整个神经网络按照最大概率来生成训练数据。深度置信网络(DBN)由多层受限玻尔兹曼机(RBMs)堆叠而成。通常,深度置信网络(DBN)主要用于对一维数据的建模比较有效,例如语音。The voiceprint recognition system based on i-vector technology can better reflect the characteristics of the speaker and is one of the mainstream voiceprint recognition technologies, but its performance is average under the condition of channel mismatch. As an emerging machine learning technology in recent years, deep learning has achieved remarkable results in a variety of specific pattern recognition tasks. A common application of deep neural networks is feature extraction. Compared with traditional hand-extracted features, the features extracted by deep neural networks can better represent high-level abstract information. Deep Belief Network (DBN) was proposed by Geoffrey Hinton in 2006. It is a generative model that allows the entire neural network to generate training data with maximum probability by training the weights between neurons. Deep Belief Networks (DBNs) are stacked by multiple layers of Restricted Boltzmann Machines (RBMs). Usually, Deep Belief Network (DBN) is mainly used to model one-dimensional data more effectively, such as speech.
受深度学习中深度置信网络在语音识别成功应用的启发,本发明通过利用大量不同信道的语音数据和相对应的说话人身份编号对深度置信网络(DBN)进行有监督的训练,通过训练好的深度置信网络(DBN)对说话人语音特征进行提取。为了测量神经网络不同隐含层输出的区分度,提出了一个判别比值来选择区分度最好的输出用于构成信道鲁棒的说话人特征矢量。同时采用3个中文语音数据库验证了相比传统的i-vector系统,基于深度置信信念网络特征矢量的信道鲁棒声纹识别系统具有更强的信道鲁棒特性。Inspired by the successful application of deep belief network in speech recognition in deep learning, the present invention carries out supervised training to deep belief network (DBN) by using a large number of speech data of different channels and corresponding speaker ID numbers, and through the trained Deep Belief Network (DBN) extracts the speaker's speech features. In order to measure the discriminative degree of output of different hidden layers of neural network, a discriminant ratio is proposed to select the output with the best discriminative degree to form a channel-robust speaker feature vector. At the same time, three Chinese speech databases are used to verify that the channel robust voiceprint recognition system based on the deep belief network feature vector has stronger channel robustness than the traditional i-vector system.
发明内容Contents of the invention
基于对上述现有技术的分析,本发明的目的是基于中文的面向移动设备的声纹识别,构造在实际应用中基于深度学习的、对信道鲁棒的面向移动设备的声纹识别系统,本系统采用深度置信网络(DBN)提取说话人语音特征,并提出了一个判别比值Rp用于测量神经网络不同隐含层输出的区分度并选择区分度最好的特征,从而提高了声纹识别系统的信道鲁棒性。系统包括如下模块:Based on the analysis of the above-mentioned prior art, the purpose of the present invention is based on voiceprint recognition for mobile devices based on Chinese, and to construct a voiceprint recognition system for mobile devices that is based on deep learning and is robust to channels in practical applications. The system uses the Deep Belief Network (DBN) to extract the speaker's speech features, and proposes a discriminant ratio R p to measure the discrimination of the output of different hidden layers of the neural network and select the feature with the best discrimination, thereby improving the voiceprint recognition. The channel robustness of the system. The system includes the following modules:
语音采集及预处理模块,用于采集所述说话人的语音信号,并对语音信号进行预处理;Voice collection and preprocessing module, for collecting the voice signal of the speaker, and preprocessing the voice signal;
原始谱特征提取模块,用于对预处理后的语音进行原始谱特征MFCC提取;The original spectral feature extraction module is used to extract the original spectral feature MFCC to the preprocessed speech;
深度置信网络训练模块,用于有监督训练一个信道鲁棒的特征矢量提取器;Deep belief network training module for supervised training of a channel-robust feature vector extractor;
说话人声纹特征矢量提取模块,利用所述训练好的深度置信网络进行信道鲁棒的说话人声纹特征矢量提取;The speaker's voiceprint feature vector extraction module uses the trained deep belief network to extract the channel robust speaker's voiceprint feature vector;
说话人声学模型生成模块,根据提取的所述说话人声纹特征矢量,对所述说话人进行声学建模;The speaker acoustic model generating module performs acoustic modeling on the speaker according to the extracted speaker voiceprint feature vector;
说话人身份鉴定模块,将待测试说话人的所述声学模型与注册说话人的所述声学模型进行比较评分,确定待测试说话人的身份。The speaker identity identification module compares and scores the acoustic model of the speaker to be tested with the acoustic model of the registered speaker to determine the identity of the speaker to be tested.
进一步地,所述语音采集及预处理模块用于对采集的语音信号进行放大、增益控制、滤波及采样等预处理。Further, the voice collection and preprocessing module is used to perform preprocessing such as amplification, gain control, filtering and sampling on the collected voice signal.
进一步地,所述原始谱特征提取模块包括:对预处理后的语音进行分帧、预加重、加窗、快速傅里叶变换,最后进行梅尔倒谱系数MFCC的提取。Further, the original spectral feature extraction module includes: performing framing, pre-emphasis, windowing, and fast Fourier transform on the preprocessed speech, and finally extracting Mel cepstral coefficients MFCC.
进一步地,所述深度置信网络训练模块,以通过大量不同信道下的语料提取出的MFCC特征作为输入,以相应的说话人身份编号(ID)作为输出,对深度置信网络进行有监督的训练,并保存训练好的深度置信网络各层参数。Further, the deep belief network training module takes the MFCC features extracted from the corpus under a large number of different channels as input, and uses the corresponding speaker ID number (ID) as an output to carry out supervised training to the deep belief network, And save the parameters of each layer of the trained deep belief network.
进一步地,所述说话人声纹特征矢量提取模块,将深度置信网络看做一个特征矢量提取器,以MFCC作为深度置信网络的输入,深度置信网络的隐含层输出可以看成是对原始MFCC特征的高层表示(深度特征),这些特征矢量具有信道鲁棒的特点。Further, the speaker voiceprint feature vector extraction module regards the depth belief network as a feature vector extractor, uses MFCC as the input of the depth belief network, and the hidden layer output of the depth belief network can be regarded as the original MFCC High-level representations of features (deep features), these feature vectors are channel-robust.
进一步地,提出了一种神经网络不同隐含层所提取深度特征的区分度测量方法,定义判别比值Rp=det(Sbp)/det(Swp),作为深度特征区分度的度量,其中Sbp是训练数据类间散度矩阵,Swp是训练数据类内散度矩阵,其定义如下:Furthermore, a method for measuring the discrimination of deep features extracted by different hidden layers of the neural network is proposed, and the discriminant ratio R p =det(S bp )/det(S wp ) is defined as the measure of the discrimination of deep features, where S bp is the inter-class scatter matrix of training data, and S wp is the intra-class scatter matrix of training data, which is defined as follows:
其中smj是MFCC特征,fp(·)是深度置信网络对MFCC输入到第p个隐含层输出的映射,Gpm是训练数据类均值向量,Gp是所有训练数据的均值向量,数学表示如下:where s mj is the MFCC feature, f p ( ) is the mapping of the deep belief network from the input of the MFCC to the output of the p-th hidden layer, G pm is the mean vector of the training data class, G p is the mean vector of all training data, and the mathematical Expressed as follows:
类间距离大和类内距离最小有利于所提取的特征矢量的可区分性。因此,判别比值Rp最大的隐含层特征矢量最具区分性,即确定满足k=argmaxp Rp的隐含层的输出作为最佳深度特征。利用所述说话人的深度置信网络第k层深度特征fk(smj),则可以得到特征矢量kth-DBN-vector,其定义为The large inter-class distance and the smallest intra-class distance are beneficial to the distinguishability of the extracted feature vectors. Therefore, the hidden layer feature vector with the largest discrimination ratio R p is the most discriminative, that is, the output of the hidden layer satisfying k=argmax p R p is determined as the best deep feature. Using the k-th layer depth feature f k (s mj ) of the speaker's deep belief network, the feature vector k th -DBN-vector can be obtained, which is defined as
其中m是说话人身份编号,cm是每句话提取出MFCC的帧长,Np是深度置信网络第p个隐含层的维数。Among them, m is the speaker's ID number, c m is the frame length of MFCC extracted from each sentence, and N p is the dimension of the pth hidden layer of the deep belief network.
进一步地,所述说话人声学建模模块,利用所述说话人的特征矢量kth-DBN-vector进行概率线性判别分析(PLDA)建模,并保存PLDA模型参数。Further, the speaker acoustic modeling module uses the speaker's feature vector k th -DBN-vector to perform probabilistic linear discriminant analysis (PLDA) modeling, and saves PLDA model parameters.
进一步地,所述说话人身份鉴定模块,根据训练好的深度置信网络,可以首先提取出注册人和测试人的kth-DBN-vector。然后基于训练好的PLDA模型,得到对数似然比值得分s,最后将得分与给定的阈值s0进行比较,若s≥s0,则认为测试人是注册人,否则不是。Further, the speaker identification module can first extract the k th -DBN-vector of the registrant and the tester according to the trained deep belief network. Then based on the trained PLDA model, the log likelihood ratio score s is obtained, and finally the score is compared with the given threshold s 0 , if s≥s 0 , the tester is considered to be the registrant, otherwise not.
本发明的有益效果在于:随着移动设备的普及,用户会在不同的移动设备间利用声纹识别进行身份验证,这就带来了注册语音和测试语音的信道失配问题,而传统的基于i-vector技术的声纹识别系统在信道失配情况下系统性能一般。深度置信网络作为一种深度网络,具有很强的学习能力,在语音识别等领域具有广泛的应用。通过利用大量不同信道的语音数据对深度置信网络进行训练,训练好的深度置信网络可以提取出对信道鲁棒的说话人特征,从而减小信道失配的影响。因此,基于深度置信网络(DBN)的声纹识别系统能对信道失配有较好的鲁棒性,能够跨设备、跨平台部署,在保证系统验证准确性的同时,为用户在不同移动设备使用声纹识别服务提供便利。The beneficial effect of the present invention is that: with the popularization of mobile devices, users will use voiceprint recognition for identity verification between different mobile devices, which brings the problem of channel mismatch between registration voice and test voice, while the traditional The voiceprint recognition system of i-vector technology has average system performance in the case of channel mismatch. As a deep network, deep belief network has strong learning ability and has a wide range of applications in fields such as speech recognition. By using a large number of speech data of different channels to train the deep belief network, the trained deep belief network can extract speaker features that are robust to the channel, thereby reducing the impact of channel mismatch. Therefore, the voiceprint recognition system based on Deep Belief Network (DBN) can have better robustness to channel mismatch, and can be deployed across devices and platforms. It is convenient to use the voiceprint recognition service.
附图说明Description of drawings
图1是本发明实施例所述的基于深度置信网络特征矢量的信道鲁棒声纹识别系统的结构示意图;Fig. 1 is a schematic structural diagram of a channel robust voiceprint recognition system based on deep belief network feature vectors according to an embodiment of the present invention;
图2.是本发明实施例所述的深度置信网络(DBN)结构示意图。Fig. 2 is a schematic diagram of the structure of the deep belief network (DBN) described in the embodiment of the present invention.
具体实施方式detailed description
以下结合附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.
本发明所述的基于深度置信网络特征矢量的信道鲁棒声纹识别系统,考虑利用大量不同信道的语料和对应的说话人身份编号(ID),对深度置信网络进行有监督的训练,因此利用训练好的深度置信网络提取出的特征矢量具有对信道鲁棒的特点,从而提高声纹识别系统在信道失配情况下的准确率。具体步骤如下,并且结合附图1的本发明系统的结构示意图:The channel robust voiceprint recognition system based on the deep belief network feature vector of the present invention considers the use of a large number of corpus of different channels and the corresponding speaker ID number (ID) to carry out supervised training on the deep belief network, so using The feature vector extracted by the trained deep belief network is robust to the channel, thus improving the accuracy of the voiceprint recognition system in the case of channel mismatch. Concrete steps are as follows, and in conjunction with the structural representation of the system of the present invention of accompanying drawing 1:
S01:语音采集及预处理模块;S01: Speech collection and preprocessing module;
首先获取语音数据,并对语音信号进行放大、增益控制、滤波及采样等预处理。Firstly, the voice data is obtained, and the voice signal is preprocessed such as amplification, gain control, filtering and sampling.
S02:原始谱特征提取模块;S02: Original spectrum feature extraction module;
其中包括对预处理后的语音进行分帧、预加重、加窗、快速傅里叶变换,最后进行梅尔倒谱系数MFCC的提取。It includes framing, pre-emphasis, windowing, fast Fourier transform of the preprocessed speech, and finally the extraction of Mel cepstral coefficient MFCC.
S03:深度置信网络训练模块;S03: Deep belief network training module;
假设说话人每句话提取出的MFCC特征表示为m(1≤m≤M)是说话人身份编号,L是每帧MFCC的长度,cm是帧数。以MFCC作为输入,对应的说话人身份编号作为输出,可以利用训练数据{smj,m,j=1,2,…,cm,m=1,2,…,M}对深度置信网络进行有监督的训练,并保存训练好的深度置信网络各层参数。其中深度置信网络的结构示意图如附图2所示。Assume that the MFCC feature extracted from each sentence of the speaker is expressed as m (1≤m≤M) is the ID number of the speaker, L is the length of each frame of MFCC, and c m is the number of frames. With MFCC as input and the corresponding speaker ID as output, the training data {s mj ,m,j=1,2,…,c m ,m=1,2,…,M} can be used to conduct deep belief network Supervised training, and save the parameters of each layer of the trained deep belief network. The structural diagram of the deep belief network is shown in Figure 2.
S04:说话人声纹特征矢量提取模块;S04: Speaker voiceprint feature vector extraction module;
将深度置信网络看做一个特征矢量提取器,以MFCC作为深度置信网络的输入,深度置信网络的隐含层输出可以看成是对原始MFCC特征的高层表示(特征矢量)。定义函数为深度置信网络从输入到第p个隐含层输出的映射,则可以得到特深度特征:{fp(smj),p=1,2,…,P}。为了测量神经网络不同隐含层所提取特征矢量的区分度,定义判别比值Rp=det(Sbp)/det(Swp),作为深度特征区分度的度量,其中Sbp是训练数据类间散度矩阵,Swp是训练数据类内散度矩阵,其定义如下:The deep belief network is regarded as a feature vector extractor, and MFCC is used as the input of the deep belief network. The hidden layer output of the deep belief network can be regarded as a high-level representation (feature vector) of the original MFCC features. define function is the mapping from the input of the deep belief network to the output of the p-th hidden layer, then the extra-depth features can be obtained: {f p (s mj ),p=1,2,…,P}. In order to measure the discrimination of the feature vectors extracted by different hidden layers of the neural network, the discriminant ratio R p =det(S bp )/det(S wp ) is defined as the measure of the depth feature discrimination, where S bp is the training data between classes The scatter matrix, S wp is the scatter matrix within the training data class, which is defined as follows:
其中smj是MFCC特征,fp(·)是深度置信网络对MFCC输入到第p个隐含层输出的映射,Gpm是训练数据类均值向量,Gp是所有训练数据的均值向量,数学表示如下:where s mj is the MFCC feature, f p ( ) is the mapping of the deep belief network from the input of the MFCC to the output of the p-th hidden layer, G pm is the mean vector of the training data class, G p is the mean vector of all training data, and the mathematical Expressed as follows:
类间距离大和类内距离最小有利于所提取的特征矢量的可区分性。因此,判别比值Rp最大的隐含层特征矢量最具区分性,即确定满足k=argmaxp Rp的隐含层的输出作为最佳深度特征。The large inter-class distance and the smallest intra-class distance are beneficial to the distinguishability of the extracted feature vectors. Therefore, the hidden layer feature vector with the largest discrimination ratio R p is the most discriminative, that is, the output of the hidden layer satisfying k=argmax p R p is determined as the best deep feature.
S05:说话人声学模型生成模块;S05: speaker acoustic model generation module;
利用所述说话人的深度置信网络第k层深度特征fk(smj),则可以得到特征矢量kth-DBN-vector,其定义为Using the k-th layer depth feature f k (s mj ) of the speaker's deep belief network, the feature vector k th -DBN-vector can be obtained, which is defined as
其中m是说话人身份编号,cm是每句话提取出MFCC的帧长,Np是深度置信网络第p个隐含层的维数。最后利用特征矢量kth-DBN-vector进行概率线性判别分析(PLDA)建模,并保存PLDA模型参数。Among them, m is the speaker's ID number, c m is the frame length of MFCC extracted from each sentence, and N p is the dimension of the pth hidden layer of the deep belief network. Finally, the feature vector k th -DBN-vector is used for probabilistic linear discriminant analysis (PLDA) modeling, and the PLDA model parameters are saved.
S06:说话人身份鉴定模块;S06: speaker identification module;
具体步骤为:(1)首先对注册人的语音进行采集及预处理,并提取原始谱MFCC特征,再利用训练好的深度置信网络提取出注册人的特征矢量kth-DBN-vector;(2)对测试人的语音进行采集及预处理,并提取原始谱MFCC特征,再利用训练好的深度置信网络提取出注册人的特征矢量kth-DBN-vector;(3)利用注册人和说话人的kth-DBN-vector,基于训练好的PLDA模型可以得到对数似然比值得分s,最后将得分与给定的阈值s0进行比较,若s≥s0,则认为测试人是注册人,否则不是。The specific steps are: (1) first collect and preprocess the voice of the registrant, and extract the original spectrum MFCC features, and then use the trained deep belief network to extract the registrant’s feature vector k th -DBN-vector; (2 ) Collect and preprocess the voice of the tester, and extract the original spectrum MFCC features, and then use the trained deep belief network to extract the feature vector k th -DBN-vector of the registrant; (3) use the registrant and the speaker k th -DBN-vector, based on the trained PLDA model, the log likelihood ratio score s can be obtained, and finally the score is compared with the given threshold s 0 , if s≥s 0 , the tester is considered to be registered people, otherwise not.
表一选用数据库详细信息Table 1 Selected database details
表二数据库分配Table 2 Database allocation
表三实验参数设置Table 3 Experimental parameter settings
在实际实验过程中,首先选用实验数据库,数据库均为中文语料,其详细信息如表一所示。其中MTDSR2015数据库由北京大学现代信号与数据处理实验室录制,THCHS-30数据库由清华大学录制,King-ASR-L-018数据库由海天瑞声公司发布。In the actual experiment process, the experimental database is selected first, and the database is all Chinese corpus, and its detailed information is shown in Table 1. Among them, the MTDSR2015 database was recorded by the Modern Signal and Data Processing Laboratory of Peking University, the THCHS-30 database was recorded by Tsinghua University, and the King-ASR-L-018 database was released by Haitian AAC.
实验中对上述数据库的分配如表二所示,其中bkg数据用于全局背景模型(UBM)、全局差异矩阵T,PLDA模型的训练,bkg数据和dev中的Part I数据用于深度置信网络的训练,dev中的Part II数据用于注册,eva数据用于测试。The distribution of the above database in the experiment is shown in Table 2, where the bkg data is used for the global background model (UBM), the global difference matrix T, and the training of the PLDA model, and the bkg data and the Part I data in dev are used for the deep belief network. For training, Part II data in dev is used for registration, and eva data is used for testing.
然后设置实验参数,如表三所示,本发明的基于深度置信网络特征矢量的信道鲁棒声纹识别系统为DBN-vector,基线算法选用的是i-vector,算法性能评价指标是等错误率(EER)和最小检测代价函数(minDCF)。Then set the experimental parameters, as shown in Table 3, the channel robust voiceprint recognition system based on the deep belief network feature vector of the present invention is DBN-vector, the baseline algorithm is i-vector, and the algorithm performance evaluation index is equal error rate (EER) and the minimum detection cost function (minDCF).
最终的实验结果和分析:Final experimental results and analysis:
利用bkg数据和dev中Part I数据,可以训练得到一个深度置信网络。通过对深度置信网络各个隐含层输出进行分析,可以得到各个隐含层的判别比值,如表四所示。从表四种可以发现,深度置信网络第4个隐含层的判别比值最大,说明第四个隐含层的深度特征f4(smj)具有最好的区分性,从而选取f4(smj)最为最佳深度特征。Using bkg data and Part I data in dev, a deep belief network can be trained. By analyzing the output of each hidden layer of the deep belief network, the discriminant ratio of each hidden layer can be obtained, as shown in Table 4. From Table 4, it can be found that the discriminative ratio of the fourth hidden layer of the deep belief network is the largest, indicating that the deep feature f 4 (s mj ) of the fourth hidden layer has the best discrimination, so f 4 (s mj ) is selected mj ) is the best deep feature.
表四深度置信网络不同隐含层的判别比值Table 4 Discriminant ratios of different hidden layers of deep belief network
表五不同信道失配情况下i-vector系统和4th-DBN-vector系统性能比较Table 5 Performance comparison between i-vector system and 4 th -DBN-vector system under different channel mismatch conditions
考虑信道失配情况下的系统性能,表五给出了不同信道失配情况下我们发明的系统以及i-vector系统的性能表现,其中a代表HUAWEI mate7,b代表XM4,c代表SamsungNote3,d代表iPhone 5C。以a-b为例,a-b表示注册阶段使用HUAWEI mate7信道进行语音信号采集,测试阶段使用XM4信号进行语音信号采集。根据表四,我们选择深度置信网络第四个隐含层的特征矢量4th-DBN-vector。从表五中可以看出,在每种信道失配情况下,基于深度置信网络特征矢量的信道鲁棒声纹识别系统(4th-DBN-vector)不管从等错误率EER或者最小检测代价函数minDCF都要远小于传统的i-vector系统,且4th-DBN-vector系统的等错误率EER均小于0.9%,最小检测代价函数均小于0.8,说明了基于深度置信网络特征矢量的信道鲁棒声纹识别系统在信道失配情况下对说话人身份鉴定的准确率要好于i-vector系统。Considering the system performance under the condition of channel mismatch, Table 5 shows the performance of the system we invented and the i-vector system under different channel mismatch conditions, where a represents HUAWEI mate7, b represents XM4, c represents Samsung Note3, and d represents iPhone 5C. Take ab as an example, ab indicates that the HUAWEI mate7 channel is used for voice signal collection during the registration phase, and the XM4 signal is used for voice signal collection during the test phase. According to Table IV, we choose the feature vector 4th- DBN -vector of the fourth hidden layer of the deep belief network. It can be seen from Table 5 that in each case of channel mismatch, the channel robust voiceprint recognition system based on the deep belief network feature vector (4 th -DBN-vector) does not matter from the equal error rate EER or the minimum detection cost function The minDCF is much smaller than the traditional i-vector system, and the equal error rate EER of the 4 th -DBN-vector system is less than 0.9%, and the minimum detection cost function is less than 0.8, which shows that the channel robustness based on the feature vector of the deep belief network In the case of channel mismatch, the voiceprint recognition system is more accurate than the i-vector system in identifying the speaker's identity.
以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制,故凡是未脱离本发明技术方案内容,依据本发明的技术实质对以上实施例所作的任何修改、等同变化与修饰,均仍属于本发明技术方案的范围内。The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any form. Therefore, any modification, Equivalent changes and modifications all still belong to the scope of the technical solutions of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611006202.2A CN106448684A (en) | 2016-11-16 | 2016-11-16 | Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611006202.2A CN106448684A (en) | 2016-11-16 | 2016-11-16 | Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106448684A true CN106448684A (en) | 2017-02-22 |
Family
ID=58207211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611006202.2A Pending CN106448684A (en) | 2016-11-16 | 2016-11-16 | Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106448684A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146601A (en) * | 2017-04-07 | 2017-09-08 | 南京邮电大学 | A back-end i‑vector enhancement method for speaker recognition systems |
CN107195077A (en) * | 2017-07-19 | 2017-09-22 | 浙江联运环境工程股份有限公司 | Bottle intelligence recycling machine |
CN107240397A (en) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition |
CN107274906A (en) * | 2017-06-28 | 2017-10-20 | 百度在线网络技术(北京)有限公司 | Voice information processing method, device, terminal and storage medium |
CN107451967A (en) * | 2017-07-25 | 2017-12-08 | 北京大学深圳研究生院 | A kind of single image to the fog method based on deep learning |
CN107481736A (en) * | 2017-08-14 | 2017-12-15 | 广东工业大学 | A voiceprint identity authentication device and its authentication optimization method and system |
CN107527620A (en) * | 2017-07-25 | 2017-12-29 | 平安科技(深圳)有限公司 | Electronic installation, the method for authentication and computer-readable recording medium |
CN107886957A (en) * | 2017-11-17 | 2018-04-06 | 广州势必可赢网络科技有限公司 | Voice wake-up method and device combined with voiceprint recognition |
CN108074575A (en) * | 2017-12-14 | 2018-05-25 | 广州势必可赢网络科技有限公司 | Identity verification method and device based on recurrent neural network |
CN108089099A (en) * | 2017-12-18 | 2018-05-29 | 广东电网有限责任公司佛山供电局 | The diagnostic method of distribution network failure based on depth confidence network |
CN108257592A (en) * | 2018-01-11 | 2018-07-06 | 广州势必可赢网络科技有限公司 | Human voice segmentation method and system based on long-term and short-term memory model |
CN109034246A (en) * | 2018-07-27 | 2018-12-18 | 中国矿业大学(北京) | A kind of the determination method and determining system of roadbed saturation state |
CN109859742A (en) * | 2019-01-08 | 2019-06-07 | 国家计算机网络与信息安全管理中心 | A speaker segmentation clustering method and device |
WO2019154107A1 (en) * | 2018-02-12 | 2019-08-15 | 阿里巴巴集团控股有限公司 | Voiceprint recognition method and device based on memorability bottleneck feature |
WO2019214047A1 (en) * | 2018-05-08 | 2019-11-14 | 平安科技(深圳)有限公司 | Method and apparatus for establishing voice print model, computer device, and storage medium |
CN110555370A (en) * | 2019-07-16 | 2019-12-10 | 西北工业大学 | channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN110600012A (en) * | 2019-08-02 | 2019-12-20 | 特斯联(北京)科技有限公司 | Fuzzy speech semantic recognition method and system for artificial intelligence learning |
WO2020035015A1 (en) * | 2018-08-16 | 2020-02-20 | Huawei Technologies Co., Ltd. | Systems and methods for selecting training objects |
CN111312283A (en) * | 2020-02-24 | 2020-06-19 | 中国工商银行股份有限公司 | Cross-channel voiceprint processing method and device |
CN111402899A (en) * | 2020-03-25 | 2020-07-10 | 中国工商银行股份有限公司 | Cross-channel voiceprint identification method and device |
CN111524524A (en) * | 2020-04-28 | 2020-08-11 | 平安科技(深圳)有限公司 | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium |
CN112967726A (en) * | 2021-02-01 | 2021-06-15 | 上海海事大学 | Deep neural network model short voice speaker confirmation method based on T distribution probability linear discrimination |
CN113611328A (en) * | 2021-06-30 | 2021-11-05 | 公安部第一研究所 | Voiceprint recognition voice evaluation method and device |
CN113763967A (en) * | 2021-08-17 | 2021-12-07 | 珠海格力电器股份有限公司 | A method, device, server and system for binding an APP to a smart home appliance |
CN114093368A (en) * | 2020-07-07 | 2022-02-25 | 华为技术有限公司 | Cross-device voiceprint registration method, electronic device and storage medium |
CN115240641A (en) * | 2022-07-26 | 2022-10-25 | 合肥讯飞数码科技有限公司 | A language identification method, device, storage medium and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2003222487A1 (en) * | 2003-04-21 | 2004-11-19 | Hee-Suk Jeong | Channel mis-match compensation apparatus and method for robust speaker verification system |
US20070233483A1 (en) * | 2006-04-03 | 2007-10-04 | Voice. Trust Ag | Speaker authentication in digital communication networks |
CN102129859A (en) * | 2010-01-18 | 2011-07-20 | 盛乐信息技术(上海)有限公司 | Voiceprint authentication system and method for rapid channel compensation |
CN105845141A (en) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness |
-
2016
- 2016-11-16 CN CN201611006202.2A patent/CN106448684A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2003222487A1 (en) * | 2003-04-21 | 2004-11-19 | Hee-Suk Jeong | Channel mis-match compensation apparatus and method for robust speaker verification system |
US20070233483A1 (en) * | 2006-04-03 | 2007-10-04 | Voice. Trust Ag | Speaker authentication in digital communication networks |
CN102129859A (en) * | 2010-01-18 | 2011-07-20 | 盛乐信息技术(上海)有限公司 | Voiceprint authentication system and method for rapid channel compensation |
CN105845141A (en) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness |
Non-Patent Citations (1)
Title |
---|
D.S. WANG等: "A Robust DBN-vector based Speaker Verification System under Channel Mismatch Conditions", 《2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING》 * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146601A (en) * | 2017-04-07 | 2017-09-08 | 南京邮电大学 | A back-end i‑vector enhancement method for speaker recognition systems |
CN107146601B (en) * | 2017-04-07 | 2020-07-24 | 南京邮电大学 | Rear-end i-vector enhancement method for speaker recognition system |
CN107274906A (en) * | 2017-06-28 | 2017-10-20 | 百度在线网络技术(北京)有限公司 | Voice information processing method, device, terminal and storage medium |
CN107195077A (en) * | 2017-07-19 | 2017-09-22 | 浙江联运环境工程股份有限公司 | Bottle intelligence recycling machine |
JP2019531492A (en) * | 2017-07-25 | 2019-10-31 | 平安科技(深▲せん▼)有限公司Ping An Technology (Shenzhen) Co.,Ltd. | Electronic device, identity authentication method, system, and computer-readable storage medium |
CN107451967A (en) * | 2017-07-25 | 2017-12-08 | 北京大学深圳研究生院 | A kind of single image to the fog method based on deep learning |
CN107527620A (en) * | 2017-07-25 | 2017-12-29 | 平安科技(深圳)有限公司 | Electronic installation, the method for authentication and computer-readable recording medium |
US11068571B2 (en) | 2017-07-25 | 2021-07-20 | Ping An Technology (Shenzhen) Co., Ltd. | Electronic device, method and system of identity verification and computer readable storage medium |
CN107527620B (en) * | 2017-07-25 | 2019-03-26 | 平安科技(深圳)有限公司 | Electronic device, the method for authentication and computer readable storage medium |
CN107451967B (en) * | 2017-07-25 | 2020-06-26 | 北京大学深圳研究生院 | A single image dehazing method based on deep learning |
CN107481736A (en) * | 2017-08-14 | 2017-12-15 | 广东工业大学 | A voiceprint identity authentication device and its authentication optimization method and system |
CN107240397A (en) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition |
CN107886957A (en) * | 2017-11-17 | 2018-04-06 | 广州势必可赢网络科技有限公司 | Voice wake-up method and device combined with voiceprint recognition |
CN108074575A (en) * | 2017-12-14 | 2018-05-25 | 广州势必可赢网络科技有限公司 | Identity verification method and device based on recurrent neural network |
CN108089099A (en) * | 2017-12-18 | 2018-05-29 | 广东电网有限责任公司佛山供电局 | The diagnostic method of distribution network failure based on depth confidence network |
CN108257592A (en) * | 2018-01-11 | 2018-07-06 | 广州势必可赢网络科技有限公司 | Human voice segmentation method and system based on long-term and short-term memory model |
WO2019154107A1 (en) * | 2018-02-12 | 2019-08-15 | 阿里巴巴集团控股有限公司 | Voiceprint recognition method and device based on memorability bottleneck feature |
WO2019214047A1 (en) * | 2018-05-08 | 2019-11-14 | 平安科技(深圳)有限公司 | Method and apparatus for establishing voice print model, computer device, and storage medium |
CN109034246A (en) * | 2018-07-27 | 2018-12-18 | 中国矿业大学(北京) | A kind of the determination method and determining system of roadbed saturation state |
US11615342B2 (en) | 2018-08-16 | 2023-03-28 | Huawei Technologies Co., Ltd. | Systems and methods for generating amplifier gain models using active learning |
WO2020035015A1 (en) * | 2018-08-16 | 2020-02-20 | Huawei Technologies Co., Ltd. | Systems and methods for selecting training objects |
CN109859742A (en) * | 2019-01-08 | 2019-06-07 | 国家计算机网络与信息安全管理中心 | A speaker segmentation clustering method and device |
CN109859742B (en) * | 2019-01-08 | 2021-04-09 | 国家计算机网络与信息安全管理中心 | Speaker segmentation clustering method and device |
CN110555370A (en) * | 2019-07-16 | 2019-12-10 | 西北工业大学 | channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN110555370B (en) * | 2019-07-16 | 2023-03-31 | 西北工业大学 | Channel effect inhibition method based on PLDA factor analysis method in underwater target recognition |
CN110600012A (en) * | 2019-08-02 | 2019-12-20 | 特斯联(北京)科技有限公司 | Fuzzy speech semantic recognition method and system for artificial intelligence learning |
CN111312283A (en) * | 2020-02-24 | 2020-06-19 | 中国工商银行股份有限公司 | Cross-channel voiceprint processing method and device |
CN111402899A (en) * | 2020-03-25 | 2020-07-10 | 中国工商银行股份有限公司 | Cross-channel voiceprint identification method and device |
CN111402899B (en) * | 2020-03-25 | 2023-10-13 | 中国工商银行股份有限公司 | Cross-channel voiceprint recognition method and device |
CN111524524B (en) * | 2020-04-28 | 2021-10-22 | 平安科技(深圳)有限公司 | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium |
WO2021217979A1 (en) * | 2020-04-28 | 2021-11-04 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and device and storage medium |
US12002473B2 (en) * | 2020-04-28 | 2024-06-04 | Ping An Technology (Shenzhen) Co., Ltd. | Voiceprint recognition method, apparatus and device, and storage medium |
US20220254349A1 (en) * | 2020-04-28 | 2022-08-11 | Ping An Technology (Shenzhen) Co., Ltd. | Voiceprint recognition method, apparatus and device, and storage medium |
CN111524524A (en) * | 2020-04-28 | 2020-08-11 | 平安科技(深圳)有限公司 | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium |
CN114093368A (en) * | 2020-07-07 | 2022-02-25 | 华为技术有限公司 | Cross-device voiceprint registration method, electronic device and storage medium |
CN112967726A (en) * | 2021-02-01 | 2021-06-15 | 上海海事大学 | Deep neural network model short voice speaker confirmation method based on T distribution probability linear discrimination |
CN113611328A (en) * | 2021-06-30 | 2021-11-05 | 公安部第一研究所 | Voiceprint recognition voice evaluation method and device |
CN113763967A (en) * | 2021-08-17 | 2021-12-07 | 珠海格力电器股份有限公司 | A method, device, server and system for binding an APP to a smart home appliance |
CN115240641A (en) * | 2022-07-26 | 2022-10-25 | 合肥讯飞数码科技有限公司 | A language identification method, device, storage medium and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106448684A (en) | Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system | |
CN104732978B (en) | Text-related speaker recognition method based on joint deep learning | |
CN106847292B (en) | Method for recognizing sound-groove and device | |
TWI527023B (en) | A voiceprint recognition method and apparatus | |
CN113823293B (en) | Speaker recognition method and system based on voice enhancement | |
CN109524014A (en) | A voiceprint recognition analysis method based on deep convolutional neural network | |
CN110289003A (en) | A voiceprint recognition method, model training method and server | |
CN108986824B (en) | Playback voice detection method | |
CN108231067A (en) | Sound scenery recognition methods based on convolutional neural networks and random forest classification | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN110265035B (en) | Speaker recognition method based on deep learning | |
CN101923855A (en) | Text-independent Voiceprint Recognition System | |
CN101540170B (en) | A voiceprint recognition method based on bionic pattern recognition | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
Wang et al. | A network model of speaker identification with new feature extraction methods and asymmetric BLSTM | |
CN110364168A (en) | A kind of method for recognizing sound-groove and system based on environment sensing | |
CN109036468A (en) | Speech-emotion recognition method based on deepness belief network and the non-linear PSVM of core | |
CN112992155A (en) | Far-field voice speaker recognition method and device based on residual error neural network | |
Zheng et al. | MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios | |
Biagetti et al. | Speaker identification with short sequences of speech frames | |
CN111508524A (en) | Method and system for identifying voice source device | |
CN102496366B (en) | Speaker identification method irrelevant with text | |
CN111091836A (en) | Intelligent voiceprint recognition method based on big data | |
Rouniyar et al. | Channel response based multi-feature audio splicing forgery detection and localization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |