CN107068167A - Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures - Google Patents
Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures Download PDFInfo
- Publication number
- CN107068167A CN107068167A CN201710146957.0A CN201710146957A CN107068167A CN 107068167 A CN107068167 A CN 107068167A CN 201710146957 A CN201710146957 A CN 201710146957A CN 107068167 A CN107068167 A CN 107068167A
- Authority
- CN
- China
- Prior art keywords
- network
- neural network
- speaker
- layer
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
本发明涉及一种融合多种端到端神经网络结构的说话人感冒症状识别方法,包括以下步骤:S1.构建及训练输入为语音,识别网络为卷积神经网络和长短期记忆网络的端到端神经网络A;S2.构建及训练输入为语音频谱,识别网络为卷积神经网络和长短期记忆网络的端到端神经网络B;S3.构建及训练输入为语音频谱,识别网络为卷积神经网络和全连接网络的端到端神经网络C;S4.构建及训练输入为语音MFCC特征/CQCC特征,识别网络为长短期记忆网络的端到端神经网络D;S5.融合以上四种训练好的端到端神经网络进行说话人感冒症状识别。
The invention relates to a method for identifying cold symptoms of a speaker who integrates multiple end-to-end neural network structures. Terminal neural network A; S2. Construction and training input is speech spectrum, and the recognition network is an end-to-end neural network B of convolutional neural network and long-term short-term memory network; S3. Construction and training input is speech spectrum, and the recognition network is convolution End-to-end neural network C of neural network and fully connected network; S4. Construction and training input is speech MFCC feature/CQCC feature, and recognition network is end-to-end neural network D of long short-term memory network; S5. Combine the above four kinds of training A good end-to-end neural network for speaker cold symptom recognition.
Description
技术领域technical field
本发明涉及声纹识别领域,更具体地,涉及一种融合多种端到端神经网络结构的说话人感冒症状识别方法。The present invention relates to the field of voiceprint recognition, and more specifically, relates to a speaker's cold symptom recognition method that integrates multiple end-to-end neural network structures.
背景技术Background technique
说话人识别又称声纹识别,是利用模式识别技术自动识别说话人的技术。当前的说话人识别技术在实验条件中取得很好的性能,但是在实际中,受识别的语音会受到环境噪声和说话人健康条件的影响,使得已有说话人识别技术的鲁棒性降低。现有的说话人识别方法主要用于说话人身份确定方面,目前还没有相关的应用于说话人感冒症状的识别方法。Speaker recognition, also known as voiceprint recognition, is a technology that uses pattern recognition technology to automatically identify speakers. The current speaker recognition technology has achieved good performance in experimental conditions, but in practice, the recognized speech will be affected by environmental noise and the speaker's health conditions, which reduces the robustness of the existing speaker recognition technology. Existing speaker recognition methods are mainly used to determine the speaker's identity, and there is no relevant recognition method applied to the speaker's cold symptoms at present.
在语音技术研究中,研究者总是希望能找到表示目标类型的特征,从识别目标语音中找到明显区别正常语音的特性进行描述,语音特征提取是提取说话人的语音特征和声道特征,目前,主流的特征参数包括MFCC、LPCC、CQCC等,都是以单个特征为主,表征说话人感冒症状的信息不足,影响识别精度。同时需要大量区分分类目标语音的知识,而在识别算法中,起步较早的是基于声道模型和语音模型的方法,但是因为模型的复杂性,没有取得很好的实用效果。而模型匹配方法如动态时间规整、隐马尔可夫模型、矢量量化等技术等开始发挥良好的识别效果。把特征提取和模式分类分开研究是识别研究的常用方法,但是存在特征和模型不匹配、训练困难、特征不易寻找的问题,经典的识别框架存在上述的问题。In the research of speech technology, researchers always hope to find the features that represent the target type, and describe the characteristics that are clearly different from normal speech from the recognition of target speech. Speech feature extraction is to extract the speaker's speech features and vocal tract features. Currently, , the mainstream feature parameters include MFCC, LPCC, CQCC, etc., all of which are based on a single feature, and the information representing the speaker's cold symptoms is insufficient, which affects the recognition accuracy. At the same time, a lot of knowledge is needed to distinguish the target speech for classification. In the recognition algorithm, the method based on the vocal tract model and the speech model started earlier, but because of the complexity of the model, it has not achieved good practical results. And model matching methods such as dynamic time warping, hidden Markov model, vector quantization and other technologies have begun to play a good role in recognition. Separate research on feature extraction and pattern classification is a common method for recognition research, but there are problems such as mismatch between features and models, difficulty in training, and difficulty in finding features. The classic recognition framework has the above problems.
近年来随着深度学习的发展,基于深层神经网络在图像和语音的识别已显示出巨大的能量,一系列的神经网络结构也被提出,比如自动编码网络、卷积神经网络和循环神经网络等。有很多学者发现,通过神经网络对语音进行学习,可以得到更好描述语音的隐藏结构特征,端到端的识别方法就是通过尽量少的先验知识,同时对特征学习和特征识别进行处理,具有很好的识别效果。In recent years, with the development of deep learning, the recognition of images and speech based on deep neural networks has shown great energy, and a series of neural network structures have also been proposed, such as automatic encoding networks, convolutional neural networks, and recurrent neural networks. . Many scholars have found that learning speech through neural networks can better describe the hidden structural features of speech. The end-to-end recognition method is to process feature learning and feature recognition with as little prior knowledge as possible. Good recognition effect.
发明内容Contents of the invention
本发明为解决现有技术提供的识别技术将特征提取和模式分类分开导致的特征和模型不匹配、训练困难,特征不易寻找等问题,提供了一种融合多种端到端神经网络结构的说话人感冒症状识别方法,该方法通过把特征学习和模式分类统一在一起,使得整个说话人感冒症状识别过程更加简单快速,具有广泛的应用前景。The present invention provides a language that integrates multiple end-to-end neural network structures in order to solve the problems of feature extraction and pattern classification that are caused by the separation of feature extraction and pattern classification in the prior art. A method for recognizing cold symptoms of a person. By unifying feature learning and pattern classification, the method makes the entire speaker's cold symptom recognition process simpler and faster, and has broad application prospects.
为实现以上发明目的,采用的技术方案是:For realizing above-mentioned purpose of the invention, the technical scheme that adopts is:
融合多种端到端神经网络结构的说话人感冒症状识别方法,包括以下步骤:A speaker's cold symptom recognition method that integrates multiple end-to-end neural network structures, including the following steps:
S1.构建及训练输入为语音,识别网络为卷积神经网络和长短期记忆网络的端到端神经网络A;S1. Construct and train an end-to-end neural network A in which the input is speech, and the recognition network is a convolutional neural network and a long-term short-term memory network;
S2.构建及训练输入为语音频谱,识别网络为卷积神经网络和长短期记忆网络的端到端神经网络B;S2. Construction and training input is speech spectrum, and the recognition network is an end-to-end neural network B of convolutional neural network and long-term short-term memory network;
S3.构建及训练输入为语音频谱,识别网络为卷积神经网络和全连接网络的端到端神经网络C;S3. Construction and training input is speech spectrum, and the recognition network is an end-to-end neural network C of convolutional neural network and fully connected network;
S4.构建及训练输入为语音MFCC特征/CQCC特征,识别网络为长短期记忆网络的端到端神经网络D;S4. Construction and training input is the voice MFCC feature/CQCC feature, and the recognition network is an end-to-end neural network D of a long short-term memory network;
S5.融合以上四种训练好的端到端神经网络进行说话人感冒症状识别。S5. Combining the above four trained end-to-end neural networks to identify the speaker's cold symptoms.
优选地,所述端到端神经网络A的卷积神经网络包括8个模块,每个模块均包括一维卷积层、ReLU激活层和一维最大池化层,其中一维卷积层的卷积核的大小为32,一维最大池化层的池化核的大小为2,池化步长为2。Preferably, the convolutional neural network of the end-to-end neural network A includes 8 modules, each of which includes a one-dimensional convolutional layer, a ReLU activation layer and a one-dimensional maximum pooling layer, wherein the one-dimensional convolutional layer The size of the convolution kernel is 32, the size of the pooling kernel of the one-dimensional maximum pooling layer is 2, and the pooling stride is 2.
优选地,所述端到端神经网络B的卷积神经网络包括6个模块,每个模块包括二维卷积层、ReLU激活层和二维最大池化层;其中第一个卷积层使用7*7的卷积核,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核;所有的最大池化层均使用3*3的池化核,池化步长为2。Preferably, the convolutional neural network of the end-to-end neural network B includes 6 modules, and each module includes a two-dimensional convolutional layer, a ReLU activation layer and a two-dimensional maximum pooling layer; wherein the first convolutional layer uses 7*7 convolution kernels, the second layer uses 5*5 convolution kernels, and the remaining 4 layers use 3*3 convolution kernels; all maximum pooling layers use 3*3 pooling kernels, pooling The step size is 2.
优选地,所述端到端神经网络C的卷积神经网络包括6个模块,每个模块包括二维卷积层、ReLU激活层和二维最大池化层;其中第一个卷积层使用7*7的卷积核,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核;所有的最大池化层均使用3*3的池化核,池化步长为2。Preferably, the convolutional neural network of the end-to-end neural network C includes 6 modules, and each module includes a two-dimensional convolutional layer, a ReLU activation layer and a two-dimensional maximum pooling layer; wherein the first convolutional layer uses 7*7 convolution kernels, the second layer uses 5*5 convolution kernels, and the remaining 4 layers use 3*3 convolution kernels; all maximum pooling layers use 3*3 pooling kernels, pooling The step size is 2.
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
现有识别技术都是把特征和模式分类分开研究,存在特征和模型不匹配、训练困难,特征不易寻找等问题。而本发明提供的方法通过融合四种不同的端到端神经网络把特征学习和模式分类统一在一起,使得整个说话人感冒症状识别过程更加简单快速,具有广泛的应用前景。Existing recognition technologies study features and pattern classification separately, and there are problems such as mismatch between features and models, difficulty in training, and difficulty in finding features. However, the method provided by the present invention unifies feature learning and pattern classification by fusing four different end-to-end neural networks, so that the entire speaker's cold symptom recognition process is simpler and faster, and has broad application prospects.
附图说明Description of drawings
图1为方法的具体实施示意图。Figure 1 is a schematic diagram of the specific implementation of the method.
图2为语音提取梅尔倒谱系数(MFCC)的流程图。Fig. 2 is a flow chart of extracting Mel cepstral coefficients (MFCC) from speech.
图3为语音提取常数Q倒谱系数(CQCC)的流程图。Fig. 3 is a flow chart of speech extraction constant Q cepstrum coefficient (CQCC).
图4为端到端神经网络A的示意图。FIG. 4 is a schematic diagram of an end-to-end neural network A.
图5为端到端神经网络B的示意图。FIG. 5 is a schematic diagram of an end-to-end neural network B.
图6为端到端神经网络C的示意图。FIG. 6 is a schematic diagram of an end-to-end neural network C.
图7为端到端神经网络D的示意图。FIG. 7 is a schematic diagram of an end-to-end neural network D.
具体实施方式detailed description
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;
以下结合附图和实施例对本发明做进一步的阐述。The present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments.
实施例1Example 1
图1为本发明提供的方法的具体实施过程图,如图1所示,本发明提供的融合多种端到端神经网络结构的说话人感冒症状识别方法,包括以下步骤:Fig. 1 is a specific implementation process diagram of the method provided by the present invention. As shown in Fig. 1, the speaker's cold symptom recognition method that fuses multiple end-to-end neural network structures provided by the present invention includes the following steps:
S1.构建及训练输入为语音,识别网络为卷积神经网络和长短期记忆网络的端到端神经网络A;S1. Construct and train an end-to-end neural network A in which the input is speech, and the recognition network is a convolutional neural network and a long-term short-term memory network;
S2.构建及训练输入为语音频谱,识别网络为卷积神经网络和长短期记忆网络的端到端神经网络B;S2. Construction and training input is speech spectrum, and the recognition network is an end-to-end neural network B of convolutional neural network and long-term short-term memory network;
S3.构建及训练输入为语音频谱,识别网络为卷积神经网络和全连接网络的端到端神经网络C;S3. Construction and training input is speech spectrum, and the recognition network is an end-to-end neural network C of convolutional neural network and fully connected network;
S4.构建及训练输入为语音MFCC特征/CQCC特征,识别网络为长短期记忆网络的端到端神经网络D,具体如图7所示;S4. construction and training input is speech MFCC feature/CQCC feature, and recognition network is the end-to-end neural network D of long short-term memory network, specifically as shown in Figure 7;
S5.融合以上四种训练好的端到端神经网络进行说话人感冒症状识别。S5. Combining the above four trained end-to-end neural networks to identify the speaker's cold symptoms.
其中,如图2、3所示,步骤S4中的MFCC特征通过对语音进行预加重,加窗分帧、快速傅里叶变换、计算能量谱密度、梅尔刻度三角滤波器组滤波、取对数运算、离散余弦变换后最终得到的,而CQCC特征是通过对语音进行常数Q变换、求能量谱密度、取对数操作、离散余弦变换得到的。Among them, as shown in Figures 2 and 3, the MFCC features in step S4 are pre-emphasized on the speech, windowed and framed, fast Fourier transform, calculated energy spectral density, Mel scale triangular filter bank filtering, and paired The CQCC feature is obtained by performing constant Q transformation on the speech, calculating the energy spectral density, taking logarithmic operation, and discrete cosine transform.
在具体的实施过程中,如图4所示,所述端到端神经网络A的卷积神经网络包括8个模块,每个模块均包括一维卷积层、ReLU激活层和一维最大池化层,其中一维卷积层的卷积核的大小为32,一维最大池化层的池化核的大小为2,池化步长为2。In the specific implementation process, as shown in Figure 4, the convolutional neural network of the end-to-end neural network A includes 8 modules, and each module includes a one-dimensional convolution layer, a ReLU activation layer and a one-dimensional maximum pooling layer, where the size of the convolution kernel of the one-dimensional convolutional layer is 32, the size of the pooling kernel of the one-dimensional maximum pooling layer is 2, and the pooling step size is 2.
在具体的实施过程中,如图5所示,所述端到端神经网络B的卷积神经网络包括6个模块,每个模块包括二维卷积层、ReLU激活层和二维最大池化层;其中第一个卷积层使用7*7的卷积核,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核;所有的最大池化层均使用3*3的池化核,池化步长为2。In the specific implementation process, as shown in Figure 5, the convolutional neural network of the end-to-end neural network B includes 6 modules, each module includes a two-dimensional convolutional layer, a ReLU activation layer and a two-dimensional maximum pooling layer; where the first convolutional layer uses a 7*7 convolutional kernel, the second layer uses a 5*5 convolutional kernel, and the remaining 4 layers use a 3*3 convolutional kernel; all maximum pooling layers are Use a 3*3 pooling kernel with a pooling step size of 2.
在具体的实施过程中,如图6所示,所述端到端神经网络C的卷积神经网络包括6个模块,每个模块包括二维卷积层、ReLU激活层和二维最大池化层;其中第一个卷积层使用7*7的卷积核,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核;所有的最大池化层均使用3*3的池化核,池化步长为2。In the specific implementation process, as shown in Figure 6, the convolutional neural network of the end-to-end neural network C includes 6 modules, each module includes a two-dimensional convolutional layer, a ReLU activation layer and a two-dimensional maximum pooling layer; where the first convolutional layer uses a 7*7 convolutional kernel, the second layer uses a 5*5 convolutional kernel, and the remaining 4 layers use a 3*3 convolutional kernel; all maximum pooling layers are Use a 3*3 pooling kernel with a pooling step size of 2.
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, on the basis of the above description, other changes or changes in different forms can also be made. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.
Claims (4)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710146957.0A CN107068167A (en) | 2017-03-13 | 2017-03-13 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
| PCT/CN2018/076272 WO2018166316A1 (en) | 2017-03-13 | 2018-02-11 | Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710146957.0A CN107068167A (en) | 2017-03-13 | 2017-03-13 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN107068167A true CN107068167A (en) | 2017-08-18 |
Family
ID=59621946
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710146957.0A Pending CN107068167A (en) | 2017-03-13 | 2017-03-13 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107068167A (en) |
| WO (1) | WO2018166316A1 (en) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108053841A (en) * | 2017-10-23 | 2018-05-18 | 平安科技(深圳)有限公司 | The method and application server of disease forecasting are carried out using voice |
| WO2018166316A1 (en) * | 2017-03-13 | 2018-09-20 | 佛山市顺德区中山大学研究院 | Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures |
| CN108899051A (en) * | 2018-06-26 | 2018-11-27 | 北京大学深圳研究生院 | A kind of speech emotion recognition model and recognition methods based on union feature expression |
| CN109086892A (en) * | 2018-06-15 | 2018-12-25 | 中山大学 | It is a kind of based on the visual problem inference pattern and system that typically rely on tree |
| CN109192226A (en) * | 2018-06-26 | 2019-01-11 | 深圳大学 | A kind of signal processing method and device |
| CN109256118A (en) * | 2018-10-22 | 2019-01-22 | 江苏师范大学 | End-to-end Chinese dialects identifying system and method based on production auditory model |
| CN109282837A (en) * | 2018-10-24 | 2019-01-29 | 福州大学 | Demodulation method of fiber Bragg grating interleaved spectrum based on LSTM network |
| CN109960910A (en) * | 2017-12-14 | 2019-07-02 | 广东欧珀移动通信有限公司 | Voice processing method, device, storage medium and terminal device |
| CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | A hybrid neural network vehicle recognition method based on audio feature fusion |
| CN116110437A (en) * | 2023-04-14 | 2023-05-12 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| ES2993990T3 (en) | 2017-03-03 | 2025-01-15 | Pindrop Security Inc | Method and apparatus for detecting spoofing conditions |
| US12488072B2 (en) | 2020-04-15 | 2025-12-02 | Pindrop Security, Inc. | Passive and continuous multi-speaker voice biometrics |
| CN114299987A (en) * | 2021-12-08 | 2022-04-08 | 中国科学技术大学 | Training method of event analysis model, event analysis method and device thereof |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5214743A (en) * | 1989-10-25 | 1993-05-25 | Hitachi, Ltd. | Information processing apparatus |
| CN105139864A (en) * | 2015-08-17 | 2015-12-09 | 北京天诚盛业科技有限公司 | Voice recognition method and voice recognition device |
| CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107068167A (en) * | 2017-03-13 | 2017-08-18 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
-
2017
- 2017-03-13 CN CN201710146957.0A patent/CN107068167A/en active Pending
-
2018
- 2018-02-11 WO PCT/CN2018/076272 patent/WO2018166316A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5214743A (en) * | 1989-10-25 | 1993-05-25 | Hitachi, Ltd. | Information processing apparatus |
| CN105139864A (en) * | 2015-08-17 | 2015-12-09 | 北京天诚盛业科技有限公司 | Voice recognition method and voice recognition device |
| CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
Non-Patent Citations (2)
| Title |
|---|
| TARA N. SAINATH等: "Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks", 《ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2015 IEEE INTERNATIONAL CONFERENCE ON》 * |
| 杜朦旭: "感冒病人嗓音的特征提取与识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018166316A1 (en) * | 2017-03-13 | 2018-09-20 | 佛山市顺德区中山大学研究院 | Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures |
| CN108053841A (en) * | 2017-10-23 | 2018-05-18 | 平安科技(深圳)有限公司 | The method and application server of disease forecasting are carried out using voice |
| CN109960910A (en) * | 2017-12-14 | 2019-07-02 | 广东欧珀移动通信有限公司 | Voice processing method, device, storage medium and terminal device |
| CN109960910B (en) * | 2017-12-14 | 2021-06-08 | Oppo广东移动通信有限公司 | Voice processing method, device, storage medium and terminal device |
| CN109086892B (en) * | 2018-06-15 | 2022-02-18 | 中山大学 | General dependency tree-based visual problem reasoning model and system |
| CN109086892A (en) * | 2018-06-15 | 2018-12-25 | 中山大学 | It is a kind of based on the visual problem inference pattern and system that typically rely on tree |
| CN108899051B (en) * | 2018-06-26 | 2020-06-16 | 北京大学深圳研究生院 | A speech emotion recognition model and recognition method based on joint feature representation |
| CN109192226A (en) * | 2018-06-26 | 2019-01-11 | 深圳大学 | A kind of signal processing method and device |
| CN108899051A (en) * | 2018-06-26 | 2018-11-27 | 北京大学深圳研究生院 | A kind of speech emotion recognition model and recognition methods based on union feature expression |
| CN109256118A (en) * | 2018-10-22 | 2019-01-22 | 江苏师范大学 | End-to-end Chinese dialects identifying system and method based on production auditory model |
| CN109256118B (en) * | 2018-10-22 | 2021-06-25 | 江苏师范大学 | End-to-end Chinese dialect recognition system and method based on generative auditory model |
| CN109282837A (en) * | 2018-10-24 | 2019-01-29 | 福州大学 | Demodulation method of fiber Bragg grating interleaved spectrum based on LSTM network |
| CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | A hybrid neural network vehicle recognition method based on audio feature fusion |
| CN116110437A (en) * | 2023-04-14 | 2023-05-12 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2018166316A1 (en) | 2018-09-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107068167A (en) | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures | |
| CN109243467B (en) | Sound-groove model construction method, method for recognizing sound-groove and system | |
| CN109036465B (en) | Speech emotion recognition method | |
| CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
| CN112509564A (en) | End-to-end voice recognition method based on connection time sequence classification and self-attention mechanism | |
| CN108766419A (en) | A kind of abnormal speech detection method based on deep learning | |
| Bhattacharjee | A comparative study of LPCC and MFCC features for the recognition of Assamese phonemes | |
| CN111048097B (en) | Twin network voiceprint recognition method based on 3D convolution | |
| CN105096955B (en) | A kind of speaker's method for quickly identifying and system based on model growth cluster | |
| CN107039036B (en) | High-quality speaker recognition method based on automatic coding depth confidence network | |
| CN103117060A (en) | Modeling approach and modeling system of acoustic model used in speech recognition | |
| CN107731233A (en) | A kind of method for recognizing sound-groove based on RNN | |
| CN108986798B (en) | Processing method, device and the equipment of voice data | |
| CN115101076B (en) | Speaker clustering method based on multi-scale channel separation convolution feature extraction | |
| CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
| CN113571095B (en) | Speech emotion recognition method and system based on nested deep neural network | |
| CN109346084A (en) | Speaker recognition method based on deep stack autoencoder network | |
| Zheng et al. | MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios | |
| Wang et al. | A network model of speaker identification with new feature extraction methods and asymmetric BLSTM | |
| CN116469395A (en) | A speaker recognition method based on Fca-Res2Net fusion self-attention | |
| CN113763965A (en) | Speaker identification method with multiple attention characteristics fused | |
| Fasounaki et al. | CNN-based Text-independent automatic speaker identification using short utterances | |
| CN108877812B (en) | A voiceprint recognition method, device and storage medium | |
| Sukhwal et al. | Comparative study of different classifiers based speaker recognition system using modified MFCC for noisy environment | |
| CN109346104A (en) | A Dimensionality Reduction Method for Audio Features Based on Spectral Clustering |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170818 |