[go: up one dir, main page]

CN104008751A - Speaker recognition method based on BP neural network - Google Patents

Speaker recognition method based on BP neural network Download PDF

Info

Publication number
CN104008751A
CN104008751A CN201410270239.0A CN201410270239A CN104008751A CN 104008751 A CN104008751 A CN 104008751A CN 201410270239 A CN201410270239 A CN 201410270239A CN 104008751 A CN104008751 A CN 104008751A
Authority
CN
China
Prior art keywords
neural network
speech
voice
training
particle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410270239.0A
Other languages
Chinese (zh)
Inventor
周婷婷
李燕萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410270239.0A priority Critical patent/CN104008751A/en
Publication of CN104008751A publication Critical patent/CN104008751A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention provides a speaker recognition method based on a BP neural network. The speaker recognition method comprises the steps of a speech training phase and a speech recognition phase. The method is characterized in that according to the speech training phase, speech training is firstly carried out on the speech of a speaker to obtain speech preprocessing signals. Feature extraction is carried out on the speech preprocessing signals through an MFCC speech parameter extraction method, then model training is carried out by adopting the PSO-BP neural network, and a PSO-BP neural network model base is built and optimized through trained models. In the speech recognition phase, the method the same as that in the speech training phase is adopted. Feature parameters are input in the BP neural network, an output result is calculated through a pso-BP procedure algorithm, the output result is compared with expected recognition identities in a database one by one, and the identity with the minimum recognition error is used as the final recognition result.

Description

A kind of method for distinguishing speek person based on BP neural network
Technical field
The present invention relates to speaker Recognition Technology, particularly relate to a kind of method for distinguishing speek person based on BP neural network.
Background technology
Speaker Identification (Speaker Recognition, SR) claims again words person identification, refers to by the analyzing and processing to speaker's voice signal, automatically confirms speaker's technology.It combines a research topic of the subject knowledges such as physiology, phonetics, digital signal processing, pattern-recognition, artificial intelligence, with advantages such as unique convenience, economy and accuracys, in association area, play an important role, and have wide market background.The ultimate principle of Speaker Identification, that to utilize speaker's voice be that each speaker sets up the model that can describe this speaker's feature, as the standard form of this speaker's speech characteristic parameter, then compare for the voice signal of test, realize the object of differentiating speaker ' s identity.
The pronunciation channel that speaker's personal characteristics is embodied in speaker to a certain extent changes above, and sound channel feature can be identified speaker better.Feature based on sound channel mainly contains: (1) Mel-cepstrum coefficient (Mel-frequency CepstralCoefficients, MFCC), be the critical band effect based on auditory system, a kind of cepstrum parameter extracting in Mel scale frequency territory.It can relatively make full use of this special apperceive characteristic of people's ear, and this feature has more intense robustness, is widely applied.(2) linear prediction cepstrum coefficient coefficient (LinearPredictionCepstrum Coefficient, LPCC), nineteen forty-seven Wei Na has proposed this term of linear prediction first, and the people such as plate storehouse are in first 1967 be applied to linear forecasting technology speech analysis and synthesized.LPCC is a kind of cepstrum parameter being applied to the earliest in speech recognition, its major advantage is the excitation information having removed more up hill and dale in voice production process, the response of main reflection sound channel, calculated amount is little, and vowel is had to descriptive power preferably, and often only need tens cepstrum coefficients just can describe preferably the resonance peak characteristic of voice, therefore in Speaker Identification, obtain good application.
In voice technology research and application, the recognizer of voice signal has three kinds: the method based on channel model and voice knowledge, the method for template matches and utilize the method for artificial neural network.Although the research starting based on channel model and voice knowledge aspect early, due to its complicacy, present stage is not obtained good practical function.The method of template matches has dynamic time warping (DTW), Hidden Markov (HMM) theory, vector quantization (VQ) technology, and these algorithms interference performance under noise circumstance is poor, can not reach good recognition effect.Artificial Neural Network has adaptivity carrying out property, robustness, fault-tolerance and learning characteristic, the classification capacity that it is powerful and input-output mapping ability very attractive all in speech recognition.
Backpropagation (BackPropagation, BP) network is the Multi-layered Feedforward Networks of a kind of error backpropagation algorithm training, has massively parallel processing, distributed information storage, the good advantage such as self-organization self-learning capability and simple, the easy realization of principle.But also there is intrinsic defect in it: be easily absorbed in local minimum, speed of convergence is slow, network generalization a little less than.And genetic algorithm is as a kind of global optimization approach, can search out fast all in solution space, and there will not be the falling trap that falls into locally optimal solution, there is the feature of Distributed Calculation due to genetic algorithm simultaneously, can pick up speed in the time of actual solving, and there is stronger precision of prediction than traditional BP neural network.And the square error of prediction is also less.
Summary of the invention
Object of the present invention is exactly to provide a kind of method for distinguishing speek person based on BP neural network in order to overcome the defect that above-mentioned prior art exists.
Object of the present invention can be achieved through the following technical solutions: a kind of method for distinguishing speek person based on BP neural network, the steps include: to be divided into voice training stage and speech recognition stage two steps; It is characterized in that: the step in described voice training stage is: first speaker's voice are carried out to voice training, obtain speaker's voice signal, and obtain voice preprocessed signal.Adopt MFCC speech parameter extraction method to carry out feature extraction to voice preprocessed signal, try to achieve speaker's characteristic parameter; Then adopt PSO-BP neural network to carry out model training, the model after training, sets up and optimization PSO-BP neural network model storehouse.2. when speech recognition, the same method while adopting with the voice training stage extracts phonetic feature from voice to be identified.In BP neural network, input above-mentioned characteristic parameter, then call respectively the network weight that in model bank, everyone has kept; And calculate Output rusults by pso-BP flow algorithm, the expectation identification identity in the result of output and database is compared one by one, using that identity of identification error minimum as last recognition result.
The invention has the beneficial effects as follows: the present invention utilizes MFCC and BP neural network to combine, method for distinguishing speek person disclosed by the invention can more effective identification speaker, the present invention is using standard back-propagation algorithm (Back Propagation) BP neural network as with reference to object, by carrying out Optimized BP Neural Network with particle cluster algorithm to reduce the erroneous judgement of abnormal sound, there is stronger precision of prediction than traditional BP neural network, and the square error of prediction is also less, is with a wide range of applications.
Brief description of the drawings
Fig. 1 is speech recognition process schematic diagram of the present invention.
Fig. 2 is that MFCC speech parameter of the present invention extracts schematic diagram.
Fig. 3 is pso-BP flow algorithm schematic diagram of the present invention.
Fig. 4 is PSO-BP neural network schematic diagram of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
According to a kind of method for distinguishing speek person based on BP neural network shown in Fig. 1, Fig. 2, Fig. 3, Fig. 4, the steps include: to be divided into voice training stage and speech recognition stage two steps; It is characterized in that: the step in described voice training stage is: first speaker's voice are carried out to voice training, obtain speaker's voice signal, and obtain voice preprocessed signal.That is: voice signal pre-service, comprising: be divided into four parts by pre-emphasis, end-point detection, point frame and windowing.
1. pre-emphasis
Because the front end of voice signal presents rapid fading, the corresponding signal content of voice signal frequency spectrum that frequency is higher is less, will carry out pre-emphasis for this reason.The object of pre-emphasis is that the frequency spectrum of more useful HFS is promoted, and makes the frequency spectrum of signal become smooth, remains on low frequency in the whole frequency band of high frequency, can ask frequency spectrum by same signal to noise ratio (S/N ratio), so that carry out spectrum analysis or channel parameters analysis.The transport function of pre-emphasis is: H (s)=1-μ s -1, wherein μ is pre emphasis factor, can be taken as 1 or than 1 slightly little value, generally get μ=0.95.
2. end-point detection
The object of end-point detection is the segment signal from comprising voice, to determine starting point and the terminal of voice.End-point detection can not only make the processing time reduce to minimum effectively, and can get rid of the noise of unvoiced segments, thereby makes recognition system have good recognition performance.
End-point detection technology is mostly that the temporal signatures based on voice signal carries out, and adopts two kinds of temporal signatures herein: short-time energy and short-time zero-crossing rate, detect by the thresholding of setting them.Short-time energy is defined as: E n = Σ m = 0 N - 1 [ X ( m ) W ( n - m ) ] 2 , Make h (n)=w 2(n), have: E n = Σ m = 0 N - 1 X ( m ) 2 · h ( n - m ) . The short-time average magnitude of voice signal is:
E nand M nall reflected signal intensity.The short-time average zero-crossing rate of voice signal X (n) is defined as:
Z n = &Sigma; m = - &infin; &infin; | sgn [ x ( m ) ] - sgn [ x ( m - 1 ) ] | w ( n - m ) , Wherein: sgn [ x ( m ) ] = 1 , x ( n ) &GreaterEqual; 0 , - 1 , x ( n ) < 0
W (n) is window function, and its effect is the same when asking short-time average energy.Generally get
w ( n ) = 1 2 N , 0 &le; n &le; N - 1 , 0 , else
3. point frame
The voice of certain length are divided into many frames and analyze, can analyze by the analytical approach to stationary process, therefore voice signal is divided into short time interval one by one by the present invention, and each short time interval is called a frame, and the length of each frame is probably 10-30ms.In order to make to seamlessly transit between frame and frame, make it keep continuity, adopt the method for overlapping segmentation, the postamble of each frame and the frame head of next frame are overlapping.
4. windowing
In order to reduce the truncation effect of speech frame, reduce the gradient at frame two ends, the two ends of speech frame are not caused sharply change and be smoothly transitted into zero, will allow speech frame be multiplied by a window function.If frame signal is x (n), window function is y (n), the number of sampling N of every frame, and the signal y (n) after windowing is:
y(n)=x(n)w(n),0≤n≤N-1
It is Hamming window that the present invention adopts window function, its expression formula as
w ( n ) = 0.54 - 0.46 cos [ 2 &pi;n / ( N - 1 ) ] , 0 &le; n &le; ( N - 1 ) 0 , else
When waveform is multiplied by Hamming window, compressed the portion waveshape that approaches function two ends, this is equivalent to analyzes interval of use and has shortened 40% left and right, with this frequency resolution 40% left and right that also declined thereupon.Even so in periodically obvious voiced sound spectrum analysis, be multiplied by applicable window function, also can suppress the variable effect of the relative phase relation of pitch period analystal section, thereby can obtain stable frequency spectrum.
5. speech de-noising
Voice signal will be purified as far as possible before transmission, and it is very crucial can improving voice communication quality.The present invention utilizes wavelet transformation to realize the denoising of signal, has good purification sound effect.
Suppose that Noisy Speech Signal is in f (t)=s (t)+n (t) formula: s (t) is pure voice signal, n (t) is that variance is σ 2white Gaussian noise.
Formula (1) is made to wavelet transform: w j , k ( f ) = &Integral; f ( t ) &psi; j , k ( t ) &OverBar; dt , j = 0,1,2 . . . N ; k = 0,1 , . . . N
In formula: &psi; j , k ( k ) = 2 1 2 &psi; ( 2 j t - k )
Wj, k (f) is wavelet coefficient, is designated as cd j.k.First to being carried out discrete series wavelet transformation by the voice signal of noise pollution, obtain being with noisy wavelet coefficient; Then with the threshold value λ setting, as thresholding, wavelet coefficient is processed, as what caused by noise, only allowed those significant wavelet coefficients of exceeding λ be used for reconstructed speech signal to the wavelet coefficient lower than λ.
Adopt MFCC speech parameter extraction method to carry out feature extraction to voice preprocessed signal, try to achieve speaker's characteristic parameter; That is: MFCC speech parameter extracts and shows that method is as follows:
1. through pretreated voice signal X (n, ω k) amplitude by by the frequency response weighting of Mel scale bank of filters.It is evenly distributed that the centre frequency of Mel scale bank of filters is pressed Mel frequency, and point is the center of adjacent filter at the bottom of two of each triangular filter, and the centre frequency of these wave filters and bandwidth and sense of hearing critical edge band filter group are unanimous on the whole.In system, Mel scale number of filter value is 28.
2. this step is calculated the energy value after the weighting of Mel scale filter frequency, represents the frequency response of first wave filter Vl (ω).The energy of the 1st Mel scale wave filter output of the speech frame of moment n is Emel (n, 1), computing formula wherein U1 and L1 represent each wave filter the highest and low-limit frequency between area of non-zero regions.
Wherein effect be according to the bandwidth of wave filter, wave filter to be normalized.Make for the input that has smooth frequency spectrum, each wave filter is by energy equal output.
(3) according to Emel (n, l), the output of bank of filters is taken the logarithm, then it is done to discrete cosine transform (DCT), obtain the Mel cepstrum coefficient of the speech frame that is positioned at moment n, be calculated as follows
C mel [ n , m ] = L R &Sigma; l = 0 R - 1 log { E mel ( n , l ) } cos ( 2 &pi; R lm )
Then adopt PSO-BP neural network to carry out model training, the model after training, sets up and optimization PSO-BP neural network model storehouse.That is: the model bank method of the foundation of PSO-BP neural network and optimization is as follows:
Step 1: initialization
Initialization BP network structure, comprises input layer, hidden layer, the neuron number of output layer and the input and output of learning rate and training sample of setting network.
Initialization population, comprises individual extreme value and global optimum, iteration error precision, constant coefficient c1 and c2, maximum Inertia Weight max, minimum Inertia Weight min, maximal rate Vmax and the maximum iteration time etc. of the scale N of particle and the position vector of each particle and velocity vector, each particle.
Step 2: iteration is upgraded
1. upgrade the speed of each particle, and judge whether the speed after upgrading is greater than maximal rate Vmax, if be greater than maximal rate vmax, the speed after upgrading is maximal rate v with regard to value, otherwise, remain unchanged.
2. upgrade the position of each particle.
3. calculate the fitness value of each particle.
4. calculate the minimum adaptive value fg=min{f1 of the overall situation of population, f2 ..., fN}; If current iteration number of times reaches the training error of maximum iteration time or fg< network and reaches accuracy requirement, iteration stopping, forwards step 3 to; Otherwise, individual extreme value Pi and the global extremum Pg position of calculating each particle, the step 1 that forwards iteration renewal to continues more speed and the position of new particle.
Step 3: the determined network weight in position and the threshold value of output global extremum P, algorithm finishes.
Four. the speech recognition stage.
When speech recognition, adopt method the same during with the voice training stage, from voice to be identified, extract phonetic feature.In BP neural network, input above-mentioned characteristic parameter, then call respectively the network weight that in model bank, everyone has kept; And calculate Output rusults by pso-BP flow algorithm, the expectation identification identity in the result of output and database is compared one by one, using that identity of identification error minimum as last recognition result.
The foregoing is only representative embodiment of the present invention, do not limit the present invention in any way, all any amendments of doing within the spirit and principles in the present invention, be equal to and replace or improvement etc., within all should being included in protection scope of the present invention.

Claims (4)

1. the method for distinguishing speek person based on BP neural network, the steps include: to be divided into voice training stage and speech recognition stage two steps; It is characterized in that: the step in described voice training stage is: first speaker's voice are carried out to voice training, obtain speaker's voice signal, and obtain voice preprocessed signal; That is: voice signal pre-service, comprises pre-emphasis, end-point detection, point frame and windowing.
2. a kind of method for distinguishing speek person based on BP neural network according to claim 1, is characterized in that: described MFCC speech parameter extraction method is carried out feature extraction to voice preprocessed signal, tries to achieve speaker's characteristic parameter; That is: MFCC speech parameter extracts and shows that method is as follows:
(1) through pretreated voice signal X (n, ω k) amplitude by by the frequency response weighting of Mel scale bank of filters.It is evenly distributed that the centre frequency of Mel scale bank of filters is pressed Mel frequency, and point is the center of adjacent filter at the bottom of two of each triangular filter, and the centre frequency of these wave filters and bandwidth and sense of hearing critical edge band filter group are unanimous on the whole; In system, Mel scale number of filter value is 28;
(2) this step is calculated the energy value after the weighting of Mel scale filter frequency, represents the frequency response of first wave filter Vl (ω); The energy of l Mel scale wave filter output of the speech frame of moment n is Emel (n, l), computing formula wherein U1 and L1 represent each wave filter the highest and low-limit frequency between area of non-zero regions;
Wherein effect be according to the bandwidth of wave filter, wave filter to be normalized; Make for the input that has smooth frequency spectrum, each wave filter is by energy equal output;
(3) according to Emel (n, l), the output of bank of filters is taken the logarithm, then it is done to discrete cosine transform (DCT), obtain the Mel cepstrum coefficient of the speech frame that is positioned at moment n, be calculated as follows
C mel [ n , m ] = L R &Sigma; l = 0 R - 1 log { E mel ( n , l ) } cos ( 2 &pi; R lm ) .
3. a kind of method for distinguishing speek person based on BP neural network according to claim 2, is characterized in that: described PSO-BP neural network is carried out model training, and the model after training is set up and optimizes PSO-BP neural network model storehouse; That is: the foundation of PSO-BP neural network and the model bank of optimization are as follows:
Step 1: initialization
Initialization BP network structure, comprises input layer, hidden layer, the neuron number of output layer and the input and output of learning rate and training sample of setting network;
Initialization population, comprises individual extreme value and global optimum, iteration error precision, constant coefficient c1 and c2, maximum Inertia Weight max, minimum Inertia Weight min, maximal rate Vmax and the maximum iteration time etc. of the scale N of particle and the position vector of each particle and velocity vector, each particle;
Step 2: iteration is upgraded
(1) upgrade the speed of each particle, and judge whether the speed after upgrading is greater than maximal rate Vmax, if be greater than maximal rate vmax, the speed after upgrading is maximal rate v with regard to value, otherwise, remain unchanged;
(2) upgrade the position of each particle;
(3) calculate the fitness value of each particle;
(4) the minimum adaptive value fg=min{f1 of the overall situation of calculating population, f2 ..., fN}; If current iteration number of times reaches the training error of maximum iteration time or fg < network and reaches accuracy requirement, iteration stopping forwards step (3) to; Otherwise, individual extreme value Pi and the global extremum Pg position of calculating each particle, the step (1) that forwards iteration renewal to continues more speed and the position of new particle;
Step 3: the determined network weight in position and the threshold value of output global extremum P, algorithm finishes.
4. a kind of method for distinguishing speek person based on BP neural network according to claim 1, is characterized in that: the same method while adopting with the voice training stage of described speech recognition stage extracts phonetic feature from voice to be identified; In BP neural network, input above-mentioned characteristic parameter, then call respectively the network weight that in model bank, everyone has kept; And calculate Output rusults by pso-BP flow algorithm, the expectation identification identity in the result of output and database is compared one by one, using that identity of identification error minimum as last recognition result.
CN201410270239.0A 2014-06-18 2014-06-18 Speaker recognition method based on BP neural network Pending CN104008751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410270239.0A CN104008751A (en) 2014-06-18 2014-06-18 Speaker recognition method based on BP neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410270239.0A CN104008751A (en) 2014-06-18 2014-06-18 Speaker recognition method based on BP neural network

Publications (1)

Publication Number Publication Date
CN104008751A true CN104008751A (en) 2014-08-27

Family

ID=51369378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410270239.0A Pending CN104008751A (en) 2014-06-18 2014-06-18 Speaker recognition method based on BP neural network

Country Status (1)

Country Link
CN (1) CN104008751A (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104569035A (en) * 2015-02-04 2015-04-29 神华集团有限责任公司 Method for acquiring critical property parameters of coal liquefaction oil
CN104732978A (en) * 2015-03-12 2015-06-24 上海交通大学 A Text-Dependent Speaker Recognition Method Based on Joint Deep Learning
CN105323700A (en) * 2015-12-02 2016-02-10 逢甲大学 Manufacturing Method of Customized In-Ear Headphones
CN106157953A (en) * 2015-04-16 2016-11-23 科大讯飞股份有限公司 continuous speech recognition method and system
CN106328126A (en) * 2016-10-20 2017-01-11 北京云知声信息技术有限公司 Far-field speech recognition processing method and device
CN106448680A (en) * 2016-03-01 2017-02-22 常熟苏大低碳应用技术研究院有限公司 Missing data feature (MDF) speaker identification method using perception auditory scene analysis (PASA)
CN106601240A (en) * 2015-10-16 2017-04-26 三星电子株式会社 Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus
CN106611598A (en) * 2016-12-28 2017-05-03 上海智臻智能网络科技股份有限公司 VAD dynamic parameter adjusting method and device
CN106952649A (en) * 2017-05-14 2017-07-14 北京工业大学 Speaker Recognition Method Based on Convolutional Neural Network and Spectrogram
CN107240397A (en) * 2017-08-14 2017-10-10 广东工业大学 A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition
CN107527620A (en) * 2017-07-25 2017-12-29 平安科技(深圳)有限公司 Electronic installation, the method for authentication and computer-readable recording medium
CN107808659A (en) * 2017-12-02 2018-03-16 宫文峰 Intelligent sound signal type recognition system device
CN108140386A (en) * 2016-07-15 2018-06-08 谷歌有限责任公司 Speaker verification
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN108417217A (en) * 2018-01-11 2018-08-17 苏州思必驰信息科技有限公司 Speaker recognition network model training method, speaker recognition method and system
CN108590244A (en) * 2018-07-12 2018-09-28 吉林工程技术师范学院 A kind of books post house for reading journalism object
CN108847244A (en) * 2018-08-22 2018-11-20 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN108847245A (en) * 2018-08-06 2018-11-20 北京海天瑞声科技股份有限公司 Speech detection method and device
CN108899032A (en) * 2018-06-06 2018-11-27 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and storage medium
CN108899037A (en) * 2018-07-05 2018-11-27 平安科技(深圳)有限公司 Animal vocal print feature extracting method, device and electronic equipment
CN109036385A (en) * 2018-10-19 2018-12-18 北京旋极信息技术股份有限公司 A kind of voice instruction recognition method, device and computer storage medium
CN109119085A (en) * 2018-08-24 2019-01-01 深圳竹云科技有限公司 A kind of relevant audio recognition method of asymmetric text based on wavelet analysis and super vector
CN109394472A (en) * 2018-09-19 2019-03-01 宁波杰曼智能科技有限公司 A kind of healing robot motion intention recognition methods based on neural network classifier
CN110232372A (en) * 2019-06-26 2019-09-13 电子科技大学成都学院 Gait recognition method based on particle group optimizing BP neural network
CN110914899A (en) * 2017-07-19 2020-03-24 日本电信电话株式会社 Mask calculation device, cluster weight learning device, mask calculation neural network learning device, mask calculation method, cluster weight learning method, and mask calculation neural network learning method
CN111259750A (en) * 2020-01-10 2020-06-09 西北工业大学 Underwater sound target identification method for optimizing BP neural network based on genetic algorithm
CN111341327A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Speaker voice recognition method, device and equipment based on particle swarm optimization
CN111524520A (en) * 2020-04-22 2020-08-11 星际(重庆)智能装备技术研究院有限公司 Voiceprint recognition method based on error reverse propagation neural network
CN112053680A (en) * 2020-09-11 2020-12-08 中航华东光电(上海)有限公司 Voice air conditioner control device suitable for blind person
US10984795B2 (en) 2018-04-12 2021-04-20 Samsung Electronics Co., Ltd. Electronic apparatus and operation method thereof
CN113053398A (en) * 2021-03-11 2021-06-29 东风汽车集团股份有限公司 Speaker recognition system and method based on MFCC (Mel frequency cepstrum coefficient) and BP (Back propagation) neural network
CN113611291A (en) * 2020-08-12 2021-11-05 广东电网有限责任公司 Speech recognition algorithm for electric power major

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104569035A (en) * 2015-02-04 2015-04-29 神华集团有限责任公司 Method for acquiring critical property parameters of coal liquefaction oil
CN104732978B (en) * 2015-03-12 2018-05-08 上海交通大学 Text-related speaker recognition method based on joint deep learning
CN104732978A (en) * 2015-03-12 2015-06-24 上海交通大学 A Text-Dependent Speaker Recognition Method Based on Joint Deep Learning
CN106157953B (en) * 2015-04-16 2020-02-07 科大讯飞股份有限公司 Continuous speech recognition method and system
CN106157953A (en) * 2015-04-16 2016-11-23 科大讯飞股份有限公司 continuous speech recognition method and system
CN106601240A (en) * 2015-10-16 2017-04-26 三星电子株式会社 Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus
CN106601240B (en) * 2015-10-16 2021-10-01 三星电子株式会社 Apparatus and method for normalizing input data of an acoustic model and speech recognition apparatus
CN105323700A (en) * 2015-12-02 2016-02-10 逢甲大学 Manufacturing Method of Customized In-Ear Headphones
CN106448680A (en) * 2016-03-01 2017-02-22 常熟苏大低碳应用技术研究院有限公司 Missing data feature (MDF) speaker identification method using perception auditory scene analysis (PASA)
CN108140386B (en) * 2016-07-15 2021-11-23 谷歌有限责任公司 Speaker verification
CN108140386A (en) * 2016-07-15 2018-06-08 谷歌有限责任公司 Speaker verification
CN106328126A (en) * 2016-10-20 2017-01-11 北京云知声信息技术有限公司 Far-field speech recognition processing method and device
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN106611598B (en) * 2016-12-28 2019-08-02 上海智臻智能网络科技股份有限公司 A kind of VAD dynamic parameter adjustment method and device
CN106611598A (en) * 2016-12-28 2017-05-03 上海智臻智能网络科技股份有限公司 VAD dynamic parameter adjusting method and device
CN106952649A (en) * 2017-05-14 2017-07-14 北京工业大学 Speaker Recognition Method Based on Convolutional Neural Network and Spectrogram
CN110914899B (en) * 2017-07-19 2023-10-24 日本电信电话株式会社 Mask calculation device, cluster weight learning device, mask calculation neural network learning device, mask calculation method, cluster weight learning method, and mask calculation neural network learning method
CN110914899A (en) * 2017-07-19 2020-03-24 日本电信电话株式会社 Mask calculation device, cluster weight learning device, mask calculation neural network learning device, mask calculation method, cluster weight learning method, and mask calculation neural network learning method
CN107527620A (en) * 2017-07-25 2017-12-29 平安科技(深圳)有限公司 Electronic installation, the method for authentication and computer-readable recording medium
CN107527620B (en) * 2017-07-25 2019-03-26 平安科技(深圳)有限公司 Electronic device, the method for authentication and computer readable storage medium
CN107240397A (en) * 2017-08-14 2017-10-10 广东工业大学 A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition
CN107808659A (en) * 2017-12-02 2018-03-16 宫文峰 Intelligent sound signal type recognition system device
CN108417217A (en) * 2018-01-11 2018-08-17 苏州思必驰信息科技有限公司 Speaker recognition network model training method, speaker recognition method and system
US10984795B2 (en) 2018-04-12 2021-04-20 Samsung Electronics Co., Ltd. Electronic apparatus and operation method thereof
CN108899032A (en) * 2018-06-06 2018-11-27 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and storage medium
CN108899037A (en) * 2018-07-05 2018-11-27 平安科技(深圳)有限公司 Animal vocal print feature extracting method, device and electronic equipment
CN108899037B (en) * 2018-07-05 2024-01-26 平安科技(深圳)有限公司 Animal voiceprint feature extraction method and device and electronic equipment
CN108590244A (en) * 2018-07-12 2018-09-28 吉林工程技术师范学院 A kind of books post house for reading journalism object
CN108590244B (en) * 2018-07-12 2024-02-09 吉林工程技术师范学院 Book post for reading news publications
CN108847245A (en) * 2018-08-06 2018-11-20 北京海天瑞声科技股份有限公司 Speech detection method and device
CN108847244A (en) * 2018-08-22 2018-11-20 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN109119085A (en) * 2018-08-24 2019-01-01 深圳竹云科技有限公司 A kind of relevant audio recognition method of asymmetric text based on wavelet analysis and super vector
CN109394472A (en) * 2018-09-19 2019-03-01 宁波杰曼智能科技有限公司 A kind of healing robot motion intention recognition methods based on neural network classifier
CN109036385A (en) * 2018-10-19 2018-12-18 北京旋极信息技术股份有限公司 A kind of voice instruction recognition method, device and computer storage medium
CN110232372A (en) * 2019-06-26 2019-09-13 电子科技大学成都学院 Gait recognition method based on particle group optimizing BP neural network
CN111259750A (en) * 2020-01-10 2020-06-09 西北工业大学 Underwater sound target identification method for optimizing BP neural network based on genetic algorithm
CN111341327A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Speaker voice recognition method, device and equipment based on particle swarm optimization
CN111524520A (en) * 2020-04-22 2020-08-11 星际(重庆)智能装备技术研究院有限公司 Voiceprint recognition method based on error reverse propagation neural network
CN113611291A (en) * 2020-08-12 2021-11-05 广东电网有限责任公司 Speech recognition algorithm for electric power major
CN112053680A (en) * 2020-09-11 2020-12-08 中航华东光电(上海)有限公司 Voice air conditioner control device suitable for blind person
CN113053398A (en) * 2021-03-11 2021-06-29 东风汽车集团股份有限公司 Speaker recognition system and method based on MFCC (Mel frequency cepstrum coefficient) and BP (Back propagation) neural network
CN113053398B (en) * 2021-03-11 2022-09-27 东风汽车集团股份有限公司 Speaker recognition system and method based on MFCC (Mel frequency cepstrum coefficient) and BP (Back propagation) neural network

Similar Documents

Publication Publication Date Title
CN104008751A (en) Speaker recognition method based on BP neural network
CN107146601B (en) Rear-end i-vector enhancement method for speaker recognition system
US20200074997A1 (en) Method and system for detecting voice activity in noisy conditions
Chang et al. Robust CNN-based speech recognition with Gabor filter kernels.
Cai et al. Sensor network for the monitoring of ecosystem: Bird species recognition
CN109192200B (en) Speech recognition method
WO2019023877A1 (en) Specific sound recognition method and device, and storage medium
CN103236260A (en) Voice recognition system
CN102800316A (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN102568476B (en) Voice conversion method based on self-organizing feature map network cluster and radial basis network
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
CN110853656A (en) Audio Tampering Recognition Algorithm Based on Improved Neural Network
CN110211594A (en) A kind of method for distinguishing speek person based on twin network model and KNN algorithm
Mistry et al. Overview: Speech recognition technology, mel-frequency cepstral coefficients (mfcc), artificial neural network (ann)
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
Yusnita et al. Automatic gender recognition using linear prediction coefficients and artificial neural network on speech signal
CN107424625A (en) A kind of multicenter voice activity detection approach based on vectorial machine frame
Nawas et al. Speaker recognition using random forest
CN118016106A (en) Emotional health analysis and support system for the elderly
CN113628639A (en) Voice emotion recognition method based on multi-head attention mechanism
CN206781702U (en) A kind of speech recognition automotive theft proof system based on quantum nerve network
Song et al. Research on scattering transform of urban sound events detection based on self-attention mechanism
Zeng et al. Multi-feature fusion speech emotion recognition based on SVM
Wei et al. Improvements on self-adaptive voice activity detector for telephone data
Sankavi et al. Deep learning based automatic noisy speech classification for enhanced speech analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
DD01 Delivery of document by public notice

Addressee: Zhou Tingting

Document name: Notification of Passing Preliminary Examination of the Application for Invention

DD01 Delivery of document by public notice

Addressee: Zhou Tingting

Document name: Notification of Publication of the Application for Invention

DD01 Delivery of document by public notice

Addressee: Zhou Tingting

Document name: Notification of before Expiration of Request of Examination as to Substance

DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Zhou Tingting

Document name: Notification that Application Deemed to be Withdrawn

DD01 Delivery of document by public notice
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140827

WD01 Invention patent application deemed withdrawn after publication