CN111326169A - A method and device for evaluating voice quality - Google Patents
A method and device for evaluating voice quality Download PDFInfo
- Publication number
- CN111326169A CN111326169A CN201811544623.XA CN201811544623A CN111326169A CN 111326169 A CN111326169 A CN 111326169A CN 201811544623 A CN201811544623 A CN 201811544623A CN 111326169 A CN111326169 A CN 111326169A
- Authority
- CN
- China
- Prior art keywords
- voice
- evaluated
- speech
- signal
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013441 quality evaluation Methods 0.000 claims abstract description 95
- 238000011156 evaluation Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims description 21
- 238000001228 spectrum Methods 0.000 claims description 18
- 238000003066 decision tree Methods 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000001755 vocal effect Effects 0.000 claims description 6
- 238000001303 quality assessment method Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000009826 distribution Methods 0.000 description 6
- 238000013210 evaluation model Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/69—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Monitoring And Testing Of Exchanges (AREA)
Abstract
本发明公开了一种语音质量的评价方法及装置,获取待评价的语音信号,将待评价的语音信号与已经保存的语音信号进行对比,当两者差异比较大时,对内置的语音质量评价模型进行更新,得到一个新的语音质量评价模型,利用新的语音质量评价模型对待评价的语音信号进行评价,通过不断学习语音信号,对语音质量评价模型进行不断更新,从而提升语音评价准确度。
The invention discloses a voice quality evaluation method and device. The voice signal to be evaluated is obtained, the voice signal to be evaluated is compared with the stored voice signal, and when the difference between the two is relatively large, the built-in voice quality is evaluated. The model is updated to obtain a new speech quality evaluation model. The new speech quality evaluation model is used to evaluate the speech signal to be evaluated. By continuously learning the speech signal, the speech quality evaluation model is continuously updated, thereby improving the accuracy of speech evaluation.
Description
技术领域technical field
本发明涉及通信技术领域,尤其涉及一种语音质量的评价方法及装置。The present invention relates to the technical field of communications, and in particular, to a method and device for evaluating voice quality.
背景技术Background technique
基于互联网的语音服务已成为网络的重要业务之一,是各服务提供商重点关注的领域,而语音质量是评价通信网络质量的一个重要因素,为了达到评价语音质量的目的,开发有效的语音质量评价方法是不可或缺的。Internet-based voice service has become one of the important services of the network, and it is the focus of various service providers. Voice quality is an important factor in evaluating the quality of communication networks. In order to achieve the purpose of evaluating voice quality, develop effective voice quality Evaluation methods are indispensable.
目前,语音评价方法通常是使用固定的语音质量评价模型对语音质量进行评价,具体方法如下:提取语音信号的特征参数,并基于提取出的特征参数训练得到语音质量评价模型,利用语音质量评价模型对语音信号进行评价,由于这种方式中训练得到的语音质量评价模型适用于语音环境变化不大的场景,并且模型是固定的、不可更改的,若在语音环境变化比较明显的场景下,利用这种语音质量评价模型进行评价的方法可能会使得语音评价的准确率比较低。At present, the speech evaluation method usually uses a fixed speech quality evaluation model to evaluate the speech quality. The specific method is as follows: extract the characteristic parameters of the speech signal, and train the speech quality evaluation model based on the extracted characteristic parameters, and use the speech quality evaluation model. Evaluate the speech signal, because the speech quality evaluation model trained in this way is suitable for scenarios where the speech environment does not change much, and the model is fixed and cannot be changed. This method of evaluating the speech quality evaluation model may make the accuracy of speech evaluation relatively low.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种语音质量的评价方法及装置,以提高语音评价的准确率。The purpose of the present invention is to provide a speech quality evaluation method and device to improve the accuracy of speech evaluation.
本发明的目的是通过以下技术方案实现的:The purpose of this invention is to realize through the following technical solutions:
第一方面,本发明提供一种语音质量的评价方法,包括:In a first aspect, the present invention provides a method for evaluating voice quality, including:
获取待评价的语音信号,并确定所述待评价的语音信号的标识信息;Obtain the voice signal to be evaluated, and determine the identification information of the voice signal to be evaluated;
若确定所述待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,则将所述待评价的语音信号作为新的语音信号,并在所述新的语音信号的数量大于第一预设阈值时,对第一语音质量评价模型进行更新,得到第二语音质量评价模型;If it is determined that the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, the voice signal to be evaluated is regarded as a new voice signal, and when the number of the new voice signal is greater than that of the first voice signal When a preset threshold is used, the first voice quality evaluation model is updated to obtain a second voice quality evaluation model;
其中,所述已经保存的语音信号为在所述待评价的语音信号之前获取的语音信号;Wherein, the stored voice signal is the voice signal obtained before the voice signal to be evaluated;
利用所述第二语音质量评价模型,对所述待评价的语音信号进行评价。The speech signal to be evaluated is evaluated by using the second speech quality evaluation model.
可选的,对所述第一语音质量评价模型进行更新,得到第二语音质量评价模型,包括:Optionally, the first voice quality evaluation model is updated to obtain a second voice quality evaluation model, including:
获取所述新的语音信号的特征参数;obtaining the characteristic parameters of the new speech signal;
利用决策树算法对所述特征参数进行训练,更新所述第一语音质量评价模型,得到第二语音质量评价模型;Use decision tree algorithm to train the feature parameters, update the first voice quality evaluation model, and obtain a second voice quality evaluation model;
所述特征参数包括如下至少一项:信噪比、背景噪声、噪声级别、平均语音信号频谱的非对称干扰值、高频平坦度分析、频谱等级范围、频谱等级标准差、相对噪声基底、线性预测系数的偏度系数、倒频谱偏度系数、浊音、后腔的平均截面、声道振幅变化、语音级别。The characteristic parameters include at least one of the following: signal-to-noise ratio, background noise, noise level, asymmetric interference value of the average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, linearity Skewness coefficient of prediction coefficients, cepstral skewness coefficient, voiced sound, mean cross section of back cavity, vocal tract amplitude variation, speech level.
可选的,获取待评价的语音信号之后,所述方法还包括:Optionally, after acquiring the speech signal to be evaluated, the method further includes:
对所述待评价的语音信号进行如下至少一项预处理:语音数据有效性检测、语音数据归一化处理、缺省值差值拟合。At least one of the following preprocessing is performed on the voice signal to be evaluated: voice data validity detection, voice data normalization processing, and default value difference fitting.
可选的,获取待评价的语音信号之后,所述方法还包括:Optionally, after acquiring the speech signal to be evaluated, the method further includes:
依据所述第一语音质量评价模型对所述待评价的语音信号进行评价,得到所述待评价的语音信号的语音质量;Evaluate the to-be-evaluated speech signal according to the first speech quality evaluation model to obtain the speech quality of the to-be-evaluated speech signal;
对待评价的语音信号的语音质量进行分类,得到不同区间等级的语音质量;所述不同区间等级的语音质量用于表征不同类别的语音质量。The speech quality of the speech signal to be evaluated is classified to obtain speech quality of different interval levels; the speech quality of different interval levels is used to characterize the speech quality of different categories.
可选的,所述待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,包括:Optionally, the identification information of the voice signal to be evaluated is different from the identification information of the saved voice signal, including:
待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,和/或待评价的语音信号的特征参数与已经保存的语音信号的特征参数不同。The speech quality of the speech signal to be evaluated is different from the speech quality of the already stored speech signal, and/or the characteristic parameter of the speech signal to be evaluated is different from the characteristic parameter of the already stored speech signal.
可选的,所述待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,包括:Optionally, the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal that has been saved, including:
所述待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量为同一区间等级的语音信号,且所述待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量的差值小于第二预设阈值;或The voice quality of the voice signal to be evaluated and the voice quality of the voice signal that has been saved are voice signals of the same interval level, and the voice quality of the voice signal to be evaluated is the same as the voice signal of the voice signal that has been saved. The difference in quality is less than the second preset threshold; or
所述待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量为不同区间等级的语音信号。The voice quality of the voice signal to be evaluated and the voice quality of the saved voice signal are voice signals of different interval levels.
第二方面,本发明提供一种语音质量的评价装置,包括:In a second aspect, the present invention provides a device for evaluating voice quality, comprising:
获取单元,用于获取待评价的语音信号;an acquisition unit for acquiring the speech signal to be evaluated;
确定单元,用于确定所述待评价的语音信号的标识信息,并在确定所述待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同时,将所述待评价的语音信号作为新的语音信号;The determining unit is used to determine the identification information of the voice signal to be evaluated, and when it is determined that the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal that has been saved, the voice signal to be evaluated is as a new voice signal;
更新单元,在所述新的语音信号的数量大于第一预设阈值时,对第一语音质量评价模型进行更新,得到第二语音质量评价模型;an updating unit, when the number of the new voice signals is greater than the first preset threshold, updating the first voice quality evaluation model to obtain a second voice quality evaluation model;
其中,所述已经保存的语音信号为在所述待评价的语音信号之前获取的语音信号;Wherein, the stored voice signal is the voice signal obtained before the voice signal to be evaluated;
评价单元,用于利用所述第二语音质量评价模型,对所述待评价的语音信号进行评价。An evaluation unit, configured to evaluate the speech signal to be evaluated by using the second speech quality evaluation model.
可选的,所述更新单元具体用于按如下方式对所述第一语音质量评价模型进行更新,得到第二语音质量评价模型:Optionally, the updating unit is specifically configured to update the first voice quality evaluation model as follows to obtain a second voice quality evaluation model:
获取所述新的语音信号的特征参数;obtaining the characteristic parameters of the new speech signal;
利用决策树算法对所述特征参数进行训练,更新所述第一语音质量评价模型,得到第二语音质量评价模型;Use decision tree algorithm to train the feature parameters, update the first voice quality evaluation model, and obtain a second voice quality evaluation model;
所述特征参数包括如下至少一项:信噪比、背景噪声、噪声级别、平均语音信号频谱的非对称干扰值、高频平坦度分析、频谱等级范围、频谱等级标准差、相对噪声基底、线性预测系数的偏度系数、倒频谱偏度系数、浊音、后腔的平均截面、声道振幅变化、语音级别。The characteristic parameters include at least one of the following: signal-to-noise ratio, background noise, noise level, asymmetric interference value of the average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, linearity Skewness coefficient of prediction coefficients, cepstral skewness coefficient, voiced sound, mean cross section of back cavity, vocal tract amplitude variation, speech level.
可选的,所述装置还包括处理单元用于:Optionally, the apparatus further includes a processing unit for:
对所述待评价的语音信号进行如下至少一项预处理:语音数据有效性检测、语音数据归一化处理、缺省值差值拟合。At least one of the following preprocessing is performed on the voice signal to be evaluated: voice data validity detection, voice data normalization processing, and default value difference fitting.
可选的,所述评价单元还用于:Optionally, the evaluation unit is also used for:
依据所述第一语音质量评价模型对所述待评价的语音信号进行评价,得到所述待评价的语音信号的语音质量;Evaluate the to-be-evaluated speech signal according to the first speech quality evaluation model to obtain the speech quality of the to-be-evaluated speech signal;
所述处理单元还用于:The processing unit is also used to:
对所述待评价的语音信号的语音质量进行分类,得到不同区间等级的语音质量;所述不同区间等级的语音质量用于表征不同类别的语音质量。The voice quality of the voice signal to be evaluated is classified to obtain the voice quality of different interval levels; the voice quality of different interval levels is used to represent the voice quality of different categories.
可选的,所述待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,包括:Optionally, the identification information of the voice signal to be evaluated is different from the identification information of the saved voice signal, including:
待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,和/或待评价的语音信号的特征参数与已经保存的语音信号的特征参数不同。The speech quality of the speech signal to be evaluated is different from the speech quality of the already stored speech signal, and/or the characteristic parameter of the speech signal to be evaluated is different from the characteristic parameter of the already stored speech signal.
可选的,所述待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,包括:Optionally, the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal that has been saved, including:
所述待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量为同一区间等级的语音信号,且所述待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量的差值小于第二预设阈值;或The voice quality of the voice signal to be evaluated and the voice quality of the voice signal that has been saved are voice signals of the same interval level, and the voice quality of the voice signal to be evaluated is the same as the voice signal of the voice signal that has been saved. The difference in quality is less than the second preset threshold; or
所述待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量为不同区间等级的语音信号。The voice quality of the voice signal to be evaluated and the voice quality of the saved voice signal are voice signals of different interval levels.
第三方面,本发明还提供了一种语音质量的评价装置,包括:In a third aspect, the present invention also provides a device for evaluating voice quality, including:
存储器,用于存储程序指令;memory for storing program instructions;
处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行第一方面所述的方法。The processor is configured to call the program instructions stored in the memory, and execute the method of the first aspect according to the obtained program.
第四方面,本发明提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行第一方面所述的方法。In a fourth aspect, the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, which, when executed on a computer, cause the computer to execute the method described in the first aspect.
本发明提供一种语音质量的评价方法及装置,获取待评价的语音信号,将待评价的语音信号与已经保存的语音信号进行对比,当两者差异比较大时,对内置的语音质量评价模型进行更新,得到一个新的语音质量评价模型,利用新的语音质量评价模型对待评价的语音信号进行评价,通过不断学习语音信号,对语音质量评价模型进行不断更新,从而提升语音评价准确度。The present invention provides a voice quality evaluation method and device. The voice signal to be evaluated is obtained, the voice signal to be evaluated is compared with the stored voice signal, and when the difference between the two is relatively large, the built-in voice quality evaluation model is evaluated. Update, get a new voice quality evaluation model, use the new voice quality evaluation model to evaluate the voice signal to be evaluated, and continuously update the voice quality evaluation model by continuously learning the voice signal, thereby improving the accuracy of the voice evaluation.
附图说明Description of drawings
图1为本申请实施例提供的一种语音质量的评价方法流程图;1 is a flowchart of a method for evaluating voice quality provided by an embodiment of the present application;
图2为本申请实施例提供的一种决策树训练分类的示意图;2 is a schematic diagram of a decision tree training classification provided by an embodiment of the present application;
图3为本申请实施例提供的另一种决策树训练示意图;3 is a schematic diagram of another decision tree training provided by an embodiment of the present application;
图4为本申请实施例提供的一种语音质量的评价模型更新方法流程图;4 is a flowchart of a method for updating an evaluation model of speech quality provided by an embodiment of the present application;
图5为本申请实施例提供的另一种语音质量的评价方法流程图;FIG. 5 is a flowchart of another voice quality evaluation method provided by an embodiment of the present application;
图6为本申请实施例提供的一种语音质量的评价装置的结构框图;6 is a structural block diagram of an apparatus for evaluating voice quality provided by an embodiment of the present application;
图7为本申请实施例提供的另一种语音质量的评价装置的示意图。FIG. 7 is a schematic diagram of another apparatus for evaluating voice quality provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,并不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
目前,常用的语音质量评价方法如下:将语音信号进行参数特征提取或者获取其他与语音质量相关的特征参数,例如网络时延、丢包、抖动等,然后对特征参数进行建模分析,得到客观语音质量评价。At present, the commonly used voice quality evaluation methods are as follows: extract the parameters of the voice signal or obtain other feature parameters related to voice quality, such as network delay, packet loss, jitter, etc., and then model and analyze the feature parameters to obtain an objective Voice quality evaluation.
通常可以使用固定的算法针对固定的评价场景进行建模,例如针对窄带语音信号的主观语音质量评估(Perceptual evaluation of speech quality,PESQ)算法,超宽带语音评价的客观感知语音质量评估(Perceptual Objective Listening QualityAssessment,POLQA)算法等,利用算法建立的语音质量评价模型都是训练好的线性回归模型,具有特定的映射方法,最后将客观语音质量评价与人群实际感知质量进行映射,最终得到语音质量的打分。Usually, fixed algorithms can be used to model fixed evaluation scenarios, such as the Perceptual evaluation of speech quality (PESQ) algorithm for narrowband speech signals, and the Perceptual Objective Listening for ultra-wideband speech evaluation. QualityAssessment, POLQA) algorithm, etc. The speech quality evaluation models established by the algorithm are all trained linear regression models with specific mapping methods. Finally, the objective speech quality evaluation is mapped with the actual perceived quality of the crowd, and finally the speech quality score is obtained. .
现有的这种方法适用于语音环境变化不大的场景,由于训练模型时使用的参数是有限的,如果在语音环境变化比较大的场景,例如火车上,与语音质量相关的参数比较多,可能并不限于固定的语音质量评价模型训练时所用的参数,此时利用这种固定的语音质量评价模型就可能会使得语音质量的评价准确率比较低。This existing method is suitable for scenarios where the speech environment does not change much. Since the parameters used in training the model are limited, if the speech environment changes greatly, such as on a train, there are many parameters related to speech quality. It may not be limited to the parameters used in the training of the fixed speech quality evaluation model. In this case, the use of this fixed speech quality evaluation model may result in a relatively low accuracy of speech quality evaluation.
有鉴于此,本申请实施例提供了一种语音质量的评价方法及装置,通过不断的获取语音信号,内置评价模型基于输入的语音信号不断更新,并对输入的语音信号评价,输出语音质量分数,从而提高了语音质量评价的准确率。In view of this, the embodiment of the present application provides a voice quality evaluation method and device. By continuously acquiring voice signals, the built-in evaluation model is continuously updated based on the input voice signal, and the input voice signal is evaluated, and a voice quality score is output. , thereby improving the accuracy of speech quality evaluation.
需要理解的是,下文所涉及到的“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。It should be understood that words such as "first" and "second" mentioned below are only used for the purpose of distinguishing and describing, and cannot be understood as indicating or implying relative importance, nor can they be understood as indicating or implying order.
本申请实施例不受环境因素约束,可适用于各种评价环境,包括变化较大的环境和较为稳定的环境。The embodiments of the present application are not constrained by environmental factors, and can be applied to various evaluation environments, including environments with large changes and relatively stable environments.
其次,本申请实施例的应用场景包括但不限于传统的第二代移动通信技术(2rd-Generation,2G)/第三代移动通信技术(3rd-Generation,3G)通话,第四代移动通信技术(the 4th Generation mobile communication technology,4G)通话,也包括2/3/4G混合场景等。Secondly, the application scenarios of the embodiments of the present application include but are not limited to traditional second-generation mobile communication technology (2rd-Generation, 2G)/third-generation mobile communication technology (3rd-Generation, 3G) calls, fourth-generation mobile communication technology (the 4th Generation mobile communication technology, 4G) calls, including 2/3/4G mixed scenarios, etc.
如图1所示为本申请实施例提供的一种语音质量的评价方法流程图,参阅图1所示,该方法包括:FIG. 1 shows a flowchart of a method for evaluating voice quality provided by an embodiment of the present application. Referring to FIG. 1 , the method includes:
S101:获取待评价的语音信号,并确定待评价的语音信号的标识信息。S101: Acquire the speech signal to be evaluated, and determine the identification information of the speech signal to be evaluated.
S102:若确定待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,则将待评价的语音信号作为新的语音信号。S102: If it is determined that the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, the voice signal to be evaluated is regarded as a new voice signal.
可以理解的是,本申请实施例中的“新的语音信号”是指待评价的语音信号与在接收待评价的语音信号之前所接收到的语音信号差异比较大,可将待评价的语音信号标记为新的语音信号。It can be understood that the "new voice signal" in the embodiment of the present application means that the voice signal to be evaluated is quite different from the voice signal received before the voice signal to be evaluated is received, and the voice signal to be evaluated can be compared. Mark as new voice signal.
具体的,待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,即待评价的语音信号的语音质量与已经保存的语音信号的语音质量差异比较大。Specifically, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, that is, the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are quite different.
其中,已经保存在内置语音质量评价模型中的语音信号为在待评价的语音信号之前获取的语音信号。Wherein, the voice signal that has been stored in the built-in voice quality evaluation model is the voice signal obtained before the voice signal to be evaluated.
本申请实施例中是不断获取周围的语音信号,因此,对于待评价的语音信号而言,在其之前获取的语音信号可作为待评价的语音信号的参考信号。In the embodiment of the present application, the surrounding speech signals are continuously acquired. Therefore, for the speech signal to be evaluated, the speech signal acquired before it can be used as the reference signal of the speech signal to be evaluated.
需要说明的是,语音质量可以包括但不限于,例如语音信号的语音质量评价分数、语音质量等级来表示。It should be noted that the voice quality may include, but is not limited to, for example, the voice quality evaluation score and the voice quality level of the voice signal.
S103:在新的语音信号的数量大于第一预设阈值时,对第一语音质量评价模型进行更新,得到第二语音质量评价模型。S103: When the number of new voice signals is greater than the first preset threshold, update the first voice quality evaluation model to obtain a second voice quality evaluation model.
为了描述方便,本申请实施例中可将“内置的语音质量评价模型”称为“第一语音质量评价模型”,将“内置的语音质量评价模型更新后的语音质量评价模型”称为“第二语音质量评价模型”。For the convenience of description, in the embodiments of this application, the "built-in speech quality evaluation model" may be referred to as the "first speech quality evaluation model", and the "updated speech quality evaluation model of the built-in speech quality evaluation model" may be referred to as the "first speech quality evaluation model". Two Speech Quality Evaluation Models".
具体的,本申请实施例中可在待评价的语音信号与旧的语音信号差异较大时,将待评价的语音信号作为新样本,并在新样本数量达到预设的阈值,例如可以为第一预设阈值时,对内置的语音质量评价模型进行更新,得到第二语音质量评价模型。Specifically, in the embodiment of the present application, when the voice signal to be evaluated is significantly different from the old voice signal, the voice signal to be evaluated can be used as a new sample, and when the number of new samples reaches a preset threshold, for example, it can be the first When there is a preset threshold, the built-in speech quality evaluation model is updated to obtain a second speech quality evaluation model.
需要说明的是,在本申请中“新的语音信号”与“新样本”、“旧的语音信号”与“已经保存的语音信号”有时会混用,本领域技术人员应当理解其含义是一致的。It should be noted that in this application, "new voice signal" and "new sample", "old voice signal" and "preserved voice signal" are sometimes used interchangeably, and those skilled in the art should understand that their meanings are consistent .
S104:利用第二语音质量评价模型,对待评价的语音信号进行评价。S104: Use the second speech quality evaluation model to evaluate the speech signal to be evaluated.
具体的,当待评价的语音信号为新样本时,可利用更新后的第二语音质量评价模型对待评价的语音信号进行评价,从而得到准确的语音质量评价结果。Specifically, when the speech signal to be evaluated is a new sample, the updated second speech quality evaluation model can be used to evaluate the speech signal to be evaluated, so as to obtain an accurate speech quality evaluation result.
本申请实施例中,通过外部语音环境的变化来不断的获取语音数据,利用不断更新的数据集来保证语音评价模型的准确度,可提升模型的精度。In the embodiment of the present application, the voice data is continuously obtained through changes in the external voice environment, and the continuously updated data set is used to ensure the accuracy of the voice evaluation model, which can improve the accuracy of the model.
一种可能的实施方式中,对第一语音质量评价模型进行更新,得到第二语音质量评价模型,可包括:In a possible implementation, the first voice quality evaluation model is updated to obtain a second voice quality evaluation model, which may include:
获取新的语音信号的特征参数,并利用决策树算法对特征参数进行训练,更新第一语音质量评价模型,得到第二语音质量评价模型。The characteristic parameters of the new speech signal are acquired, and a decision tree algorithm is used to train the characteristic parameters, and the first speech quality evaluation model is updated to obtain a second speech quality evaluation model.
具体的,本申请实施例中可提取一定数量的新的语音信号(新的语音信号的数量大于第一预设阈值)的特征参数,或获取与语音质量相关的其他特征参数,然后根据特征参数训练得到新的语音质量评价模型。Specifically, in this embodiment of the present application, a certain number of feature parameters of new voice signals (the number of new voice signals is greater than the first preset threshold) may be extracted, or other feature parameters related to voice quality may be obtained, and then according to the feature parameters A new speech quality evaluation model is obtained by training.
可以理解的是,与语音质量相关的其他特征参数包括但不限于网络时延、丢包、抖动等参数。It can be understood that other characteristic parameters related to voice quality include but are not limited to parameters such as network delay, packet loss, and jitter.
上述利用特征参数训练得到模型的方法与现有的方案类似,在此处不作过多赘述。The above-mentioned method for obtaining a model by training with characteristic parameters is similar to the existing solution, and will not be repeated here.
另一种可能的实施方式中,本申请实施例中可将新的语音信号与旧的语音信号进行融合,然后提取新的语音信号与旧的语音信号的特征参数,或获取与语音质量相关的其他特征参数,最后根据特征参数训练得到新的语音质量评价模型。In another possible implementation, in this embodiment of the present application, the new voice signal and the old voice signal can be fused, and then the characteristic parameters of the new voice signal and the old voice signal can be extracted, or the characteristic parameters related to the voice quality can be obtained. Other characteristic parameters, and finally a new speech quality evaluation model is obtained by training according to the characteristic parameters.
需要说明的是,旧的语音信号即为在待评价的语音信号之前接收到的语音信号。It should be noted that the old voice signal is the voice signal received before the voice signal to be evaluated.
具体的,本申请实施例中的特征参数包括如下至少一项参数:信噪比、背景噪声、噪声级别、平均语音信号频谱的非对称干扰值、高频平坦度分析、频谱等级范围、频谱等级标准差、相对噪声基底、线性预测系数的偏度系数、倒频谱偏度系数、浊音、后腔的平均截面、声道振幅变化、语音级别。Specifically, the characteristic parameters in this embodiment of the present application include at least one of the following parameters: signal-to-noise ratio, background noise, noise level, asymmetric interference value of the average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level Standard Deviation, Relative Noise Floor, Skewness Coefficient of Linear Prediction Coefficient, Cepstral Skewness Coefficient, Voiced Voice, Mean Section of Back Cavity, Vocal Amplitude Variation, Speech Level.
由于语音质量评价参数比较多,通常在训练时会选取一些权重值比较高的参数作为特征参数。Since there are many speech quality evaluation parameters, some parameters with relatively high weight values are usually selected as characteristic parameters during training.
在本申请实施例中,语音质量评价参数还可包括:平均语音信号干扰值、全局背景噪声、语音中断时间、电平突降、静音长度、基音周期、机器化、后腔和中腔的相互关系、连续帧的相关性、连续帧的平均功率、重复帧的能量和、重复帧的数量、不自然哔哔声的帧数量、不自然哔哔声的样本平均能量、不自然哔哔声的样本比例、倒频谱标准差绝对值、倒频谱峰度系数、线性预测系数的峰度系数、线性预测系数的偏度系数的绝对值、固定噪声加权、频谱清晰度、背景噪声的样本的平均能量级别、背景噪声的样本的平均能量、乘性噪声信噪比、非自然静音帧的总能量等。In this embodiment of the present application, the speech quality evaluation parameters may further include: average speech signal interference value, global background noise, speech interruption time, level dip, silence length, pitch period, mechanization, and the interaction between the back cavity and the middle cavity Relationship, Correlation of Consecutive Frames, Average Power of Consecutive Frames, Energy Sum of Repeated Frames, Number of Repeated Frames, Frames of Unnatural Beeps, Sample Average Energy of Unnatural Beeps, Unnatural Beeps Sample ratio, absolute value of cepstral standard deviation, cepstral kurtosis coefficient, kurtosis coefficient of linear prediction coefficient, absolute value of skewness coefficient of linear prediction coefficient, fixed noise weighting, spectral sharpness, mean energy of samples for background noise level, average energy of samples of background noise, multiplicative noise signal-to-noise ratio, total energy of unnatural silence frames, etc.
进一步的,获取待评价的语音信号之后,所述方法还包括:Further, after acquiring the speech signal to be evaluated, the method further includes:
对待评价的语音信号进行如下至少一项预处理:语音数据有效性检测、语音数据归一化处理、缺省值差值拟合。The speech signal to be evaluated is subjected to at least one of the following preprocessing: speech data validity detection, speech data normalization processing, and default value difference fitting.
具体的,由于原始语音信号中存在着大量不完整、不一致、有异常的数据,严重影响到后期建模的执行效率,甚至可能导致模型结果的偏差。此外,数据本身的值也会影响到模型的结果,所以首先可对原始语音信号进行数据清洗。通常需要处理数据的缺失、异常、冗余以及大小放缩等。Specifically, there are a large number of incomplete, inconsistent and abnormal data in the original speech signal, which seriously affects the execution efficiency of later modeling, and may even lead to deviations in model results. In addition, the value of the data itself will also affect the results of the model, so data cleaning can be performed on the original speech signal first. It is usually necessary to deal with missing data, anomalies, redundancy, and scaling.
数据处理的方法主要有数据有效性检测、数据归一化、缺省值插值拟合等包括但不限于以上方法。Data processing methods mainly include data validity detection, data normalization, default value interpolation fitting, etc., including but not limited to the above methods.
更进一步的,在获取待评价的语音信号之后,所述方法还包括:Further, after acquiring the speech signal to be evaluated, the method further includes:
依据第一语音质量评价模型对待评价的语音信号进行评价,得到待评价的语音信号的语音质量;并对待评价的语音信号的语音质量进行分类,得到不同区间等级的语音质量。Evaluate the speech signal to be evaluated according to the first speech quality evaluation model to obtain the speech quality of the speech signal to be evaluated; and classify the speech quality of the speech signal to be evaluated to obtain speech quality of different interval levels.
其中,所述不同区间等级的语音质量用于表征不同类别的语音质量。The voice quality of the different interval levels is used to represent the voice quality of different categories.
对于语音评价模型中的分类算法有多种选择,例如可以使用GBDT(GradientBoosting Decision Tree,梯度上升决策树)算法。There are various options for the classification algorithm in the speech evaluation model, for example, the GBDT (Gradient Boosting Decision Tree, Gradient Boosting Decision Tree) algorithm can be used.
具体的,本申请实施例中可利用决策树算法对语音信号的质量进行分类,参阅图2所示。Specifically, in this embodiment of the present application, a decision tree algorithm may be used to classify the quality of the speech signal, as shown in FIG. 2 .
图2中特征标识(1)、(2)等用于表示语音信号的特征参数的标识信息,决策树算法可以认为是一个预测模型,同时也可以理解为是一个分类树。本申请中利用决策树可对语音质量的分类进行映射。The feature identifiers (1) and (2) in FIG. 2 are used to represent the identification information of the feature parameters of the speech signal. The decision tree algorithm can be regarded as a prediction model, and can also be understood as a classification tree. In this application, a decision tree can be used to map the classification of voice quality.
通过决策树对语音质量的分类进行映射,并且决策树可以进行多次迭代形成渐进提升的组合树来对映射性能进行优化,例如图3,在图3中,学习器可对语音信号进行预测打分,从而得到预测的语音质量。The classification of speech quality is mapped through a decision tree, and the decision tree can be iterated multiple times to form a progressively improved combination tree to optimize the mapping performance, such as Figure 3. In Figure 3, the learner can predict and score the speech signal. , so as to obtain the predicted speech quality.
图3中的参数分别表示:θ表示权值,φ表示不同学习器的映射函数。The parameters in Figure 3 represent respectively: θ represents the weight, and φ represents the mapping function of different learners.
需要说明的是,图2和图3只是一种示例性说明,其具体形式及内容并不限于图中所示的形式和内容。例如,语音质量的分数集并不限于按照0-5分进行分类。It should be noted that FIG. 2 and FIG. 3 are only exemplary descriptions, and the specific form and content thereof are not limited to the form and content shown in the figures. For example, the set of scores for speech quality is not limited to classification on a scale of 0-5.
可以理解的是,决策树可通过机器学习等方法来获得,本申请实施例对此不作限定。It can be understood that the decision tree can be obtained by methods such as machine learning, which is not limited in this embodiment of the present application.
由图3中的推进算法可知,语音信号最终的预测打分结果为b个学习器语音质量结果的合并: It can be seen from the advancement algorithm in Figure 3 that the final prediction and scoring result of the speech signal is the combination of the speech quality results of the b learners:
可以理解的是,上式中的与图中的φ相对应。It is understandable that in the above formula Corresponds to φ in the figure.
将上述公式在函数空间优化可得到: Optimizing the above formula in the function space can be obtained:
其中,ρ表示学习率。where ρ is the learning rate.
根据上述公式可得到每次对一个语音样本的训练值为:According to the above formula, the training value of one speech sample each time can be obtained:
由上述公式可知:语音质量分数可对应至不同的语音质量分数区间,例如[0,1],[1,2]等,不同的语音质量分数区间可对应不同的语音类别。It can be known from the above formula that the voice quality score can correspond to different voice quality score intervals, such as [0, 1], [1, 2], and different voice quality score intervals can correspond to different voice categories.
较佳的,待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,可包括:Preferably, the identification information of the voice signal to be evaluated is different from the identification information of the saved voice signal, and may include:
待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,和/或待评价的语音信号的特征参数与已经保存的语音信号的特征参数不同。The speech quality of the speech signal to be evaluated is different from the speech quality of the already stored speech signal, and/or the characteristic parameter of the speech signal to be evaluated is different from the characteristic parameter of the already stored speech signal.
具体的,待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,可包括:Specifically, the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal that has been saved, and may include:
待评价的语音信号的语音质量与已经保存的语音信号的语音质量为同一区间等级的语音信号,且待评价的语音信号的语音质量与已经保存的语音信号的语音质量的差值小于设定阈值(例如可以为第二预设阈值),或待评价的语音信号的语音质量与已经保存的语音信号的语音质量为不同区间等级的语音信号。The voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval level, and the difference between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is less than the set threshold (for example, it may be the second preset threshold), or the voice quality of the voice signal to be evaluated and the voice quality of the voice signal that has been saved are voice signals of different interval levels.
可选的,本申请实施例中可先利用内置的语音质量评价模型(第一语音质量评价模型)对待评价的语音信号进行评价,以便于判断该待评价的语音信号是否是新的语音信号。Optionally, in this embodiment of the present application, the built-in speech quality evaluation model (the first speech quality evaluation model) may be used to evaluate the speech signal to be evaluated, so as to determine whether the speech signal to be evaluated is a new speech signal.
具体的,本申请实施例中通过从外部获取语音数据,并利用内置的评价模型对新获取的语音数据进行语音质量的分类和打分,然后判断新获取的语音信号是否属于新样本。如果不同分类的语音数据差异不大,或者同一类别的语音数据的分数与旧语音数据的分数差异过大,则可将这部分语音数据作为新样本。Specifically, in the embodiment of the present application, the voice data is obtained from the outside, and the built-in evaluation model is used to classify and score the voice quality of the newly obtained voice data, and then determine whether the newly obtained voice signal belongs to a new sample. If the voice data of different categories are not very different, or the scores of the voice data of the same category differ too much from the scores of the old voice data, this part of the voice data can be used as a new sample.
具体的,利用特征参数判断待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,可包括但不限于如下几种方法:Specifically, using the characteristic parameter to determine that the identification information of the speech signal to be evaluated is different from the identification information of the stored speech signal may include but not be limited to the following methods:
(1)、基于一元正态分布检测:(1), based on the univariate normal distribution detection:
原数据集合为xi,1,xi,2,xi,3,…,xi,n,i∈(1,…,m),包含m个样本,n维特征,可以计算每个特征维度的均值和方法:The original data set is xi,1 , xi,2 , xi,3 ,..., xi,n ,i∈(1,...,m), including m samples, n-dimensional features, each feature can be calculated The mean and method of dimensions:
对于新数据可以计算其概率为:For new data, the probability can be calculated as:
可根据概率判断新数据与旧数据特征分布差异。The difference in feature distribution between the new data and the old data can be judged according to the probability.
(2)、基于多元高斯分布检测:(2), based on multivariate Gaussian distribution detection:
原数据集合为共n维特征向量,可以计算n维特征均值向量和n*n的协方差矩阵:The original data set is For a total of n-dimensional eigenvectors, the n-dimensional eigenmean vector and the n*n covariance matrix can be calculated:
Σ=[Cov(xi,xj)],i,j∈(1,…,n)Σ=[Cov(x i , x j )], i, j∈(1,...,n)
对于新数据可以计算其概率为:For new data, the probability can be calculated as:
可根据概率判断新数据与旧数据特征分布差异,其中公式中的T表示对矩阵的转置。The difference between the feature distribution of the new data and the old data can be judged according to the probability, where T in the formula represents the transposition of the matrix.
(3)、基于马氏距离检测:(3), based on Mahalanobis distance detection:
对于多维数据集合,a为均值向量,新数据a到a的马氏距离为:For multidimensional data sets, a is the mean vector, and the Mahalanobis distance from the new data a to a is:
其中T表示对矩阵的转置,S为协方差矩阵,如果该S值过大则认为特征分布不同。Among them, T represents the transpose of the matrix, and S is the covariance matrix. If the value of S is too large, the feature distribution is considered to be different.
(4)、基于特征重要性检测:(4), based on feature importance detection:
利用基于树的模型,如GBDT等能够得出特征的重要度排名。The importance ranking of features can be derived using tree-based models such as GBDT.
特征j的全局重要度通过特征j在单棵树中的重要度的平均值来衡量:The global importance of feature j is measured by the average of the importance of feature j in a single tree:
其中,M是树的数量。where M is the number of trees.
特征j在单棵树的重要度如下:The importance of feature j in a single tree is as follows:
其中,L为树的叶子节点数量,L-1即为树的非叶子节点数量。νt是和节点t相关联的特征,是节点t分裂后平方损失减少值,J表示特征集合,T表示树的集合。Among them, L is the number of leaf nodes of the tree, and L-1 is the number of non-leaf nodes of the tree. ν t is the feature associated with node t, is the squared loss reduction value after node t is split, J represents the feature set, and T represents the set of trees.
对于新样本训练的前k个特征,如果和原数据集特征相差较多,则认为与原数据集分布不同。For the first k features trained by the new sample, if they are quite different from the features of the original dataset, it is considered that the distribution is different from the original dataset.
本申请实施例中,一种可能的实施方式中,可通过如图4所示的方法流程图增量学习语音数据,从而更新内置的语音质量评价模型,参阅图4所示。In the embodiment of the present application, in a possible implementation manner, the speech data may be incrementally learned through the method flow chart shown in FIG. 4 , thereby updating the built-in speech quality evaluation model, as shown in FIG. 4 .
可以理解的是,图4中的正常打分即为通过内置的语音质量评价模型进行的打分。It can be understood that the normal scoring in FIG. 4 is the scoring performed by the built-in speech quality evaluation model.
对于本申请实施例中的整个方法流程,可参与如图5所示的方法流程图,在该方法中,通过获取外部的语音信号,并对语音信号进行预处理,然后利用决策树算法进行语音信号质量的分类,得到语音信号的质量打分,再判断语音样本数据是否符合新的样本特征,当语音信号为新的样本时,收集新样本到一定数量之后,对内置语音质量评价模型进行更新,利用更新后的语音质量评价模型进行标准打分。For the entire method flow in the embodiment of the present application, the method flow chart shown in FIG. 5 can be involved. In this method, by acquiring an external voice signal, preprocessing the voice signal, and then using a decision tree algorithm to perform voice Classify the signal quality, get the quality score of the voice signal, and then judge whether the voice sample data conforms to the new sample characteristics. When the voice signal is a new sample, after collecting a certain number of new samples, the built-in voice quality evaluation model is updated. Standard scoring is performed using the updated speech quality evaluation model.
基于与上述一种方法实施例相同的构思,本发明实施例还提供了一种语音质量的评价装置的结构框图,参阅图6所示,该装置包括:获取单元101、确定单元102、更新单元103、评价单元104。Based on the same concept as the above method embodiment, an embodiment of the present invention also provides a structural block diagram of an apparatus for evaluating voice quality. Referring to FIG. 6 , the apparatus includes: an
其中,获取单元101,用于获取待评价的语音信号。Wherein, the obtaining
确定单元102,用于确定所述获取单元101获取到的待评价的语音信号的标识信息,并在确定待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同时,将待评价的语音信号作为新的语音信号。The determining
更新单元103,在确定单元102确定的新的语音信号的数量大于第一预设阈值时,对第一语音质量评价模型进行更新,得到第二语音质量评价模型。The updating
其中,已经保存的语音信号为在所述待评价的语音信号之前获取的语音信号。The stored voice signal is the voice signal obtained before the voice signal to be evaluated.
评价单元104,用于利用更新单元103得到的第二语音质量评价模型,对待评价的语音信号进行评价。The evaluating
具体的,所述更新单元103具体用于按如下方式对第一语音质量评价模型进行更新,得到第二语音质量评价模型:Specifically, the updating
获取新的语音信号的特征参数;利用决策树算法对特征参数进行训练,更新第一语音质量评价模型,得到第二语音质量评价模型。Obtain the characteristic parameters of the new speech signal; use the decision tree algorithm to train the characteristic parameters, update the first speech quality evaluation model, and obtain the second speech quality evaluation model.
其中,特征参数包括如下至少一项:信噪比、背景噪声、噪声级别、平均语音信号频谱的非对称干扰值、高频平坦度分析、频谱等级范围、频谱等级标准差、相对噪声基底、线性预测系数的偏度系数、倒频谱偏度系数、浊音、后腔的平均截面、声道振幅变化、语音级别。The characteristic parameters include at least one of the following: signal-to-noise ratio, background noise, noise level, asymmetric interference value of the average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, linearity Skewness coefficient of prediction coefficients, cepstral skewness coefficient, voiced sound, mean cross section of back cavity, vocal tract amplitude variation, speech level.
相应的,所述装置还包括:处理单元105用于:Correspondingly, the apparatus further includes: the
对所待评价的语音信号进行如下至少一项预处理:语音数据有效性检测、语音数据归一化处理、缺省值差值拟合。At least one of the following preprocessing is performed on the speech signal to be evaluated: speech data validity detection, speech data normalization processing, and default value difference fitting.
更进一步的,所述评价单元104还用于:Further, the
依据第一语音质量评价模型对所述待评价的语音信号进行评价,得到待评价的语音信号的语音质量。The speech signal to be evaluated is evaluated according to the first speech quality evaluation model to obtain the speech quality of the speech signal to be evaluated.
所述处理单元105还用于:The
对待评价的语音信号的语音质量进行分类,得到不同区间等级的语音质量;不同区间等级的语音质量用于表征不同类别的语音质量。The speech quality of the speech signal to be evaluated is classified to obtain speech quality of different interval levels; the speech quality of different interval levels is used to characterize the speech quality of different categories.
可选的,待评价的语音信号的标识信息与已经保存的语音信号的标识信息不同,包括:Optionally, the identification information of the voice signal to be evaluated is different from the identification information of the saved voice signal, including:
待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,和/或待评价的语音信号的特征参数与已经保存的语音信号的特征参数不同。The speech quality of the speech signal to be evaluated is different from the speech quality of the already stored speech signal, and/or the characteristic parameter of the speech signal to be evaluated is different from the characteristic parameter of the already stored speech signal.
更进一步的,待评价的语音信号的语音质量与已经保存的语音信号的语音质量不同,包括:Further, the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal that has been saved, including:
待评价的语音信号的语音质量与已经保存的语音信号的语音质量为同一区间等级的语音信号,且待评价的语音信号的语音质量与所述已经保存的语音信号的语音质量的差值小于第二预设阈值;或待评价的语音信号的语音质量与已经保存的语音信号的语音质量为不同区间等级的语音信号。The voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval level, and the difference between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is less than the first Two preset thresholds; or the voice quality of the voice signal to be evaluated and the voice quality of the saved voice signal are voice signals of different interval levels.
需要说明的是,本发明实施例中上述涉及的语音质量的评价装置中各个单元的功能实现可以进一步参照相关方法实施例的描述,在此不再赘述。It should be noted that, for the function implementation of each unit in the above-mentioned apparatus for evaluating voice quality in the embodiments of the present invention, further reference may be made to the description of the related method embodiments, and details are not repeated here.
本申请实施例还提供另外一种语音质量的评价装置,如图7所示,该装置包括:The embodiment of the present application also provides another voice quality evaluation device, as shown in FIG. 7 , the device includes:
存储器202,用于存储程序指令。
收发机201,用于接收和发送语音质量的评价指令。The
处理器200,用于调用所述存储器中存储的程序指令,根据收发机201接收到的指令按照获得的程序执行本申请实施例图6所示的处理单元(102)、确定单元(103)、更新单元(104)以及评价单元(105)所执行的方法。The
其中,在图7中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器200代表的一个或多个处理器和存储器202代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口提供接口。7, the bus architecture may include any number of interconnected buses and bridges, specifically one or more processors represented by
收发机201可以是多个元件,即包括发送机和收发机,提供用于在传输介质上与各种其他装置通信的单元。
处理器200负责管理总线架构和通常的处理,存储器202可以存储处理器200在执行操作时所使用的数据。The
处理器200可以是中央处理器(CPU)、专用集成电路(Application SpecificIntegrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)。The
本申请实施例还提供了一种计算机存储介质,用于储存为上述本申请实施例中所述的任一装置所用的计算机程序指令,其包含用于执行上述本申请实施例提供的任一方法的程序。The embodiments of the present application further provide a computer storage medium for storing computer program instructions used by any of the apparatuses described in the above embodiments of the present application, including instructions for executing any of the methods provided by the above embodiments of the present application program of.
所述计算机存储介质可以是计算机能够存取的任何可用介质或数据存储设备,包括但不限于磁性存储器(例如软盘、硬盘、磁带、磁光盘(MO)等)、光学存储器(例如CD、DVD、BD、HVD等)、以及半导体存储器(例如ROM、EPROM、EEPROM、非易失性存储器(NAND FLASH)、固态硬盘(SSD))等。The computer storage medium can be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic storage (eg, floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical storage (eg CD, DVD, BD, HVD, etc.), and semiconductor memory (eg, ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state disk (SSD)), and the like.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811544623.XA CN111326169B (en) | 2018-12-17 | 2018-12-17 | Voice quality evaluation method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811544623.XA CN111326169B (en) | 2018-12-17 | 2018-12-17 | Voice quality evaluation method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111326169A true CN111326169A (en) | 2020-06-23 |
| CN111326169B CN111326169B (en) | 2023-11-10 |
Family
ID=71172436
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811544623.XA Active CN111326169B (en) | 2018-12-17 | 2018-12-17 | Voice quality evaluation method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111326169B (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111816207A (en) * | 2020-08-31 | 2020-10-23 | 广州汽车集团股份有限公司 | Sound analysis method, system, automobile and storage medium |
| CN112632841A (en) * | 2020-12-22 | 2021-04-09 | 交通运输部科学研究院 | Road surface long-term performance prediction method and device |
| CN112634946A (en) * | 2020-12-25 | 2021-04-09 | 深圳市博瑞得科技有限公司 | Voice quality classification prediction method, computer equipment and storage medium |
| CN112885377A (en) * | 2021-02-26 | 2021-06-01 | 平安普惠企业管理有限公司 | Voice quality evaluation method and device, computer equipment and storage medium |
| CN113393863A (en) * | 2021-06-10 | 2021-09-14 | 北京字跳网络技术有限公司 | Voice evaluation method, device and equipment |
| CN113838168A (en) * | 2021-10-13 | 2021-12-24 | 亿览在线网络技术(北京)有限公司 | Method for generating particle special effect animation |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101740024A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for automatic evaluation based on generalized fluent spoken language fluency |
| US20120059650A1 (en) * | 2009-04-17 | 2012-03-08 | France Telecom | Method and device for the objective evaluation of the voice quality of a speech signal taking into account the classification of the background noise contained in the signal |
| US20140032212A1 (en) * | 2011-04-11 | 2014-01-30 | Orange | Evaluation of the voice quality of a coded speech signal |
| WO2017041553A1 (en) * | 2015-09-07 | 2017-03-16 | 中兴通讯股份有限公司 | Method and apparatus for determining voice quality |
| CN106558308A (en) * | 2016-12-02 | 2017-04-05 | 深圳撒哈拉数据科技有限公司 | A kind of internet audio quality of data auto-scoring system and method |
| CN107895582A (en) * | 2017-10-16 | 2018-04-10 | 中国电子科技集团公司第二十八研究所 | A speaker-adaptive speech emotion recognition method for multi-source information domain |
| CN108346434A (en) * | 2017-01-24 | 2018-07-31 | 中国移动通信集团安徽有限公司 | A kind of method and apparatus of speech quality evaluation |
-
2018
- 2018-12-17 CN CN201811544623.XA patent/CN111326169B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101740024A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for automatic evaluation based on generalized fluent spoken language fluency |
| US20120059650A1 (en) * | 2009-04-17 | 2012-03-08 | France Telecom | Method and device for the objective evaluation of the voice quality of a speech signal taking into account the classification of the background noise contained in the signal |
| US20140032212A1 (en) * | 2011-04-11 | 2014-01-30 | Orange | Evaluation of the voice quality of a coded speech signal |
| WO2017041553A1 (en) * | 2015-09-07 | 2017-03-16 | 中兴通讯股份有限公司 | Method and apparatus for determining voice quality |
| CN106558308A (en) * | 2016-12-02 | 2017-04-05 | 深圳撒哈拉数据科技有限公司 | A kind of internet audio quality of data auto-scoring system and method |
| CN108346434A (en) * | 2017-01-24 | 2018-07-31 | 中国移动通信集团安徽有限公司 | A kind of method and apparatus of speech quality evaluation |
| CN107895582A (en) * | 2017-10-16 | 2018-04-10 | 中国电子科技集团公司第二十八研究所 | A speaker-adaptive speech emotion recognition method for multi-source information domain |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111816207A (en) * | 2020-08-31 | 2020-10-23 | 广州汽车集团股份有限公司 | Sound analysis method, system, automobile and storage medium |
| CN112632841A (en) * | 2020-12-22 | 2021-04-09 | 交通运输部科学研究院 | Road surface long-term performance prediction method and device |
| CN112634946A (en) * | 2020-12-25 | 2021-04-09 | 深圳市博瑞得科技有限公司 | Voice quality classification prediction method, computer equipment and storage medium |
| CN112634946B (en) * | 2020-12-25 | 2022-04-12 | 博瑞得科技有限公司 | Voice quality classification prediction method, computer equipment and storage medium |
| CN112885377A (en) * | 2021-02-26 | 2021-06-01 | 平安普惠企业管理有限公司 | Voice quality evaluation method and device, computer equipment and storage medium |
| CN113393863A (en) * | 2021-06-10 | 2021-09-14 | 北京字跳网络技术有限公司 | Voice evaluation method, device and equipment |
| CN113393863B (en) * | 2021-06-10 | 2023-11-03 | 北京字跳网络技术有限公司 | Voice evaluation method, device and equipment |
| CN113838168A (en) * | 2021-10-13 | 2021-12-24 | 亿览在线网络技术(北京)有限公司 | Method for generating particle special effect animation |
| CN113838168B (en) * | 2021-10-13 | 2023-10-03 | 亿览在线网络技术(北京)有限公司 | Particle special effect animation generation method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111326169B (en) | 2023-11-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111326169B (en) | Voice quality evaluation method and device | |
| US10964337B2 (en) | Method, device, and storage medium for evaluating speech quality | |
| Jacob | Modelling speech emotion recognition using logistic regression and decision trees | |
| JP5223673B2 (en) | Audio processing apparatus and program, and audio processing method | |
| CN113223485B (en) | Training method of beat detection model, beat detection method and device | |
| WO2018107810A1 (en) | Voiceprint recognition method and apparatus, and electronic device and medium | |
| CN112508580A (en) | Model construction method and device based on rejection inference method and electronic equipment | |
| CN107690660A (en) | Image-recognizing method and device | |
| WO2020045313A1 (en) | Mask estimation device, mask estimation method, and mask estimation program | |
| CN109378014A (en) | A method and system for source identification of mobile devices based on convolutional neural network | |
| CN114372139B (en) | Data processing method, abstract display method, device, equipment and storage medium | |
| CN114186646A (en) | Block chain abnormal transaction identification method and device, storage medium and electronic equipment | |
| CN115062678A (en) | Training method of equipment fault detection model, fault detection method and device | |
| JP2019008131A (en) | Speaker determination device, speaker determination information generation method, and program | |
| CN109104257A (en) | A kind of wireless signal detection method and device | |
| CN118228993A (en) | Demand priority determination method, device, computer equipment and storage medium | |
| CN110782879B (en) | Voiceprint clustering method, device, equipment and storage medium based on sample size | |
| CN106663110B (en) | Derivation of probability scores for audio sequence alignment | |
| CN109545198A (en) | A kind of Oral English Practice mother tongue degree judgment method based on convolutional neural networks | |
| CN119476197A (en) | Data display processing method, device, electronic device and storage medium | |
| CN118609573A (en) | Voice signal processing method, device, electronic device and non-volatile storage medium | |
| CN115757315A (en) | Method for evaluating performance of speaker log system, electronic device and storage medium | |
| CN112906999B (en) | Method, device and computing equipment for evaluating effect of traffic index optimization | |
| CN117407521A (en) | Community classification method and device | |
| CN109036390B (en) | Broadcast keyword identification method based on integrated gradient elevator |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |