CN111798857A - Information identification method and device, electronic equipment and storage medium - Google Patents
Information identification method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111798857A CN111798857A CN201910277902.2A CN201910277902A CN111798857A CN 111798857 A CN111798857 A CN 111798857A CN 201910277902 A CN201910277902 A CN 201910277902A CN 111798857 A CN111798857 A CN 111798857A
- Authority
- CN
- China
- Prior art keywords
- feature vector
- audio
- service
- network
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 91
- 239000013598 vector Substances 0.000 claims abstract description 243
- 238000012795 verification Methods 0.000 claims description 62
- 238000007477 logistic regression Methods 0.000 claims description 23
- 238000011176 pooling Methods 0.000 claims description 23
- 238000000605 extraction Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 13
- 230000006403 short-term memory Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 230000007787 long-term memory Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 description 24
- 238000002372 labelling Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000029305 taxis Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/16—Hidden Markov models [HMM]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及互联网技术领域,具体而言,涉及一种信息识别方法、装置、电子设备及存储介质。The present application relates to the field of Internet technologies, and in particular, to an information identification method, apparatus, electronic device, and storage medium.
背景技术Background technique
随着科技的发展,互联网技术已经渗透到人们生活的各个方面,例如,通过网络进行打车、购物、订外卖等,但是,目前网络上的信息呈爆炸性增长,信息更新的频率也越来越快,这就使得网络上存在各式各样的信息,当用户通过网络购买服务时,需要浏览大量的信息才能找到想要的服务,这无疑会降低用户体验。因此,对于服务提供商而言,如何在不给用户带来负担的前提下为用户提供更适合的服务,是一个亟待解决的问题。With the development of science and technology, Internet technology has penetrated into all aspects of people's lives, such as taxis, shopping, takeout orders, etc. through the Internet. However, the information on the Internet is increasing explosively, and the frequency of information updates is getting faster and faster. , which makes all kinds of information exist on the network. When a user purchases a service through the network, he needs to browse a large amount of information to find the desired service, which will undoubtedly reduce the user experience. Therefore, for service providers, how to provide users with more suitable services without burdening users is an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请实施例的目的在于提供一种信息识别方法、装置、电子设备及存储介质,能够在用户无感知的情况下提供更适合的服务,提升用户体验。In view of this, the purpose of the embodiments of the present application is to provide an information identification method, apparatus, electronic device and storage medium, which can provide more suitable services and improve user experience without the user's perception.
第一方面,本申请实施例提供了一种信息识别方法,所述方法包括:获取待识别音频;确定表征所述待识别音频的声学特征的特征向量;基于所述特征向量以及预先训练的信息识别模型,得到所述待识别音频对应的服务标签;依据所述服务标签确定出与所述服务标签相匹配的服务信息。In a first aspect, an embodiment of the present application provides an information recognition method, the method includes: acquiring audio to be recognized; determining a feature vector representing an acoustic feature of the audio to be recognized; based on the feature vector and pre-trained information Identifying the model to obtain a service tag corresponding to the audio to be recognized; and determining service information matching the service tag according to the service tag.
一种可选实施方式中,所述服务标签包括年龄识别结果;所述基于所述特征向量以及预先训练的信息识别模型,得到所述待识别音频对应的服务标签的步骤,包括:将所述特征向量输入所述信息识别模型,得到所述特征向量在各个预设年龄区间的概率值;依据所述特征向量在各个预设年龄区间的概率值,确定出所述待识别音频对应的年龄识别结果。In an optional embodiment, the service tag includes an age recognition result; the step of obtaining the service tag corresponding to the audio to be recognized based on the feature vector and a pre-trained information recognition model includes: The feature vector is input into the information recognition model, and the probability value of the feature vector in each preset age interval is obtained; according to the probability value of the feature vector in each preset age interval, the age recognition corresponding to the to-be-recognized audio is determined result.
一种可选实施方式中,所述信息识别模型包括第一网络及第二网络;所述将所述特征向量输入所述信息识别模型,得到所述特征向量在各个预设年龄区间的概率值的步骤,包括:将所述特征向量输入所述信息识别模型,利用所述第一网络对所述特征向量进行特征提取,得到第一特征图;将从所述第一网络提取到的第一特征图输入所述第二网络并进行分类,得到所述特征向量在各个预设年龄区间的概率值。In an optional embodiment, the information identification model includes a first network and a second network; the feature vector is input into the information identification model to obtain the probability value of the feature vector in each preset age interval. The steps include: inputting the feature vector into the information recognition model, and using the first network to perform feature extraction on the feature vector to obtain a first feature map; The feature map is input into the second network and classified to obtain the probability value of the feature vector in each preset age range.
一种可选实施方式中,所述第一网络包括卷积层及统计池化层;所述将所述特征向量输入所述信息识别模型,利用所述第一网络对所述特征向量进行特征提取,得到第一特征图的步骤,包括:将所述特征向量输入所述信息识别模型,利用所述卷积层的卷积核和卷积核偏置对所述特征向量进行卷积处理,得到第一输出特征图;利用所述统计池化层对所述第一输出特征图进行统计池化处理,得到第一特征图。In an optional embodiment, the first network includes a convolution layer and a statistical pooling layer; the feature vector is input into the information recognition model, and the feature vector is characterized by using the first network. The step of extracting and obtaining the first feature map includes: inputting the feature vector into the information recognition model, and using the convolution kernel of the convolution layer and the convolution kernel offset to perform convolution processing on the feature vector, obtaining a first output feature map; using the statistical pooling layer to perform statistical pooling processing on the first output feature map to obtain a first feature map.
一种可选实施方式中,所述第二网络包括全连接层及多类逻辑回归层;所述将从所述第一网络提取到的第一特征图输入所述第二网络并进行分类,得到所述特征向量在各个预设年龄区间的概率值的步骤,包括:将从所述第一网络提取到的第一特征图输入所述全连接层,利用所述全连接层对第一特征图进行降维,得到第一向量;将所述第一向量输入所述多类逻辑回归层,得到所述特征向量在各个预设年龄区间的概率值。In an optional implementation manner, the second network includes a fully connected layer and a multi-class logistic regression layer; the first feature map extracted from the first network is input to the second network and classified, The step of obtaining the probability value of the feature vector in each preset age interval includes: inputting the first feature map extracted from the first network into the fully connected layer, and using the fully connected layer to analyze the first feature Dimensionality reduction is performed on the graph to obtain a first vector; the first vector is input into the multi-class logistic regression layer to obtain the probability value of the feature vector in each preset age interval.
一种可选实施方式中,所述依据所述特征向量在各个预设年龄区间的概率值,确定出所述待识别音频对应的年龄识别结果的步骤,包括:将所述特征向量在各个预设年龄区间的概率值进行加权平均,得到所述待识别音频对应的年龄识别结果。In an optional embodiment, the step of determining the age recognition result corresponding to the audio to be recognized according to the probability value of the feature vector in each preset age interval includes: placing the feature vector in each preset age range. It is assumed that the probability values of the age intervals are weighted and averaged to obtain the age recognition result corresponding to the audio to be recognized.
一种可选实施方式中,服务器预先存储有多个预设年龄区间及每个预设年龄区间对应的服务内容;所述依据所述服务标签确定出与所述服务标签相匹配的服务信息的步骤,包括:依据所述年龄识别结果,从所述多个预设年龄区间中确定出所述年龄识别结果所属的目标年龄区间;获取所述目标年龄区间对应的服务内容,得到与所述服务标签相匹配的服务信息。In an optional implementation manner, the server pre-stores a plurality of preset age intervals and the service content corresponding to each preset age interval; The steps include: determining a target age interval to which the age recognition result belongs from the plurality of preset age intervals according to the age recognition result; acquiring service content corresponding to the target age interval, and obtaining the service content corresponding to the service The tag matches the service information.
一种可选实施方式中,所述服务标签包括性别识别结果;所述基于所述特征向量以及预先训练的信息识别模型,得到所述待识别音频对应的服务标签的步骤,包括:将所述特征向量输入所述信息识别模型,得到所述特征向量在各个预设性别类别的概率值;利用预先训练的隐马尔科夫模型对所述特征向量在各个预设性别类别的概率值进行约束,得到所述待识别音频对应的性别识别结果。In an optional embodiment, the service label includes a gender recognition result; the step of obtaining the service label corresponding to the audio to be recognized based on the feature vector and a pre-trained information recognition model includes: The feature vector is input into the information recognition model, and the probability value of the feature vector in each preset gender category is obtained; the pre-trained hidden Markov model is used to constrain the probability value of the feature vector in each preset gender category, A gender recognition result corresponding to the audio to be recognized is obtained.
一种可选实施方式中,所述信息识别模型包括第三网络及第四网络;所述将所述特征向量输入所述信息识别模型,得到所述特征向量在各个预设性别类别的概率值的步骤,包括:将所述特征向量输入所述信息识别模型,利用所述第三网络对所述特征向量进行特征提取,得到第二特征图;利用所述第四网络对所述第三网络输出的第二特征图进行分类,得到所述特征向量在各个预设性别类别的概率值。In an optional embodiment, the information identification model includes a third network and a fourth network; the feature vector is input into the information identification model to obtain the probability value of the feature vector in each preset gender category. The steps include: inputting the feature vector into the information recognition model, using the third network to perform feature extraction on the feature vector, and obtaining a second feature map; using the fourth network to extract the third network The outputted second feature map is classified to obtain the probability value of the feature vector in each preset gender category.
一种可选实施方式中,所述第三网络包括卷积层及长短时记忆层;所述将所述特征向量输入所述信息识别模型,利用所述第三网络对所述特征向量进行特征提取,得到第二特征图的步骤,包括:将所述特征向量输入所述信息识别模型,利用所述卷积层的卷积核和卷积核偏置对所述特征向量进行卷积处理,得到第二输出特征图;利用所述长短时记忆层捕捉所述第二输出特征图的序列信息,得到第二特征图。In an optional embodiment, the third network includes a convolution layer and a long-term and short-term memory layer; the feature vector is input into the information recognition model, and the feature vector is characterized by the third network. The step of extracting and obtaining the second feature map includes: inputting the feature vector into the information recognition model, and performing convolution processing on the feature vector by using the convolution kernel and the convolution kernel offset of the convolution layer, Obtain a second output feature map; use the long and short-term memory layer to capture sequence information of the second output feature map to obtain a second feature map.
一种可选实施方式中,所述第四网络包括全连接层及多类逻辑回归层;所述利用所述第四网络对所述第三网络输出的第二特征图进行分类,得到所述特征向量在各个预设性别类别的概率值的步骤,包括:将所述第三网络输出的第二特征图输入所述全连接层,利用所述全连接层对第二特征图进行降维,得到第二向量;利用所述多类逻辑回归层对所述第二向量进行处理,得到所述特征向量在各个预设性别类别的概率值。In an optional implementation manner, the fourth network includes a fully connected layer and a multi-class logistic regression layer; the fourth network is used to classify the second feature map output by the third network to obtain the The step of determining the probability value of the feature vector in each preset gender category includes: inputting the second feature map output by the third network into the fully connected layer, and using the fully connected layer to reduce the dimension of the second feature map, Obtain a second vector; use the multi-class logistic regression layer to process the second vector to obtain the probability value of the feature vector in each preset gender category.
一种可选实施方式中,服务器预先存储有多个预设性别类别及每个预设性别类别对应的服务内容;所述依据所述服务标签确定出与所述服务标签相匹配的服务信息的步骤,包括:依据所述性别识别结果,从所述多个服务内容中获取所述性别识别结果对应的目标服务内容,得到与所述服务标签相匹配的服务信息。In an optional implementation manner, the server pre-stores multiple preset gender categories and service content corresponding to each preset gender category; The step includes: acquiring target service content corresponding to the gender identification result from the plurality of service contents according to the gender identification result, and obtaining service information matching the service tag.
一种可选实施方式中,所述服务标签包括身份验证结果;所述基于所述特征向量以及预先训练的信息识别模型,得到所述待识别音频对应的服务标签的步骤,包括:获取所述待识别音频对应的标准音频;确定表征所述标准音频的声学特征的特征向量;将所述待识别音频对应的特征向量和所述标准音频对应的特征向量均输入所述信息识别模型,得到所述待识别音频和所述标准音频的相似度分值;根据所述相似度分值确定出所述待识别音频对应的身份验证结果,所述身份验证结果包括验证通过结果或验证失败结果。In an optional embodiment, the service label includes an identity verification result; the step of obtaining the service label corresponding to the audio to be recognized based on the feature vector and a pre-trained information recognition model includes: obtaining the The standard audio corresponding to the audio to be recognized; determine the feature vector representing the acoustic features of the standard audio; input the feature vector corresponding to the audio to be recognized and the feature vector corresponding to the standard audio into the information recognition model to obtain the The similarity score between the to-be-recognized audio and the standard audio is determined; the identity verification result corresponding to the to-be-recognized audio is determined according to the similarity score, and the identity verification result includes a verification pass result or a verification failure result.
一种可选实施方式中,服务器预先存储有身份验证结果及所述身份验证结果对应的服务内容;所述依据所述服务标签确定出与所述服务标签相匹配的服务信息的步骤,包括:当所述身份验证结果为验证通过结果时,获取所述验证通过结果对应的服务内容作为与所述服务标签相匹配的服务信息;当所述身份验证结果为验证失败结果时,获取所述验证失败结果对应的服务内容作为与所述服务标签相匹配的服务信息。In an optional embodiment, the server pre-stores the identity verification result and the service content corresponding to the identity verification result; the step of determining the service information matching the service tag according to the service tag includes: When the identity verification result is a verification pass result, acquire the service content corresponding to the verification pass result as the service information matching the service tag; when the identity verification result is a verification failure result, acquire the verification The service content corresponding to the failure result is used as the service information matching the service tag.
第二方面,本申请实施例还提供了一种信息识别装置,所述装置包括:音频获取模块,用于获取待识别音频;第一执行模块,用于确定表征所述待识别音频的声学特征的特征向量;第二执行模块,用于基于所述特征向量以及预先训练的信息识别模型,得到所述待识别音频对应的服务标签;第三执行模块,用于依据所述服务标签确定出与所述服务标签相匹配的服务信息。In a second aspect, an embodiment of the present application further provides an information identification device, the device includes: an audio acquisition module, configured to acquire audio to be identified; a first execution module, configured to determine an acoustic feature characterizing the audio to be identified The second execution module is used to obtain the corresponding service tag of the audio to be identified based on the feature vector and the pre-trained information identification model; the third execution module is used to determine the corresponding service tag according to the service tag. The service tag matches the service information.
第三方面,本申请实施例还提供了一种电子设备,包括:处理器、存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储介质之间通过总线通信,所述处理器执行所述机器可读指令,以运行时执行如上述的信息识别方法的步骤。In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium, and a bus, where the storage medium stores machine-readable instructions executable by the processor, and when the electronic device runs, The processor communicates with the storage medium through a bus, and the processor executes the machine-readable instructions to execute the steps of the above-mentioned information identification method at runtime.
第四方面,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如上述的信息识别方法的步骤。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the above-mentioned information identification method when the computer program is run by a processor.
本申请实施例通过获取用户的待识别音频,并确定出表征该待识别音频的声学特征的特征向量,再利用预先训练的信息识别模型对特征向量进行识别,得到待识别音频对应的服务标签,最后依据得到的服务标签确定出与该服务标签相匹配的服务信息,从而在用户无感知的情况下为用户提供更适合的服务,使用户享受服务的同时不会给用户带来负担,提高了用户体验。The embodiment of the present application obtains the audio to be recognized of the user, determines a feature vector representing the acoustic feature of the audio to be recognized, and then uses a pre-trained information recognition model to identify the feature vector to obtain a service tag corresponding to the audio to be recognized, Finally, the service information matching the service tag is determined according to the obtained service tag, so as to provide the user with a more suitable service without the user's perception, so that the user can enjoy the service without burdening the user. user experience.
本申请另一实施例将确定出的特征向量输入预先训练的信息识别模型,通过信息识别模型识别出特征向量在各个预设年龄区间的概率值,再依据得到的概率值确定出待识别音频对应的年龄识别结果,也就是,得到待识别音频的服务标签为年龄识别结果,从而能够进一步依据年龄识别结果确定出与用户年龄相匹配的服务信息。Another embodiment of the present application inputs the determined feature vector into a pre-trained information recognition model, identifies the probability value of the feature vector in each preset age range through the information recognition model, and then determines the corresponding audio frequency to be recognized according to the obtained probability value That is, the service tag of the audio to be recognized is obtained as the age recognition result, so that the service information matching the age of the user can be further determined according to the age recognition result.
本申请另一实施例将确定出的特征向量输入预先训练的信息识别模型,通过信息识别模型识别出特征向量在各个预设性别类别的概率值,再利用预先训练的隐马尔科夫模型对得到的概率值进行约束,以此确定出更为准确的性别识别结果,也就是,得到待识别音频的服务标签为性别识别结果,从而能够进一步依据性别识别结果确定出与用户性别相匹配的服务信息。Another embodiment of the present application inputs the determined feature vector into a pre-trained information identification model, identifies the probability value of the feature vector in each preset gender category through the information identification model, and then uses the pre-trained hidden Markov model to obtain The probability value is constrained to determine a more accurate gender recognition result, that is, the service label of the audio to be recognized is the gender recognition result, so that the service information that matches the user's gender can be further determined according to the gender recognition result. .
本申请另一实施例通过获取待识别音频对应的标准音频,并确定出表征该标准音频的声学特征的特征向量,再将待识别音频对应的特征向量和标准音频对应的特征向量均输入信息识别模型,得到待识别音频和标准音频的相似度分值,最后根据该相似度分值确定出身份验证结果,也就是,得到待识别音频的服务标签为身份验证结果,从而能够进一步依据身份验证结果确定出与用户身份相匹配的服务信息。Another embodiment of the present application obtains the standard audio corresponding to the audio to be recognized, determines a feature vector representing the acoustic features of the standard audio, and then inputs the feature vector corresponding to the audio to be recognized and the feature vector corresponding to the standard audio for information identification. model, obtain the similarity score of the audio to be recognized and the standard audio, and finally determine the authentication result according to the similarity score, that is, get the service tag of the audio to be recognized as the authentication result, so that it can be further based on the authentication result. Determine the service information that matches the user's identity.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.
图1示出了本申请实施例所提供的信息识别系统的架构图;FIG. 1 shows an architecture diagram of an information identification system provided by an embodiment of the present application;
图2示出了本申请实施例提供的信息识别方法流程图;2 shows a flowchart of an information identification method provided by an embodiment of the present application;
图3为图2示出的步骤S103的一个子步骤流程图;Fig. 3 is a sub-step flowchart of step S103 shown in Fig. 2;
图4为图2示出的步骤S103的另一个子步骤流程图;Fig. 4 is another sub-step flowchart of step S103 shown in Fig. 2;
图5为图2示出的步骤S103的又一个子步骤流程图;Fig. 5 is another sub-step flowchart of step S103 shown in Fig. 2;
图6示出了本申请实施例提供的信息识别装置的示意图;FIG. 6 shows a schematic diagram of an information identification device provided by an embodiment of the present application;
图7示出了本申请实施例提供的电子设备的结构框图。FIG. 7 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,应当理解,本申请中附图仅起到说明和描述的目的,并不用于限定本申请的保护范围。另外,应当理解,示意性的附图并未按实物比例绘制。本申请中使用的流程图示出了根据本申请的一些实施例实现的操作。应该理解,流程图的操作可以不按顺序实现,没有逻辑的上下文关系的步骤可以反转顺序或者同时实施。此外,本领域技术人员在本申请内容的指引下,可以向流程图添加一个或多个其他操作,也可以从流程图中移除一个或多个操作。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present application. The drawings are only for the purpose of illustration and description, and are not used to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented in accordance with some embodiments of the application. It should be understood that the operations of the flowcharts may be performed out of order and that steps without logical context may be performed in reverse order or concurrently. In addition, those skilled in the art can add one or more other operations to the flowchart, and can also remove one or more operations from the flowchart under the guidance of the content of the present application.
另外,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。In addition, the described embodiments are only some of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.
需要说明的是,本申请实施例中将会用到术语“包括”,用于指出其后所声明的特征的存在,但并不排除增加其它的特征。It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the existence of the features declared later, but does not exclude the addition of other features.
图1是本申请一些实施例的信息识别系统100的框图。例如,信息识别系统100可以是用于诸如出租车、代驾服务、快车、拼车、公共汽车服务、驾驶员租赁、班车服务、共享单车之类的运输服务、或其任意组合的在线运输服务平台;也可以是用于诸如外卖配送、生鲜配送、同城配送、物流配送之类的配送服务、或其任意组合的在线配送服务平台;还可以是用于诸如网上超市、网上商城、网上书店之类的购物服务、或其任意组合的电商服务平台。信息识别系统100可以包括服务器110、网络120、用户端130和数据库140中的一种或多种,服务器110中可以包括执行指令操作的处理器。FIG. 1 is a block diagram of an information identification system 100 according to some embodiments of the present application. For example, the information identification system 100 may be an online transportation service platform for transportation services such as taxis, ride-hailing services, express cars, carpooling, bus services, driver rentals, shuttle services, bike sharing, or any combination thereof ; It can also be an online delivery service platform for delivery services such as takeaway delivery, fresh food delivery, intra-city delivery, logistics delivery, or any combination thereof; it can also be used for online supermarkets, online malls, and online bookstores. type of shopping services, or any combination of e-commerce service platforms. The information identification system 100 may include one or more of a
在一些实施例中,服务器110可以包括处理器。处理器可以处理与服务请求有关的信息和/或数据,以执行本申请中描述的一个或多个功能。在一些实施例中,处理器可以包括一个或多个处理核(例如,单核处理器(S)或多核处理器(S))。仅作为举例,处理器可以包括中央处理单元(Central Processing Unit,CPU)、专用集成电路(Application SpecificIntegrated Circuit,ASIC)、专用指令集处理器(Application Specific Instruction-set Processor,ASIP)、图形处理单元(Graphics Processing Unit,GPU)、物理处理单元(Physics Processing Unit,PPU)、数字信号处理器(Digital Signal Processor,DSP)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、可编程逻辑器件(Programmable Logic Device,PLD)、控制器、微控制器单元、简化指令集计算机(ReducedInstruction Set Computing,RISC)、或微处理器等,或其任意组合。In some embodiments,
在一些实施例中,用户端130对应的设备类型可以是移动设备,比如可以包括智能家居设备、可穿戴设备、智能移动设备、虚拟现实设备、或增强现实设备等,也可以是平板计算机、膝上型计算机、或机动车辆中的内置设备等。In some embodiments, the device type corresponding to the
在一些实施例中,数据库140可以连接到网络120以与信息识别系统100中的一个或多个组件(例如,服务器110、用户端130等)通信。信息识别系统100中的一个或多个组件可以经由网络120访问存储在数据库140中的数据或指令。在一些实施例中,数据库140可以直接连接到信息识别系统100中的一个或多个组件,或者,数据库140也可以是服务器110的一部分。In some embodiments,
为了便于理解,以下实施方式主要以“网约车”应用场景为例进行描述。对于本领域技术人员来说,在不脱离本申请的精神和范围的情况下,可以将这里定义的一般原理应用于其他实施例和应用场景。虽然本申请主要围绕对打车进行信息识别进行描述,但是应该理解,这仅是一个示例性实施例。For ease of understanding, the following embodiments mainly describe the application scenario of "online car-hailing" as an example. For those skilled in the art, the general principles defined herein may be applied to other embodiments and application scenarios without departing from the spirit and scope of the present application. Although this application is mainly described around the information identification of taxis, it should be understood that this is only an exemplary embodiment.
请参照图2,图2示出了本申请实施例提供的信息识别方法流程图,该方法可以由信息识别系统100中的服务器110来执行,其包括以下步骤:Please refer to FIG. 2. FIG. 2 shows a flowchart of an information identification method provided by an embodiment of the present application. The method can be executed by the
步骤S101,获取待识别音频。Step S101, acquiring the audio to be recognized.
在本申请的一个实施例中,根据信息识别方法的应用场景不同,待识别音频可以有不同的获取方式,例如,将信息识别方法应用到网约车场景,可以通过运营商获取到乘客和司机双方电话通信的音频,也可以获取乘客和司机通过打车应用程序发送的音频,如果要对乘客进行信息识别,则可以从获取到的音频中提取乘客音频,即可得到待识别音频;如果要对司机进行信息识别,则可以从获取到的音频中提取司机音频,即可得到待识别音频。In an embodiment of the present application, according to different application scenarios of the information identification method, the audio to be identified can be obtained in different ways. For example, if the information identification method is applied to the online car-hailing scene, passengers and drivers can be obtained through the operator. The audio of the telephone communication between the two parties can also be obtained from the audio sent by the passenger and the driver through the taxi application. If the information of the passenger needs to be identified, the passenger audio can be extracted from the obtained audio to obtain the audio to be identified; When the driver performs information identification, the driver audio can be extracted from the acquired audio to obtain the audio to be recognized.
在本申请的另一个实施例中,由于信息识别模型容易受到信道变化的干扰,例如,同一说话人在电话信道和移动无线信道下容易被信息识别模型识别为两个人,因此,在获取待识别音频的过程中,可以通过引入信道补偿技术来克服这一问题。In another embodiment of the present application, since the information identification model is easily disturbed by channel changes, for example, the same speaker is easily identified as two persons by the information identification model under the telephone channel and the mobile wireless channel. In the audio process, this problem can be overcome by introducing channel compensation technology.
也就是说,可以采集用户在不同信道的音频,每个信道均预先建立有独立的信道模型,基于信道模型,通过信道映射的方式,采用最大似然线性回归(Maximum LikelihoodLinear Regression,MLLR)等算法将不同信道的音频映射至同一个标准特征空间,即可得到信道补偿后的待识别音频,进而实现同一说话人在多信道下的识别。例如,可以采集用户在电话信道和移动无线信道的音频,具体来说,通过运营商获取到的乘客和司机双方电话通信的音频即为电话信道的音频,获取到的乘客和司机通过打车应用程序发送的音频即为移动无线信道的音频,同时,可以预先对电话信道和移动无线信道收集大量的统计数据来建立信道模型,之后利用信道模型,分别对电话信道和移动无线信道建立一个映射,使得电话信道和移动无线信道的音频映射至同一个标准特征空间,即可得到待识别音频。That is to say, the user's audio in different channels can be collected, and each channel has an independent channel model established in advance. Based on the channel model, through channel mapping, algorithms such as Maximum Likelihood Linear Regression (MLLR) are used. By mapping the audio of different channels to the same standard feature space, the to-be-recognized audio after channel compensation can be obtained, thereby realizing the recognition of the same speaker in multiple channels. For example, the audio of the user in the telephone channel and the mobile wireless channel can be collected. Specifically, the audio of the telephone communication between the passenger and the driver obtained through the operator is the audio of the telephone channel. The obtained passenger and driver through the taxi application The transmitted audio is the audio of the mobile wireless channel. At the same time, a large amount of statistical data can be collected for the telephone channel and the mobile wireless channel in advance to establish a channel model, and then the channel model is used to establish a mapping for the telephone channel and the mobile wireless channel, so that The audio to be identified can be obtained by mapping the audio of the telephone channel and the mobile wireless channel to the same standard feature space.
步骤S102,确定表征待识别音频的声学特征的特征向量。Step S102, determining a feature vector representing the acoustic feature of the audio to be recognized.
在本申请一个实施例中,声学特征包括梅尔频率倒谱系数MFCC特征、瓶颈特征BNF特征中的至少一项,表征待识别音频的声学特征的特征向量可以是MFCC向量、BNF向量中的至少一项。In an embodiment of the present application, the acoustic feature includes at least one of the Mel-frequency cepstral coefficient MFCC feature and the bottleneck feature BNF feature, and the feature vector representing the acoustic feature of the audio to be identified may be at least one of the MFCC vector and the BNF vector. one.
作为一种实施方式,得到待识别音频的MFCC向量的方法可以包括以下步骤:首先,将待识别音频通过一个高通滤波器进行预加重,得到预加重后的音频信号,从而提升待识别音频中的高频部分使其频谱变的平坦;第二,对预加重后的待识别音频进行分帧和加窗,具体来说,先将待识别音频分为多帧,每一帧包含待识别音频中的N个采样点,且相邻两帧之间有一段重叠区域,再将每一帧代入窗函数,窗外的值设定为0;第三,对分帧加窗后的待识别音频进行快速傅里叶变换得到待识别音频的频谱;第四,采用梅尔滤波器对经快速傅里叶变换后的待识别音频进行滤波;第五,对经梅尔滤波器滤波后的待识别音频进行对数能量参数运算;第六,对经对数能量参数运算后的待识别音频进行离散余弦变换,得到待识别音频的MFCC向量。As an embodiment, the method for obtaining the MFCC vector of the audio to be recognized may include the following steps: firstly, pre-emphasizing the audio to be recognized through a high-pass filter to obtain a pre-emphasized audio signal, thereby improving the audio frequency to be recognized. The high-frequency part makes its spectrum flat; second, the pre-emphasized audio to be recognized is divided into frames and windows. Specifically, the audio to be recognized is first divided into multiple frames, and each frame contains the audio to be recognized. There are N sampling points of , and there is an overlapping area between two adjacent frames, and then each frame is substituted into the window function, and the value outside the window is set to 0; Fourier transform to obtain the frequency spectrum of the audio to be recognized; fourthly, use the Mel filter to filter the audio to be recognized after the fast Fourier transform; fifthly, to filter the audio to be recognized after filtering by the Mel filter; Logarithmic energy parameter operation; sixth, perform discrete cosine transform on the to-be-identified audio after the logarithmic energy parameter operation, to obtain the MFCC vector of the to-be-identified audio.
此外,得到待识别音频的BNF向量的方法与上述得到待识别音频的MFCC向量的方法类似,在此不再赘述。In addition, the method for obtaining the BNF vector of the audio to be recognized is similar to the above-mentioned method for obtaining the MFCC vector of the audio to be recognized, and will not be repeated here.
步骤S103,基于特征向量以及预先训练的信息识别模型,得到待识别音频对应的服务标签。Step S103, based on the feature vector and the pre-trained information identification model, obtain a service tag corresponding to the audio to be identified.
在本申请一个实施例中,确定出表征待识别音频的声学特征的特征向量之后,可以将该特征向量输入预先训练的信息识别模型,利用信息识别模型得到待识别音频对应的服务标签,服务标签可以包括年龄识别结果、性别识别结果、身份验证结果中的至少一项,也就是说,可以基于用户的待识别音频的特征向量,利用信息识别模型识别出用户的年龄、性别,并能对用户的实际身份进行验证,下面对得到待识别音频对应的服务标签的过程进行详细介绍。In an embodiment of the present application, after determining the feature vector representing the acoustic features of the audio to be recognized, the feature vector can be input into a pre-trained information recognition model, and the information recognition model can be used to obtain a service tag corresponding to the audio to be recognized. It can include at least one of age recognition results, gender recognition results, and identity verification results, that is, based on the feature vector of the user's audio to be recognized, the information recognition model can be used to recognize the user's age and gender, and can identify the user's age and gender. The actual identity of the audio is verified, and the process of obtaining the service tag corresponding to the audio to be recognized is described in detail below.
在一些实施例中,当服务标签包括年龄识别结果时,请参照图3,基于特征向量及信息识别模型得到年龄识别结果的过程可以包括子步骤S1031~S1032,下面进行详细描述:In some embodiments, when the service tag includes the age recognition result, please refer to FIG. 3 , the process of obtaining the age recognition result based on the feature vector and the information recognition model may include sub-steps S1031 to S1032, which are described in detail below:
子步骤S1031,将特征向量输入信息识别模型,得到特征向量在各个预设年龄区间的概率值。Sub-step S1031, the feature vector is input into the information identification model, and the probability value of the feature vector in each preset age interval is obtained.
在一个实施例中,当服务标签包括年龄识别结果时,可以利用信息识别模型对待识别音频进行年龄识别,此时,信息识别模型可以包括高斯混合模型(Gaussian MixedModel,GMM)、神经网络模型等,或其组合;神经网络模型可以包括BP(Back PropagationNeural Network,反向传播)神经网络、卷积神经网络(Convolutional Neural Networks,CNN)、循环神经网络(Recurrent Neural Network,RNN)、长短时记忆网络(Long ShortTerm Memory Network,LSTM)等。In one embodiment, when the service tag includes an age recognition result, an information recognition model may be used to perform age recognition on the audio to be recognized. In this case, the information recognition model may include a Gaussian Mixed Model (GMM), a neural network model, and the like, or a combination thereof; the neural network model may include BP (Back Propagation Neural Network, back propagation) neural network, convolutional neural network (Convolutional Neural Networks, CNN), recurrent neural network (Recurrent Neural Network, RNN), long and short-term memory network ( Long ShortTerm Memory Network, LSTM) and so on.
作为一种实施方式,信息识别模型可以包括第一网络及第二网络,第一网络用于对特征向量进行特征提取,第二网络用于输出特征向量在各个预设年龄区间的概率值,预设年龄类别可以由用户根据实际情况灵活设置,例如,预设10-20岁、20-30岁、30-40岁这3个年龄区间,在此不作限定。此时,将特征向量输入信息识别模型,得到特征向量在各个预设年龄区间的概率值的过程,具体包括:As an implementation manner, the information identification model may include a first network and a second network, the first network is used for feature extraction of the feature vector, the second network is used for outputting the probability value of the feature vector in each preset age interval, and the prediction It is assumed that the age category can be flexibly set by the user according to the actual situation, for example, three age ranges of 10-20 years old, 20-30 years old, and 30-40 years old are preset, which are not limited here. At this time, the process of inputting the feature vector into the information recognition model to obtain the probability value of the feature vector in each preset age range specifically includes:
第一步,将特征向量输入信息识别模型,利用第一网络对所特征向量进行特征提取,得到第一特征图。The first step is to input the feature vector into the information recognition model, and use the first network to perform feature extraction on the feature vector to obtain a first feature map.
在一个实施例中,将特征向量输入信息识别模型之后,信息识别模型的第一网络可以对特征向量进行特征提取,第一网络可以包括卷积层(Convolutional layer)、卷积层+池化层(Pooling layer)、卷积层+池化层+随机丢失层(Dropout layer)、全连接层(FullyConnected layer,FC layer)等等,第一网络的具体设置可以由用户根据实际情况灵活调整,在此不再赘述。In one embodiment, after the feature vector is input into the information recognition model, the first network of the information recognition model may perform feature extraction on the feature vector, and the first network may include a convolutional layer, a convolutional layer+pooling layer (Pooling layer), convolution layer + pooling layer + random drop layer (Dropout layer), fully connected layer (Fully Connected layer, FC layer), etc. The specific settings of the first network can be flexibly adjusted by the user according to the actual situation. This will not be repeated here.
作为一种实施方式,第一网络可以包括卷积层及统计池化层,此时,将特征向量输入信息识别模型,利用第一网络对特征向量进行特征提取,得到第一特征图的过程,可以包括:As an embodiment, the first network may include a convolution layer and a statistical pooling layer. In this case, the feature vector is input into the information recognition model, and the first network is used to perform feature extraction on the feature vector to obtain the first feature map. Can include:
首先,将特征向量输入信息识别模型,利用卷积层的卷积核和卷积核偏置对特征向量进行卷积处理,得到第一输出特征图。此处的卷积层可以为多层,且对卷积层的具体层数及卷积核大小不做限定,可以利用卷积层的卷积核对输入的特征向量进行卷积求和操作、外加偏置,再将结果经过激励函数输出,就能得到第一输出特征图。First, the feature vector is input into the information recognition model, and the feature vector is convolved with the convolution kernel and convolution kernel offset of the convolution layer to obtain the first output feature map. The convolution layer here can be multi-layered, and the specific number of layers of the convolution layer and the size of the convolution kernel are not limited. The convolution kernel of the convolution layer can be used to perform the convolution sum operation on the input feature vector and add offset, and then output the result through the excitation function to obtain the first output feature map.
然后,利用统计池化层对第一输出特征图进行统计池化处理,得到第一特征图,统计池化层能够捕捉待识别音频所有帧间的统计信息,有利于对全局信息进行建模。此处的统计池化层可以为多层,且对统计池化层的具体层数及接受域大小不做限定,可以利用统计池化层统计上一层各节点输出的均值和方差并输出至下一层,得到第一特征图。Then, the statistical pooling layer is used to perform statistical pooling processing on the first output feature map to obtain the first feature map. The statistical pooling layer can capture the statistical information between all frames of the audio to be recognized, which is conducive to modeling global information. The statistical pooling layer here can be multi-layered, and the specific number of layers and the size of the receptive field of the statistical pooling layer are not limited. In the next layer, the first feature map is obtained.
第二步,将从第一网络提取到的第一特征图输入第二网络并进行分类,得到特征向量在各个预设年龄区间的概率值。In the second step, the first feature map extracted from the first network is input into the second network and classified to obtain the probability value of the feature vector in each preset age interval.
在一个实施例中,利用第一网络提取到第一特征图之后,将第一特征图输入第二网络并进行分类,并输出特征向量在各个预设年龄区间的概率值,第二网络可以包括全连接层+多类逻辑回归层(Softmax layer)、1*1卷积层+多类逻辑回归层、全局池化层(GlobalAverage Pooling,GAP)+多类逻辑回归层等等,第二网络的具体设置可以由用户根据实际情况灵活调整,在此不再赘述。In one embodiment, after the first feature map is extracted by the first network, the first feature map is input into the second network for classification, and the probability value of the feature vector in each preset age interval is output. The second network may include Fully connected layer + multi-class logistic regression layer (Softmax layer), 1*1 convolution layer + multi-class logistic regression layer, global pooling layer (GlobalAverage Pooling, GAP) + multi-class logistic regression layer, etc., the second network The specific settings can be flexibly adjusted by the user according to the actual situation, which will not be repeated here.
作为一种实施方式,第二网络可以包括全连接层及多类逻辑回归层,此时,将从第一网络提取到的第一特征图输入第二网络并进行分类,得到特征向量在各个预设年龄区间的概率值的过程,可以包括:As an embodiment, the second network may include a fully connected layer and a multi-class logistic regression layer. In this case, the first feature map extracted from the first network is input into the second network and classified, and the feature vector is obtained in each prediction. The process of setting the probability value of the age interval can include:
首先,将从第一网络提取到的第一特征图输入全连接层,利用全连接层对第一特征图进行降维,得到第一向量。第一向量为一维向量,此处的全连接层可以为多层,且对全连接层的具体层数不做限定,可以利用全连接层将第一特征图拉平为一维向量。First, the first feature map extracted from the first network is input to the fully connected layer, and the fully connected layer is used to reduce the dimension of the first feature map to obtain the first vector. The first vector is a one-dimensional vector, and the fully connected layer here may be multiple layers, and the specific number of layers of the fully connected layer is not limited. The fully connected layer may be used to flatten the first feature map into a one-dimensional vector.
然后,将第一向量输入多类逻辑回归层,得到特征向量在各个预设年龄区间的概率值。多类逻辑回归层为信息识别模型的最终输出层,可输出0~1之间数值,代表着特征向量属于每个预设年龄区间的概率值,例如,输出10-20岁概率为0、20-30岁概率为0.6、30-40岁概率为0.4。假设特征向量为R,则可由下式得到特征矢量属于第j个预设年龄类别的概率:Then, the first vector is input into the multi-class logistic regression layer to obtain the probability value of the feature vector in each preset age interval. The multi-class logistic regression layer is the final output layer of the information recognition model, which can output values between 0 and 1, representing the probability value of the feature vector belonging to each preset age interval. For example, the output probability of 10-20 years old is 0, 20 The probability of -30 years old is 0.6, and the probability of 30-40 years old is 0.4. Assuming that the feature vector is R, the probability that the feature vector belongs to the jth preset age category can be obtained by the following formula:
其中,P(y(i))为特征矢量属于第j个预设年龄类别的概率,ω为权重,b为偏差。 Among them, P(y (i) ) is the probability that the feature vector belongs to the jth preset age category, ω is the weight, and b is the bias.
在一个实施例中,在利用信息识别模型识别出用户的年龄之前,需要对信息识别模型进行训练,信息识别模型的训练过程,可以包括以下步骤:In one embodiment, before using the information identification model to identify the age of the user, the information identification model needs to be trained, and the training process of the information identification model may include the following steps:
首先,获取多个第一音频样本、以及每个第一音频样本的年龄信息,此处的年龄信息为具体的年龄,例如,27.3岁等。多个第一音频样本包括了信息识别模型所能够识别的所有年龄区间的音频样本,例如,如果信息识别模型用于识别10-20岁、20-30岁、30-40岁等多个年龄区间,则多个第一音频样本包括了10-20岁、20-30岁、30-40岁等多个年龄区间分别对应的音频样本。First, obtain a plurality of first audio samples and age information of each first audio sample, where the age information is a specific age, for example, 27.3 years old. The plurality of first audio samples include audio samples of all age ranges that the information recognition model can recognize. For example, if the information recognition model is used to recognize multiple age ranges such as 10-20 years old, 20-30 years old, 30-40 years old , the multiple first audio samples include audio samples corresponding to multiple age ranges, such as 10-20 years old, 20-30 years old, and 30-40 years old.
然后,依据每个第一音频样本的年龄标注信息和各个预设年龄区间,得到每个第一音频样本的标签。预设年龄区间可以包括信息识别模型所能够识别的所有年龄区间,例如,信息识别模型能够识别10-20岁、20-30岁、30-40岁这3个年龄区间,则预设年龄区间包括10-20岁、20-30岁、30-40岁。Then, according to the age labeling information of each first audio sample and each preset age interval, the label of each first audio sample is obtained. The preset age interval may include all age intervals that the information recognition model can recognize. For example, if the information recognition model can recognize 3 age intervals of 10-20 years old, 20-30 years old, and 30-40 years old, the preset age interval includes 10-20 years old, 20-30 years old, 30-40 years old.
第一音频样本的标签是第一音频样本所属年龄区间的相关信息,也就是,第一音频样本的标签就是其所属的实际年龄区间,例如,年龄标注信息为21岁、22岁、23岁的3个第一音频样本,它们的标签均为20-30岁。在一种可选实施例中,每个第一音频样本的标签可以采用人工标注的方式获得。在另一种可选实施例中,每个第一音频样本可以通过采集特定年龄区间的用户的音频获得,并根据用户的实际年龄,获得每个第一音频样本的标签。The label of the first audio sample is related information of the age range to which the first audio sample belongs, that is, the label of the first audio sample is the actual age range to which it belongs, for example, the age label information is 21 years old, 22 years old, 23 years old. 3 first audio samples, all labeled 20-30 years old. In an optional embodiment, the label of each first audio sample may be obtained by manual labeling. In another optional embodiment, each first audio sample may be obtained by collecting audio of users in a specific age range, and a label of each first audio sample may be obtained according to the actual age of the user.
接下来,针对获取的每个第一音频样本,确定表征该第一音频样本的声学特征的第一样本特征向量,第一样本特征向量的获取方式,与步骤S102中待识别音频的特征向量的获取方式类似,在此不再赘述;Next, for each acquired first audio sample, determine the first sample feature vector representing the acoustic feature of the first audio sample, the acquisition method of the first sample feature vector, and the feature of the audio to be identified in step S102 The acquisition method of the vector is similar, and will not be repeated here;
最后,基于第一样本特征向量及第一样本特征向量对应的标签,对信息识别模型进行训练。具体来说,是将第一样本特征向量输入信息识别模型,利用信息识别模型对第一样本特征向量进行识别,并输出第一样本特征向量在各个预设年龄区间的概率值,并将最大概率值对应的预设年龄区间作为第一样本特征向量的预测年龄区间,由于信息识别模型识别的第一样本特征向量的预测年龄区间应当与第一样本特征向量对应的标签一致,因此,如果预测年龄区间与对应的标签不一致,则对信息识别模型进行参数调整,并将第一样本特征向量输入参数调整后的信息识别模型,重复上述过程,直至满足预设的模型训练截止条件,完后对信息识别模型的训练。Finally, the information recognition model is trained based on the first sample feature vector and the label corresponding to the first sample feature vector. Specifically, the first sample feature vector is input into the information identification model, the information identification model is used to identify the first sample feature vector, and the probability value of the first sample feature vector in each preset age interval is output, and The preset age interval corresponding to the maximum probability value is used as the predicted age interval of the first sample eigenvector, since the predicted age interval of the first sample eigenvector identified by the information recognition model should be consistent with the label corresponding to the first sample eigenvector , therefore, if the predicted age interval is inconsistent with the corresponding label, adjust the parameters of the information recognition model, input the first sample feature vector into the information recognition model after parameter adjustment, and repeat the above process until the preset model training is satisfied The cut-off condition, after the training of the information recognition model.
预设的模型训练截止条件可以包括以下两种情形:第一,训练次数达到预设次数(例如,200次);第二,使用测试样本集对训练后的信息识别模型进行测试,如果信息识别模型的识别准确率达到预设阈值(例如,90%)。在这两种情形下,均将最后一次参数调整后的信息识别模型作为训练后的信息识别模型。The preset model training cut-off conditions may include the following two situations: first, the number of training times reaches a preset number of times (for example, 200 times); second, the trained information recognition model is tested by using the test sample set. The recognition accuracy of the model reaches a preset threshold (eg, 90%). In both cases, the information recognition model after the last parameter adjustment is used as the trained information recognition model.
子步骤S1032,依据特征向量在各个预设年龄区间的概率值,确定出所待识别音频对应的年龄识别结果。Sub-step S1032, according to the probability value of the feature vector in each preset age interval, determine the age recognition result corresponding to the audio to be recognized.
在一个实施例中,按照子步骤S1031介绍的而方法得到特征向量在各个预设年龄区间的概率值之后,可以对各个预设年龄区间的概率值进行插值计算,从而得到待识别音频对应的年龄识别结果。另外,还可以将特征向量在各个预设年龄区间的概率值进行加权平均,得到待识别音频对应的年龄识别结果,为了便于计算,可以取各个预设年龄区间的中间值并计算加权平均,例如,信息识别模型输出10-20岁概率为0、20-30岁概率为0.6、30-40岁概率为0.4,则年龄识别结果为25+(35-25)*0.4=29岁。In one embodiment, after obtaining the probability value of the feature vector in each preset age interval according to the method introduced in sub-step S1031, the probability value of each preset age interval can be interpolated to obtain the age corresponding to the audio to be recognized. Identify the results. In addition, the probability values of the feature vectors in each preset age interval can also be weighted and averaged to obtain the age recognition result corresponding to the audio to be recognized. In order to facilitate the calculation, the middle value of each preset age interval can be taken and the weighted average can be calculated. , the information recognition model outputs that the probability of 10-20 years old is 0, the probability of 20-30 years old is 0.6, and the probability of 30-40 years old is 0.4, then the age recognition result is 25+(35-25)*0.4=29 years old.
在另一些实施例中,当服务标签包括性别识别结果时,请参照图4,基于特征向量及信息识别模型得到性别识别结果的过程可以包括子步骤S1033~S1034,下面进行详细描述:In other embodiments, when the service tag includes a gender identification result, please refer to FIG. 4 , the process of obtaining the gender identification result based on the feature vector and the information identification model may include sub-steps S1033 to S1034, which are described in detail below:
子步骤S1033,将特征向量输入信息识别模型,得到特征向量在各个预设性别类别的概率值。Sub-step S1033, the feature vector is input into the information identification model, and the probability value of the feature vector in each preset gender category is obtained.
在一个实施例中,当服务标签包括性别识别结果时,可以利用信息识别模型对待识别音频进行性别识别,此时,信息识别模型可以包括GMM、神经网络模型等,或其组合;神经网络模型可以包括BP神经网络、CNN、RNN、LSTM等。In one embodiment, when the service tag includes a gender recognition result, an information recognition model can be used to perform gender recognition on the audio to be recognized. In this case, the information recognition model can include GMM, a neural network model, etc., or a combination thereof; the neural network model can Including BP neural network, CNN, RNN, LSTM, etc.
作为一种实施方式,信息识别模型包括第三网络及第四网络,第三网络用于对特征向量进行特征提取并输出特征向量在各个预设性别类别的概率值,本实施例可以预设男、女、不确定这3个性别类别,这里的不确定指的是由于噪声干扰造成的识别错误,此时,将特征向量输入信息识别模型,得到特征向量在各个预设性别类别的概率值的过程,可以包括:As an implementation manner, the information recognition model includes a third network and a fourth network, and the third network is used to extract the feature vector and output the probability value of the feature vector in each preset gender category. In this embodiment, male , female, and uncertain are the three gender categories. The uncertainty here refers to the recognition error caused by noise interference. At this time, the feature vector is input into the information recognition model to obtain the probability value of the feature vector in each preset gender category. process, which can include:
第一步,将特征向量输入信息识别模型,利用第三网络对特征向量进行特征提取,得到第二特征图。The first step is to input the feature vector into the information recognition model, and use the third network to perform feature extraction on the feature vector to obtain a second feature map.
在一个实施例中,将特征向量输入信息识别模型之后,信息识别模型的第一网络可以对特征向量进行特征提取,第三网络可以包括卷积层、卷积层+池化层、卷积层+池化层+随机丢失层、全连接层等等,第三网络的具体设置可以由用户根据实际情况灵活调整,在此不再赘述。In one embodiment, after the feature vector is input into the information recognition model, the first network of the information recognition model can perform feature extraction on the feature vector, and the third network can include a convolution layer, a convolution layer+pooling layer, a convolution layer + pooling layer + random loss layer, fully connected layer, etc. The specific settings of the third network can be flexibly adjusted by the user according to the actual situation, and will not be repeated here.
作为一种实施方式,第三网络包括卷积层及长短时记忆层,此时,将特征向量输入信息识别模型,利用第三网络对特征向量进行特征提取,得到第二特征图的过程,可以包括:As an embodiment, the third network includes a convolution layer and a long-term and short-term memory layer. At this time, the feature vector is input into the information recognition model, and the third network is used to extract the feature vector to obtain the second feature map. Process, you can include:
首先,将特征向量输入信息识别模型,利用卷积层的卷积核和卷积核偏置对特征向量进行卷积处理,得到第二输出特征图。此处的卷积层可以为多层,且对卷积层的具体层数及卷积核大小不做限定。First, the feature vector is input into the information recognition model, and the feature vector is convolved with the convolution kernel and the convolution kernel offset of the convolution layer to obtain the second output feature map. The convolution layer here can be multi-layered, and the specific number of layers and the size of the convolution kernel of the convolution layer are not limited.
然后,利用长短时记忆层捕捉第二输出特征图的序列信息,得到第二特征图,相比于传统模型的局部特征,长短时记忆层捕捉的序列信息可以反映一段时间内待识别音频和性别的关联。Then, the long and short-term memory layer is used to capture the sequence information of the second output feature map, and the second feature map is obtained. Compared with the local features of the traditional model, the sequence information captured by the long and short-term memory layer can reflect the audio and gender to be recognized for a period of time. association.
第二步,利用第四网络对第三网络输出的第二特征图进行分类,得到特征向量在各个预设性别类别的概率值。In the second step, the fourth network is used to classify the second feature map output by the third network, and the probability value of the feature vector in each preset gender category is obtained.
在一个实施例中,利用第三网络提取到第二特征图之后,将第二特征图输出第四网络进行分类,并输出特征向量在各个预设性别类别的概率值,第四网络可以包括全连接层+多类逻辑回归层、1*1卷积层+多类逻辑回归层、全局池化层+多类逻辑回归层等等,第二网络的具体设置可以由用户根据实际情况灵活调整,在此不再赘述。In one embodiment, after the second feature map is extracted by the third network, the second feature map is output to the fourth network for classification, and the probability value of the feature vector in each preset gender category is output. The fourth network may include all Connection layer + multi-class logistic regression layer, 1*1 convolution layer + multi-class logistic regression layer, global pooling layer + multi-class logistic regression layer, etc. The specific settings of the second network can be flexibly adjusted by the user according to the actual situation. It is not repeated here.
作为一种实施方式,第四网络包括全连接层及多类逻辑回归层,此时,利用第四网络对第三网络输出的第二特征图进行分类,得到特征向量在各个预设性别类别的概率值的步骤,包括:As an embodiment, the fourth network includes a fully connected layer and a multi-class logistic regression layer. At this time, the fourth network is used to classify the second feature map output by the third network, and the feature vectors in each preset gender category are obtained. Probability value steps, including:
首先,将第三网络输出的第二特征图输入全连接层,利用全连接层对第二特征图进行降维,得到第二向量,第二向量为一维向量,此处的全连接层可以为多层,且对全连接层的具体层数不做限定,可以利用全连接层将第二特征图拉平为一维向量。First, input the second feature map output by the third network into the fully connected layer, and use the fully connected layer to reduce the dimension of the second feature map to obtain a second vector. The second vector is a one-dimensional vector, and the fully connected layer here can be It is a multi-layer, and the specific number of the fully connected layer is not limited. The fully connected layer can be used to flatten the second feature map into a one-dimensional vector.
然后,利用多类逻辑回归层对第二向量进行处理,得到特征向量在各个预设性别类别的概率值,例如,输出男概率为0.1、女概率为0.1、不确定概率为0.8。Then, the multi-class logistic regression layer is used to process the second vector to obtain the probability value of the feature vector in each preset gender category, for example, the output probability of male is 0.1, the probability of female is 0.1, and the probability of uncertainty is 0.8.
在一个实施例中,在利用信息识别模型识别出特征向量在各个预设性别类别的概率值之前,需要对信息识别模型进行训练,具体的训练过程可以包括以下步骤:In one embodiment, before using the information identification model to identify the probability value of the feature vector in each preset gender category, the information identification model needs to be trained, and the specific training process may include the following steps:
首先,获取多个第二音频样本、以及每个第二音频样本的性别标注信息,多个第二音频样本包括了男性和女性两种类别的音频样本,性别标注信息包括男和女。在一种可选实施例中,每个第二音频样本的性别标注信息可以采用人工标注的方式获得。在另一种可选实施例中,每个第二音频样本可以通过采集特定性别的用户的音频获得。First, acquire a plurality of second audio samples and gender labeling information of each second audio sample, where the plurality of second audio samples include male and female audio samples, and the gender labeling information includes male and female. In an optional embodiment, the gender labeling information of each second audio sample may be obtained by manual labeling. In another optional embodiment, each second audio sample may be obtained by collecting audio of a user of a specific gender.
然后,针对获取的每个第二音频样本,确定表征该第二音频样本的声学特征的第二样本特征向量,第二样本特征向量的获取方式,与步骤S102中待识别音频的特征向量的获取方式类似,在此不再赘述;Then, for each acquired second audio sample, determine the second sample feature vector representing the acoustic feature of the second audio sample, the acquisition method of the second sample feature vector, and the acquisition of the feature vector of the audio to be identified in step S102 The method is similar and will not be repeated here;
最后,基于第二样本特征向量及第二样本特征向量的性别标注信息,对信息识别模型进行训练。具体来说,是将第二样本特征向量输入信息识别模型,利用信息识别模型对第二样本特征向量进行识别,并输出第二样本特征向量在各个预设性别类别的概率值,并将最大概率值对应的性别类别作为第二样本特征向量的预测性别,由于第二样本特征向量的预测性别应当与第二样本特征向量的性别标注信息一致,因此,如果预测性别与性别标注信息不一致,则对信息识别模型进行参数调整,并将第二样本特征向量输入参数调整后的信息识别模型,重复上述过程,直至满足预设的模型训练截止条件,完后对信息识别模型的训练。Finally, the information recognition model is trained based on the second sample feature vector and the gender labeling information of the second sample feature vector. Specifically, the second sample feature vector is input into the information identification model, the second sample feature vector is identified by the information identification model, and the probability value of the second sample feature vector in each preset gender category is output, and the maximum probability The gender category corresponding to the value is used as the predicted gender of the second sample feature vector. Since the predicted gender of the second sample feature vector should be consistent with the gender annotation information of the second sample feature vector, if the predicted gender is inconsistent with the gender annotation information, the The information recognition model adjusts parameters, and inputs the second sample feature vector into the parameter-adjusted information recognition model, and repeats the above process until the preset model training cutoff condition is met, and then the information recognition model is trained.
另外,在实际应用中,还可以通过获取用户的实际性别来优化信息识别模型,例如,向用户进行调查或者对用户进行实名认证的以获知用户的性别,以此来纠正信息识别模型的识别错误。In addition, in practical applications, the information recognition model can also be optimized by obtaining the actual gender of the user, for example, by conducting a survey of the user or performing real-name authentication on the user to know the user's gender, so as to correct the recognition error of the information recognition model. .
子步骤S1034,利用预先训练的隐马尔科夫模型对特征向量在各个预设性别类别的概率值进行约束,得到待识别音频对应的性别识别结果。Sub-step S1034, using the pre-trained hidden Markov model to constrain the probability value of the feature vector in each preset gender category, to obtain a gender recognition result corresponding to the audio to be recognized.
在一个实施例中,按照子步骤S1033介绍的而方法得到特征向量在各个预设性别类型的概率值之后,将最大概率值对应的预设性别类别作为特征向量的预测性别,并且利用预先训练的隐马尔科夫模型(Hidden Markov Model,HMM)在时间域上加入约束,限制预设性别类别的跳变,也就是,约束预测性别在男、女、不确定这3个类别之间的跳转,具体来说,限制类别的跳动可以通过设置HMM隐状态跳转概率来实现,即,提升状态自跳的概率且降低跳转至其它状态的概率,例如,信息识别模型输出男概率为0.1、女概率为0.1、不确定概率为0.8,则设置男、女、不确定这3个类别自跳的概率依次为0.9、0.8、0.6,男、女、不确定这3个类别跳转至其它状态的概率依次为0.1、0.2、0.4。In one embodiment, after obtaining the probability value of each preset gender type of the feature vector according to the method introduced in sub-step S1033, the preset gender category corresponding to the maximum probability value is used as the predicted gender of the feature vector, and the pre-trained gender is used. Hidden Markov Model (HMM) adds constraints in the time domain to limit the jump of preset gender categories, that is, the constraints predict the jump between the three categories of male, female, and uncertain. , specifically, limiting the jumping of the category can be achieved by setting the HMM hidden state jump probability, that is, increasing the probability of state self-jumping and reducing the probability of jumping to other states, for example, the information recognition model outputs a probability of 0.1, If the probability of female is 0.1, and the probability of uncertainty is 0.8, then set the probability of self-jumping for the three categories of male, female, and uncertainty as 0.9, 0.8, and 0.6, and jump to other states for the three categories of male, female, and uncertainty. The probability of is 0.1, 0.2, 0.4.
在另一些实施例中,当服务标签为身份验证结果时,请参照图5,基于特征向量及信息识别模型得到身份验证结果的过程可以包括子步骤S1035~S1038,下面进行详细描述:In other embodiments, when the service tag is the identity verification result, please refer to FIG. 5 , the process of obtaining the identity verification result based on the feature vector and the information recognition model may include sub-steps S1035 to S1038, which are described in detail below:
子步骤S1035,获取待识别音频对应的标准音频。Sub-step S1035, obtaining the standard audio corresponding to the audio to be recognized.
在一个实施例中,信息识别系统100的数据库140中预先存储有每个待进行身份验证的用户的标准音频,在获取到待进行身份验证的目标用户的待识别音频后,根据目标用户的待验证身份,从数据库140中获取该待验证身份对应的标准音频。In one embodiment, the
根据用户实际身份的不同,数据库140中存储的标准音频可以有不同的获取方式,例如,对于网约车场景中的司机和乘客,可以在司机或者乘客注册时获取音频,并将注册音频作为司机或者乘客的标准音频,另外,标准音频可以包括司机或者乘客在不同信道(例如,电话信道、移动无线信道等)的音频,同时,可以预先建立司机声纹库和乘客声纹库分别用于存储司机和乘客的标准音频。According to the actual identity of the user, the standard audio stored in the
子步骤S1036,确定表征标准音频的声学特征的特征向量。Sub-step S1036, determining a feature vector representing the acoustic feature of the standard audio.
标准音频的特征向量的获取方式,与步骤S102中待识别音频的特征向量的获取方式类似,在此不再赘述。The acquisition method of the feature vector of the standard audio is similar to the acquisition method of the feature vector of the audio to be recognized in step S102, and details are not repeated here.
子步骤S1037,将待识别音频对应的特征向量和标准音频对应的特征向量均输入信息识别模型,得到待识别音频和标准音频的相似度分值。Sub-step S1037, input the feature vector corresponding to the audio to be recognized and the feature vector corresponding to the standard audio into the information recognition model to obtain the similarity score between the audio to be recognized and the standard audio.
在一个实施例中,当服务标签包括身份验证结果时,可以利用信息识别模型对待识别音频进行身份验证,此时,信息识别模型可以包括GMM、概率线性判别分析包括概率线性判别分析(Probabilistic Linear Discriminant Analysis,PLDA)模型、或者神经网络模型,利用GMM、PLDA模型、或者神经网络模型计算待识别音频和标准音频的相似度分值时,对比的是待识别音频对应的特征向量和标准音频对应的特征向量的权重平均值,而非单个的特征向量,可以使得对比的结果更加准确。In one embodiment, when the service tag includes an authentication result, an information identification model may be used to authenticate the audio to be identified. In this case, the information identification model may include GMM, and probabilistic linear discriminant analysis may include probabilistic linear discriminant analysis (Probabilistic Linear Discriminant Analysis). Analysis, PLDA) model, or neural network model, when using GMM, PLDA model, or neural network model to calculate the similarity score between the audio to be recognized and the standard audio, what is compared is the feature vector corresponding to the audio to be recognized and the corresponding standard audio The weighted average of the eigenvectors, rather than the individual eigenvectors, can make the comparison results more accurate.
相似度分值包括欧氏距离、马氏距离、夹角余弦、汉明距离、相关系数、相关距离、信息熵中的任意一种。The similarity score includes any one of Euclidean distance, Mahalanobis distance, included angle cosine, Hamming distance, correlation coefficient, correlation distance, and information entropy.
子步骤S1038,根据相似度分值确定出待识别音频对应的身份验证结果,身份验证结果包括验证通过结果或验证失败结果。In sub-step S1038, an identity verification result corresponding to the audio to be recognized is determined according to the similarity score, and the identity verification result includes a verification pass result or a verification failure result.
在一个实施例中,可以根据相似度分值与预设的相似度阈值之间的大小关系来确定待识别音频对应的身份验证结果,具体来说,将相似度分值与预设的相似度阈值(例如,0.8)进行比对,当相似度分值大于或等于预设的相似度阈值(例如,0.8)时,确定出待识别音频对应的身份验证结果为验证通过;当相似度分值小于预设的相似度阈值(例如,0.8)时,确定出待识别音频对应的身份验证结果为验证失败。In one embodiment, the identity verification result corresponding to the audio to be recognized may be determined according to the magnitude relationship between the similarity score and the preset similarity threshold. Specifically, the similarity score and the preset similarity are determined. The threshold (for example, 0.8) is compared, and when the similarity score is greater than or equal to the preset similarity threshold (for example, 0.8), it is determined that the identity verification result corresponding to the audio to be recognized is verified as passed; when the similarity score is When it is less than a preset similarity threshold (for example, 0.8), it is determined that the authentication result corresponding to the audio to be recognized is an authentication failure.
结合到网约车场景,当需要对司机的身份进行验证时,通过获取司机的待识别音频及标准音频,并对比待识别音频的特征向量与标准音频的特征向量来验证司机的身份,即,验证实际使用网约车的司机与注册的司机是否为同一个人;当需要对司机的身份进行验证时,通过获取乘客的待识别音频及乘客声纹库中的标准音频,将待识别音频的特征向量与标准音频的特征向量进行对比来确认乘客的身份,即,确认实际使用网约车的乘客的身份。Combined with the online car-hailing scenario, when the driver's identity needs to be verified, the driver's identity is verified by obtaining the driver's to-be-recognized audio and standard audio, and comparing the feature vector of the to-be-recognized audio and the standard audio feature vector, that is, Verify whether the driver who actually uses the car-hailing is the same person as the registered driver; when the driver's identity needs to be verified, the characteristics of the audio to be recognized are obtained by obtaining the audio to be recognized of the passenger and the standard audio in the passenger voiceprint database. The vector is compared with the feature vector of standard audio to confirm the identity of the passenger, that is, to confirm the identity of the passenger who actually uses the car-hailing.
需要指出的是,本申请利用信息识别模型得到待识别音频对应的服务标签,该服务标签可以是年龄识别结果、性别识别结果、身份验证结果中的一项、也可以是这三项的任意组合,也就是,可以基于用户的待识别音频的特征向量,利用信息识别模型对用户进行年龄识别、性别识别、身份验证中的至少一项,实际应用中可以根据需求灵活调整信息识别模型的结构,来得到最终的服务标签。例如,应用到网约车场景,可以同时对司机进行年龄识别、性别识别和身份验证,将得到的年龄识别结果、性别识别结果、身份验证结果与该司机的实名注册信息进行对比,以此来确定实际使用网约车的司机和注册的实际是否为同一个人,从而得到更为准确的结果。It should be pointed out that this application uses the information recognition model to obtain the service tag corresponding to the audio to be recognized, and the service tag can be one of the age recognition result, gender recognition result, and identity verification result, or any combination of these three items. , that is, based on the feature vector of the user's audio to be recognized, the information recognition model can be used to perform at least one of age recognition, gender recognition, and identity verification on the user. In practical applications, the structure of the information recognition model can be flexibly adjusted according to requirements. to get the final service tag. For example, when applied to the online car-hailing scenario, age recognition, gender recognition and identity verification can be performed on the driver at the same time, and the obtained age recognition results, gender recognition results, and identity verification results can be compared with the real-name registration information of the driver. Determine whether the driver who actually uses the car-hailing and the registered driver are actually the same person, so as to get more accurate results.
步骤S104,依据服务标签确定出与服务标签相匹配的服务信息。In step S104, service information matching the service tag is determined according to the service tag.
在一个实施例中,按照步骤S104介绍的方法得到待识别音频对应的服务标签之后,也就是,利用信息识别模型识别出用户的年龄、性别,并能对用户的实际身份进行验证之后,可以根据服务标签为用户匹配相应的服务。In one embodiment, after obtaining the service tag corresponding to the audio to be recognized according to the method introduced in step S104, that is, after using the information recognition model to identify the age and gender of the user and verifying the actual identity of the user, the Service tags match users to corresponding services.
在一些实施例中,当服务标签包括年龄识别结果时,服务器110可以预先存储有多个预设年龄区间及每个预设年龄区间对应的服务内容,例如,应用到网约车场景,服务器110可以预先存储有10-20岁、20-30岁、30-40岁、40-50岁、50岁以上这几个年龄区间、以及每个年龄区间对应的司机,如20-30岁对应口碑好的司机,50岁以上对应驾驶平稳的司机等。In some embodiments, when the service tag includes an age identification result, the
此时,依据服务标签确定出与服务标签相匹配的服务信息的过程,可以包括:首先,依据年龄识别结果,从多个预设年龄区间中确定出年龄识别结果所属的目标年龄区间,例如,年龄识别结果为29岁,则年龄识别结果所属的目标年龄区间为20-30岁;然后,获取目标年龄区间对应的服务内容,得到与服务标签相匹配的服务信息,例如,确定出年龄识别结果所属的目标年龄区间为20-30岁,则从20-30岁对应的司机中随机选择一位进行派单。At this time, the process of determining the service information matching the service tag according to the service tag may include: first, according to the age recognition result, determining the target age range to which the age recognition result belongs from a plurality of preset age ranges, for example, If the age recognition result is 29 years old, the target age range to which the age recognition result belongs is 20-30 years old; then, obtain the service content corresponding to the target age range, and obtain the service information that matches the service tag, for example, determine the age recognition result If the target age range is 20-30 years old, one will be randomly selected from the corresponding drivers of 20-30 years old to dispatch the order.
年龄识别对于某些场景具有重要的作用,例如,应用到网约车场景,由于乘客端没有实名认证,故通常无法获知乘客的年龄,此时,可以利用本申请实施例介绍的信息识别方法识别乘客的年龄,并根据年龄进行订单的分配,比如,将年轻乘客的订单分配给口碑好的司机,就可以有效避免打车过程中的事故发生;又如,应用到网上购物场景,由于不同年龄段的用户可能有不同的消费习惯和消费需求,故可以通过识别用户的年龄,就能得到年龄画像,此时可以针对年龄画像对用户进行商品推荐。Age recognition plays an important role in some scenarios. For example, when applied to online car-hailing scenarios, the age of the passenger cannot usually be known because the passenger terminal does not have real-name authentication. In this case, the information recognition method introduced in the embodiment of this application can be used to The age of the passengers and the allocation of orders according to their age. For example, allocating the orders of young passengers to drivers with good reputation can effectively avoid accidents in the process of taxiing; another example, when applied to online shopping scenarios, due to different age groups The users of , may have different consumption habits and consumption needs, so the age portrait can be obtained by identifying the user's age. At this time, the user can be recommended for products based on the age portrait.
在另一些实施例中,当服务标签包括性别识别结果时,服务器110可以预先存储有多个预设性别类别及每个预设性别类别对应的服务内容,例如,应用到网约车场景,服务器110可以预先存储有男性和女性两个性别类别、以及每个性别类别对应的司机,如女性对应女司机,男性对应男司机等。In other embodiments, when the service tag includes a gender identification result, the
此时,依据服务标签确定出与服务标签相匹配的服务信息的过程,可以包括:依据性别识别结果,从多个服务内容中获取性别识别结果对应的目标服务内容,得到与服务标签相匹配的服务信息,例如,性别识别结果为女,则从女司机中随机选择一位进行派单。At this time, the process of determining the service information matching the service tag according to the service tag may include: obtaining target service content corresponding to the gender identification result from a plurality of service contents according to the gender identification result, and obtaining a target service content that matches the service tag. For service information, for example, if the gender identification result is female, one of the female drivers will be randomly selected to dispatch the order.
性别识别对于某些场景具有重要的作用,例如,应用到网约车场景,由于乘客端没有实名认证,故通常无法获知乘客的性别,此时,可以利用本申请实施例介绍的信息识别方法识别乘客的性别,并根据性别进行订单的分配,比如,将女性乘客的订单分配给女司机,就可以有效避免打车过程中的事故发生;又如,应用到网上购物场景,可以通过识别用户的性别,并根据性别为用户进行个性化推荐。Gender recognition plays an important role in some scenarios. For example, when applied to online car-hailing scenarios, since the passenger terminal does not have real-name authentication, it is usually impossible to know the gender of the passenger. In this case, the information recognition method introduced in the embodiment of this application can be used to identify The gender of the passenger, and assign the order according to the gender. For example, allocating the order of the female passenger to the female driver can effectively avoid the accident during the taxi process; another example, applied to the online shopping scene, can identify the gender of the user. , and make personalized recommendations for users based on gender.
在另一些实施例中,当服务标签包括身份验证结果时,服务器110可以预先存储有身份验证结果及身份验证结果对应的服务内容,例如,应用到网约车场景,对于司机,验证通过结果对应正常派单,验证失败结果对应停止派单;对于乘客,验证通过结果对应生成订单,验证失败结果对应停止生成订单。In other embodiments, when the service tag includes the identity verification result, the
此时,依据服务标签确定出与服务标签相匹配的服务信息的过程,可以包括:当身份验证结果为验证通过结果时,获取验证通过结果对应的服务内容作为与服务标签相匹配的服务信息;当身份验证结果为验证失败结果时,获取验证失败结果对应的服务内容作为与服务标签相匹配的服务信息。At this time, the process of determining the service information matching the service tag according to the service tag may include: when the identity verification result is a verification pass result, acquiring the service content corresponding to the verification pass result as the service information matching the service tag; When the authentication result is an authentication failure result, the service content corresponding to the authentication failure result is acquired as the service information matching the service tag.
身份验证对于某些场景具有重要的作用,例如,应用到网约车场景,随着互联网的快速发展,网约车在人们的出行中占据了重要作用,但是,伴随着网约车带来的便利性,也存在着诸多的安全隐患,例如,实际使用网约车的用户(司机或乘客)可能与注册的用户不是同一个人,尤其是实际使用网约车的用户可能是有驾驶事故记录甚至犯罪记录的人,因此,可以利用本申请实施例介绍的信息识别方法对司机或者乘客进行身份验证,并根据身份验证的结果进行订单分配,例如,在身份验证失败时停止分配订单,同时,可以接入公安机关的服务平台以协助公安机关执法。Identity verification plays an important role in some scenarios. For example, it is applied to online car-hailing scenarios. With the rapid development of the Internet, online car-hailing plays an important role in people's travel. Convenience also has many potential safety hazards. For example, the user (driver or passenger) who actually uses the car-hailing may not be the same person as the registered user, especially the user who actually uses the car-hailing may have a driving accident record or even A person with a criminal record, therefore, can use the information identification method introduced in the embodiment of this application to perform identity verification on the driver or passenger, and allocate orders according to the result of the identity verification, for example, stop allocating orders when the identity verification fails. Access to the service platform of the public security organ to assist the public security organ in law enforcement.
基于同一发明构思,本申请实施例中还提供了与信息识别方法对应的信息识别装置300,由于本申请实施例中的装置解决问题的原理与本申请实施例上述信息识别方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present application also provides an
请参照图6,图6示出了本申请实施例提供的信息识别装置300的示意图,信息识别装置300包括音频获取模块301、第一执行模块302、第二执行模块303及第三执行模块304。Please refer to FIG. 6 . FIG. 6 shows a schematic diagram of an
音频获取模块301,用于获取待识别音频。The
第一执行模块302,用于确定表征待识别音频的声学特征的特征向量。The
一种可选实施方式中,声学特征包括梅尔频率倒谱系数MFCC特征、瓶颈特征BNF特征中的至少一项。In an optional implementation manner, the acoustic feature includes at least one of a Mel-frequency cepstral coefficient MFCC feature and a bottleneck feature BNF feature.
第二执行模块303,用于基于特征向量以及预先训练的信息识别模型,得到待识别音频对应的服务标签。The
一种可选实施方式中,服务标签包括年龄识别结果;第二执行模块303具体用于:将特征向量输入信息识别模型,得到特征向量在各个预设年龄区间的概率值;依据特征向量在各个预设年龄区间的概率值,确定出待识别音频对应的年龄识别结果。In an optional embodiment, the service tag includes an age identification result; the
一种可选实施方式中,信息识别模型包括第一网络及第二网络;第二执行模块303执行将特征向量输入信息识别模型,得到特征向量在各个预设年龄区间的概率值的方式,包括:将特征向量输入信息识别模型,利用第一网络对特征向量进行特征提取,得到第一特征图;将从第一网络提取到的第一特征图输入第二网络并进行分类,得到特征向量在各个预设年龄区间的概率值。In an optional embodiment, the information identification model includes a first network and a second network; the
一种可选实施方式中,第一网络包括卷积层及统计池化层;第二执行模块303执行将特征向量输入信息识别模型,利用第一网络对特征向量进行特征提取,得到第一特征图的方式,包括:将特征向量输入信息识别模型,利用卷积层的卷积核和卷积核偏置对特征向量进行卷积处理,得到第一输出特征图;利用统计池化层对第一输出特征图进行统计池化处理,得到第一特征图。In an optional embodiment, the first network includes a convolution layer and a statistical pooling layer; the
一种可选实施方式中,第二网络包括全连接层及多类逻辑回归层;第二执行模块303执行将从第一网络提取到的第一特征图输入第二网络并进行分类,得到特征向量在各个预设年龄区间的概率值的方式,包括:将从第一网络提取到的第一特征图输入全连接层,利用全连接层对第一特征图进行降维,得到第一向量;将第一向量输入多类逻辑回归层,得到特征向量在各个预设年龄区间的概率值。In an optional implementation manner, the second network includes a fully connected layer and a multi-class logistic regression layer; the
一种可选实施方式中,第二执行模块303执行依据特征向量在各个预设年龄区间的概率值,确定出待识别音频对应的年龄识别结果的方式,包括:将特征向量在各个预设年龄区间的概率值进行加权平均,得到待识别音频对应的年龄识别结果。In an optional embodiment, the
一种可选实施方式中,服务标签包括性别识别结果;第二执行模块303执行基于特征向量以及预先训练的信息识别模型,得到待识别音频对应的服务标签的方式,包括:将特征向量输入信息识别模型,得到特征向量在各个预设性别类别的概率值;利用预先训练的隐马尔科夫模型对特征向量在各个预设性别类别的概率值进行约束,得到待识别音频对应的性别识别结果。In an optional implementation manner, the service label includes the gender identification result; the
一种可选实施方式中,信息识别模型包括第三网络及第四网络;第二执行模块303执行将特征向量输入信息识别模型,得到特征向量在各个预设性别类别的概率值的方式,包括:将特征向量输入信息识别模型,利用第三网络对特征向量进行特征提取,得到第二特征图;利用第四网络对第三网络输出的第二特征图进行分类,得到特征向量在各个预设性别类别的概率值。In an optional embodiment, the information identification model includes a third network and a fourth network; the
一种可选实施方式中,第三网络包括卷积层及长短时记忆层;第二执行模块303执行将特征向量输入信息识别模型,利用第三网络对特征向量进行特征提取,得到第二特征图的方式,包括:将特征向量输入信息识别模型,利用卷积层的卷积核和卷积核偏置对特征向量进行卷积处理,得到第二输出特征图;利用长短时记忆层捕捉第二输出特征图的序列信息,得到第二特征图。In an optional embodiment, the third network includes a convolution layer and a long-term and short-term memory layer; the
一种可选实施方式中,第四网络包括全连接层及多类逻辑回归层;第二执行模块303执行利用第四网络对第三网络输出的第二特征图进行分类,得到特征向量在各个预设性别类别的概率值的方式,包括:将第三网络输出的第二特征图输入全连接层,利用全连接层对第二特征图进行降维,得到第二向量;利用多类逻辑回归层对第二向量进行处理,得到特征向量在各个预设性别类别的概率值。In an optional implementation manner, the fourth network includes a fully connected layer and a multi-class logistic regression layer; the
一种可选实施方式中,服务标签包括身份验证结果;第二执行模块303执行基于特征向量以及预先训练的信息识别模型,得到待识别音频对应的服务标签的方式,包括:获取待识别音频对应的标准音频;确定表征标准音频的声学特征的特征向量;将待识别音频对应的特征向量和标准音频对应的特征向量均输入信息识别模型,得到待识别音频和标准音频的相似度分值;根据相似度分值确定出待识别音频对应的身份验证结果,身份验证结果包括验证通过结果或验证失败结果。In an optional implementation manner, the service tag includes an identity verification result; the
第三执行模块304,用于依据服务标签确定出与服务标签相匹配的服务信息。The
一种可选实施方式中,服务器110预先存储有多个预设年龄区间及每个预设年龄区间对应的服务内容;第三执行模块304执行依据服务标签确定出与服务标签相匹配的服务信息的方式,包括:依据年龄识别结果,从多个预设年龄区间中确定出年龄识别结果所属的目标年龄区间;获取目标年龄区间对应的服务内容,得到与服务标签相匹配的服务信息。In an optional implementation manner, the
一种可选实施方式中,服务器110预先存储有多个预设性别类别及每个预设性别类别对应的服务内容;第三执行模块304执行依据服务标签确定出与服务标签相匹配的服务信息的方式,包括:依据性别识别结果,从多个服务内容中获取性别识别结果对应的目标服务内容,得到与服务标签相匹配的服务信息。In an optional embodiment, the
一种可选实施方式中,服务器110预先存储有身份验证结果及身份验证结果对应的服务内容;第三执行模块304执行依据服务标签确定出与服务标签相匹配的服务信息的方式,包括:当身份验证结果为验证通过结果时,获取验证通过结果对应的服务内容作为与服务标签相匹配的服务信息;当身份验证结果为验证失败结果时,获取验证失败结果对应的服务内容作为与服务标签相匹配的服务信息。In an optional embodiment, the
本申请实施例还提供了一种电子设备60,如图7所示,为本申请实施例提供的电子设备60结构示意图,包括:处理器61、存储器62、和总线63。所述存储器62存储有所述处理器61可执行的机器可读指令(比如,图6中的装置中音频获取模块301、第一执行模块302、第二执行模块303、第三执行模块304对应的执行指令等),当电子设备60运行时,所述处理器61与所述存储器62之间通过总线63通信,所述机器可读指令被所述处理器61运行时执行上述信息识别方法的步骤。The embodiment of the present application further provides an electronic device 60 , as shown in FIG. 7 , which is a schematic structural diagram of the electronic device 60 provided by the embodiment of the present application, including: a
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述信息识别方法的步骤。具体地,该存储介质能够为通用的存储介质,如移动磁盘、硬盘等,该存储介质上的计算机程序被运行时,能够执行上述信息识别方法,从而达到在用户无感知的情况下提供更适合的服务、提升用户体验的效果。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the above information identification method when the computer program is run by a processor. Specifically, the storage medium can be a general storage medium, such as a removable disk, a hard disk, etc., when the computer program on the storage medium is run, the above-mentioned information identification method can be executed, so as to provide more suitable information without the user's perception. services and improve the user experience.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考方法实施例中的对应过程,本申请中不再赘述。在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the method embodiment, which is not repeated in this application. In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or modules, which may be in electrical, mechanical or other forms.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.
以上仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only the specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the technical scope disclosed in the present application can easily think of changes or replacements, which should be covered within the scope of the present application. within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910277902.2A CN111798857A (en) | 2019-04-08 | 2019-04-08 | Information identification method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910277902.2A CN111798857A (en) | 2019-04-08 | 2019-04-08 | Information identification method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111798857A true CN111798857A (en) | 2020-10-20 |
Family
ID=72805657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910277902.2A Pending CN111798857A (en) | 2019-04-08 | 2019-04-08 | Information identification method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111798857A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112163571A (en) * | 2020-10-29 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for identifying attribute of electronic equipment user |
CN112651372A (en) * | 2020-12-31 | 2021-04-13 | 北京眼神智能科技有限公司 | Age judgment method and device based on face image, electronic equipment and storage medium |
CN113987258A (en) * | 2021-11-10 | 2022-01-28 | 北京有竹居网络技术有限公司 | Audio identification method and device, readable medium and electronic equipment |
CN113987258B (en) * | 2021-11-10 | 2025-10-03 | 北京有竹居网络技术有限公司 | Audio recognition method, device, readable medium and electronic device |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2353125A1 (en) * | 2008-11-03 | 2011-08-10 | Veritrix, Inc. | User authentication for social networks |
CN102834842A (en) * | 2010-03-23 | 2012-12-19 | 诺基亚公司 | Method and apparatus for determining a user age range |
CN103871413A (en) * | 2012-12-13 | 2014-06-18 | 上海八方视界网络科技有限公司 | Men and women speaking voice classification method based on SVM and HMM mixing model |
US20140188481A1 (en) * | 2009-12-22 | 2014-07-03 | Cyara Solutions Pty Ltd | System and method for automated adaptation and improvement of speaker authentication in a voice biometric system environment |
US20150095028A1 (en) * | 2013-09-30 | 2015-04-02 | Bank Of America Corporation | Customer Identification Through Voice Biometrics |
CN104700843A (en) * | 2015-02-05 | 2015-06-10 | 海信集团有限公司 | Method and device for identifying ages |
CN107492379A (en) * | 2017-06-30 | 2017-12-19 | 百度在线网络技术(北京)有限公司 | A kind of voice-print creation and register method and device |
CN107527620A (en) * | 2017-07-25 | 2017-12-29 | 平安科技(深圳)有限公司 | Electronic installation, the method for authentication and computer-readable recording medium |
CN107680582A (en) * | 2017-07-28 | 2018-02-09 | 平安科技(深圳)有限公司 | Acoustic training model method, audio recognition method, device, equipment and medium |
CN107680602A (en) * | 2017-08-24 | 2018-02-09 | 平安科技(深圳)有限公司 | Voice fraud recognition methods, device, terminal device and storage medium |
WO2018108080A1 (en) * | 2016-12-13 | 2018-06-21 | 北京奇虎科技有限公司 | Voiceprint search-based information recommendation method and device |
US20180197547A1 (en) * | 2017-01-10 | 2018-07-12 | Fujitsu Limited | Identity verification method and apparatus based on voiceprint |
CN108737872A (en) * | 2018-06-08 | 2018-11-02 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN108829739A (en) * | 2018-05-23 | 2018-11-16 | 出门问问信息科技有限公司 | A kind of information-pushing method and device |
CN108958810A (en) * | 2018-02-09 | 2018-12-07 | 北京猎户星空科技有限公司 | A kind of user identification method based on vocal print, device and equipment |
CN109036436A (en) * | 2018-09-18 | 2018-12-18 | 广州势必可赢网络科技有限公司 | A kind of voice print database method for building up, method for recognizing sound-groove, apparatus and system |
CN109255053A (en) * | 2018-09-14 | 2019-01-22 | 北京奇艺世纪科技有限公司 | Resource search method, device, terminal, server, computer readable storage medium |
CN109582822A (en) * | 2018-10-19 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | A kind of music recommended method and device based on user speech |
-
2019
- 2019-04-08 CN CN201910277902.2A patent/CN111798857A/en active Pending
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2353125A1 (en) * | 2008-11-03 | 2011-08-10 | Veritrix, Inc. | User authentication for social networks |
US20140188481A1 (en) * | 2009-12-22 | 2014-07-03 | Cyara Solutions Pty Ltd | System and method for automated adaptation and improvement of speaker authentication in a voice biometric system environment |
CN102834842A (en) * | 2010-03-23 | 2012-12-19 | 诺基亚公司 | Method and apparatus for determining a user age range |
CN103871413A (en) * | 2012-12-13 | 2014-06-18 | 上海八方视界网络科技有限公司 | Men and women speaking voice classification method based on SVM and HMM mixing model |
US20150095028A1 (en) * | 2013-09-30 | 2015-04-02 | Bank Of America Corporation | Customer Identification Through Voice Biometrics |
CN104700843A (en) * | 2015-02-05 | 2015-06-10 | 海信集团有限公司 | Method and device for identifying ages |
WO2018108080A1 (en) * | 2016-12-13 | 2018-06-21 | 北京奇虎科技有限公司 | Voiceprint search-based information recommendation method and device |
US20180197547A1 (en) * | 2017-01-10 | 2018-07-12 | Fujitsu Limited | Identity verification method and apparatus based on voiceprint |
CN107492379A (en) * | 2017-06-30 | 2017-12-19 | 百度在线网络技术(北京)有限公司 | A kind of voice-print creation and register method and device |
CN107527620A (en) * | 2017-07-25 | 2017-12-29 | 平安科技(深圳)有限公司 | Electronic installation, the method for authentication and computer-readable recording medium |
CN107680582A (en) * | 2017-07-28 | 2018-02-09 | 平安科技(深圳)有限公司 | Acoustic training model method, audio recognition method, device, equipment and medium |
CN107680602A (en) * | 2017-08-24 | 2018-02-09 | 平安科技(深圳)有限公司 | Voice fraud recognition methods, device, terminal device and storage medium |
CN108958810A (en) * | 2018-02-09 | 2018-12-07 | 北京猎户星空科技有限公司 | A kind of user identification method based on vocal print, device and equipment |
CN108829739A (en) * | 2018-05-23 | 2018-11-16 | 出门问问信息科技有限公司 | A kind of information-pushing method and device |
CN108737872A (en) * | 2018-06-08 | 2018-11-02 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN109255053A (en) * | 2018-09-14 | 2019-01-22 | 北京奇艺世纪科技有限公司 | Resource search method, device, terminal, server, computer readable storage medium |
CN109036436A (en) * | 2018-09-18 | 2018-12-18 | 广州势必可赢网络科技有限公司 | A kind of voice print database method for building up, method for recognizing sound-groove, apparatus and system |
CN109582822A (en) * | 2018-10-19 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | A kind of music recommended method and device based on user speech |
Non-Patent Citations (1)
Title |
---|
邱东等: "《当代机器深度学习方法与应用研究》", 31 May 2004, 沈阳:东北财经大学出版社, pages: 433 - 435 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112163571A (en) * | 2020-10-29 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for identifying attribute of electronic equipment user |
CN112163571B (en) * | 2020-10-29 | 2024-03-05 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for identifying attribute of electronic equipment user |
CN112651372A (en) * | 2020-12-31 | 2021-04-13 | 北京眼神智能科技有限公司 | Age judgment method and device based on face image, electronic equipment and storage medium |
CN113987258A (en) * | 2021-11-10 | 2022-01-28 | 北京有竹居网络技术有限公司 | Audio identification method and device, readable medium and electronic equipment |
CN113987258B (en) * | 2021-11-10 | 2025-10-03 | 北京有竹居网络技术有限公司 | Audio recognition method, device, readable medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435684B (en) | Voice separation method and device, computer equipment and storage medium | |
US10504504B1 (en) | Image-based approaches to classifying audio data | |
CN110289003B (en) | A voiceprint recognition method, model training method and server | |
US11875799B2 (en) | Method and device for fusing voiceprint features, voice recognition method and system, and storage medium | |
US10276167B2 (en) | Method, apparatus and system for speaker verification | |
CN106782564B (en) | Method and apparatus for handling voice data | |
CN107481720B (en) | Explicit voiceprint recognition method and device | |
US20160358610A1 (en) | Method and device for voiceprint recognition | |
CN113035202B (en) | Identity recognition method and device | |
CN108197532A (en) | The method, apparatus and computer installation of recognition of face | |
CN106683680A (en) | Speaker recognition method and device and computer equipment and computer readable media | |
US11107462B1 (en) | Methods and systems for performing end-to-end spoken language analysis | |
CN110648671A (en) | Voiceprint model reconstruction method, terminal, device and readable storage medium | |
WO2014114116A1 (en) | Method and system for voiceprint recognition | |
CN111862945A (en) | A speech recognition method, device, electronic device and storage medium | |
EP3123468A1 (en) | Training classifiers using selected cohort sample subsets | |
CN109920410A (en) | Apparatus and method for determining reliability of recommendations based on vehicle environment | |
CN109949798A (en) | Commercial detection method and device based on audio | |
CA3158927A1 (en) | Shopping method, device and system | |
CN107731232A (en) | Voice translation method and device | |
CN111798857A (en) | Information identification method and device, electronic equipment and storage medium | |
CN116705034A (en) | Voiceprint feature extraction method, speaker recognition method, model training method and device | |
CN113113048B (en) | Speech emotion recognition method and device, computer equipment and medium | |
CN119649856A (en) | Speech emotion recognition method, device, computer equipment and readable storage medium | |
CN118711585A (en) | Intention recognition method, device, equipment and medium based on voice interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201020 |