[go: up one dir, main page]

WO2018137426A1 - Procédé et appareil de reconnaissance d'informations vocales d'utilisateur - Google Patents

Procédé et appareil de reconnaissance d'informations vocales d'utilisateur Download PDF

Info

Publication number
WO2018137426A1
WO2018137426A1 PCT/CN2017/115677 CN2017115677W WO2018137426A1 WO 2018137426 A1 WO2018137426 A1 WO 2018137426A1 CN 2017115677 W CN2017115677 W CN 2017115677W WO 2018137426 A1 WO2018137426 A1 WO 2018137426A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
voice
language
user
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/115677
Other languages
English (en)
Chinese (zh)
Inventor
袁文华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Publication of WO2018137426A1 publication Critical patent/WO2018137426A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present disclosure relates to the field of communications, and in particular to a method and apparatus for identifying user voice information.
  • the identity recognition system includes face recognition, fingerprint recognition, voiceprint recognition, password recognition and password recognition.
  • Password identification means that the user inputs or speaks one or more words, phrases or sentences as the key for verification, but the identification technology can only be used for simple identification, and the key is easily stolen or stolen.
  • Voiceprint is a sound wave spectrum that carries speech information displayed by electroacoustic instruments.
  • the generation of human language is a complex physiological and physical process between the human language center and the vocal organs.
  • the vocal organs used by people when speaking, for example, The tongue, teeth, throat, lungs, and nasal cavity vary greatly in size and morphology, so there are differences in the voiceprint of any two people.
  • voiceprint recognition technology is more secure than password recognition technology.
  • voiceprint recognition technology for the identification of user identity is still a security risk, because the phonetic features can still be imitated by certain techniques, such as illegally accepting the voice information of the user's speech, through the pirated
  • the voice information mimics the user's voice, causing the user's identity information to be stolen, causing economic losses to the user.
  • Embodiments of the present disclosure provide a method and apparatus for identifying user voice information.
  • a method for identifying user voice information including: Acquiring the voice information of the user; extracting the first language feature and the voice feature of the voice information; searching for a preset second language feature corresponding to the voice feature; according to the first language feature and the second language feature The first comparison result determines whether the sound information is legal.
  • determining whether the sound information is legal according to the first comparison result of the first language feature and the second language feature comprises: determining the first language feature according to the first comparison result a vector similarity with the second language feature; determining whether the sound information is legal according to a comparison result of the vector similarity and a preset threshold, wherein when the comparison result indicates that the vector similarity is greater than or When the preset threshold is equal to, the sound information is determined to be legal.
  • the preset threshold is an average of vector similarities between a plurality of the second language features.
  • the searching for the preset second language feature corresponding to the voice feature comprises: searching for user identification information corresponding to the voice feature; and acquiring the first according to the found user identifier information Two language features.
  • the correspondence between the voice feature and the user identification information is determined by using a voice feature and user identification information input in advance as an input of a neural network model in the neural network model. Train the learning to get the corresponding relationship.
  • the first linguistic feature and the second linguistic feature are Mel Frequency Cepstral Coefficents (MFCC);
  • MFCC Mel Frequency Cepstral Coefficents
  • LPCC Linear Prediction Cepstrum Coefficient
  • the first linguistic feature and the second voicing feature comprise a voiceprint feature; the voice feature comprising linguistic content of a user password.
  • an apparatus for identifying user voice information comprising: an obtaining module configured to acquire sound information of a user; and an extracting module configured to extract a first language feature and a voice of the sound information a locating module configured to find a pre-set second language feature corresponding to the voice feature; the determining module configured to determine the first comparison result according to the first language feature and the second language feature Whether the sound information is legal.
  • the determining module is further configured to determine according to the first comparison result a vector similarity between the first language feature and the second language feature; and determining whether the sound information is legal according to a comparison result of the vector similarity and a preset threshold, wherein when the comparison result indicates When the vector similarity is greater than or equal to the preset threshold, it is determined that the sound information is legal.
  • the lookup module is further configured to look up user identification information corresponding to the voice feature; and acquire the second language feature according to the found user identification information.
  • a storage medium is also provided.
  • the storage medium is configured to store program code for performing the steps of: acquiring sound information of the user; extracting first language features and voice features of the sound information; and searching for pre-set second language features corresponding to the voice features Determining whether the sound information is legal according to the first comparison result of the first language feature and the second language feature.
  • the storage medium is further configured to store program code for performing a step of determining a vector similarity between the first language feature and the second language feature based on the first comparison result; Determining whether the sound information is legal according to the comparison result of the vector similarity and the preset threshold, wherein when the comparison result indicates that the vector similarity is greater than or equal to the preset threshold, determining that the sound information is legal .
  • the storage medium is further configured to store program code for performing: searching for user identification information corresponding to the voice feature; acquiring the second language feature based on the found user identification information .
  • the storage medium is further configured to store program code for performing training learning in the neural network model with pre-entered speech features and user identification information as inputs to a neural network model, The corresponding relationship is obtained.
  • the linguistic features and the voice features of the sound information are acquired and extracted, and the corresponding linguistic features are searched according to the voice features, whether the sound information is legal according to the comparison result between the linguistic features is determined, and the identification process is implemented.
  • the linguistic features and voice features of the user's voice information are comprehensively considered.
  • FIG. 1 is a block diagram showing the hardware structure of a computer terminal for identifying a user voice information according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a method of identifying user sound information according to an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of an apparatus for identifying user sound information according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a flowchart (1) of a method of identifying user sound information according to an exemplary embodiment of the present disclosure
  • FIG. 5 is a flowchart (2) of a method of identifying user sound information according to an exemplary embodiment of the present disclosure
  • FIG. 6 is a structural block diagram of an apparatus for identifying user sound information according to an embodiment of the present disclosure.
  • Vector similarity is the similarity between linguistic feature vectors, usually expressed by the distance between vectors, such as Euclidean distance, cosine distance, and so on.
  • FIG. 1 is a hardware structural block diagram of a computer terminal for identifying a user voice information according to an embodiment of the present disclosure.
  • computer terminal 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
  • processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
  • a memory 104 configured to store data
  • a transmission device 106 for communication functions It will be understood by those skilled in the art that the structure shown in FIG. 1 is merely illustrative and does not limit the structure of the above electronic device.
  • computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
  • the memory 104 can be configured as a software program and a module for storing application software, such as program instructions/modules corresponding to the method for identifying user sound information in the embodiment of the present disclosure, and the processor 102 runs the software program and module stored in the memory 104. Thereby performing various functional applications and data processing, that is, implementing the above method.
  • Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Transmission device 106 is configured to receive or transmit data via a network.
  • the network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10.
  • the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 can be a Radio Frequency (RF) module configured to communicate with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a method for identifying user voice information according to an embodiment of the present disclosure, as shown in FIG. The process includes the following steps:
  • Step S202 acquiring sound information of the user
  • Step S204 extracting first language features and voice features of the sound information
  • Step S206 searching for a preset second language feature corresponding to the voice feature
  • Step S208 Determine whether the sound information is legal according to the first comparison result of the first language feature and the second language feature.
  • the sound information includes a language feature and a voice feature, wherein the language feature is a specific content of the user password, and the voice feature is a voiceprint feature extracted from the sound information.
  • the language feature is a specific content of the user password
  • the voice feature is a voiceprint feature extracted from the sound information. For example, when the user issues the password "turn on the light”, the word “turning on the light” is the linguistic feature of the user's voice information, and the feature of the sound emitted by the user through the vocal organ is the voice feature.
  • the linguistic features and the voice features of the sound information are acquired and extracted, and the corresponding linguistic features are searched according to the voice features, whether the sound information is legal according to the comparison result between the linguistic features is determined, and the identification process is implemented.
  • the linguistic features and voice features of the user's voice information are comprehensively considered.
  • the execution body of the above steps may be a data processing device such as a single chip microcomputer, but is not limited thereto.
  • step S208 may be performed by determining a vector similarity between the first language feature and the second language feature according to the first comparison result; comparing the vector similarity with the preset threshold As a result, it is determined whether the sound information is legal, wherein when the comparison result indicates that the vector similarity is greater than or equal to the preset threshold, it is determined that the sound information is legal.
  • the second language feature may be a plurality of language features, and when the first language feature is compared with the second language feature, the vector similarity between the first language feature and each second language feature may be sequentially determined. And then take the average of the vector similarity, but is not limited to this. Because this example uses the method of vector similarity and threshold comparison to judge whether the sound information is legal, it realizes the fast and accurate judgment of the user's identity, and improves the accuracy of identifying the user's identity according to the language feature.
  • the predetermined threshold is an average of vector similarities between the plurality of second language features.
  • the language feature of the user may be repeatedly collected, and then the average of the vector similarity between the collected language features is obtained, and the average objectively and accurately reflects the user.
  • the linguistic features of the sound information prevent the problem of inaccurate acquisition results due to interference with the language features of a certain acquisition.
  • step S206 may be performed by searching for user identification information corresponding to the voice feature, and acquiring the second language feature according to the found user identification information.
  • the user identification information may be composed of an integer string or a string of names, typically the ID of the user, which is assigned by the system and is unique.
  • the correspondence between the voice feature and the user identification information is determined by using the voice feature and the user identification information input in advance as input of a neural network model, and performing training learning in the neural network model. Get the corresponding relationship.
  • the neural network model may be a deep neural network recognition model, such as a Convolutional Neural Network (CNN).
  • CNN Convolutional Neural Network
  • the deep neural network model is pre-trained. In the pre-training, there must be a large training set. The content of the training set is the pre-entered speech feature and user identification information.
  • the pre-training process is a standard deep neural network training process, and the content of the training set is input into the depth one by one.
  • the parameters of the neural network are optimized by the Error Back Propagation (BP) algorithm according to the comparison of the output and the user identification information, and the loop is performed until the accuracy of the model reaches the requirement.
  • BP Error Back Propagation
  • the first linguistic feature and the second linguistic feature are MFCC; the speech feature is LPCC.
  • the speech feature can use LPCC as the feature parameter, and the LPCC parameter has the advantage of high computational efficiency, and the excitation information in the speech generation process is completely removed, which effectively reflects the channel response, and requires less LPCC. More than a dozen LPCCs can describe the formant characteristics of speech signals.
  • the first language feature and the second phone feature include a voiceprint feature; the voice feature includes linguistic content of a user password.
  • the apparatus includes: a sound entry module 32 configured to collect sound information; and a language feature recognition module 34 configured to The language feature is extracted from the voice information; the voice feature recognition module 36 is configured to extract the voice feature in the sound information; the feature comparison module 38 is configured to compare the language feature and the vector similarity; and the feature storage module 310 is configured to store the preset The linguistic feature, the user identification information corresponding to the linguistic feature, the preset threshold, and the correspondence between the voice feature and the user identification information; the recognition result output module 312 is configured to output the indication information that determines whether the voice information is legal.
  • FIG. 4 is a flowchart (1) of a method for identifying user voice information according to an exemplary embodiment of the present disclosure. As shown in FIG. 4, the flow includes:
  • Step S402 the sound input module 32 collects the sound information of the password of the user
  • Step S404 the language feature recognition module 34 performs Fourier transform on the sound information to obtain the spectrum of the sound information, and takes the logarithm of the spectrum and then performs inverse Fourier transform to obtain the MFCC;
  • Step S406 the sound entry module 32 and the language feature recognition module 34 repeat the above-mentioned acquisition and acquisition operations until n MFCCs are acquired, where n ⁇ 2;
  • Step S408 the feature comparison module 38 compares n MFCCs in pairs, obtains multiple vector similarities, and averages the plurality of vector similarities to obtain a preset threshold.
  • Step S410 the speech feature recognition module 36 extracts the LPCC in the sound information; inputs the LPCC and the user ID into the pre-trained deep neural network model, and performs training learning on the deep neural network model;
  • Step S412 the sound input module 32 and the voice feature recognition module 36 repeat the above-mentioned acquisition and training learning operations until the test effect of the deep neural network model reaches the specified requirement;
  • step S414 the storage module 310 stores n MFCCs, IDs of users corresponding to the MFCCs, preset thresholds, and correspondences between the LPCCs and the IDs of the users.
  • This example is an operation in which the device configures relevant parameters when the user first uses the identification device of the user's voice information.
  • the sound entry module 32 can perform sound information collection through the microphone.
  • FIG. 5 is a flowchart (2) of a method for identifying user voice information according to an exemplary embodiment of the present disclosure. As shown in FIG. 5, the flow includes:
  • Step S502 the sound input module 32 collects the sound information of the password of the user
  • Step S504 the linguistic feature recognition module 34 performs Fourier transform on the sound information to obtain the spectrum of the sound information, and takes the logarithm of the spectrum and then performs inverse Fourier transform to obtain the MFCC;
  • Step S506 the speech feature recognition module 36 extracts the LPCC in the sound information; inputs the LPCC into the pre-trained deep neural network model to obtain the ID of the user;
  • Step S508 the feature comparison module 38 searches the storage module 310 for the MFCC corresponding to the obtained user ID, and compares the found MFCC with the MFCC extracted from the voice information of the password password to obtain a vector similarity;
  • the vector similarity is compared with the preset threshold. When the vector similarity is greater than or equal to the preset threshold, the voice information of the password password is legal, the user identification is successful; when the vector similarity is less than the preset threshold, the voice information of the password password Is illegal, user identification fails;
  • step S510 the recognition result outputting module 312 outputs the recognition result of the successful or failed user identification determined by the feature comparison module 38.
  • an "open door” command is issued at the door of the door;
  • the identification device of the user's voice information prompts for a password, and the user speaks the preset password password "I am the homeowner";
  • the identification device enters the password and the voice according to the password
  • the feature determines a preset language feature;
  • the identifying device compares the language feature extracted by the password and the password with the preset language feature to obtain a vector similarity; compares the vector similarity with the preset threshold, and when the vector similarity is greater than or equal to the preset When the threshold is reached, the door is automatically opened; When the vector similarity is less than the preset threshold, the user is prompted to re-enter the password.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is The usual implementation.
  • the solution of the present disclosure may be embodied in the form of a software product stored in a storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), and includes a plurality of instructions for making one
  • the terminal device (which may be a cell phone, computer, server, or network device, etc.) performs the methods described in various embodiments of the present disclosure.
  • module may implement a combination of software and/or hardware of a predetermined function.
  • devices described in the following embodiments are typically implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • the apparatus includes an obtaining module 62 configured to acquire sound information of a user, and an extracting module 64 configured to extract sound information. a first language feature and a voice feature; the search module 66 is configured to search for a preset second language feature corresponding to the voice feature; the determining module 68 is configured to determine according to the first comparison result of the first language feature and the second language feature Whether the sound information is legal.
  • the determining module 68 is further configured to determine a vector similarity between the first language feature and the second language feature according to the first comparison result; and compare the vector similarity with the preset threshold. Determining whether the sound information is legal, wherein when the comparison result indicates that the vector similarity is greater than or equal to the preset threshold, it is determined that the sound information is legal.
  • the lookup module 66 is further configured to search for user identification information corresponding to the voice feature; and acquire the second language feature according to the found user identification information.
  • Example 1 differs from Example 1 in the division of the module.
  • the acquisition module 62 is similar to the sound entry module 32 in Example 1, but with some new functional features added;
  • Block 64 is similar to language feature recognition module 34 and speech feature recognition module 36 in Example 1, but with some new functional features added;
  • feature comparison module in Example 1 is implemented by lookup module 66 and determination module 68 in this example. 38 features.
  • the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the above modules are all located in the same processor; or, the above modules are respectively located in different combinations. In the processor.
  • Embodiments of the present disclosure also provide a storage medium.
  • the above storage medium may be configured to store program code for performing the following steps: S11, acquiring sound information of the user; S12, extracting first language features and voice features of the sound information; S13, finding and voice features Corresponding pre-set second language features; S14, determining whether the sound information is legal according to the first comparison result of the first language feature and the second language feature.
  • the storage medium may be further configured to store program code for performing the following steps: S21, determining a vector similarity between the first language feature and the second language feature according to the first comparison result; S22, according to the vector
  • S21 determining a vector similarity between the first language feature and the second language feature according to the first comparison result
  • S22 according to the vector
  • the result of the comparison between the similarity and the preset threshold determines whether the sound information is legal.
  • the comparison result indicates that the vector similarity is greater than or equal to the preset threshold, the sound information is determined to be legal.
  • the storage medium may be further configured to store program code for performing the following steps: S31, searching for user identification information corresponding to the voice feature; S32, acquiring the second language feature based on the found user identification information.
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, a magnetic disk. Or a variety of media such as optical discs that can store program code.
  • the functional modules/units in the system, device, and device can be implemented as software, firmware, hardware, and suitable combinations thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components work together.
  • Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit.
  • Such software may be distributed on a computer readable medium, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer.
  • communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .
  • the linguistic features and the voice features of the sound information are acquired and extracted, and the corresponding linguistic features are searched according to the voice features, whether the sound information is legal according to the comparison result between the linguistic features is determined, and the identification process is implemented.
  • the linguistic features and voice features of the user's voice information are comprehensively considered.
  • the problem of economic loss and poor user experience effectively combines password recognition technology and voiceprint recognition technology, and improves the security of identity recognition by comprehensively considering the language features and voice features of the user's voice information.
  • the probability that a key is stolen or stolen by others increases the user experience.
  • the present disclosure therefore has industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de reconnaissance d'informations vocales d'un utilisateur. Le procédé consiste à : obtenir des informations vocales d'un utilisateur ; extraire des premières caractéristiques linguistiques et caractéristiques vocales des informations vocales ; rechercher des secondes caractéristiques linguistiques prédéfinies correspondant aux caractéristiques vocales ; et à déterminer si les informations vocales sont légitimes en fonction d'un premier résultat de comparaison des premières caractéristiques linguistiques et des secondes caractéristiques linguistiques.
PCT/CN2017/115677 2017-01-24 2017-12-12 Procédé et appareil de reconnaissance d'informations vocales d'utilisateur Ceased WO2018137426A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710054959.7A CN108345777A (zh) 2017-01-24 2017-01-24 用户声音信息的识别方法及装置
CN201710054959.7 2017-01-24

Publications (1)

Publication Number Publication Date
WO2018137426A1 true WO2018137426A1 (fr) 2018-08-02

Family

ID=62962910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/115677 Ceased WO2018137426A1 (fr) 2017-01-24 2017-12-12 Procédé et appareil de reconnaissance d'informations vocales d'utilisateur

Country Status (2)

Country Link
CN (1) CN108345777A (fr)
WO (1) WO2018137426A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111128129B (zh) * 2019-12-31 2022-06-03 中国银行股份有限公司 基于语音识别的权限管理方法及装置
CN113976478A (zh) * 2021-11-15 2022-01-28 中国联合网络通信集团有限公司 矿石检测方法、服务器、终端及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103986725A (zh) * 2014-05-29 2014-08-13 中国农业银行股份有限公司 一种客户端、服务器端以及身份认证系统和方法
CN104219195A (zh) * 2013-05-29 2014-12-17 腾讯科技(深圳)有限公司 身份校验方法、装置及系统
CN104376250A (zh) * 2014-12-03 2015-02-25 优化科技(苏州)有限公司 基于音型像特征的真人活体身份验证方法
CN104834847A (zh) * 2014-02-11 2015-08-12 腾讯科技(深圳)有限公司 身份验证方法及装置
US20150302855A1 (en) * 2014-04-21 2015-10-22 Qualcomm Incorporated Method and apparatus for activating application by speech input
CN105635087A (zh) * 2014-11-20 2016-06-01 阿里巴巴集团控股有限公司 通过声纹验证用户身份的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104219195A (zh) * 2013-05-29 2014-12-17 腾讯科技(深圳)有限公司 身份校验方法、装置及系统
CN104834847A (zh) * 2014-02-11 2015-08-12 腾讯科技(深圳)有限公司 身份验证方法及装置
US20150302855A1 (en) * 2014-04-21 2015-10-22 Qualcomm Incorporated Method and apparatus for activating application by speech input
CN103986725A (zh) * 2014-05-29 2014-08-13 中国农业银行股份有限公司 一种客户端、服务器端以及身份认证系统和方法
CN105635087A (zh) * 2014-11-20 2016-06-01 阿里巴巴集团控股有限公司 通过声纹验证用户身份的方法及装置
CN104376250A (zh) * 2014-12-03 2015-02-25 优化科技(苏州)有限公司 基于音型像特征的真人活体身份验证方法

Also Published As

Publication number Publication date
CN108345777A (zh) 2018-07-31

Similar Documents

Publication Publication Date Title
US11900948B1 (en) Automatic speaker identification using speech recognition features
US10593336B2 (en) Machine learning for authenticating voice
US9542948B2 (en) Text-dependent speaker identification
US20180277103A1 (en) Constructing speech decoding network for numeric speech recognition
US20110320202A1 (en) Location verification system using sound templates
KR20200012963A (ko) 객체 인식 방법, 컴퓨터 디바이스 및 컴퓨터 판독 가능 저장 매체
Shah et al. Biometric voice recognition in security system
WO2017197953A1 (fr) Procédé et dispositif de reconnaissance d'identité fondés sur une empreinte vocale
JP2018536889A (ja) 音声データを使用して操作を開始するための方法および装置
Baloul et al. Challenge-based speaker recognition for mobile authentication
CN108780645B (zh) 对通用背景模型和登记说话者模型进行文本转录适配的说话者验证计算机系统
CN108766445A (zh) 声纹识别方法及系统
US20230401338A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
WO2017206375A1 (fr) Procédés et dispositifs d'enregistrement et d'authentification d'empreinte vocale
US20100223057A1 (en) Method and system to authenticate a user and/or generate cryptographic data
KR101888058B1 (ko) 발화된 단어에 기초하여 화자를 식별하기 위한 방법 및 그 장치
KR20240132372A (ko) 멀티태스크 음성 모델을 이용한 화자 검증
JP2007133414A (ja) 音声の識別能力推定方法及び装置、ならびに話者認証の登録及び評価方法及び装置
CN110379433A (zh) 身份验证的方法、装置、计算机设备及存储介质
WO2020220541A1 (fr) Procédé et terminal de reconnaissance de locuteur
WO2018137426A1 (fr) Procédé et appareil de reconnaissance d'informations vocales d'utilisateur
Aronowitz et al. Efficient speaker recognition using approximated cross entropy (ACE)
KR102098956B1 (ko) 음성인식장치 및 음성인식방법
Georgescu et al. GMM-UBM modeling for speaker recognition on a Romanian large speech corpora
CN110047491A (zh) 一种随机数字口令相关的说话人识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17894027

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17894027

Country of ref document: EP

Kind code of ref document: A1