WO2021010056A1 - マイクユニット - Google Patents
マイクユニット Download PDFInfo
- Publication number
- WO2021010056A1 WO2021010056A1 PCT/JP2020/022616 JP2020022616W WO2021010056A1 WO 2021010056 A1 WO2021010056 A1 WO 2021010056A1 JP 2020022616 W JP2020022616 W JP 2020022616W WO 2021010056 A1 WO2021010056 A1 WO 2021010056A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- microphone
- sound data
- unit
- collation
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R19/00—Electrostatic transducers
- H04R19/01—Electrostatic transducers characterised by the use of electrets
- H04R19/016—Electrostatic transducers characterised by the use of electrets for microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R19/00—Electrostatic transducers
- H04R19/04—Microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/003—Mems transducers or their use
Definitions
- the present invention relates to a microphone unit capable of determining whether or not the voice input to the first microphone is the voice of the intended speaker.
- Patent Document 1 describes a voice dialogue system.
- This voice dialogue system is composed of a voice dialogue device and a voice recognition server.
- the voice dialogue device recognizes the voice input to the voice input means and transmits the voice input to the voice input means to the voice recognition server.
- the voice recognition server recognizes the voice received from the voice dialogue device.
- the voice dialogue device is configured to output a response based on the recognition result of the voice dialogue device, and then output a response based on the difference between the recognition result of the voice dialogue device and the recognition result of the voice recognition server.
- Patent Document 1 performs voice recognition processing not only in the voice dialogue device but also in the voice recognition server. For this reason, it is necessary to perform communication for voice recognition, and it cannot be used in a situation where the communication infrastructure is not maintained. Further, the technique described in Patent Document 1 is a technique for performing voice recognition, and is not supposed to identify a voice speaker.
- the characteristic configuration of the microphone unit according to the present invention is a microphone unit capable of determining whether or not the sound input to the first microphone is the sound produced by the intended speaker, and acquires the sound as sound data.
- the matching sound data is created by a device different from the device on which the first microphone is mounted, which includes a matching unit for collating whether or not to check, and a matching result output unit for outputting the matching result of the matching unit.
- the device on which the first microphone is mounted and the device different from the device are different from each other in that the matching sound data is exchanged by wireless communication.
- collation can be performed without providing a server for collation. That is, since it is a so-called local recognition process, it is possible to perform collation safely in terms of security. In addition, it is possible to easily identify the speaker of the voice registered in advance.
- the collation unit when the collation unit is in the sleep state, it is preferable to end the sleep state by using the acquisition of the evaluation sound data by the evaluation sound data acquisition unit as a trigger.
- the sound data acquired by the sound data acquisition unit is a sound input to a second microphone provided in a device different from the device on which the first microphone is mounted, and is the first microphone and the first microphone.
- an evaluation unit that evaluates the frequency characteristics of the first microphone and the frequency characteristics of the second microphone, and the frequency characteristics of one of the first microphone and the second microphone. It is preferable to further include a correction unit that corrects the sound so as to match the other frequency characteristic.
- the collation rate when the same user inputs voice to the first microphone and the second microphone can be enhanced.
- the voice of the speaker who utters the voice related to the collation sound data is tested by the first microphone.
- the vocalization based on the test sound data while changing the collation parameters used for the collation.
- the collation unit further includes a calculation unit for calculating the collation rate of the person, and the collation unit is based on the collation parameter when the collation rate is the highest among the collation rates calculated by the calculation unit. It is preferable to perform.
- the collation parameter is an amplification factor that amplifies at least one of the test sound data and the collation sound data.
- the collation rate can be increased by changing the input range of at least one of the first microphone and the second microphone.
- a parameter changing unit that automatically changes the parameters of the first microphone based on the matching parameters when inputting voice based on the evaluation sound data to the first microphone.
- the voice level of the voice input from the first microphone can be automatically suppressed and changed to a level that improves the collation rate. Therefore, the collation rate can be improved automatically. Further, for example, it is possible to record for a certain period of time and automatically change the audio level based on the average audio level within that time.
- the microphone unit identifies the speaker of the voice input to the first microphone based on the collation result of the collation unit.
- the utterance content of the voice input to the first microphone it is preferable to estimate the utterance content of the voice input to the first microphone and issue an operation command to the device equipped with the first microphone based on the estimated content.
- the operation of the device equipped with the first microphone can be controlled hands-free, so that the convenience can be improved.
- the microphone unit according to the present invention is configured to be able to determine whether or not the input voice is the voice of the intended speaker.
- the microphone unit 1 of the present embodiment will be described.
- FIG. 1 is a block diagram schematically showing the configuration of the microphone unit 1 according to the present embodiment.
- the microphone unit 1 has the functions of the first microphone 10, the sound data acquisition unit 11, the sound data registration unit 12, the evaluation sound data acquisition unit 13, the collation unit 14, and the collation result output unit 15. It has a part.
- Each of these functional units is constructed of hardware, software, or both with a CPU as a core member in order to perform the processing related to the above-mentioned determination.
- the first microphone 10 is a microphone element, and the configuration is not particularly limited. For example, it is preferable to use at least one of an electlet condenser microphone (ECM), an analog MEMS (Micro-Electro-Mechanical System) microphone, a digital MEMS (Micro-Electro-Mechanical System) microphone, and the like.
- ECM electlet condenser microphone
- An analog MEMS Micro-Electro-Mechanical System
- a digital MEMS Micro-Electro-Mechanical System
- the sound data acquisition unit 11 acquires voice as sound data.
- the sound data acquired by the sound data acquisition unit 11 is the voice input to the second microphone 2A provided in the device 2 different from the device in which the first microphone 10 is mounted.
- the device on which the first microphone 10 is mounted is the microphone unit 1 in the present embodiment. Therefore, the second microphone 2A is provided separately (separately) from the microphone unit 1.
- the second microphone 2A also has, for example, an electret condenser microphone (ECM), an analog MEMS (Micro-Electro-Mechanical System) microphone, a digital MEMS (Micro-Electro-Mechanical System) microphone, and the like. It is preferable to use at least one.
- ECM electret condenser microphone
- the voice input to the second microphone 2A is converted into sound data which is an electric signal by the second microphone 2A.
- the sound data acquisition unit 11 acquires the sound data converted and generated by the second microphone 2A.
- the sound data registration unit 12 registers collation sound data obtained by extracting feature points from the sound data generated by the second microphone 2A.
- the sound data generated by the second microphone 2A is generated by converting (converting into data) the voice input to the second microphone 2A as described above.
- the feature point is a feature in an electric signal (sound data), and corresponds to, for example, a period, a peak value, a half width, and the like. Therefore, the collation sound data corresponds to the one obtained by extracting the characteristics of the electric signal generated by converting the voice input to the second microphone 2A.
- Such collation sound data corresponds to the master sound data for realizing the function of the microphone unit 1 to determine whether or not the sound input to the first microphone 10 is the sound of the intended speaker. , Such collation sound data is recorded in the sound data registration unit 12.
- the collation sound data is created by a device different from the device (microphone unit 1) on which the first microphone 10 is mounted.
- a device different from the device on which the first microphone 10 is mounted is a device different from the microphone unit 1.
- the device 2 on which the second microphone 2A is mounted and the device 3 other than the microphone unit 1 and the device 2 correspond to the device 2.
- the collation sound data is generated by the collation sound data generation unit 3A provided in the device 3.
- the collation sound data is exchanged by wireless communication with a device different from the device on which the first microphone 10 is mounted.
- Wireless communication corresponds to, for example, LAN communication such as Wi-Fi (registered trademark) and short-range wireless communication such as Bluetooth (registered trademark).
- the microphone unit 1 transmits the collation sound data from the device 3 (the collation sound data generation unit 3A of the device 3) via such wireless communication.
- the collation sound data generation unit 3A may be included in the device 2.
- the sound data acquisition unit 11 transmits the sound data to the device 3 other than the microphone unit 1 and the device 2 by wireless communication, and the microphone It is preferable that the sound data registration unit 12 acquires the verification sound data created in the device 3 other than the unit 1 and the device 2 by wireless communication.
- the sound data acquisition unit 11 transmits the sound data to the device 2 by wireless communication, and the collation sound data created by the device 2 is transmitted to the device 2 by wireless communication.
- the registration unit 12 may be configured to acquire the data, or the device 2 may be provided with the sound data acquisition unit 11, and the collation sound data created based on the sound data acquired by the sound data acquisition unit 11 in the device 2. May be configured to be acquired by the sound data registration unit 12 by wireless communication.
- the voice input to the microphone is digitized, and the digitized voice data is transmitted to an external device (server) via an internet line.
- the collation sound data generation unit 3A extracts the feature points from the voice data and generates the collation sound data wirelessly to the paired device. Sent via communication.
- the evaluation sound data acquisition unit 13 acquires the voice input to the first microphone 10 as the evaluation sound data. As described above, the voice input to the first microphone 10 is converted into sound data which is an electric signal by the first microphone 10. This sound data corresponds to the evaluation sound data. Therefore, the evaluation sound data acquisition unit 13 acquires the evaluation sound data generated by converting the sound input to the first microphone 10 into sound data which is an electric signal by the first microphone 10.
- the collation unit 14 determines whether or not the speaker of the voice based on the evaluation sound data is the speaker of the voice based on the collation sound data based on the collation sound data and the feature points extracted from the evaluation sound data. Is collated.
- the collation sound data is registered and recorded in the sound data registration unit 12.
- the evaluation sound data is acquired by the evaluation sound data acquisition unit 13.
- the feature points extracted from the evaluation sound data are features of the evaluation sound data which are electric signals, and correspond to, for example, a period, a peak value, a half width, and the like. Similar to the collation sound data, such feature points can be generated by a device different from the microphone unit 1 and can be configured to be transmitted via wireless communication. Of course, it is also possible to configure the collation unit 14 to extract feature points.
- the voice based on the evaluation sound data is the voice input to the first microphone 10 and is converted into the evaluation sound data by the first microphone 10.
- the voice based on the collation sound data is the voice input to the second microphone 2A in the present embodiment, and is the voice converted into the collation sound data by the second microphone 2A.
- the collation unit 14 uses the first microphone based on the collation sound data recorded in the sound data registration unit 12 and the feature points extracted from the evaluation sound data acquired by the evaluation sound data acquisition unit 13.
- the speaker of the sound input to the second microphone 10 and converted into the evaluation sound data by the first microphone 10 is the sound input to the second microphone 2A, and the matching sound is generated by the second microphone 2A. It is checked whether or not the person is the same person as the speaker of the sound to be converted into data.
- the feature portions (corresponding to the above-mentioned "feature points") of the evaluation sound data and the collation sound data are compared, and the coincidence points and the differences are extracted, and the coincidence points and the differences are obtained. It is better to do it based on the degree of agreement calculated from the ratio. Specifically, when the degree of matching is larger than a preset value, the speaker of the voice converted into the evaluation sound data is the same person as the speaker of the voice converted into the matching sound data. If it is determined that there is, and the degree of matching is less than or equal to the preset value, the speaker of the voice converted into the evaluation sound data must be the same person as the speaker of the voice converted into the matching sound data. It is possible to judge. Of course, it is also possible to carry out by a method different from such a method (for example, a known voiceprint analysis or the like).
- the collation unit 14 since the collation by the collation unit 14 requires arithmetic processing, the power consumption increases when the collation unit 14 is always in the operating state. Therefore, it is preferable that the collation unit 14 is put into an operating state only when collation is performed, and is put into a sleep state when collation is not performed. In such a case, when the collation unit 14 is in the sleep state, for example, it is preferable to end the sleep state by triggering the acquisition of the evaluation sound data by the evaluation sound data acquisition unit 13.
- the acquisition of the evaluation sound data by the evaluation sound data acquisition unit 13 may be indicated by transmitting information indicating that the evaluation sound data has been acquired by the evaluation sound data acquisition unit 13 to the collation unit 14. Then, it may be shown by transmitting the evaluation sound data to the collating unit 14. Further, it is also possible to configure the information indicating that the first microphone 10 has detected the voice (Voice Active Detection) by transmitting it to the collation unit 14 via the evaluation sound data acquisition unit 13.
- the collation unit 14 can be put into the operating state only when the collation unit 14 performs collation, so that the power consumption can be reduced in other states.
- Such a configuration can be realized by setting the operating frequency of the collation unit 14 during sleep to be lower than the operating frequency during operation. Further, in order to realize such different operating frequencies, for example, during sleep, it is operated by a clock (external clock) from the outside of the collation unit 14, and during operation, it is operated by a clock (internal clock) generated by the collation unit 14. It is also possible to operate it.
- the collation unit 14 can be configured to wake up from the sleep state in response to a user's button operation (switch operation).
- the collation result output unit 15 outputs the collation result of the collation unit 14.
- the collation result of the collation unit 14 is a determination result of whether or not the speaker of the voice converted into the evaluation sound data is the same person as the speaker of the voice converted into the collation sound data. That is, it is a determination result of whether or not the speaker of the voice input to the first microphone 10 is the same person as the speaker of the voice input to the second microphone 2A.
- the collation result output unit 15 may output such a determination result to a display device and display it on the display device. Alternatively, such a determination result may be output to a speaker and the speaker may notify the speaker. Further, the determination result may be output to another control device and used for control by the other control device.
- the speaker or display device may be configured to output the identification result of the speaker. Further, the identification result may be output to another control device, which may be used for control.
- the microphone unit 1 according to the second embodiment is different in that the microphone unit 1 according to the first embodiment includes an evaluation unit 20 and a correction unit 21. Other than this point, it is the same as that of the first embodiment, and therefore, mainly different points will be described here.
- FIG. 2 is a block diagram schematically showing the configuration of the microphone unit 1 according to the present embodiment.
- the microphone unit 1 of the present embodiment includes a first microphone 10, a sound data acquisition unit 11, a sound data registration unit 12, an evaluation sound data acquisition unit 13, a collation unit 14, and a collation result output unit.
- Each functional unit of 15, the evaluation unit 20, and the correction unit 21 is provided.
- the evaluation unit 20 and the correction unit 21 also use the CPU as a core member in hardware, software, or both in order to perform the processing related to the above-mentioned determination. Has been built.
- the evaluation unit 20 evaluates the frequency characteristics of the first microphone 10 and the frequency characteristics of the second microphone 2A before inputting voice to both the first microphone 10 and the second microphone 2A. Before the voice input to both the first microphone 10 and the second microphone 2A is before the voice input to both the first microphone 10 and the second microphone 2A is completed.
- the state where the voice is not input to both the first microphone 10 and the second microphone 2A and the state where the voice is not input to both the first microphone 10 and the first microphone 10 A state in which voice is input only to the second microphone and a state in which voice is input only to the second microphone 2A correspond to each other, and the sound is input to at least one of the first microphone 10 and the second microphone 2A. There is no sound input.
- the evaluation unit 20 Since the frequency characteristics of the first microphone 10 and the frequency characteristics of the second microphone 2A are predetermined for each microphone, such frequency characteristics are stored in a storage unit (not shown) and the evaluation unit 20 stores them. It may be acquired, or the evaluation unit 20 may actually energize the first microphone 10 and the second microphone 2A and perform frequency analysis to acquire the frequency. The evaluation unit 20 calculates the difference between the acquired frequency characteristics of the first microphone 10 and the frequency characteristics of the second microphone 2A.
- the correction unit 21 corrects the frequency characteristic of one of the first microphone 10 and the second microphone 2A so as to match the frequency characteristic of the other. As described above, the frequency characteristics of the first microphone 10 and the frequency characteristics of the second microphone 2A are evaluated by the evaluation unit 20. Further, the evaluation by the evaluation unit 20 is performed in a state where no voice is input to at least one of the first microphone 10 and the second microphone 2A.
- One of the first microphone 10 and the second microphone 2A in the state where no sound is input means that no sound is input to both the first microphone 10 and the second microphone 2A.
- the first microphone 10 and the second microphone 2A Of the microphones 2A corresponds to this.
- the other of the first microphone 10 and the second microphone 2A in the state where no sound is input is in a state where no sound is input to both the first microphone 10 and the second microphone 2A.
- the first microphone 10 and Of the second microphone 2A corresponds to the one to which voice is input.
- the correction unit 21 determines the frequency characteristics of one of the first microphone 10 and the second microphone 2A when no sound is input to both the first microphone 10 and the second microphone 2A.
- the first microphone is used.
- the frequency characteristics of the 1st microphone 10 and the 2nd microphone 2A for which no sound is input are corrected so as to match the frequency characteristics of the 1st microphone 10 and the 2nd microphone 2A for which the sound is input. ..
- the first microphone 10 and the second microphone 2A at least the one to which the voice is input later can be matched with the frequency characteristic of the one to be input first, so that the frequency of the microphone can be matched. Collation errors due to differences in characteristics can be reduced.
- the microphone unit 1 according to the third embodiment is different in that the microphone unit 1 according to the first embodiment includes a test sound data acquisition unit 30, a calculation unit 31, and a parameter change unit 40. Other than this point, it is the same as that of the first embodiment, and therefore, mainly different points will be described here.
- FIG. 3 is a block diagram schematically showing the configuration of the microphone unit 1 according to the present embodiment.
- the microphone unit 1 of the present embodiment includes a first microphone 10, a sound data acquisition unit 11, a sound data registration unit 12, an evaluation sound data acquisition unit 13, a collation unit 14, and a collation result output unit. 15.
- Each functional unit of the test sound data acquisition unit 30, the calculation unit 31, and the parameter change unit 40 is provided.
- the test sound data acquisition unit 30, the calculation unit 31, and the parameter change unit 40 also have the CPU as the core in order to perform the processing related to the above-mentioned determination, similarly to the other functional units described in the first embodiment. It is constructed of hardware and / or software as a member.
- the test sound data acquisition unit 30 uses the first microphone 10 to obtain the sound related to the collation sound data after the collation sound data is registered and before the evaluation sound data is acquired from the first microphone 10.
- the voice of the speaker who uttered the above is acquired as test sound data.
- the collation sound data generated by converting the voice input to the second microphone 2A is registered in the sound data registration unit 12.
- the period before the evaluation sound data is acquired is before the evaluation sound data generated by converting the sound input to the first microphone 10 by the evaluation sound data acquisition unit 13 is acquired.
- the voice of the speaker who uttered the voice related to the collation sound data by the first microphone 10 is the same utterer as the utterer who uttered the voice that is the basis of the collation sound data registered in the sound data registration unit 12. This is the voice obtained from the first microphone 10.
- the test sound data acquisition unit 30 registers the verification sound data generated by converting the sound input to the second microphone 2A in the sound data registration unit 12, and then the evaluation sound data acquisition unit 13. Before the evaluation sound data generated by converting the sound input to the first microphone 10 is acquired, the sound that is the basis of the collation sound data registered in the sound data registration unit 12 is emitted. The sound input by the same speaker as the speaker who made the sound is converted from the first microphone 10 into test sound data and acquired.
- the calculation unit 31 calculates the collation rate of the speaker based on the test sound data while changing the collation parameters used for collation based on the test sound data and the collation sound data.
- the calculation unit 31 acquires the test sound data from the test sound data acquisition unit 30, and acquires the collation sound data from the sound data registration unit 12.
- the parameters used for collation are the test sound data and the collation sound data so that the speaker of the voice that is the basis of the test sound data is collated with the speaker of the voice that is the basis of the collation sound data. This is a parameter that corrects at least one of the above.
- the collation parameter corresponds to the amplification factor that amplifies at least one of the test sound data and the collation sound data.
- the amplification factor at this time corresponds to the collation parameter.
- the calculation unit 31 amplifies one or both of the input test sound data and the matching sound data while sequentially changing the amplification factor, and the speaker and the matching sound of the voice that is the basis of the test sound data. It is calculated whether or not the speaker of the voice on which the data is based is collated with the same speaker.
- the calculation unit 31 stores such a calculation result.
- the collation parameter (amplification rate) at the time of the highest collation rate is transmitted to the parameter change unit 40, and the collation parameters are the first microphone 10 and the second microphone. It is set to at least one of 2A.
- the collation unit 14 uses the test sound data and the collation sound to which the collation parameter at the time of the highest collation rate set by the parameter change unit 40 among the collation rates calculated by the calculation unit 31 is applied. Collate based on the data. With such a configuration, it is possible to reduce erroneous collation in which the speaker of the voice based on the evaluation sound data is determined not to be the same speaker even though the speaker is the speaker of the voice based on the verification sound data.
- the parameter changing unit 40 when the parameter changing unit 40 changes the matching parameter described above, the parameter changing unit 40 inputs the voice based on the evaluation sound data to the first microphone 10, and the parameter of the first microphone 10 is changed based on the matching parameter. It is configured to change automatically. As a result, the matching parameters can be easily changed, and the speaker of the voice based on the evaluation sound data is not the same speaker even though the speaker is the speaker of the voice based on the matching sound data. It is possible to perform collation using evaluation sound data that can reduce erroneous collation.
- the microphone unit 1 Collation Processing Next, a specific application example of the microphone unit 1 will be described by taking unlocking of the door shown in FIG. 4 as an example.
- the user 100 inputs a voice uttering a predetermined word to the second microphone 2A provided in a mobile terminal (an example of the device 2) such as a smartphone (# 1).
- a mobile terminal an example of the device 2
- the smartphone # 1
- voice input via a smartphone can be performed by setting up an application on the smartphone in advance.
- the voice input to the second microphone 2A is converted into sound data and transmitted to the microphone unit 1 via the communication function (for example, wireless communication) of the smartphone. It is preferable to use a smartphone application to convert such sound data.
- the sound data acquisition unit 11 of the microphone unit 1 acquires this sound data, and the collation sound data whose feature points are extracted from the sound data by the collation sound data generation unit 3A is used as the collation sound data by the sound data registration unit 12. It will be registered (# 2).
- the user 100 inputs voice to the first microphone 10 of the microphone unit 1 (# 3). At this time, it is preferable to input the voice by uttering the above-mentioned specific word. At this point, it is not necessary for the microphone unit 1 to specify whether or not the voice input person is the user 100.
- the input voice is converted into evaluation sound data by the first microphone 10, and is acquired by the evaluation sound data acquisition unit 13.
- the collation unit 14 collates the feature points extracted from the evaluation sound data with the collation sound data (# 5).
- the collation result output unit 15 causes the lock unit to utter two voices.
- a signal is output to the lock unit indicating that the persons are the same person, that is, the speaker of the voice based on the evaluation sound data is the user 100 who is the speaker of the voice based on the matching sound data.
- the lock is released (# 6).
- voice is input to the first microphone 10 of the microphone unit 1 by the user 150 who is a different person from the user 100 (# 7).
- the input voice is converted into evaluation sound data by the first microphone 10, and is acquired by the evaluation sound data acquisition unit 13.
- the collation unit 14 collates the evaluation sound data with the collation sound data (# 8).
- the collation result output unit 15 causes the lock unit to be the speaker related to the two voices. Is not the same person, that is, a signal indicating that the voice speaker based on the evaluation sound data is not the user 100 who is the voice speaker based on the collation sound data is output to the lock unit. In this case, the lock is not released and the locked state is maintained (# 9). In such a case, a signal indicating that the speaker of the voice based on the evaluation sound data is not the user 100 who is the speaker of the voice based on the verification sound data is output to the notification unit (not shown), and the notification unit outputs the signal. It is also possible to output and notify the sound or light indicating that the speaker is different.
- the input voice word is different from the voice word related to the collation sound data, it is possible to determine that the user 150 related to the current utterance is not the user 100.
- the sleep state is terminated with the acquisition of the evaluation sound data by the evaluation sound data acquisition unit 13 as a trigger.
- the collation unit 14 can also be configured so as not to go to sleep.
- the microphone unit 1 includes a first microphone 10, a sound data acquisition unit 11, a sound data registration unit 12, an evaluation sound data acquisition unit 13, a collation unit 14, a collation result output unit 15, and an evaluation unit. 20.
- the test sound data acquisition unit 30 and the calculation unit 31 may be provided, and further, the test sound data acquisition unit 30 and the calculation unit 31 may be provided.
- the parameter changing unit 40 may be provided.
- the microphone unit 1 is described as having the parameter changing unit 40, but the microphone unit 1 can be configured without the parameter changing unit 40.
- the case where the first microphone 10 is one has been described, but it is also possible to configure a plurality of first microphones 10. In such a case, it is preferable to configure each of the first microphones so that only voice from the desired direction can be input. This makes it possible to recognize only a specific voice and make it easier to match the utterer.
- the frequency characteristics of the first microphone 10 and the second microphone 2A are evaluated, and the correction unit 21 converts the frequency characteristic of one of the first microphone 10 and the second microphone 2A into the other frequency characteristic. It was explained as correcting so that they match.
- the user 100 inputs voice to the first microphone 10, and the microphone unit 1 transfers the voice input to the first microphone 10 to the device 2 by wireless communication.
- the second microphone 2A of the device 2 it is also possible to configure the second microphone 2A of the device 2 to acquire as sound data based on the voice transferred from the microphone unit 1 and register the sound data as collation sound data. With such a configuration, the collation sound data can be directly generated by using the voice input to the first microphone 10, so that the correction for matching the frequency characteristics can be unnecessary.
- the microphone unit 1 is used for unlocking the door lock.
- the door lock of the vehicle the start of the power unit of the vehicle (for example, the engine, the motor, etc.), and the vehicle It can also be used for devices provided in the door (hands-free microphone, box with integrated speaker microphone, voice recognition microphone outside the vehicle, voice recognition microphone inside the vehicle).
- the vehicle can be used for smart speakers, built-in microphones for housing, surveillance cameras, interphones, home appliances (TVs, refrigerators, rice cookers, microwave ovens, etc.), remote controls for baths, and the like. That is, in other words, the microphone unit 1 estimates the utterance content of the voice input to the first microphone 10, and issues an operation command to the device (microphone unit 1) on which the estimated first microphone 10 is mounted. It can be said that.
- first microphone 10 and the second microphone 2A are different from each other has been described, but the first microphone 10 and the second microphone 2A may be the same microphone.
- the user 100 inputs a voice uttering a predetermined word to the second microphone 2A provided in a mobile terminal (an example of the device 2) such as a smartphone (#). 1) ”, the user 100 inputs the voice of uttering a predetermined word to the first microphone 10, and the voice input to the first microphone 10 is transmitted to the second microphone 2A via wireless communication. , It can be configured to generate matching sound data. Further, although it has been described as “the collation unit 14 collates the feature points extracted from the evaluation sound data with the collation sound data (# 5)", the extraction of the feature points from the evaluation sound data is also the second. It is also possible to configure it to be performed by the microphone 2A. In any case, each data or feature point can be configured to be transmitted via wireless communication.
- the present invention can be used for a microphone unit capable of determining whether or not the voice input to the first microphone is the voice of the intended speaker.
- Microphone unit 2 Different device 2A: 2nd microphone 10: 1st microphone 11: Sound data acquisition unit 12: Sound data registration unit 13: Evaluation sound data acquisition unit 14: Collation unit 15: Collation result output unit 20: Evaluation unit 21: Correction unit 30: Test sound data acquisition unit 31: Calculation unit 40: Parameter change unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
本発明に係るマイクユニットは、入力された音声が所期の発声者による音声であるか否かの判定を行うことができるように構成されている。以下、本実施形態のマイクユニット1について説明する。
次に、第2の実施形態について説明する。第2の実施形態に係るマイクユニット1は、上記第1の実施形態に係るマイクユニット1に、評価部20及び補正部21を備えている点が異なる。この点以外については、第1の実施形態と同様であるので、ここでは主に異なる点を中心に説明する。
次に、第3の実施形態について説明する。第3の実施形態に係るマイクユニット1は、上記第1の実施形態に係るマイクユニット1に、試験用音データ取得部30、算定部31、及びパラメータ変更部40を備えている点が異なる。この点以外については、第1の実施形態と同様であるので、ここでは主に異なる点を中心に説明する。
次に、マイクユニット1の具体的な適用例について、図4に示すドアのロック解除を例に挙げて説明する。まず、ユーザ100がスマートフォン等の携帯端末(装置2の一例)に備えられた第2マイクロフォン2Aに、所定の単語を発声した音声を入力する(#1)。このようにスマートフォンを介した音声の入力は、予めスマートフォンにアプリケーションをセットアップしておくことで行うことが可能である。
上記第1の実施形態では、照合部14がスリープ状態である場合に、評価用音データ取得部13による評価用音データの取得をトリガとしてスリープ状態を終了するとして説明したが、照合部14はスリープ状態にならないように構成することも可能である。
2:異なる装置
2A:第2マイクロフォン
10:第1マイクロフォン
11:音データ取得部
12:音データ登録部
13:評価用音データ取得部
14:照合部
15:照合結果出力部
20:評価部
21:補正部
30:試験用音データ取得部
31:算定部
40:パラメータ変更部
Claims (8)
- 第1マイクロフォンに入力された音声が、所期の発声者による音声であるか否かを判定可能なマイクユニットであって、
音声を音データとして取得する音データ取得部と、
前記音データから特徴点を抽出した照合用音データを登録する音データ登録部と、
前記第1マイクロフォンに入力された音声を評価用音データとして取得する評価用音データ取得部と、
前記照合用音データと前記評価用音データから抽出した特徴点とに基づいて、前記評価用音データに基づく音声の発声者が前記照合用音データに基づく音声の発声者であるか否かの照合を行う照合部と、
前記照合部の照合結果を出力する照合結果出力部と、
を備え、
前記照合用音データは、前記第1マイクロフォンが搭載された装置とは異なる装置によって作成され、前記第1マイクロフォンが搭載された装置と前記異なる装置とは無線通信により前記照合用音データの受け渡しが行われるマイクユニット。 - 前記照合部がスリープ状態である場合に、前記評価用音データ取得部による前記評価用音データの取得をトリガとして前記スリープ状態を終了する請求項1に記載のマイクユニット。
- 前記音データ取得部が取得する音データは、前記第1マイクロフォンが搭載された装置とは異なる装置に設けられた第2マイクロフォンに入力された音声であって、
前記第1マイクロフォン及び前記第2マイクロフォンの双方への音声の入力前に、前記第1マイクロフォンの周波数特性及び前記第2マイクロフォンの周波数特性を評価する評価部と、
前記第1マイクロフォン及び前記第2マイクロフォンのうちの一方の周波数特性を他方の周波数特性に一致するように補正する補正部と、を更に備える請求項1又は2に記載のマイクユニット。 - 前記照合用音データが登録された後であって、前記評価用音データが取得される前に、前記第1マイクロフォンで前記照合用音データに係る音声を発した発声者の音声を試験用音データとして取得する試験用音データ取得部と、
前記試験用音データと前記照合用音データとに基づいて、前記照合に用いる照合用パラメータを変更しながら前記試験用音データに基づく前記発声者の照合率を算定する算定部と、を更に備え、
前記照合部は、前記算定部により算定された前記照合率のうち、最も高い照合率である時の前記照合用パラメータに基づいて前記照合を行う請求項1から3のいずれか一項に記載のマイクユニット。 - 前記照合用パラメータは、前記試験用音データ及び前記照合用音データのうちの少なくともいずれか一方を増幅する増幅率である請求項4に記載のマイクユニット。
- 前記第1マイクロフォンに対する前記評価用音データに基づく音声の入力時に、当該第1マイクロフォンのパラメータを前記照合用パラメータに基づいて自動で変更するパラメータ変更部を、更に備える請求項4又は5に記載のマイクユニット。
- 前記照合部の照合結果に基づいて、前記第1マイクロフォンに入力された音声の発声者を識別する請求項1から6のいずれか一項に記載のマイクユニット。
- 前記第1マイクロフォンに入力された音声の発声内容を推定し、推定された内容に基づき前記第1マイクロフォンが搭載された装置に対して操作指令を行う請求項1から7のいずれか一項に記載のマイクユニット。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20840120.8A EP4002356A4 (en) | 2019-07-17 | 2020-06-09 | MIC UNIT |
| JP2021532725A JP7462634B2 (ja) | 2019-07-17 | 2020-06-09 | マイクユニット |
| US17/626,982 US12057127B2 (en) | 2019-07-17 | 2020-06-09 | Microphone unit |
| CN202080051540.6A CN114080641B (zh) | 2019-07-17 | 2020-06-09 | 麦克风单元 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019131930 | 2019-07-17 | ||
| JP2019-131930 | 2019-07-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021010056A1 true WO2021010056A1 (ja) | 2021-01-21 |
Family
ID=74210556
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/022616 Ceased WO2021010056A1 (ja) | 2019-07-17 | 2020-06-09 | マイクユニット |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US12057127B2 (ja) |
| EP (1) | EP4002356A4 (ja) |
| JP (1) | JP7462634B2 (ja) |
| CN (1) | CN114080641B (ja) |
| WO (1) | WO2021010056A1 (ja) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116453516A (zh) * | 2023-03-13 | 2023-07-18 | 思必驰科技股份有限公司 | 设备唤醒方法、电子设备和存储介质 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS63223965A (ja) * | 1987-03-13 | 1988-09-19 | Toshiba Corp | 知的ワ−クステ−シヨン |
| JP2005241215A (ja) * | 2004-02-27 | 2005-09-08 | Mitsubishi Electric Corp | 電気機器、冷蔵庫、冷蔵庫の操作方法 |
| JP2006003451A (ja) * | 2004-06-15 | 2006-01-05 | Brother Ind Ltd | 対象者特定装置,催事動向分析装置及び催事動向分析システム |
| JP2006126558A (ja) * | 2004-10-29 | 2006-05-18 | Asahi Kasei Corp | 音声話者認証システム |
| WO2014112375A1 (ja) * | 2013-01-17 | 2014-07-24 | 日本電気株式会社 | 話者識別装置、話者識別方法、および話者識別用プログラム |
| US20180040323A1 (en) * | 2016-08-03 | 2018-02-08 | Cirrus Logic International Semiconductor Ltd. | Speaker recognition |
| JP2018045190A (ja) | 2016-09-16 | 2018-03-22 | トヨタ自動車株式会社 | 音声対話システムおよび音声対話方法 |
Family Cites Families (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH08223281A (ja) * | 1995-02-10 | 1996-08-30 | Kokusai Electric Co Ltd | 携帯電話機 |
| JP2003066985A (ja) * | 2001-08-22 | 2003-03-05 | Nec Corp | 携帯型通信機器の使用者認証方式 |
| RU2230375C2 (ru) | 2002-09-03 | 2004-06-10 | Общество с ограниченной ответственностью "Центр речевых технологий" | Метод распознавания диктора и устройство для его осуществления |
| JP2004219728A (ja) * | 2003-01-15 | 2004-08-05 | Matsushita Electric Ind Co Ltd | 音声認識装置 |
| JP2007057805A (ja) * | 2005-08-24 | 2007-03-08 | Denso Corp | 車両用情報処理装置 |
| JP2006023773A (ja) * | 2005-08-29 | 2006-01-26 | Toshiba Corp | 音声処理システム |
| US20100097178A1 (en) * | 2008-10-17 | 2010-04-22 | Pisz James T | Vehicle biometric systems and methods |
| CN110096253B (zh) * | 2013-07-11 | 2022-08-30 | 英特尔公司 | 利用相同的音频输入的设备唤醒和说话者验证 |
| US9639854B2 (en) * | 2014-06-26 | 2017-05-02 | Nuance Communications, Inc. | Voice-controlled information exchange platform, such as for providing information to supplement advertising |
| DE102014110054A1 (de) * | 2014-07-17 | 2016-01-21 | Osram Oled Gmbh | Optoelektronische Baugruppe und Verfahren zum Herstellen einer optoelektronischen Baugruppe |
| JP6081966B2 (ja) * | 2014-07-18 | 2017-02-15 | キャンバスマップル株式会社 | 情報検索装置、情報検索プログラム、および情報検索システム |
| US9257120B1 (en) * | 2014-07-18 | 2016-02-09 | Google Inc. | Speaker verification using co-location information |
| CN106034063A (zh) * | 2015-03-13 | 2016-10-19 | 阿里巴巴集团控股有限公司 | 一种在通信软件中通过语音启动业务的方法及相应装置 |
| US9704488B2 (en) * | 2015-03-20 | 2017-07-11 | Microsoft Technology Licensing, Llc | Communicating metadata that identifies a current speaker |
| US11437020B2 (en) * | 2016-02-10 | 2022-09-06 | Cerence Operating Company | Techniques for spatially selective wake-up word recognition and related systems and methods |
| US20190057703A1 (en) * | 2016-02-29 | 2019-02-21 | Faraday&Future Inc. | Voice assistance system for devices of an ecosystem |
| US10635800B2 (en) * | 2016-06-07 | 2020-04-28 | Vocalzoom Systems Ltd. | System, device, and method of voice-based user authentication utilizing a challenge |
| US20180018973A1 (en) * | 2016-07-15 | 2018-01-18 | Google Inc. | Speaker verification |
| CN206224756U (zh) * | 2016-11-30 | 2017-06-06 | 南京小脚印网络科技有限公司 | 声纹识别设备 |
| US10593328B1 (en) * | 2016-12-27 | 2020-03-17 | Amazon Technologies, Inc. | Voice control of remote device |
| JP6531776B2 (ja) * | 2017-04-25 | 2019-06-19 | トヨタ自動車株式会社 | 音声対話システムおよび音声対話方法 |
| US10332517B1 (en) * | 2017-06-02 | 2019-06-25 | Amazon Technologies, Inc. | Privacy mode based on speaker identifier |
| US10504511B2 (en) * | 2017-07-24 | 2019-12-10 | Midea Group Co., Ltd. | Customizable wake-up voice commands |
| US10887764B1 (en) * | 2017-09-25 | 2021-01-05 | Amazon Technologies, Inc. | Audio verification |
| US10861453B1 (en) * | 2018-05-01 | 2020-12-08 | Amazon Technologies, Inc. | Resource scheduling with voice controlled devices |
| CN108926111A (zh) * | 2018-07-23 | 2018-12-04 | 广州维纳斯家居股份有限公司 | 智能升降桌声音控制方法、装置、智能升降桌及存储介质 |
| US10861457B2 (en) * | 2018-10-26 | 2020-12-08 | Ford Global Technologies, Llc | Vehicle digital assistant authentication |
| US11004454B1 (en) * | 2018-11-06 | 2021-05-11 | Amazon Technologies, Inc. | Voice profile updating |
| US11232788B2 (en) * | 2018-12-10 | 2022-01-25 | Amazon Technologies, Inc. | Wakeword detection |
| US11361764B1 (en) * | 2019-01-03 | 2022-06-14 | Amazon Technologies, Inc. | Device naming-indicator generation |
-
2020
- 2020-06-09 WO PCT/JP2020/022616 patent/WO2021010056A1/ja not_active Ceased
- 2020-06-09 CN CN202080051540.6A patent/CN114080641B/zh active Active
- 2020-06-09 JP JP2021532725A patent/JP7462634B2/ja active Active
- 2020-06-09 EP EP20840120.8A patent/EP4002356A4/en not_active Withdrawn
- 2020-06-09 US US17/626,982 patent/US12057127B2/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS63223965A (ja) * | 1987-03-13 | 1988-09-19 | Toshiba Corp | 知的ワ−クステ−シヨン |
| JP2005241215A (ja) * | 2004-02-27 | 2005-09-08 | Mitsubishi Electric Corp | 電気機器、冷蔵庫、冷蔵庫の操作方法 |
| JP2006003451A (ja) * | 2004-06-15 | 2006-01-05 | Brother Ind Ltd | 対象者特定装置,催事動向分析装置及び催事動向分析システム |
| JP2006126558A (ja) * | 2004-10-29 | 2006-05-18 | Asahi Kasei Corp | 音声話者認証システム |
| WO2014112375A1 (ja) * | 2013-01-17 | 2014-07-24 | 日本電気株式会社 | 話者識別装置、話者識別方法、および話者識別用プログラム |
| US20180040323A1 (en) * | 2016-08-03 | 2018-02-08 | Cirrus Logic International Semiconductor Ltd. | Speaker recognition |
| JP2018045190A (ja) | 2016-09-16 | 2018-03-22 | トヨタ自動車株式会社 | 音声対話システムおよび音声対話方法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4002356A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4002356A1 (en) | 2022-05-25 |
| CN114080641B (zh) | 2024-11-01 |
| EP4002356A4 (en) | 2023-05-24 |
| JPWO2021010056A1 (ja) | 2021-01-21 |
| US12057127B2 (en) | 2024-08-06 |
| CN114080641A (zh) | 2022-02-22 |
| US20220415330A1 (en) | 2022-12-29 |
| JP7462634B2 (ja) | 2024-04-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10485049B1 (en) | Wireless device connection handover | |
| US8364486B2 (en) | Speech understanding method and system | |
| US20190318745A1 (en) | Speaker recognition with assessment of audio frame contribution | |
| CN113643707B (zh) | 一种身份验证方法、装置和电子设备 | |
| JP6402748B2 (ja) | 音声対話装置および発話制御方法 | |
| CN102204233A (zh) | 车辆生物测定系统和方法 | |
| CN108463369B (zh) | 车辆信息娱乐和连接性系统 | |
| WO2022199405A1 (zh) | 一种语音控制方法和装置 | |
| US10425746B2 (en) | Method for operating a hearing apparatus, and hearing apparatus | |
| CN110024027A (zh) | 说话人识别 | |
| US11064281B1 (en) | Sending and receiving wireless data | |
| JP6239826B2 (ja) | 話者認識装置、話者認識方法及び話者認識プログラム | |
| KR102374054B1 (ko) | 음성 인식 방법 및 이에 사용되는 장치 | |
| JP2019184809A (ja) | 音声認識装置、音声認識方法 | |
| WO2021010056A1 (ja) | マイクユニット | |
| JP2015055835A (ja) | 話者認識装置、話者認識方法及び話者認識プログラム | |
| JP2006025079A (ja) | ヘッドセット及び無線通信システム | |
| JP2018055155A (ja) | 音声対話装置および音声対話方法 | |
| KR101863098B1 (ko) | 음성 인식 장치 및 방법 | |
| JP3522421B2 (ja) | 話者認識システムおよび話者認識方法 | |
| KR102495028B1 (ko) | 휘파람소리 인식 기능이 구비된 사운드장치 | |
| WO2022233239A1 (zh) | 一种升级方法、装置及电子设备 | |
| WO2021124258A1 (en) | Instruction validation system | |
| JP2003066985A (ja) | 携帯型通信機器の使用者認証方式 | |
| JP7310594B2 (ja) | 車両用通信システム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20840120 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021532725 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020840120 Country of ref document: EP Effective date: 20220217 |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2020840120 Country of ref document: EP |