WO2023088156A1 - 一种声速矫正方法以及装置 - Google Patents
一种声速矫正方法以及装置 Download PDFInfo
- Publication number
- WO2023088156A1 WO2023088156A1 PCT/CN2022/131002 CN2022131002W WO2023088156A1 WO 2023088156 A1 WO2023088156 A1 WO 2023088156A1 CN 2022131002 W CN2022131002 W CN 2022131002W WO 2023088156 A1 WO2023088156 A1 WO 2023088156A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- sound source
- microphone array
- sound velocity
- correction signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01H—MEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
- G01H5/00—Measuring propagation velocity of ultrasonic, sonic or infrasonic waves, e.g. of pressure waves
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
Definitions
- the embodiments of the present application relate to the field of audio processing, and in particular, to a sound velocity correction method and device.
- a microphone array is often used in the conference terminal to pick up the sound of the participants and locate the sound source, so as to locate and track the position of the participants, and realize functions such as sound screen and broadcasting.
- the sound velocity is one of the important parameters in the microphone array sound source localization algorithm.
- the sound velocity depends on the ambient temperature. In the actual conference scene, due to the temperature change, the sound velocity can vary from 337 to 350m/s. Affects the accurate positioning and tracking of participants.
- face recognition is used to correct the speed of sound.
- the angle of the speaker is determined through face recognition, referred to as the face angle.
- the microphone array performs sound source localization based on the preset sound speed to measure the speaker angle, referred to as the sound angle.
- the current actual sound speed can be determined by continuously fine-tuning the sound speed so that the difference between the sound angle and the face angle is within the preset range.
- the real-time performance of sound source localization is poor.
- the present application provides a sound velocity correction method and device, which are used to reduce the calculation amount of sound velocity correction and improve the real-time performance of sound velocity correction.
- the first aspect of the present application provides a sound velocity correction method, the method comprising: receiving a sound velocity correction signal from a correction sound source through a microphone array; determining the target sound velocity, the spatial relative distance between the target sound velocity and the microphone array and the correction sound source, and the sound velocity Correction signal correlation.
- the executive subject of the embodiment of the present application may be a sound source localization device, and a corrected sound source may be provided in the conference venue, and the corrected sound source may output a sound velocity correction signal, and the microphone array included in the sound source localization device may collect the sound source. Sound velocity correction signal.
- the sound velocity correction signal can be used to calculate the time delay between the microphones in the microphone array, and the sound source localization device can obtain the position information of the corrected sound source and the microphone array, and determine the space of the corrected sound source and the microphone array based on the position information relative distance, and then obtain the target speed of sound based on the relationship between time and distance.
- the sound source localization device does not need face recognition, and can determine the target sound velocity at one time, reducing the calculation amount of sound velocity correction and improving the real-time performance of sound velocity correction.
- the sound velocity correction signal is an ultrasonic wave or the first sound, and the frequency of the first sound is outside the preset frequency range.
- the above step of receiving the sound velocity correction signal from the correction sound source through the microphone array includes: through the microphone array Receive sound velocity correction signals from correction sound sources in real time.
- the corrected sound source is an ultrasonic transmitter. Since the human body cannot hear ultrasonic waves, the ultrasonic transmitter can output the sound velocity correction signal in real time without affecting the voice signals of the participants at the meeting site. ;
- the sound velocity correction signal is the first sound
- the corrected sound source can be a loudspeaker at this time, and the preset frequency range is the human voice range, and the sound source localization device only locates the sound in the human voice frequency range when performing sound source localization , so the first sound will not affect the sound source localization.
- the sound source localization device can also receive the sound speed correction signal in real time to update the target sound speed and improve the accuracy of the target sound speed.
- the sound velocity correction signal is the second sound
- the frequency of the second sound is within the preset frequency range
- the above steps of receiving the sound velocity correction signal from the correction sound source through the microphone array include: periodically A sound velocity correction signal is received from a correction sound source.
- the corrected sound source can be a loudspeaker at this time, and the preset frequency range is the frequency range of human vocalization. Because the frequency band of the second sound overlaps with the frequency range of human vocalization, the sound The source location device collects the second sound and the human voice at the same time, which may make the sound source location inaccurate. If someone is speaking at the conference site, the speaker can only periodically send the second sound to the microphone array, and the sound source location device periodically corrects the target sound velocity. , diverge the meeting time, and improve the accuracy of sound source positioning.
- the sound from the speaker during the remote single-talk can also be used as the second sound.
- the sound source location The device does not need to provide the second sound itself.
- the sound velocity correction signal is used to determine the time delay between microphones in the microphone array, the target sound speed is proportional to the relative space distance, and the target sound speed is inversely proportional to the time delay.
- the sound source localization device after the sound source localization device receives the sound velocity correction signal through the microphone array, it can count the time delays of receiving the sound velocity correction signal between different microphones in the microphone array, and then based on the corrected sound source and different microphones.
- the difference in the relative spatial distance determines the target sound velocity. Since the difference in the relative spatial distance between the rectified sound source and different microphones is fixed, the lower the delay, the higher the target sound velocity. This provides a way to determine the target sound velocity, directly Target sound velocity, improve real-time performance.
- the method further includes: acquiring the position of the sound source to be corrected by a camera; and determining a spatial relative distance according to the position of the sound source to be corrected and the position of the microphone array.
- the step of determining the spatial relative distance according to the position of the corrected sound source and the position of the microphone array includes: determining the first coordinate of the corrected sound source in the three-dimensional space coordinate system according to the position of the corrected sound source; The position of the array determines the second coordinates of the microphone array in the three-dimensional space coordinate system; the spatial relative distance is determined according to the geometric relationship between the first coordinates and the second coordinates.
- the sound source localization device determines the spatial relative distance according to the position of the corrected sound source and the position of the microphone array by establishing a three-dimensional space coordinate system, so that the corrected sound source and the microphone array are in the three-dimensional space coordinates There are coordinates in the system, and then based on the geometric operation relationship between the coordinates, the relative distance of the space can be obtained to improve the accuracy of distance calculation.
- the method further includes: performing sound source localization according to a target sound velocity.
- the sound source localization device determines the target sound velocity, it can use the target sound speed to perform sound source localization on the positions of the participants at the meeting site, which reduces the influence of temperature on sound source localization and improves sound source localization. the accuracy.
- the second aspect of the present application provides a sound velocity correction device, which can implement the method in the first aspect or any possible implementation manner of the first aspect.
- the apparatus includes corresponding units or modules for performing the above method.
- the units or modules included in the device can be realized by means of software and/or hardware.
- the device can be, for example, a network device, or a chip, a chip system, or a processor that supports the network device to implement the above method, or a logic module or software that can realize all or part of the functions of the network device.
- the third aspect of the present application provides a computer device, including: a processor, the processor is coupled with a memory, and the memory is used to store instructions.
- the device implements the first aspect or the first A method in any possible implementation of an aspect.
- the apparatus may be, for example, a network device, or may be a chip or a chip system that supports the network device to implement the foregoing method.
- the fourth aspect of the present application provides a computer-readable storage medium, in which instructions are stored in the computer-readable storage medium, and when the instructions are executed, the computer executes the first aspect or any possible implementation of the first aspect method provided.
- the fifth aspect of the present application provides a computer program product.
- the computer program product includes computer program code.
- the computer program code When executed, the computer executes the aforementioned first aspect or any possible implementation manner of the first aspect. Methods.
- FIG. 1 is a schematic structural diagram of a sound source localization device provided by an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a uniform linear microphone array provided in an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a uniform circular microphone array provided by an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of a uniform spherical microphone array provided in an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a three-dimensional uniform linear microphone array provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of a sound velocity correction method provided in an embodiment of the present application.
- FIG. 7 is a schematic diagram of a delay estimation process provided by an embodiment of the present application.
- Fig. 8 is a schematic diagram of sound source localization provided by the embodiment of the present application.
- FIG. 9 is a schematic diagram of another sound source localization provided in the embodiment of the present application.
- FIG. 10 is a schematic diagram of a sound velocity correction process provided by the embodiment of the present application.
- Fig. 11 is a schematic diagram of the position of a rectified sound source provided by the embodiment of the present application.
- FIG. 12 is a schematic diagram of correcting the spatial relative distance between the sound source and the microphone array provided by the embodiment of the present application.
- Fig. 13 is a schematic diagram of the position of another rectified sound source provided by the embodiment of the present application.
- Fig. 14 is another schematic diagram of correcting the spatial relative distance between the sound source and the microphone array provided by the embodiment of the present application.
- Fig. 15 is a schematic diagram of the position of another rectified sound source provided by the embodiment of the present application.
- FIG. 16 is another schematic diagram of correcting the spatial relative distance between the sound source and the microphone array provided by the embodiment of the present application.
- Fig. 17 is a schematic structural diagram of a sound velocity correction device provided in an embodiment of the present application.
- FIG. 18 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- Embodiments of the present application provide a sound velocity correction method and device, which are used to reduce the calculation amount of sound velocity correction and improve the real-time performance of sound velocity correction.
- Microphone array A system composed of a certain number of acoustic sensors (usually microphones) used to sample and process the spatial characteristics of the sound field.
- mean( ⁇ ) means average, and argmax(f(s)) means the value of s when f(s) is the largest.
- the sound source correction method provided in this application can be executed by a sound source localization device, and is applied to various scenarios that require sound pickup, for example, video calls, voice calls, multi-person conferences, recording or video recording, and other scenarios.
- the sound source localization device may include a variety of terminals capable of picking up sound, such as a large-screen conference terminal, a TV, a tablet computer, a head mount display, HMD), augmented reality (augmented reality, AR) equipment, mixed reality (mixed reality, MR) equipment, personal digital assistant (personal digital assistant, PDA), tablet computer, vehicle electronic equipment, laptop computer (laptop computer) , personal computer (personal computer, PC), monitoring equipment, robot, vehicle terminal, wearable device or self-driving vehicle, etc.
- the terminal takes a large-screen conference terminal as an example.
- the structure of a sound source localization device may be shown in FIG. 1 , and the sound source localization device 1 may include a microphone array 11 and a processor 12 .
- the microphone array 11 may include an array composed of multiple microphones for collecting voice signals.
- the structure formed by the plurality of microphones may include a centralized array structure or a distributed array structure. For example, when the sound pressure of the user's voice exceeds the sound source detection threshold, the voice signal is collected through the microphone array.
- Each microphone can form one voice signal, and the multiple voice signals are fused to form the data collected in the current environment.
- the microphone in this application can be an ordinary omnidirectional microphone, and the microphone array formed by multiple microphones according to a certain topology can be in any array form, such as a uniform straight line as shown in Figure 2 formed by 8 ordinary omnidirectional microphones Microphone array, the distance between adjacent microphones is d; as the uniform circular microphone array shown in Figure 3 formed by 8 common omnidirectional microphones, the angle between adjacent microphones and the connection line of the center of circle is o; as A uniform spherical microphone array as shown in Figure 4 composed of 18 common omnidirectional microphones, and a three-dimensional uniform linear microphone array as shown in Figure 5 formed by 10 common omnidirectional microphones, each dimension is adjacent The distance between the microphones is d.
- a uniform linear microphone array composed of a plurality of ordinary omnidirectional microphones is taken as an example.
- the processor 12 can be used to process the data collected by the microphone array, so as to extract the voice data corresponding to the sound source. It can be understood that the steps of the sound source correction method provided in this application can be executed by the processor 12 .
- the sound source localization device may include devices such as Octopus conferencing devices, Internet of Things (Internet of Things, IoT), smart speakers, or smart robots.
- devices such as Octopus conferencing devices, Internet of Things (Internet of Things, IoT), smart speakers, or smart robots.
- FIG. 6 Please refer to FIG. 6 .
- a sound velocity correction method provided by the embodiment of the present application is shown, and the flow of the method is specifically described as follows.
- Step 601. The corrected sound source sends a sound velocity correction signal to the microphone array, and accordingly, the sound source localization device receives the sound velocity correction signal from the corrected sound source through the microphone array.
- a corrected sound source may be set in the meeting place, and the corrected sound source may output a sound velocity correction signal, and the microphone array included in the sound source localization device may collect the sound velocity correction signal.
- the sound velocity correction signal may be any sound wave signal, that is, any sound wave signal may be selected to participate in the sound velocity correction process.
- the microphone array picks up all the sound wave signals within the range of the microphone.
- the pickup distance of the microphone can be determined according to the specific application environment. For example, if the room size is 5 meters long, 10 meters wide, and 4 meters high, the microphone array can be required to process all the sounds in the room. Should be at least 10 meters.
- the microphone array can usually collect sound wave signals whose sound pressure exceeds a certain threshold.
- the sound pressure of speech exceeds a threshold.
- the threshold is collectively referred to as the sound source detection threshold, and sound wave signals that do not exceed the threshold are usually discarded.
- Step 602. The sound source localization device determines the target sound velocity, and the target sound velocity is related to the spatial relative distance between the microphone array and the corrected sound source, and the sound velocity correction signal.
- the sound source localization device after the sound source localization device obtains the sound velocity correction signal through the microphone array, it can first obtain the position information of the corrected sound source and the microphone array, and determine the spatial relative distance between the corrected sound source and the microphone array based on the position information, and The sound velocity correction signal can be used to calculate the time delay between the microphones in the microphone array, so that the target sound velocity can be obtained based on the relationship between time and distance.
- the position of the corrected sound source may be predetermined, that is, the user inputs the position information of the corrected sound source into the corrected sound source after setting the corrected sound source.
- the position of the corrected sound source can also be obtained by the sound source localization device through the camera to collect images and perform image recognition. The distance is not limited here.
- the way for the sound source localization device to determine the spatial relative distance may be to set a three-dimensional space coordinate system in the three-dimensional space within the sound pickup range of the microphone array, and the origin of the three-dimensional space coordinate system may be at any position within the sound pickup range, Exemplarily, in this embodiment, the origin of the three-dimensional space coordinates may be the center position of the microphone array, or the position of any microphone in the microphone array, or other positions.
- the second coordinates of each microphone and the first coordinates of the corrected sound source can be determined according to the position of the microphone array and the corrected sound source in the three-dimensional space coordinate system.
- the geometric relationship between the two coordinates and the first coordinate of the rectified sound source obtains the spatial relative distance between the rectified sound source and the microphone array.
- the sound source localization device After the sound source localization device receives the sound velocity correction signal through the microphone array, it can count the time delay of receiving the sound velocity correction signal between different microphones in the microphone array, and then determine based on the difference in the spatial relative distance between the corrected sound source and different microphones The target sound speed, wherein, since the difference of the spatial relative distance between the rectified sound source and different microphones is fixed, the lower the delay, the higher the target sound speed.
- the target sound velocity estimation can be obtained through the following equations:
- t ij represents the time delay between the i-th microphone and the j-th microphone
- FIG. 7 is a delay estimation flow chart provided by the embodiment of the present application.
- the phases of the received sound velocity correction signals are different;
- Step 702 performing fast Fourier transform (fast fourier transform, FFT) on the sound velocity correction signal of the i-th microphone to obtain signal 1, performing FFT on the sound velocity correction signal of the j-th microphone and Perform conjugation operation to obtain signal 2;
- Step 703 Convolve signal 1 and signal 2 to obtain signal 3;
- Step 704 Perform power spectrum weighting on signal 3 to obtain signal 4;
- Step 705 Perform fast Fourier inverse on signal 4 Transform (inverse fast fourier transform, IFFT) to obtain signal 5;
- step 706 use the time delay corresponding to the peak value of signal 5 as the time delay between the i-th microphone and the j-th microphone.
- FFT fast Fourier transform
- ⁇ ij ( ⁇ ) is a weighting function, and the commonly used value is is the cross-spectrum (signal 3), ⁇ is the delay difference parameter, ⁇ is the angular velocity, is calculated as:
- X i ( ⁇ ) is the time spectrum of the signal received by the i-th microphone (signal 1)
- jth microphone is the time-spectrum conjugate of the signal received by the jth microphone (signal 2).
- the correction sound source may be a speaker or an ultrasonic transmitter, which is not limited here.
- the sound velocity correction signal is ultrasonic. Since the human body cannot hear ultrasonic waves, the ultrasonic transmitter can output the sound velocity correction signal in real time.
- the sound source location device can receive the sound velocity correction signal in real time. Update the target sound velocity, further reduce the influence of temperature on sound source localization, and improve the real-time performance of sound velocity correction.
- step 801 microphone array pickup and sampling
- step 802 adopt high-pass filter to extract sound velocity correction signal, and adopt low-pass filter to extract human voice signal
- step 803 adjust sound velocity Correct the estimated time delay of the signal, and determine the target sound velocity
- step 804 perform sound source localization on the human voice signal based on the target sound velocity.
- the loudspeaker may be controlled by a sound source localization device, or may be controlled by other devices, which is not limited here.
- the sound output by the loudspeaker may be the first sound whose frequency is outside the preset frequency band range, wherein the human voice range is between 100Hz (bass) to 10kHz (soprano), then the preset frequency range may be Set it to 100Hz to 10kHz. If the frequency of the first sound is 18kHz at this time, the frequency band of the first sound and the human voice will not overlap, which will not affect the collection of human voice by the sound source localization equipment. At this time, the sound source localization equipment can also Receive the sound velocity correction signal in real time to update the target sound velocity.
- step 901 microphone array picks up sound and samples;
- step 902 adopts a bandpass filter to extract the sound velocity correction signal (exemplarily, the frequency band collected by the bandpass filter can be 10kHz-20kHz), using a low-pass filter to extract the human voice signal;
- Step 903 Estimate the time delay of the sound velocity correction signal, and determine the target sound velocity;
- Step 904 Perform sound source localization on the human voice signal based on the target sound velocity.
- the sound source localization device can only periodically update the target sound velocity, and compare the time when the speaker outputs the sound speed correction signal with the time of the sound source output signal of the sound source localization at the meeting site Stagger to avoid the impact of sound source localization.
- the speaker of the sound source localization device outputs the voice signal of the far-end single-talk, since the position of the speaker has been confirmed, it can directly communicate with the voice signal of the far-end single-talk and the speaker.
- the target sound velocity is determined at the location of the sound source, which is input as a parameter for subsequent sound source localization, and there is no need to provide a sound velocity correction signal locally at this time.
- the process of correcting the sound velocity may refer to the schematic flow chart of the sound velocity correction shown in FIG. 10 .
- Step 1001 determine whether it is a remote single-speaking at present, if so, execute step 1002, otherwise execute step 1003;
- Step 1002 perform the operation of determining the target sound velocity based on the voice signal output by the current loudspeaker;
- Step 1003 perform sound based on the current target sound velocity source location.
- the location of the corrected sound source there is no limitation on the location of the corrected sound source.
- the corrected sound source when the corrected sound source is controlled by the sound source localization device, if the microphone array is built into the sound source localization device, the corrective sound source may be built into the sound source localization device, please refer to Figure 11, as shown in Figure 11
- Figure 11 A schematic diagram of the location of the rectified sound source provided in the embodiment of the present application.
- the rectified sound source can be at any position in the sound source localization device, such as any position of the dotted circle in Figure 11. Take one of the positions as an example. At this time The spatial relative distance between the corrected sound source and the microphone array is shown in Fig. 12 .
- the rectified sound source can be built into the microphone array, please refer to Figure 13, as shown in Figure 13 is a schematic diagram of the position of another rectified sound source provided by the embodiment of the present application.
- the rectified sound source can be at any position in the microphone array, such as any position of the dotted circle in FIG. 13 .
- the rectified sound source can also be placed outside the microphone array, and the rectified sound source may or may not be connected to the sound source localization equipment, which is not limited here.
- the time-corrected spatial relative distance between the sound source and the microphone array is shown in FIG. 14 . Please refer to Fig. 15. As shown in Fig.
- FIG. 15 it is a schematic diagram of the location of another rectified sound source provided by the embodiment of the present application.
- the rectified sound source can be set at any position outside the sound source localization equipment.
- Built-in sound source localization equipment or external sound source localization equipment, taking one of the positions as an example, the spatial relative distance between the corrected sound source and the microphone array is shown in Figure 16.
- the sound source localization equipment determines the target sound velocity, it can use the target sound speed to locate the position of the participants at the conference site, which reduces the influence of temperature on the sound source localization and improves the accuracy of the sound source localization.
- the corrected sound source sends a sound velocity correction signal to the microphone array
- the sound source localization device receives the sound velocity correction signal through the microphone array, and determines the target sound velocity according to the spatial relative position of the corrected sound source and the microphone array, as well as the sound velocity correction signal, and the sound source localization
- the device does not need face recognition, and can determine the target sound velocity at one time, reducing the calculation amount of sound velocity correction and improving the real-time performance of sound velocity correction.
- the device 170 includes:
- a receiving unit 1701 configured to receive a sound velocity correction signal from a correction sound source through a microphone array
- the determining unit 1702 is configured to determine a target sound velocity, the target sound velocity is related to the spatial relative distance between the microphone array and the corrected sound source, and the sound velocity correction signal.
- the sound velocity correction signal is an ultrasonic wave or the first sound
- the frequency of the first sound is outside the preset frequency range
- the receiving unit 1701 is specifically configured to: receive the sound velocity correction signal from the correction sound source in real time through the microphone array.
- the sound velocity correction signal is the second sound
- the frequency of the second sound is within the preset frequency range
- the communication receiving unit 1701 is specifically configured to periodically receive the sound velocity correction signal from the correction sound source through the microphone array.
- the sound velocity correction signal is used to determine the time delay between the microphones in the microphone array, the target sound speed is proportional to the relative space distance, and the target sound speed is inversely proportional to the time delay.
- the device 170 further includes: an acquisition unit 1703, which is specifically configured to: obtain the position of the rectified sound source through a camera; the determination unit 1702 is also configured to: determine the spatial relative distance.
- an acquisition unit 1703 which is specifically configured to: obtain the position of the rectified sound source through a camera
- the determination unit 1702 is also configured to: determine the spatial relative distance.
- the determining unit 1702 is specifically configured to: determine the first coordinate of the corrected sound source in the three-dimensional space coordinate system according to the position of the corrected sound source; determine the second coordinate of the microphone array in the three-dimensional space coordinate system according to the position of the microphone array ; Determine the spatial relative distance according to the geometric relationship between the first coordinate and the second coordinate.
- the device 170 further includes a positioning unit 1704, and the positioning unit 1704 is specifically configured to: perform sound source localization according to a target sound velocity.
- the receiving unit 1701 of the apparatus 170 is configured to execute step 601 in FIG. 6
- the determining unit 1702 of the apparatus 170 is configured to execute step 602 in FIG. 6 , which will not be repeated here.
- FIG. 18 is a schematic diagram of a possible logical structure of a computer device 180 provided by an embodiment of the present application.
- the computer device 180 includes: a processor 1801 , a communication interface 1802 , a storage system 1803 and a bus 1804 .
- the processor 1801 , the communication interface 1802 and the storage system 1803 are connected to each other through a bus 1804 .
- the processor 1801 is used to control and manage the actions of the computer device 180, for example, the processor 1801 is used to execute the steps performed by the sound source localization device in the method embodiment in FIG. 6 .
- the communication interface 1802 is used to support the computer device 180 in communicating.
- the storage system 1803 is used for storing program codes and data of the computer device 180 .
- the processor 1801 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
- the processor 1801 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the bus 1804 can be a PCI bus or an EISA bus, etc.
- the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 18 , but it does not mean that there is only one bus or one type of bus.
- the receiving unit 1701 in the device 170 is equivalent to the communication interface 1802 in the computer device 180
- the determining unit 1702 , obtaining unit 1703 and positioning unit 1704 in the device 170 are equivalent to the processor 1801 in the computer device 180 .
- the computer device 180 in this embodiment may correspond to the sound source localization device in the above-mentioned method embodiment in FIG. And/or various steps implemented, for the sake of brevity, details are not repeated here.
- each unit in the device can be implemented in the form of software called by the processing element; they can also be implemented in the form of hardware; some units can also be implemented in the form of software called by the processing element, and some units can be implemented in the form of hardware.
- each unit can be a separate processing element, or it can be integrated in a certain chip of the device.
- it can also be stored in the memory in the form of a program, which is called and executed by a certain processing element of the device. Function.
- all or part of these units can be integrated together, or implemented independently.
- the processing element mentioned here may also be a processor, which may be an integrated circuit with signal processing capabilities.
- each step of the above method or each unit above may be implemented by an integrated logic circuit of hardware in the processor element or implemented in the form of software called by the processing element.
- the units in any of the above devices may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (application specific integrated circuit, ASIC), or, one or Multiple microprocessors (digital signal processor, DSP), or, one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms.
- ASIC application specific integrated circuit
- DSP digital signal processor
- FPGA field programmable gate array
- the units in the device can be implemented in the form of a processing element scheduler
- the processing element can be a general-purpose processor, such as a central processing unit (central processing unit, CPU) or other processors that can call programs.
- CPU central processing unit
- these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).
- SOC system-on-a-chip
- a computer-readable storage medium is also provided.
- Computer-executable instructions are stored in the computer-readable storage medium.
- the processor of the device executes the computer-executable instructions
- the device executes the above-mentioned method embodiment.
- a computer program product in another embodiment of the present application, includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium.
- the processor of the device executes the computer-executed instructions, the device executes the method performed by the sound source localization device in the foregoing method embodiments.
- the disclosed system, device and method can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
一种声速矫正方法以及装置,方法包括:矫正声源向麦克风阵列(11)发送声速矫正信号(601),声源定位设备(1)通过麦克风阵列(11)接收声速矫正信号,并根据矫正声源与麦克风阵列(11)的空间相对位置,以及声速矫正信号确定目标声速(602)。声源定位设备(1)不需要进行人脸识别,并且可以一次性确定目标声速,减少声速矫正的计算量以及提高声速矫正的实时性。
Description
本申请要求于2021年11月22日提交中国专利局、申请号为202111382634.4、发明名称为“一种矫正声速的方法和会议系统”的中国专利申请的优先权,以及要求于2021年12月31日提交中国专利局、申请号为CN202111672798.0、发明名称为“一种声速矫正方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请实施例涉及音频处理领域,尤其涉及一种声速矫正方法以及装置。
会议终端中常会用到麦克风阵列对与会人进行拾音和声源定位,从而对与会人的位置进行定位与跟踪,实现音幕,导播等功能。其中,声速是麦克风阵列声源定位算法中重要参数之一,声速的大小依赖于环境温度的高低,在实际会议场景中,由于温度的变化,声速变化范围可在337-350m/s之间,影响对与会人定位的准确定位与跟踪。
当前采用人脸识别对声速进行矫正的方式,通过人脸识别确定说话人角度,简称人脸角度,麦克风阵列基于预设声速进行声源定位测量出说话人角度,简称声音角度,以人脸角度为基准,通过对声速的不断微调,使得声音角度与人脸角度的差值在预设范围内,则可以确定当前实际的声速。但是,由于人脸识别使用的图像识别计算量大,而且需要不断调节声速导致声源定位的实时性差。
发明内容
本申请提供了一种声速矫正方法以及装置,用于减少声速矫正的计算量以及提高声速矫正的实时性。
本申请第一方面提供了一种声速矫正方法,该方法包括:通过麦克风阵列接收来自矫正声源的声速矫正信号;确定目标声速,目标声速与麦克风阵列和矫正声源的空间相对距离,及声速矫正信号相关。
上述方面中,本申请实施例的执行主体可以为声源定位设备,会议会场可以设置有一个矫正声源,该矫正声源可以输出声速矫正信号,声源定位设备中包括的麦克风阵列可以采集该声速矫正信号。其中,声速矫正信号可以用来计算麦克风阵列中麦克风之间的时延,声源定位设备可以再获得矫正声源和麦克风阵列的位置信息,并基于该位置信息确定矫正声源和麦克风阵列的空间相对距离,然后基于时间和距离的关系获得目标声速。声源定位设备不需要进行人脸识别,且可以一次性确定目标声速,减少声速矫正的计算量以及提高声速矫正的实时性。
在一个可能的实施方式中,声速矫正信号为超声波或第一声音,第一声音的频率在预设频段范围之外,上述步骤通过麦克风阵列接收来自矫正声源的声速矫正信号包括:通过麦克风阵列实时接收来自矫正声源的声速矫正信号。
上述可能的实施方式中,声速矫正信号为超声波时,矫正声源为超声波发射器,由于 人体无法听到超声波,因此,超声波发射器可以实时输出声速矫正信号,不影响会议现场与会人的语音信号;声速矫正信号为第一声音时,此时矫正声源可以是扬声器,预设频段范围为人的发声范围,声源定位设备在进行声源定位的时候只针对人的发声频段范围的声音进行定位,因此第一声音不会影响声源定位,此时声源定位设备同样可以实时接收声速矫正信号更新目标声速,提高目标声速的准确度。
在一个可能的实施方式中,声速矫正信号为第二声音,第二声音的频率在预设频段范围之内,上述步骤通过麦克风阵列接收来自矫正声源的声速矫正信号包括:通过麦克风阵列周期性接收来自矫正声源的声速矫正信号。
上述可能的实施方式中,声速矫正信号为第二声音时,此时矫正声源可以是扬声器,预设频段范围为人的发声频段范围,因第二声音的频段与人的发声频段范围重叠,声源定位设备同一时间采集第二声音和人声容易使得声源定位不准确,若会议现场有人发声,此时扬声器只能周期性发送第二声音给麦克风阵列,声源定位设备周期性矫正目标声速,岔开会议时间,提高声源定位的准确度。当会议现场外的人通过会议现场内的扬声器发声(远端单讲),会议现场内无人讲话时,该远端单讲时扬声器发出的声音也可以作为第二声音,此时声源定位设备无需自行提供第二声音。
在一个可能的实施方式中,声速矫正信号用于确定麦克风阵列中麦克风之间的时延,目标声速与空间相对距离成正比关系,目标声速与时延成反比关系。
上述可能的实施方式中,声源定位设备通过麦克风阵列接收到声速矫正信号后,可以统计麦克风阵列中不同麦克风之间收到声速矫正信号的时延,然后基于矫正声源和不同麦克风之间的空间相对距离的差值确定目标声速,由于矫正声源和不同麦克风之间的空间相对距离的差值固定,则时延越低,目标声速越高,提供一种确定目标声速的方式,直接确定目标声速,提高实时性。
在一个可能的实施方式中,该方法还包括:通过摄像头获取矫正声源的位置;根据矫正声源的位置和麦克风阵列的位置确定空间相对距离。
上述可能的实施方式中,矫正声源的位置除了可以是预先输入的,还可以是声源定位设备通过摄像头采集图像并进行图像识别获得的,通过图像中矫正声源的位置与预先输入的麦克风阵列的位置的几何关系获得空间相对距离,提高方案的灵活性。
在一个可能的实施方式中,上述步骤根据矫正声源的位置和麦克风阵列的位置确定空间相对距离包括:根据矫正声源的位置确定矫正声源在三维空间坐标系中的第一坐标;根据麦克风阵列的位置确定麦克风阵列在三维空间坐标系中的第二坐标;根据第一坐标和第二坐标的几何关系确定空间相对距离。
上述可能的实施方式中,声源定位设备根据矫正声源的位置和麦克风阵列的位置确定空间相对距离的方式可以是通过建立一个三维空间坐标系,使得矫正声源和麦克风阵列在该三维空间坐标系中都有坐标,然后基于坐标与坐标之间的几何运算关系即可获得该空间相对距离,提高距离计算的准确度。
在一个可能的实施方式中,该方法还包括:根据目标声速进行声源定位。
上述可能的实施方式中,声源定位设备在确定目标声速后,可以采用该目标声速对会 议现场的与会人的位置进行声源定位,减少了温度对声源定位的影响,提高了声源定位的准确度。
本申请第二方面提供了一种声速矫正装置,可以实现上述第一方面或第一方面中任一种可能的实施方式中的方法。该装置包括用于执行上述方法的相应的单元或模块。该装置包括的单元或模块可以通过软件和/或硬件方式实现。该装置例如可以为网络设备,也可以为支持网络设备实现上述方法的芯片、芯片系统、或处理器等,还可以为能实现全部或部分网络设备功能的逻辑模块或软件。
本申请第三方面提供了一种计算机设备,包括:处理器,该处理器与存储器耦合,该存储器用于存储指令,当指令被处理器执行时,使得该装置实现上述第一方面或第一方面中任一种可能的实施方式中的方法。该装置例如可以为网络设备,也可以为支持网络设备实现上述方法的芯片或芯片系统等。
本申请第四方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有指令,当该指令被执行时,使得计算机执行前述第一方面或第一方面任一种可能的实施方式提供的方法。
本申请第五方面提供了一种计算机程序产品,计算机程序产品中包括计算机程序代码,当该计算机程序代码被执行时,使得计算机执行前述第一方面或第一方面任一种可能的实施方式提供的方法。
图1为本申请实施例提供的一种声源定位装置的结构示意图;
图2为本申请实施例提供的一种均匀直线麦克风阵列的结构示意图;
图3为本申请实施例提供的一种均匀圆形麦克风阵列的结构示意图;
图4为本申请实施例提供的一种均匀球面形麦克风阵列的结构示意图;
图5为本申请实施例提供的一种三维均匀直线麦克风阵列的结构示意图;
图6为本申请实施例提供的一种声速矫正方法的流程示意图;
图7为本申请实施例提供的一种时延估计流程示意图;
图8为本申请实施例提供的一种声源定位示意图;
图9为本申请实施例提供的另一种声源定位示意图;
图10为本申请实施例提供的一种声速矫正流程示意图;
图11为本申请实施例提供的一种矫正声源的位置示意图;
图12为本申请实施例提供的一种矫正声源与麦克风阵列的空间相对距离的示意图;
图13为本申请实施例提供的另一种矫正声源的位置示意图;
图14为本申请实施例提供的另一种矫正声源与麦克风阵列的空间相对距离的示意图;
图15为本申请实施例提供的另一种矫正声源的位置示意图;
图16为本申请实施例提供的另一种矫正声源与麦克风阵列的空间相对距离的示意图;
图17为本申请实施例提供的一种声速矫正装置的结构示意图;
图18为本申请实施例提供的一种计算机设备的结构示意图。
本申请实施例提供了一种声速矫正方法以及装置,用于减少声速矫正的计算量以及提高声速矫正的实时性。
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
首先对本申请实施例提供的一些概念做解释说明。
麦克风阵列(microphone array):一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样并处理的系统。
远端单讲:来自远端的发音,由本端的扬声器输出语音信号。
mean(·)表示求平均,argmax(f(s))表示当f(s)最大时s的值。
本申请提供的声源矫正方法可以由声源定位设备执行,应用于各种需要进行拾音的场景,例如,视频通话、语音通话、多人会议、录音或者录视频等场景。
首先介绍本申请提供的声源定位设备,该声源定位设备可以包括多种可以进行拾音的终端,该终端可以包括大屏会议终端、电视、平板电脑、头戴显示设备(head mount display,HMD)、增强现实(augmented reality,AR)设备,混合现实(mixed reality,MR)设备、个人数字助理(personal digital assistant,PDA)、平板型电脑、车载电子设备、膝上型电脑(laptop computer)、个人电脑(personal computer,PC)、监控设备、机器人、车载终端、穿戴设备或者自动驾驶车辆等。当然,在以下实施例中,该终端以大屏会议终端为例。
示例性地,声源定位装置(或者也可以称为拾音装置)的结构可以如图1所示,该声源定位设备1可以包括麦克风阵列11和处理器12。
麦克风阵列11可以包括多个麦克风组成的阵列,用于采集语音信号。该多个麦克风组成的结构可以包括集中式阵列结构,也可以包括分布式阵列结构。例如,当用户发出的语 音声压超过音源检测阈值,则通过麦克风阵列来采集语音信号,每个麦克风可以形成一路语音信号,多路语音信号融合后形成当前环境下采集到的数据。
本申请中的麦克风可以为普通的全向麦克风,且多个麦克风按照一定的拓扑结构组成的麦克风阵列可以为任意阵列形式,如8个普通全向麦克风组成的如附图2所示的均匀直线麦克风阵列,相邻麦克风之间的距离为d;如8个普通全向麦克风组成的如附图3所示的均匀圆形麦克风阵列,相邻麦克风与圆心的连线的夹角为o;如18个普通全向麦克风组成的如附图4所示的均匀球面形麦克风阵列,再如10个普通全向麦克风组成的如附图5所示的三维均匀直线麦克风阵列,每一维度上相邻麦克风之间的距离为d,本实施例中以多个普通全向麦克风组成的均匀直线麦克风阵列为例。
处理器12可以用于对麦克风阵列采集到的数据进行处理,从而提取出声源对应的语音数据。可以理解为,可以通过该处理器12执行本申请提供的声源矫正方法的步骤。
可选地,该声源定位设备可以包括八爪鱼会议设备、物联网(internet of things,IoT)、智能音箱或智能机器人等设备。
请参阅图6,如图6所示为本申请实施例提供的一种声速矫正方法,该方法的流程具体如下所述。
步骤601.矫正声源向麦克风阵列发送声速矫正信号,相应的,声源定位设备通过麦克风阵列接收来自矫正声源的声速矫正信号。
本实施例中,可以在会议会场设置一个矫正声源,该矫正声源可以输出声速矫正信号,声源定位设备中包括的麦克风阵列可以采集该声速矫正信号。其中,该声速矫正信号可以为任意的声波信号,即可以选取任意的声波信号用来参与声速矫正流程。
具体的,麦克风阵列在处于麦克风拾取范围内的所有声波信号。麦克风的拾音距离可根据具体的应用环境来确定,如房间大小为长5米、宽10米、高4米,则可以要求麦克风阵列对该房间内的所有声音进行处理,麦克风的拾音距离应至少为10米。
通常,麦克风阵列通常可以采集声压超过一定阈值的声波信号,如语音的声压超过阈值,以下将该阈值统称为音源检测阈值,未超过阈值的声波信号通常丢弃。通常,音源检测阈值越高,拾音灵敏度越低,声源检测阈值越低,拾音灵敏度越高。
步骤602.声源定位设备确定目标声速,目标声速与麦克风阵列和矫正声源的空间相对距离,及声速矫正信号相关。
本实施例中,声源定位设备通过麦克风阵列获得该声速矫正信号后,可以先获得矫正声源和麦克风阵列的位置信息,并基于该位置信息确定矫正声源和麦克风阵列的空间相对距离,而声速矫正信号可以用来计算麦克风阵列中麦克风之间的时延,因此,基于时间和距离的关系即可获得目标声速。
矫正声源的位置可以是预先确定的,即用户设置好矫正声源后将矫正声源的位置信息输入到矫正声源中。该矫正声源的位置还可以是声源定位设备通过摄像头采集图像并进行图像识别获得的,声源定位设备通过图像中矫正声源的位置与预先输入的麦克风阵列的位 置的几何关系获得空间相对距离,此处不作限定。
具体的,声源定位设备确定空间相对距离的方式可以是在麦克风阵列的拾音范围内的三维空间中设置三维空间坐标系,该三维空间坐标系的原点可以是在拾音范围的任意位置,示例性的,本实施例中,三维空间坐标的原点可以为麦克风阵列的中心位置,也可以为麦克风阵列中的任意一个麦克风的位置,或其他位置。在确定三维空间坐标系后,可以根据麦克风阵列和矫正声源在三维空间坐标系中的位置确定各麦克风的第二坐标和矫正声源的第一坐标,相应的,即可基于各麦克风的第二坐标和矫正声源的第一坐标之间的几何关系获得矫正声源和麦克风阵列的空间相对距离。
声源定位设备通过麦克风阵列接收到声速矫正信号后,可以统计麦克风阵列中不同麦克风之间收到声速矫正信号的时延,然后基于矫正声源和不同麦克风之间的空间相对距离的差值确定目标声速,其中,由于矫正声源和不同麦克风之间的空间相对距离的差值固定,则时延越低,目标声速越高。
具体的,目标声速估计可通过以下方程组得出:
其中,t
ij代表第i个麦克风与第j个麦克风之间的时延,
代表第i个麦克风与第j个麦克风之间估计出来的声速,
代表总的声速估计,t
ij的数量越多,则
越准确。其中,对于直线型的麦克风阵列,除了第1个麦克风的其他麦克风之间的时延都可以由不同j值的t
1j的差值获得,例如t
23=t
13-t
12,因此直线型的麦克风阵列可以只计算
的声速。
其中,t
ij的估计可以采用广义互相关函数(Generalized Cross Correlation,GCC)方法。具体的流程图可以参照图7所示,图7为本申请实施例提供的时延估计流程图,该流程如下:步骤701:第i个麦克风和第j个麦克风接收声速矫正信号,不同麦克风收到的声速矫正信号的相位不同;步骤702:对第i个麦克风的声速矫正信号进行快速傅里叶变换(fast fourier transform,FFT)获得信号1,对第j个麦克风的声速矫正信号进行FFT并共进行轭运算获得信号2;步骤703:将信号1和信号2进行卷积获得信号3;步骤704:对信号3进行功率谱加权获得信号4;步骤705:对信号4进行快速傅里叶逆变换(inverse fast fourier transform,IFFT)获得信号5;步骤706:将信号5的峰值对应的时延作为第i个麦克风与第j个麦克风之间的时延。
具体的,GCC方法的计算公式为:
本申请实施例中,该矫正声源可以是扬声器或者超声波发射器,此处不作限定。当矫正声源为超声波发射器时,此时声速矫正信号为超声波,由于人体无法听到超声波,因此,超声波发射器可以实时输出声速矫正信号,相应的,声源定位设备可以实时接收声速矫正信号更新目标声速,进一步减少温度对声源定位的影响,提高声速矫正的实时性。例如图8所示的一种声源定位示意图,步骤801:麦克风阵列拾音和采样;步骤802:采用高通滤波器提取声速矫正信号,采用低通滤波器提取人声信号;步骤803:对声速矫正信号估计时延,并确定目标声速;步骤804:基于目标声速对人声信号进行声源定位。
当矫正声源为扬声器时,该扬声器可以是由声源定位设备控制的,也可以是其他设备控制的,此处不作限定。其中,该扬声器输出的声音可以是频率在预设频段范围之外的第一声音,其中,人的发声范围是在100Hz(男低音)到10kHz(女高音)之间,则预设频段范围可以设置为100Hz至10kHz,假如此时第一声音的频率为18kHz,则该第一声音与人声的频段不重叠,不影响声源定位设备对人声的采集,此时声源定位设备同样可以实时接收声速矫正信号更新目标声速。例如图9所示的另一种声源定位示意图,步骤901:麦克风阵列拾音和采样;步骤902:采用带通滤波器提取声速矫正信号(示例性的,带通滤波器采集的频段可以是10kHz-20kHz),采用低通滤波器提取人声信号;步骤903:对声速矫正信号估计时延,并确定目标声速;步骤904:基于目标声速对人声信号进行声源定位。
当该扬声器输出的声音频率在该预设频段范围内时,声源定位设备只能周期性更新目标声速,将扬声器输出声速矫正信号的时间与会议现场的声源定位的声源输出信号的时间错开,避免声源定位受到影响。
正常会议中往往存在远端单讲的情况,当声源定位设备的扬声器输出远端单讲的语音信号时,由于该扬声器的位置已确认,因此可以直接对远端单讲的语音信号和扬声器的位置确定目标声速,作为后续声源定位的参数输入,此时无需由本地再去提供声速矫正信号。示例性的,在不配置声速矫正信号的情况下,矫正声速的流程可以参照图10所示的声速矫正流程示意图。步骤1001:确定当前是否为远程单讲,若是则执行步骤1002,若否则执行步骤1003;步骤1002:基于当前扬声器输出的语音信号执行确定目标声速的操作;步骤1003:基于当前的目标声速进行声源定位。
本申请实施例中对矫正声源的位置不作限定。具体的,当矫正声源由声源定位设备控制时,如果麦克风阵列内置于声源定位设备,该矫正声源可以是内置于该声源定位设备,请参阅图11,如图11所示为本申请实施例提供的一种矫正声源的位置示意图,该矫正声 源可以在声源定位设备中的任意位置,如图11中虚线圆圈的任意一个位置,以其中一个位置为例,此时矫正声源与麦克风阵列的空间相对距离如图12所示。当麦克风阵列外置于声源定位设备时,该矫正声源可以内置于麦克风阵列,请参阅图13,如图13所示为本申请实施例提供的另一种矫正声源的位置示意图,该矫正声源可以是在麦克风阵列中的任意位置,如图13中虚线圆圈的任意位置。在另一种方式中,该矫正声源还可以外置于麦克风阵列,该矫正声源可以与声源定位设备相接也可以不相接,此处不作限定,以其中一个位置为例,此时矫正声源与麦克风阵列的空间相对距离如图14所示。请参阅图15,如图15所示为本申请实施例提供的另一种矫正声源的位置示意图,该矫正声源可以设置于声源定位设备之外的任意位置,此时麦克风阵列不限于内置于声源定位设备或者外置与声源定位设备,以其中一个位置为例,此时矫正声源与麦克风阵列的空间相对距离如图16所示。
声源定位设备在确定目标声速后,可以采用该目标声速对会议现场的与会人的位置进行声源定位,减少了温度对声源定位的影响,提高了声源定位的准确度。
矫正声源向麦克风阵列发送声速矫正信号,声源定位设备通过麦克风阵列接收该声速矫正信号,并根据该矫正声源与麦克风阵列的空间相对位置,以及该声速矫正信号确定目标声速,声源定位设备不需要进行人脸识别,并且可以一次性确定目标声速,减少声速矫正的计算量以及提高声速矫正的实时性。
上面讲述了声速矫正方法,下面对执行该方法的装置进行描述。
请参阅图17,如图17所示为本申请实施例提供的一种声速矫正装置,该装置170包括:
接收单元1701,用于通过麦克风阵列接收来自矫正声源的声速矫正信号;
确定单元1702,用于确定目标声速,目标声速与麦克风阵列和矫正声源的空间相对距离,及声速矫正信号相关。
可选的,声速矫正信号为超声波或第一声音,第一声音的频率在预设频段范围之外,接收单元1701具体用于:通过麦克风阵列实时接收来自矫正声源的声速矫正信号。
可选的,声速矫正信号为第二声音,第二声音的频率在预设频段范围之内,通接收单元1701具体用于:通过麦克风阵列周期性接收来自矫正声源的声速矫正信号。
可选的,声速矫正信号用于确定麦克风阵列中麦克风之间的时延,目标声速与空间相对距离成正比关系,目标声速与时延成反比关系。
可选的,装置170还包括:获取单元1703,获取单元1703具体用于:通过摄像头获取矫正声源的位置;确定单元1702还用于:根据矫正声源的位置和麦克风阵列的位置确定空间相对距离。
可选的,确定单元1702具体用于:根据矫正声源的位置确定矫正声源在三维空间坐标系中的第一坐标;根据麦克风阵列的位置确定麦克风阵列在三维空间坐标系中的第二坐标;根据第一坐标和第二坐标的几何关系确定空间相对距离。
可选的,装置170还包括定位单元1704,定位单元1704具体用于:根据目标声速进行声源定位。
装置170的接收单元1701用于执行图6中的步骤601,装置170的确定单元1702用于执行图6中的步骤602,此处不再赘述。
图18所示,为本申请的实施例提供的计算机设备180的一种可能的逻辑结构示意图。计算机设备180包括:处理器1801、通信接口1802、存储系统1803以及总线1804。处理器1801、通信接口1802以及存储系统1803通过总线1804相互连接。在本申请的实施例中,处理器1801用于对计算机设备180的动作进行控制管理,例如,处理器1801用于执行图6的方法实施例中声源定位设备所执行的步骤。通信接口1802用于支持计算机设备180进行通信。存储系统1803,用于存储计算机设备180的程序代码和数据。
其中,处理器1801可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器1801也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线1804可以是PCI总线或EISA总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图18中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
装置170中的接收单元1701相当于计算机设备180中的通信接口1802,装置170中的确定单元1702、获取单元1703和定位单元1704相当于计算机设备180中的处理器1801。
本实施例的计算机设备180可对应于上述图6方法实施例中的声源定位设备,该计算机设备180中的通信接口1802可以实现上述图6方法实施例中的声源定位设备所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。
应理解以上装置中单元的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且装置中的单元可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分单元以软件通过处理元件调用的形式实现,部分单元以硬件的形式实现。例如,各个单元可以为单独设立的处理元件,也可以集成在装置的某一个芯片中实现,此外,也可以以程序的形式存储于存储器中,由装置的某一个处理元件调用并执行该单元的功能。此外这些单元全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件又可以成为处理器,可以是一种具有信号的处理能力的集成电路。在实现过程中,上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路实现或者以软件通过处理元件调用的形式实现。
在一个例子中,以上任一装置中的单元可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA),或这些集成电路形式中至少两种的组合。再如,当装置中的单元可以通过处理元件调度程序的形式实现时,该处 理元件可以是通用处理器,例如中央处理器(central processing unit,CPU)或其它可以调用程序的处理器。再如,这些单元可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。
在本申请的另一个实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述方法实施例中声源定位设备所执行的方法。
在本申请的另一个实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中。当设备的处理器执行该计算机执行指令时,设备执行上述方法实施例中声源定位设备所执行的方法。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
Claims (17)
- 一种声速矫正方法,其特征在于,包括:通过麦克风阵列接收来自矫正声源的声速矫正信号;确定目标声速,所述目标声速与所述麦克风阵列和所述矫正声源的空间相对距离,及所述声速矫正信号相关。
- 根据权利要求1所述的方法,其特征在于,所述声速矫正信号为超声波或第一声音,所述第一声音的频率在预设频段范围之外,所述通过麦克风阵列接收来自矫正声源的声速矫正信号包括:通过麦克风阵列实时接收来自矫正声源的声速矫正信号。
- 根据权利要求1所述的方法,其特征在于,所述声速矫正信号为第二声音,所述第二声音的频率在预设频段范围之内,所述通过麦克风阵列接收来自矫正声源的声速矫正信号包括:通过麦克风阵列周期性接收来自矫正声源的声速矫正信号。
- 根据权利要求1-3所述的方法,其特征在于,所述声速矫正信号用于确定所述麦克风阵列中麦克风之间的时延,所述目标声速与所述空间相对距离成正比关系,所述目标声速与所述时延成反比关系。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:通过摄像头获取所述矫正声源的位置;根据所述矫正声源的位置和所述麦克风阵列的位置确定所述空间相对距离。
- 根据权利要求5所述的方法,其特征在于,所述根据所述矫正声源的位置和所述麦克风阵列的位置确定所述空间相对距离包括:根据所述矫正声源的位置确定所述矫正声源在三维空间坐标系中的第一坐标;根据所述麦克风阵列的位置确定所述麦克风阵列在所述三维空间坐标系中的第二坐标;根据所述第一坐标和所述第二坐标的几何关系确定所述空间相对距离。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:根据所述目标声速进行声源定位。
- 一种声速矫正装置,其特征在于,包括:接收单元,用于通过麦克风阵列接收来自矫正声源的声速矫正信号;确定单元,用于确定目标声速,所述目标声速与所述麦克风阵列和所述矫正声源的空间相对距离,及所述声速矫正信号相关。
- 根据权利要求8所述的装置,其特征在于,所述声速矫正信号为超声波或第一声音,所述第一声音的频率在预设频段范围之外,所述接收单元具体用于:通过麦克风阵列实时接收来自矫正声源的声速矫正信号。
- 根据权利要求8所述的装置,其特征在于,所述声速矫正信号为第二声音,所述第二声音的频率在预设频段范围之内,所述接收单元具体用于:通过麦克风阵列周期性接收来自矫正声源的声速矫正信号。
- 根据权利要求8-10所述的装置,其特征在于,所述声速矫正信号用于确定所述麦 克风阵列中麦克风之间的时延,所述目标声速与所述空间相对距离成正比关系,所述目标声速与所述时延成反比关系。
- 根据权利要求8-11任一项所述的装置,其特征在于,所述装置还包括:获取单元,所述获取单元具体用于:通过摄像头获取所述矫正声源的位置;所述确定单元还用于:根据所述矫正声源的位置和所述麦克风阵列的位置确定所述空间相对距离。
- 根据权利要求12所述的装置,其特征在于,所述确定单元具体用于:根据所述矫正声源的位置确定所述矫正声源在三维空间坐标系中的第一坐标;根据所述麦克风阵列的位置确定所述麦克风阵列在所述三维空间坐标系中的第二坐标;根据所述第一坐标和所述第二坐标的几何关系确定所述空间相对距离。
- 根据权利要求8-13任一项所述的装置,其特征在于,所述装置还包括定位单元,所述定位单元具体用于:根据所述目标声速进行声源定位。
- 一种计算机设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述处理器用于执行所述存储器中存储的指令,使得所述计算机设备执行如权利要求1至7中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令被执行时,使得计算机执行如权利要求1至7中任一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品中包括计算机程序代码,其特征在于,当所述计算机程序代码在计算机上运行时,使得计算机实现如权利要求1至7中任一项所述的方法。
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111382634.4 | 2021-11-22 | ||
| CN202111382634 | 2021-11-22 | ||
| CN202111672798.0A CN116148769A (zh) | 2021-11-22 | 2021-12-31 | 一种声速矫正方法以及装置 |
| CN202111672798.0 | 2021-12-31 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023088156A1 true WO2023088156A1 (zh) | 2023-05-25 |
Family
ID=86357001
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/131002 Ceased WO2023088156A1 (zh) | 2021-11-22 | 2022-11-10 | 一种声速矫正方法以及装置 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116148769A (zh) |
| WO (1) | WO2023088156A1 (zh) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001141578A (ja) * | 1999-11-10 | 2001-05-25 | Ishikawajima Harima Heavy Ind Co Ltd | 温度検出方法及び温度検出装置 |
| JP2004309265A (ja) * | 2003-04-04 | 2004-11-04 | Nec Corp | マルチスタティック水中音速計測方法および方式 |
| CN201247251Y (zh) * | 2008-08-21 | 2009-05-27 | 中国船舶重工集团公司第七一一研究所 | 管道气体流速和声速测量计 |
| CN105307063A (zh) * | 2014-07-15 | 2016-02-03 | 松下知识产权经营株式会社 | 声速校正装置 |
| CN109164414A (zh) * | 2018-09-07 | 2019-01-08 | 深圳市天博智科技有限公司 | 基于麦克风阵列的定位方法、装置和存储介质 |
-
2021
- 2021-12-31 CN CN202111672798.0A patent/CN116148769A/zh active Pending
-
2022
- 2022-11-10 WO PCT/CN2022/131002 patent/WO2023088156A1/zh not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001141578A (ja) * | 1999-11-10 | 2001-05-25 | Ishikawajima Harima Heavy Ind Co Ltd | 温度検出方法及び温度検出装置 |
| JP2004309265A (ja) * | 2003-04-04 | 2004-11-04 | Nec Corp | マルチスタティック水中音速計測方法および方式 |
| CN201247251Y (zh) * | 2008-08-21 | 2009-05-27 | 中国船舶重工集团公司第七一一研究所 | 管道气体流速和声速测量计 |
| CN105307063A (zh) * | 2014-07-15 | 2016-02-03 | 松下知识产权经营株式会社 | 声速校正装置 |
| CN109164414A (zh) * | 2018-09-07 | 2019-01-08 | 深圳市天博智科技有限公司 | 基于麦克风阵列的定位方法、装置和存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116148769A (zh) | 2023-05-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107534725B (zh) | 一种语音信号处理方法及装置 | |
| JP6246792B2 (ja) | ユーザのグループのうちのアクティブに話しているユーザを識別するための装置及び方法 | |
| KR101659712B1 (ko) | 입자 필터링을 이용한 음원 위치를 추정 | |
| CN105874535B (zh) | 语音处理方法和语音处理装置 | |
| CN102324237B (zh) | 麦克风阵列语音波束形成方法、语音信号处理装置及系统 | |
| WO2014161309A1 (zh) | 一种移动终端实现声源定位的方法及装置 | |
| CN109804559A (zh) | 空间音频系统中的增益控制 | |
| CN104244164A (zh) | 生成环绕立体声声场 | |
| WO2016014254A1 (en) | System and method for determining audio context in augmented-reality applications | |
| WO2021037129A1 (zh) | 一种声音采集方法及装置 | |
| WO2014101429A1 (zh) | 一种终端双麦克风降噪的方法及装置 | |
| CN104699445A (zh) | 一种音频信息处理方法及装置 | |
| CN105245811B (zh) | 一种录像方法及装置 | |
| CN112015364A (zh) | 拾音灵敏度的调整方法、装置 | |
| WO2015184893A1 (zh) | 移动终端通话语音降噪方法及装置 | |
| CN109270493B (zh) | 声源定位方法和装置 | |
| CN106338711A (zh) | 一种基于智能设备的语音定向方法及系统 | |
| WO2019061678A1 (zh) | 移动侦测方法、装置和监控设备 | |
| WO2022007030A1 (zh) | 音频信号处理方法、装置、设备及可读介质 | |
| CN114255781B (zh) | 一种多通道音频信号获取方法、装置及系统 | |
| WO2019200722A1 (zh) | 声源方向估计方法和装置 | |
| CN112466325B (zh) | 声源定位方法和装置,及计算机存储介质 | |
| CN115914905A (zh) | 会议音频处理方法、设备及其存储介质 | |
| WO2023088156A1 (zh) | 一种声速矫正方法以及装置 | |
| CN114038452B (zh) | 一种语音分离方法和设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22894693 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22894693 Country of ref document: EP Kind code of ref document: A1 |