CN113763982A

CN113763982A - Audio processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN113763982A
Application number: CN202010507347.0A
Authority: CN
Inventors: 方博伟
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2021-12-07

Abstract

Embodiments of the present disclosure disclose an audio processing method, an apparatus, an electronic device, and a readable storage medium. The audio processing method includes: acquiring first audio data collected by a first microphone and second audio data corresponding to the first audio data collected by a second microphone; determining the first audio data and the second audio data The main sound source orientation of the data; a target noise ratio is determined based on the first audio data, the second audio data and the main sound source orientation, the target noise ratio representing the respective differences of the first audio data and the second audio data a ratio of desired signal energy to undesired signal energy; and, based on the target noise ratio, filtering the first audio data and/or the second audio data and based on the filtered first audio data and/or the second audio data The audio data obtains the target audio data, which improves the estimation accuracy of the noise parameters, so that the signal of the desired audio source can be better extracted from the environment.

Description

Audio processing method, apparatus, electronic device and readable storage medium

技术领域technical field

本公开涉及计算机技术领域，具体涉及一种音频处理方法、装置、电子设备及可读存储介质。The present disclosure relates to the field of computer technologies, and in particular, to an audio processing method, an apparatus, an electronic device, and a readable storage medium.

背景技术Background technique

在拍摄视频、录音、进行语音通话或远程会议时，麦克风所接收到的信号是期望信号与非期望噪音信号叠加的结果。实际环境常常伴随着各种不同类型的噪音，有稳态的高斯白噪声，也有非稳态的噪声，如食堂，超市，餐厅等声音场所，环境声音组成比较复杂，导致接收到的声音往往非常嘈杂，影响听音感受，严重的情况下期望的声音甚至会被噪音掩蔽，无法获取期望的语音内容。When shooting video, recording, conducting a voice call or remote conference, the signal received by the microphone is the result of the superposition of the desired signal and the undesired noise signal. The actual environment is often accompanied by various types of noise, including steady-state Gaussian white noise and non-steady-state noise, such as canteens, supermarkets, restaurants and other sound places. Noisy, affecting the listening experience. In severe cases, the desired sound may even be masked by noise, and the desired speech content cannot be obtained.

音频降噪的基本思想都是采用谱减法，内置许多不同环境下的背景噪音样本，计算匹配最相似的噪音样本，来应对不同的实际环境。但对于非稳态噪音抑制效果很弱。双麦克风阵列可以定位音源，通过波束成形来提取期望位置的音源，可以在一定程度上抵消环境噪音，但是在混响较大或嘈杂环境的中提取效果较差，还是需要借助谱减法达到最终降噪的目的，并且，该方法在嘈杂环境中定位到目标音源位置不准直接影响了噪音参数估计，可能会出现误判，造成期望语音较大的失真，影响听音感受。The basic idea of audio noise reduction is to use spectral subtraction, built-in many background noise samples in different environments, and calculate and match the most similar noise samples to deal with different actual environments. However, the effect of suppressing non-steady-state noise is weak. The dual microphone array can locate the sound source and extract the sound source at the desired position through beamforming, which can offset the ambient noise to a certain extent. In addition, the inaccuracy of the method to locate the target sound source in a noisy environment directly affects the noise parameter estimation, which may lead to misjudgment, resulting in large distortion of the expected speech and affecting the listening experience.

发明内容SUMMARY OF THE INVENTION

为了解决相关技术中的问题，本公开实施例提供一种音频处理方法、装置、电子设备及可读存储介质。In order to solve the problems in the related art, the embodiments of the present disclosure provide an audio processing method, an apparatus, an electronic device, and a readable storage medium.

第一方面，本公开实施例中提供了一种音频处理方法。In a first aspect, an audio processing method is provided in the embodiments of the present disclosure.

具体地，所述音频处理方法，包括：Specifically, the audio processing method includes:

获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；acquiring first audio data collected by the first microphone and second audio data corresponding to the first audio data collected by the second microphone;

确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；Determine the main sound source position of the first audio data and the second audio data. Set the position of the sound source of the condition;

基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；以及A target noise ratio is determined based on the first audio data, the second audio data, and the main audio source orientation, the target noise ratio representing the desired signal energy and the undesired signal energy of each of the first audio data and the second audio data the ratio of signal energy; and

基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据。Based on the target noise ratio, filtering the first audio data and/or the second audio data and obtaining target audio data based on the filtered first audio data and/or the second audio data.

结合第一方面，本公开在第一方面的第一种实现方式中，所述方法还包括：In conjunction with the first aspect, in a first implementation manner of the first aspect of the present disclosure, the method further includes:

在确定目标噪音比之前，获取所述第一音频数据的频谱和所述第二音频数据的频谱。Before determining the target noise ratio, the frequency spectrum of the first audio data and the frequency spectrum of the second audio data are acquired.

结合第一方面的第一种实现方式，本公开在第一方面的第二种实现方式中，所述基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，包括：With reference to the first implementation manner of the first aspect, in the second implementation manner of the first aspect of the present disclosure, the target noise ratio is determined based on the first audio data, the second audio data, and the position of the main audio source, include:

针对指定频点，确定所述第一音频数据的频谱与所述第二音频数据的频谱之间的相关函数；For a specified frequency point, determine a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data;

基于所述相关函数和主音源方位确定所述指定频点的目标噪音比。A target-to-noise ratio for the specified frequency point is determined based on the correlation function and the orientation of the main sound source.

结合第一方面的第二种实现方式，本公开在第一方面的第三种实现方式中，所述基于所述相关函数和主音源方位确定所述指定频点的目标噪音比，包括：With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect of the present disclosure, the determining the target noise ratio of the specified frequency point based on the correlation function and the main sound source position includes:

确定所述相关函数的实部的期望信号成分表示和非期望信号成分表示；determining a desired signal component representation and an undesired signal component representation of the real part of the correlation function;

确定所述相关函数的虚部的期望信号成分表示和非期望信号成分表示；determining a desired signal component representation and an undesired signal component representation of the imaginary part of the correlation function;

基于所述相关函数的实部的期望信号成分表示和非期望信号成分表示、所述相关函数的虚部的期望信号成分表示和非期望信号成分表示，和所述主音源方位，确定所述指定频点的目标噪音比。The assignment is determined based on the desired signal component representation and the undesired signal component representation of the real part of the correlation function, the desired signal component representation and the undesired signal component representation of the imaginary part of the correlation function, and the main sound source orientation The target-to-noise ratio of the frequency point.

结合第一方面的第二种实现方式，本公开在第一方面的第四种实现方式中，所述基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，还包括获取所述频谱中各个频点的目标噪音比。With reference to the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect of the present disclosure, the target noise ratio is determined based on the first audio data, the second audio data, and the position of the main audio source, It also includes acquiring the target-to-noise ratio of each frequency point in the frequency spectrum.

结合第一方面的第一种实现方式，本公开在第一方面的第五种实现方式中，所述基于所述目标噪音比，对所述第一音频数据和第二音频数据进行滤波并基于滤波后的第一音频数据和第二音频数据获取目标音频数据，包括：With reference to the first implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the first audio data and the second audio data are filtered based on the target noise ratio and based on The filtered first audio data and the second audio data obtain target audio data, including:

基于所述目标噪音比，对所述第一音频数据的频谱和所述第二音频数据的频谱进行滤波；based on the target noise ratio, filtering the frequency spectrum of the first audio data and the frequency spectrum of the second audio data;

从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据，和/或从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；The time domain representation of the first audio data is obtained from the spectrum of the filtered first audio data as third audio data, and/or the time domain representation of the second audio data is obtained from the spectrum of the filtered second audio data represented as fourth audio data;

基于所述第三音频数据和/或所述第四音频数据以获取目标音频数据。Target audio data is acquired based on the third audio data and/or the fourth audio data.

结合第一方面的第五种实现方式，本公开在第一方面的第六种实现方式中，所述基于所述目标噪音比，对所述第一音频数据和第二音频数据进行滤波，包括：With reference to the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect of the present disclosure, the filtering of the first audio data and the second audio data based on the target noise ratio includes: :

获取期望音源的方位范围；Obtain the azimuth range of the desired sound source;

基于所述主音源方位和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果，所述当前音频数据为所述第一音频数据或所述第二音频数据；Based on the azimuth of the main audio source and the azimuth range of the desired audio source, a judgment result of whether the current audio data is desired audio data or undesired audio data is obtained, and the current audio data is the first audio data or the second audio data;

基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数；updating spatial filter coefficients based on the judgment result, the current audio data, and the target noise ratio;

通过更新后的空域滤波器系数对所述当前音频数据滤波。The current audio data is filtered by the updated spatial filter coefficients.

结合第一方面的第六种实现方式，本公开在第一方面的第七种实现方式中，所述基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数，包括：With reference to the sixth implementation manner of the first aspect, in the seventh implementation manner of the present disclosure, the spatial filter coefficients are updated based on the judgment result, the current audio data, and the target noise ratio, include:

在所述当前音频数据为期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的全局协方差矩阵；When the current audio data is desired audio data, update the global covariance matrix of the current audio data based on the current audio data and the target noise ratio;

在所述当前音频数据为非期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的噪音协方差矩阵和所述全局协方差矩阵；When the current audio data is undesired audio data, update the noise covariance matrix and the global covariance matrix of the current audio data based on the current audio data and the target noise ratio;

基于所述噪音协方差矩阵和所述全局协方差矩阵更新空域滤波器系数。Spatial filter coefficients are updated based on the noise covariance matrix and the global covariance matrix.

结合第一方面，本公开在第一方面的第八种实现方式中，所述概率符合预设条件的音源方位，包括概率最大的音源方位。With reference to the first aspect, in an eighth implementation manner of the first aspect of the present disclosure, the sound source azimuth whose probability meets the preset condition includes the sound source azimuth with the highest probability.

第二方面，本公开实施例中提供了一种音频处理装置。In a second aspect, an embodiment of the present disclosure provides an audio processing apparatus.

具体地，所述音频处理装置，包括：Specifically, the audio processing device includes:

第一获取模块，被配置为获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；a first acquisition module, configured to acquire first audio data collected by a first microphone and second audio data corresponding to the first audio data collected by a second microphone;

第一确定模块，被配置为确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；The first determining module is configured to determine the position of the main sound source of the first audio data and the second audio data, and the main sound source position includes the position of the first audio data and the second audio data. The position of the sound source whose probability meets the preset conditions among the positions of the multiple sound sources;

第二确定模块，被配置为基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；以及A second determination module configured to determine a target noise ratio based on the first audio data, the second audio data and the main audio source orientation, the target noise ratio representing the first audio data and the second audio data the respective ratios of desired signal energy to undesired signal energy; and

第二获取模块，被配置为基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据。A second obtaining module, configured to filter the first audio data and/or the second audio data based on the target noise ratio, and obtain target audio based on the filtered first audio data and/or the second audio data data.

结合第二方面，本公开在第二方面的第一种实现方式中，所述装置还包括：In conjunction with the second aspect, in a first implementation manner of the second aspect of the present disclosure, the apparatus further includes:

第三获取模块，被配置为在确定目标噪音比之前，获取所述第一音频数据的频谱和所述第二音频数据的频谱。The third acquisition module is configured to acquire the frequency spectrum of the first audio data and the frequency spectrum of the second audio data before determining the target noise ratio.

结合第二方面的第一种实现方式，本公开在第二方面的第二种实现方式中，所述确定模块包括：With reference to the first implementation manner of the second aspect, in the second implementation manner of the second aspect of the present disclosure, the determining module includes:

第一确定子模块，被配置为针对指定频点，确定所述第一音频数据的频谱与所述第二音频数据的频谱之间的相关函数；a first determination submodule, configured to determine, for a specified frequency point, a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data;

第二确定子模块，被配置为基于所述相关函数和主音源方位确定所述指定频点的目标噪音比。The second determination submodule is configured to determine the target noise ratio of the specified frequency point based on the correlation function and the position of the main sound source.

结合第二方面的第二种实现方式，本公开在第二方面的第三种实现方式中，所述第二确定子模块包括：With reference to the second implementation manner of the second aspect, in a third implementation manner of the second aspect of the present disclosure, the second determination submodule includes:

第一确定单元，被配置为确定所述相关函数的实部的期望信号成分表示和非期望信号成分表示；a first determination unit configured to determine a desired signal component representation and an undesired signal component representation of the real part of the correlation function;

第二确定单元，被配置为确定所述相关函数的虚部的期望信号成分表示和非期望信号成分表示；a second determination unit configured to determine a desired signal component representation and an undesired signal component representation of the imaginary part of the correlation function;

第三确定单元，被配置为基于所述相关函数的实部的期望信号成分表示和非期望信号成分表示、所述相关函数的虚部的期望信号成分表示和非期望信号成分表示，和所述主音源方位，确定所述指定频点的目标噪音比。a third determination unit configured to represent desired and undesired signal components based on the real part of the correlation function, the desired and undesired signal component representations of the imaginary part of the correlation function, and the The orientation of the main sound source, and the target noise ratio of the specified frequency point is determined.

结合第二方面的第二种实现方式，本公开在第二方面的第四种实现方式中，所述第二确定模块还包括第一获取子模块，被配置为获取所述频谱中各个频点的目标噪音比。With reference to the second implementation manner of the second aspect, in a fourth implementation manner of the second aspect of the present disclosure, the second determination module further includes a first acquisition sub-module configured to acquire each frequency point in the frequency spectrum target-to-noise ratio.

结合第二方面的第一种实现方式，本公开在第二方面的第五种实现方式中，所述第二获取模块包括：With reference to the first implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the second acquisition module includes:

滤波子模块，被配置为基于所述目标噪音比，对所述第一音频数据的频谱和所述第二音频数据的频谱进行滤波；a filtering submodule configured to filter the frequency spectrum of the first audio data and the frequency spectrum of the second audio data based on the target-to-noise ratio;

第二获取子模块，被配置为从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据，和/或从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；A second acquisition sub-module configured to acquire a time-domain representation of the first audio data from the spectrum of the filtered first audio data as third audio data, and/or to acquire from the spectrum of the filtered second audio data a time domain representation of the second audio data as the fourth audio data;

第三获取子模块，被配置为基于所述第三音频数据和/或所述第四音频数据以获取目标音频数据。A third obtaining sub-module is configured to obtain target audio data based on the third audio data and/or the fourth audio data.

结合第二方面的第五种实现方式，本公开在第二方面的第六种实现方式中，所述滤波子模块包括：With reference to the fifth implementation manner of the second aspect, in the sixth implementation manner of the second aspect of the present disclosure, the filtering submodule includes:

第一获取单元，被配置为获取期望音源的方位范围；a first obtaining unit, configured to obtain the azimuth range of the desired sound source;

第二获取单元，被配置为基于所述主音源方位和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果，所述当前音频数据为所述第一音频数据或所述第二音频数据；a second obtaining unit, configured to obtain a judgment result of whether the current audio data is expected audio data or undesired audio data based on the azimuth of the main sound source and the azimuth range of the desired sound source, where the current audio data is the first audio data audio data or the second audio data;

更新单元，被配置为基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数；an update unit configured to update the spatial filter coefficient based on the judgment result, the current audio data and the target noise ratio;

滤波单元，被配置为通过更新后的空域滤波器系数对所述当前音频数据滤波。A filtering unit configured to filter the current audio data through the updated spatial filter coefficients.

结合第二方面的第六种实现方式，本公开在第二方面的第七种实现方式中，所述更新单元包括：In conjunction with the sixth implementation manner of the second aspect, in the seventh implementation manner of the second aspect of the present disclosure, the updating unit includes:

第一更新子单元，被配置为在所述当前音频数据为期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的全局协方差矩阵；A first update subunit, configured to update the global covariance matrix of the current audio data based on the current audio data and the target noise ratio when the current audio data is the desired audio data;

第二更新子单元，被配置为在所述当前音频数据为非期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的噪音协方差矩阵和所述全局协方差矩阵；a second update subunit, configured to update the noise covariance matrix of the current audio data and the global covariance matrix;

第三更新子单元，被配置为基于所述噪音协方差矩阵和所述全局协方差矩阵更新空域滤波器系数。A third update subunit is configured to update spatial filter coefficients based on the noise covariance matrix and the global covariance matrix.

结合第二方面，本公开在第二方面的第八种实现方式中，所述概率符合预设条件的音源方位，包括概率最大的音源方位。With reference to the second aspect, in an eighth implementation manner of the second aspect of the present disclosure, the sound source azimuth whose probability meets the preset condition includes the sound source azimuth with the highest probability.

第三方面，本公开实施例提供了一种音频处理方法，包括：In a third aspect, an embodiment of the present disclosure provides an audio processing method, including:

获取N个麦克风分别采集的彼此对应的N个音频数据，N≥3；Obtain N pieces of audio data corresponding to each other collected by N microphones, N≥3;

基于所述N个音频数据确定一个或多个音频数据对；determining one or more pairs of audio data based on the N audio data;

对于每个音频数据对，确定所述音频数据对所对应的音频数据的主音源方位，所述主音源方位包括对所述音频数据对定位出的多个音源方位中概率符合预设条件的音源方位；基于所述音频数据对所对应的音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述音频数据对所对应的音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述音频数据对所对应的音频数据进行滤波以获取滤波后的音频数据；For each pair of audio data, determine the location of the main sound source of the audio data corresponding to the pair of audio data, where the location of the main sound source includes a sound source whose probability meets a preset condition among the locations of multiple audio sources located for the pair of audio data Orientation; determine a target noise ratio based on the audio data corresponding to the audio data pair and the main audio source orientation, where the target noise ratio represents the respective expected signal energy and undesired signal energy of the audio data corresponding to the audio data pair based on the target-to-noise ratio, filtering the audio data to the corresponding audio data to obtain filtered audio data;

基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据。Target audio data is determined based on the filtered audio data obtained from the one or more pairs of audio data.

结合第三方面，本公开在第三方面的第一种实现方式中，所述基于所述多个音频数据确定一个或多个音频数据对包括：With reference to the third aspect, in a first implementation manner of the third aspect of the present disclosure, the determining one or more pairs of audio data based on the plurality of audio data includes:

根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对；或者determining the one or more pairs of audio data according to the positional relationship of the N microphones; or

将所述多个音频数据中的任意两个音频数据组成音频数据对。Any two audio data in the plurality of audio data are formed into an audio data pair.

结合第三方面的第一种实现方式，本公开在第三方面的第二种实现方式中，所述根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对包括：With reference to the first implementation manner of the third aspect, in the second implementation manner of the third aspect of the present disclosure, the determining of the one or more audio data pairs according to the positional relationship of the N microphones includes:

若所述N个麦克风以线性方式布置，则选择距离所述N个麦克风构成的阵列的几何中心点最近的两个麦克风对应的音频数据组成音频数据对。If the N microphones are arranged in a linear manner, audio data corresponding to two microphones closest to the geometric center point of the array formed by the N microphones are selected to form an audio data pair.

结合第三方面，本公开在第三方面的第三种实现方式中，所述基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据，包括：In conjunction with the third aspect, in a third implementation manner of the third aspect of the present disclosure, the determining of target audio data based on the filtered audio data obtained from the one or more pairs of audio data includes:

通过对从所述一个或多个音频数据对获取的所述滤波后的音频数据进行加权求和以获取目标音频数据；或者obtaining target audio data by weighted summing said filtered audio data obtained from said one or more pairs of audio data; or

在从所述一个或多个音频数据对获取的所述滤波后的音频数据中，选择与预设位置的麦克风相对应的滤波后的音频数据作为目标音频数据。Among the filtered audio data obtained from the one or more pairs of audio data, the filtered audio data corresponding to the microphone at the preset position is selected as the target audio data.

结合第三方面，本公开在第三方面的第四种实现方式中，所述概率符合预设条件的音源方位，包括概率最大的音源方位。With reference to the third aspect, in a fourth implementation manner of the third aspect of the present disclosure, the sound source azimuth whose probability meets the preset condition includes the sound source azimuth with the highest probability.

第四方面，本公开实施例提供了一种音频处理装置，包括：In a fourth aspect, an embodiment of the present disclosure provides an audio processing apparatus, including:

第四获取模块，被配置为获取N个麦克风分别采集的彼此对应的N个音频数据，N≥3；a fourth acquisition module, configured to acquire N pieces of audio data corresponding to each other collected by the N microphones, N≥3;

第三确定模块，被配置为基于所述N个音频数据确定一个或多个音频数据对；a third determining module configured to determine one or more pairs of audio data based on the N pieces of audio data;

滤波模块，被配置为对于每个音频数据对，确定所述音频数据对所对应的音频数据的主音源方位，所述主音源方位包括对所述音频数据对定位出的多个音源方位中概率符合预设条件的音源方位；基于所述音频数据对所对应的音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述音频数据对所对应的音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述音频数据对所对应的音频数据进行滤波以获取滤波后的音频数据；The filtering module is configured to, for each pair of audio data, determine the position of the main sound source of the audio data corresponding to the pair of audio data, and the position of the main sound source includes a probability among the positions of the multiple sound sources located for the pair of audio data The sound source orientation that meets the preset conditions; the target noise ratio is determined based on the audio data corresponding to the audio data pair and the main sound source orientation, and the target noise ratio represents the respective expected signals of the audio data pair corresponding to the audio data. a ratio of energy to undesired signal energy; based on the target-to-noise ratio, filtering the audio data to the corresponding audio data to obtain filtered audio data;

第四确定模块，被配置为基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据。A fourth determination module configured to determine target audio data based on the filtered audio data obtained from the one or more pairs of audio data.

结合第四方面，本公开在第四方面的第一种实现方式中，所述第三确定模块包括：With reference to the fourth aspect, in a first implementation manner of the fourth aspect of the present disclosure, the third determining module includes:

第三确定子模块，被配置为根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对；或者a third determination submodule, configured to determine the one or more audio data pairs according to the positional relationship of the N microphones; or

第四确定子模块，被配置为将所述多个音频数据中的任意两个音频数据组成音频数据对。The fourth determining submodule is configured to form an audio data pair from any two audio data in the plurality of audio data.

结合第四方面的第一种实现方式，本公开在第四方面的第二种实现方式中，所述第三确定子模块被配置为：With reference to the first implementation manner of the fourth aspect, in the second implementation manner of the fourth aspect of the present disclosure, the third determination submodule is configured as:

结合第四方面，本公开在第四方面的第三种实现方式中，所述第四确定模块被配置为：In conjunction with the fourth aspect, in a third implementation manner of the fourth aspect, the fourth determination module is configured to:

结合第四方面，本公开在第四方面的第四种实现方式中，所述概率符合预设条件的音源方位，包括概率最大的音源方位。With reference to the fourth aspect, in a fourth implementation manner of the fourth aspect of the present disclosure, the sound source azimuth whose probability meets the preset condition includes the sound source azimuth with the highest probability.

第五方面，本公开实施例提供了一种电子设备，包括存储器和处理器，其中，所述存储器用于存储一条或多条计算机指令，其中，所述一条或多条计算机指令被所述处理器执行以实现如第一方面、第一方面的第一种实现方式到第八种实现方式、第三方面、第三方面的第一种实现方式到第四种实现方式中任一项所述的方法。In a fifth aspect, embodiments of the present disclosure provide an electronic device, including a memory and a processor, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are processed by the The device executes to implement any one of the first aspect, the first implementation manner to the eighth implementation manner of the first aspect, the third aspect, the first implementation manner of the third aspect to the fourth implementation manner Methods.

第六方面，本公开实施例中提供了一种计算机可读存储介质，其上存储有计算机指令，该计算机指令被处理器执行时实现如第一方面、第一方面的第一种实现方式到第八种实现方式、第三方面、第三方面的第一种实现方式到第四种实现方式中任一项所述的方法。In a sixth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed by a processor, implement the first aspect, the first implementation manner of the first aspect to The method described in any one of the eighth implementation manner, the third aspect, and the first implementation manner of the third aspect to the fourth implementation manner.

根据本公开实施例提供的技术方案，通过获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；以及，基于所述目标噪音比，对所述第一音频数据和第二音频数据进行滤波并基于滤波后的第一音频数据和第二音频数据获取目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions provided by the embodiments of the present disclosure, first audio data collected by a first microphone and second audio data corresponding to the first audio data collected by a second microphone are obtained; the first audio data and The position of the main sound source of the second audio data, the position of the main sound source includes the position of the sound source whose probability meets the preset condition among the plurality of sound source positions located on the first audio data and the second audio data; The first audio data, the second audio data and the main sound source orientation determine a target noise ratio, the target noise ratio representing the desired signal energy and the undesired signal energy of the first audio data and the second audio data respectively. and, based on the target noise ratio, filtering the first audio data and the second audio data and obtaining the target audio data based on the filtered first audio data and the second audio data, which improves the noise parameter. Estimate the accuracy, so that the signal of the desired sound source can be better extracted from the environment.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

结合附图，通过以下非限制性实施方式的详细描述，本公开的其它特征、目的和优点将变得更加明显。在附图中：Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the attached image:

图1示出根据本公开实施例的音频处理方法的流程图；1 shows a flowchart of an audio processing method according to an embodiment of the present disclosure;

图2示出根据本公开实施例的确定目标噪音比的流程图；FIG. 2 shows a flowchart of determining a target noise ratio according to an embodiment of the present disclosure;

图3示出根据本公开实施例的获取目标音频数据的流程图；3 shows a flowchart of acquiring target audio data according to an embodiment of the present disclosure;

图4A示出根据本公开实施例的对所述第一音频数据和第二音频数据进行滤波的流程图；4A shows a flowchart of filtering the first audio data and the second audio data according to an embodiment of the present disclosure;

图4B示出根据本公开实施例的期望音源的方位范围的示意图；4B shows a schematic diagram of an azimuth range of a desired sound source according to an embodiment of the present disclosure;

图5示出根据本公开实施例的更新空域滤波器系数的示意图；5 shows a schematic diagram of updating spatial filter coefficients according to an embodiment of the present disclosure;

图6示出根据本公开另一实施例的音频处理方法的示意图；6 shows a schematic diagram of an audio processing method according to another embodiment of the present disclosure;

图7示出根据本公开实施例的音频处理装置的框图；7 shows a block diagram of an audio processing apparatus according to an embodiment of the present disclosure;

图8示出根据本公开实施例的第二确定模块的框图；8 shows a block diagram of a second determination module according to an embodiment of the present disclosure;

图9示出根据本公开实施例的第二获取模块的框图；9 shows a block diagram of a second acquisition module according to an embodiment of the present disclosure;

图10示出根据本公开实施例的滤波子模块的框图；10 shows a block diagram of a filtering sub-module according to an embodiment of the present disclosure;

图11示出根据本公开另一实施例的音频处理方法的流程图；11 shows a flowchart of an audio processing method according to another embodiment of the present disclosure;

图12A～图12C示出根据本公开实施例的多个麦克风的示意图；12A-12C illustrate schematic diagrams of a plurality of microphones according to an embodiment of the present disclosure;

图13示出根据本公开另一实施例的音频处理装置的框图；13 shows a block diagram of an audio processing apparatus according to another embodiment of the present disclosure;

图14示出根据本公开实施例的电子设备的框图；以及FIG. 14 shows a block diagram of an electronic device according to an embodiment of the present disclosure; and

图15示出根据本公开实施例的适于实现音频处理方法的计算机系统的结构示意图。FIG. 15 shows a schematic structural diagram of a computer system suitable for implementing an audio processing method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

下文中，将参考附图详细描述本公开的示例性实施例，以使本领域技术人员可容易地实现它们。此外，为了清楚起见，在附图中省略了与描述示例性实施例无关的部分。Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts unrelated to describing the exemplary embodiments are omitted from the drawings.

在本公开中，应理解，诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在，并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。In the present disclosure, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts, or combinations thereof disclosed in this specification, and are not intended to exclude a or multiple other features, numbers, steps, acts, components, parts, or combinations thereof may exist or be added.

另外还需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。In addition, it should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

针对音频降噪，单麦降噪的基本思想都是采用谱减法，通过语音活动检测获取噪音段音频数据，进一步估计出噪音频谱，从接收到的声音谱中减去估计的噪音谱获取期望语音成分。考虑到环境的多样性，有的设备会内置许多不同环境下的背景噪音样本，计算匹配最相似的噪音样本，来应对不同的实际环境。由于实际环境随机性非常大，且相同环境中的声音成分更是千差万别，这种内置噪音样本的降噪方式对稳态噪音有一定效果，但是要采集很多噪音样本，工作量较大，且对于非稳态噪音抑制效果很弱。同时单麦并不能定位音源，无法实现定向的增强某一个拾音范围内的声音。For audio noise reduction, the basic idea of single-mic noise reduction is to use spectral subtraction, obtain the audio data of the noise segment through voice activity detection, further estimate the noise spectrum, and subtract the estimated noise spectrum from the received sound spectrum to obtain the desired speech. Element. Considering the diversity of environments, some devices will have many built-in background noise samples in different environments, and calculate and match the most similar noise samples to deal with different actual environments. Since the actual environment is very random, and the sound components in the same environment are very different, this noise reduction method with built-in noise samples has a certain effect on steady-state noise, but it requires a lot of noise samples, which requires a lot of work, and for Astable noise suppression effect is weak. At the same time, a single microphone cannot locate the sound source, and cannot achieve directional enhancement of the sound within a certain pickup range.

因此，在一些小型通信设备或者录音设备中，双麦克风阵列成了增强期望声音、抑制噪音的首选，其基本思想是依赖音源定位检测结果，通过波束成形来提取期望区域的音源，传统的延时加成(delay and sum)的方法通过估计音源定位结果来纠正两个声道直接的延时，再叠加两个声道数据在一定程度上抵消环境噪音，但是在混响较大或嘈杂环境的中提取效果较差，还是需要借助谱减法达到最终降噪的目的；最小方差无失真响应法(MVDR，Minimum variance distortionless response)是一种自适应波束成形方法，通过语音活动检测判别噪音帧和音源定位结果来估计噪音频谱参数，计算转向向量更新空域滤波器系数，进而从信号中分离期望信号的方法，该方法在嘈杂环境中定位到目标音源位置不准直接影响了噪音(尤其是非稳态干扰噪音)参数估计，可能会出现误判，造成期望语音较大的失真，影响听音感受。Therefore, in some small communication equipment or recording equipment, the dual-microphone array has become the first choice for enhancing the desired sound and suppressing the noise. The method of addition (delay and sum) corrects the direct delay of the two channels by estimating the sound source localization result, and then superimposes the data of the two channels to offset the ambient noise to a certain extent, but in the case of a large reverberation or a noisy environment. The extraction effect is poor, and spectral subtraction is still needed to achieve the final noise reduction; the minimum variance distortionless response method (MVDR, Minimum variance distortionless response) is an adaptive beamforming method. The positioning results are used to estimate the noise spectrum parameters, and the steering vector is calculated to update the spatial filter coefficients, and then the desired signal is separated from the signal. This method locates the target sound source in a noisy environment. noise) parameter estimation, misjudgment may occur, resulting in large distortion of the expected speech, affecting the listening experience.

本公开实施例提供的音频处理方法通过获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；以及，基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。The audio processing method provided by the embodiment of the present disclosure obtains first audio data collected by a first microphone and second audio data corresponding to the first audio data collected by a second microphone; The position of the main sound source of the second audio data, the position of the main sound source includes the position of the sound source whose probability meets the preset condition among the plurality of sound source positions located on the first audio data and the second audio data; The first audio data, the second audio data, and the orientation of the main audio source determine a target noise ratio, the target noise ratio representing the difference between the desired signal energy and the undesired signal energy of the first audio data and the second audio data respectively. and, based on the target-to-noise ratio, filtering the first audio data and/or the second audio data and obtaining target audio data based on the filtered first audio data and/or the second audio data, improving the The estimation accuracy of noise parameters, so that the signal of the desired sound source can be better extracted from the environment.

图1示出根据本公开的实施例的音频处理方法的流程图。如图1所示，所述音频处理方法包括以下步骤S110～S140：FIG. 1 shows a flowchart of an audio processing method according to an embodiment of the present disclosure. As shown in FIG. 1, the audio processing method includes the following steps S110-S140:

在步骤S110中，获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；In step S110, first audio data collected by the first microphone and second audio data corresponding to the first audio data collected by the second microphone are obtained;

在步骤S120中，确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；In step S120, determine the main sound source orientation of the first audio data and the second audio data, where the main sound source orientation includes multiple audio sources located for the first audio data and the second audio data The position of the sound source whose probability meets the preset conditions in the position;

在步骤S130中，基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；In step S130, a target noise ratio is determined based on the first audio data, the second audio data and the orientation of the main audio source, the target noise ratio representing the respective expectations of the first audio data and the second audio data The ratio of signal energy to undesired signal energy;

在步骤S140中，基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据。In step S140, based on the target noise ratio, the first audio data and/or the second audio data are filtered, and target audio data is acquired based on the filtered first audio data and/or the second audio data.

根据本公开实施例的技术方案，通过获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据，通过确定出拾音区域内期望信号能量与拾音区域外非期望信号能量之间的信噪比，提升了噪音参数的估计准确率，从而可以更好地从环境中提取嘈杂环境中期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, by acquiring the first audio data collected by the first microphone and the second audio data corresponding to the first audio data collected by the second microphone; The position of the main sound source of the second audio data, the position of the main sound source includes the position of the sound source whose probability meets the preset condition among the plurality of sound source positions located on the first audio data and the second audio data; The first audio data, the second audio data, and the main audio source orientation determine a target noise ratio, the target noise ratio representing the difference between the desired signal energy and the undesired signal energy of the first audio data and the second audio data. ratio; based on the target noise ratio, filtering the first audio data and/or the second audio data and obtaining target audio data based on the filtered first audio data and/or the second audio data, The signal-to-noise ratio between the desired signal energy in the sound area and the undesired signal energy outside the pickup area improves the estimation accuracy of noise parameters, so that the signal of the desired sound source in the noisy environment can be better extracted from the environment.

根据本公开的实施例，所述概率符合预设条件的音源方位，包括概率最大的音源方位。According to an embodiment of the present disclosure, the sound source location whose probability meets the preset condition includes the sound source location with the highest probability.

根据本公开实施例，可以使用通过两个麦克风之间的有向连线中点的射线与所述有向连线的方向之间的夹角来表示音源方位。According to the embodiment of the present disclosure, the angle between the ray passing through the midpoint of the directional connection line between the two microphones and the direction of the directional connection line can be used to represent the sound source orientation.

根据本公开实施例，所述方法还包括在确定目标噪音比之前，获取所述第一音频数据的频谱和所述第二音频数据的频谱。According to an embodiment of the present disclosure, the method further includes acquiring the frequency spectrum of the first audio data and the frequency spectrum of the second audio data before determining the target-to-noise ratio.

根据本公开实施例的技术方案，通过在确定目标噪音比之前，获取所述第一音频数据的频谱和所述第二音频数据的频谱，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, by acquiring the frequency spectrum of the first audio data and the frequency spectrum of the second audio data before determining the target noise ratio, the estimation accuracy of the noise parameter is improved, so that the noise parameter can be better estimated. Extract the signal of the desired source from the environment.

根据本公开实施例，对于第一麦克风接收到的音频数据，可以对其进行分帧得到第一音频数据，对于第二麦克风接收到的音频数据，可以对其进行分帧得到第二音频数据，第一音频数据和第二音频数据是包含同一时间段采集的音频数据的音频帧。可以对第一音频数据加窗，并经傅里叶变换获取第一音频数据的频谱。其中，例如可以按照帧长20ms，帧移10ms进行分帧，即，将第一音频数据分割成0～20ms、10ms～30ms、20ms～40ms等片段，即音频帧。对该些片段加窗，使得首尾变化连续，由一个片段转换成周期型的信号。对加窗后的结果进行傅里叶变换，得到每一个音频帧的频谱。傅里叶变换长度可以预先设置，例如设置为1024甚至更高，傅里叶变换长度决定了离散频谱中的多个频率点的值。多个频率点的值ω_l＝2πl/L,l＝0,1,2,…,L-1，其中L为傅里叶变换长度。同理，对于第二麦克风接收到的第二音频数据，可以类似地对其进行加窗，并经傅里叶变换获取第二音频数据的频谱。According to the embodiment of the present disclosure, for the audio data received by the first microphone, the first audio data can be obtained by framing it, and the audio data received by the second microphone can be divided into frames to obtain the second audio data, The first audio data and the second audio data are audio frames containing audio data collected in the same time period. The first audio data may be windowed, and a frequency spectrum of the first audio data may be obtained through Fourier transform. For example, the frame may be divided according to the frame length of 20ms and the frame shift of 10ms, that is, the first audio data may be divided into segments such as 0-20ms, 10ms-30ms, 20ms-40ms, ie audio frames. These segments are windowed so that the head and tail changes are continuous, and a segment is converted into a periodic signal. Fourier transform is performed on the windowed result to obtain the spectrum of each audio frame. The Fourier transform length can be preset, for example, set to 1024 or even higher, and the Fourier transform length determines the values of multiple frequency points in the discrete spectrum. The values of multiple frequency points ω _l =2πl/L, l=0,1,2,...,L-1, where L is the Fourier transform length. Similarly, for the second audio data received by the second microphone, it can be similarly windowed, and the frequency spectrum of the second audio data can be obtained through Fourier transform.

根据本公开实施例，通过现有的音源定位技术(sound source location)可以检测出当前帧主音源方位，该主音源方位为对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位。其中，音频帧可以包括从不同角度入射的多种声音，音源定位技术可以定位出多个音源角度及其相应概率，将概率最高的角度作为主音源方位，该主音源方位还可以作为判断当前帧是否为期望帧的基础。According to the embodiments of the present disclosure, the position of the main sound source of the current frame can be detected by using the existing sound source location technology, where the position of the main sound source is the most of the positions of the first audio data and the second audio data. Among the sound source positions, the sound source positions whose probability meets the preset conditions. The audio frame may include multiple sounds incident from different angles, and the sound source localization technology can locate multiple sound source angles and their corresponding probabilities, and use the angle with the highest probability as the main sound source orientation, which can also be used to judge the current frame. Whether to base the desired frame.

图2示出根据本公开实施例的确定目标噪音比的流程图。如图2所示，步骤S130可以包括以下步骤S210～S220：FIG. 2 shows a flowchart of determining a target noise ratio according to an embodiment of the present disclosure. As shown in FIG. 2, step S130 may include the following steps S210-S220:

在步骤S210中，针对指定频点，确定所述第一音频数据的频谱与所述第二音频数据的频谱之间的相关函数；In step S210, for a specified frequency point, a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data is determined;

在步骤S220中，基于所述相关函数和主音源方位确定所述指定频点的目标噪音比。In step S220, the target noise ratio of the designated frequency point is determined based on the correlation function and the orientation of the main sound source.

根据本公开实施例的技术方案，通过针对指定频点，确定所述第一音频数据的频谱与所述第二音频数据的频谱之间的相关函数；基于所述相关函数和主音源方位确定所述指定频点的目标噪音比，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data is determined for a specified frequency point; The target noise ratio of the specified frequency point is described, which improves the estimation accuracy of the noise parameters, so that the signal of the desired sound source can be better extracted from the environment.

两个信号的相关函数可以表示为：The correlation function of two signals can be expressed as:

其中，

表示互能量谱密度，定义为Φ_uv(ω,k)＝E[U(ω,k)V^*(ω,k)]，E表示求能量的过程，*表示共轭符号。

和

表示自能量谱密度，定义为Φ_uu(ω,k)＝E[|U(ω,k)|²]。其中，ω为频点，k为音频帧的编号，U(ω，k)是信号U的频谱，V(ω，k)是信号V的频谱。in,

Represents the cross-energy spectral density, which is defined as Φ _uv (ω,k)=E[U(ω,k)V ^* (ω,k)], E represents the process of finding energy, and * represents the conjugate symbol.

and

represents the self-energy spectral density, defined as Φ _uu (ω,k)=E[|U(ω,k)| ² ]. Among them, ω is the frequency point, k is the number of the audio frame, U(ω, k) is the frequency spectrum of the signal U, and V(ω, k) is the frequency spectrum of the signal V.

两个入射夹角为θ的信号的相关函数又可以表示为：The correlation function of two signals with an incident angle θ can be expressed as:

其中，f_s为采样率，

表示声速，d表示麦克风间距，单位米。where f _s is the sampling rate,

Indicates the speed of sound, and d represents the distance between the microphones, in meters.

两个信号的相关函数表述为目标噪音比的形式可以写成：The correlation function of the two signals expressed as the target noise ratio can be written as:

其中，

和

分别表示带噪信号相关函数，期望信号的相关函数和非期望噪音信号的相关函数，TNR₁和TNR₂分别表示两个麦克风信号的目标噪音比。in,

and

Denote the correlation function of the noisy signal, the correlation function of the desired signal and the correlation function of the undesired noise signal, respectively, and TNR ₁ and TNR ₂ denote the target-to-noise ratio of the two microphone signals, respectively.

由于两个双麦克风阵列间距往往只有5到20cm，所以

估计目标-噪音比记为

所以上式可以修正为：Since the distance between two dual microphone arrays is often only 5 to 20cm, so

The estimated target-to-noise ratio is recorded as

So the above formula can be modified as:

将公式(2)代入到公式(4)，假设要提取的主音源方向与麦克风阵列0°方向夹角为θ，相关函数可以写成：Substituting formula (2) into formula (4), assuming that the angle between the direction of the main sound source to be extracted and the 0° direction of the microphone array is θ, the correlation function can be written as:

其中τ＝f_s(d/c)，β＝cosβ₁+cosβ₂+...+cosβ_N表示角度θ以外若干噪音音源之和。例如，通过音源定位确定主音源方位θ为100度，通过音源定位确定当前帧中除100度以外的其他方位为β₁，β₂，……，β_N。where τ=f _s (d/c), β=cosβ ₁ +cosβ ₂ +...+cosβ _N represents the sum of several noise sources other than the angle θ. For example, the main sound source azimuth θ is determined to be 100 degrees through sound source localization, and other azimuths other than 100 degrees in the current frame are determined to be β ₁ , β ₂ , . . . , β _N through sound source localization.

式(3)、(4)或(5)中，等式右边的第一项即为相关函数中的期望信号成分，第二项为相关函数中的非期望信号成分。本公开实施例的方法基于所述期望信号成分和所述非期望信号成分确定所述指定频点的目标噪音比，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。In equations (3), (4) or (5), the first term on the right side of the equation is the expected signal component in the correlation function, and the second term is the undesired signal component in the correlation function. The method of the embodiment of the present disclosure determines the target noise ratio of the specified frequency point based on the desired signal component and the undesired signal component, which improves the estimation accuracy of the noise parameter, so that the desired sound source can be better extracted from the environment signal of.

根据本公开实施例，所述基于所述期望信号成分和非期望信号成分确定所述指定频点的目标噪音比，包括：According to an embodiment of the present disclosure, the determining the target-to-noise ratio of the specified frequency based on the desired signal component and the undesired signal component includes:

根据本公开实施例的技术方案，通过确定所述相关函数的实部的期望信号成分表示和非期望信号成分表示；确定所述相关函数的虚部的期望信号成分表示和非期望信号成分表示；基于所述相关函数的实部的期望信号成分表示和非期望信号成分表示、所述相关函数的虚部的期望信号成分表示和非期望信号成分表示，和所述主音源方位，确定所述指定频点的目标噪音比，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, by determining the desired signal component representation and the undesired signal component representation of the real part of the correlation function; determining the desired signal component representation and the undesired signal component representation of the imaginary part of the correlation function; The assignment is determined based on the desired signal component representation and the undesired signal component representation of the real part of the correlation function, the desired signal component representation and the undesired signal component representation of the imaginary part of the correlation function, and the main sound source orientation The target-to-noise ratio of the frequency point improves the estimation accuracy of the noise parameters, so that the signal of the desired sound source can be better extracted from the environment.

根据本公开实施例，在式(5)的基础上，可以令

α＝ωτcosθ，从而，相关函数的实部R和虚部I可以分别表示为：According to the embodiment of the present disclosure, on the basis of formula (5), we can make

α=ωτcosθ, thus, the real part R and imaginary part I of the correlation function can be expressed as:

经整理可知：After sorting, it can be seen that:

通过联立公式(8)，可以列关于

的方程：Through the simultaneous formula (8), it can be listed about

The equation for :

其中，I、R可以通过式(1)计算，α＝ωτcosθ为已知量，因此，可以求解

或

Among them, I and R can be calculated by formula (1), α=ωτcosθ is a known quantity, therefore, it can be solved

or

例如，记：For example, note:

A＝I-sinα,A=I-sinα,

B＝cosα-R,B=cosα-R,

C＝Rsinα-Icosα. (10)C=Rsinα-Icosα. (10)

可以解方程得到：The equation can be solved to get:

将式(11)代入式(8)即可确定指定频点的目标噪音比。Substitute Equation (11) into Equation (8) to determine the target noise ratio at the specified frequency point.

根据本公开实施例，所述基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，还包括获取所述频谱中各个频点的目标噪音比。通过针对每个频点进行以上运算，可获取频谱中各个频点的目标噪音比，以便后续对音频帧频谱上的每个频率进行滤波以获取目标音频数据，从而可以更好地从环境中提取嘈杂环境中期望音源的信号。According to an embodiment of the present disclosure, the determining a target noise ratio based on the first audio data, the second audio data, and the orientation of the main audio source further includes acquiring the target noise ratio of each frequency point in the frequency spectrum. By performing the above operations on each frequency point, the target noise ratio of each frequency point in the spectrum can be obtained, so that each frequency on the audio frame spectrum can be filtered to obtain the target audio data, so that the target audio data can be better extracted from the environment The signal of the desired source in a noisy environment.

图3示出根据本公开实施例的获取目标音频数据的流程图。如图3所示，步骤S140可以包括以下步骤：FIG. 3 shows a flowchart of acquiring target audio data according to an embodiment of the present disclosure. As shown in FIG. 3, step S140 may include the following steps:

在步骤S310中，基于所述目标噪音比，对所述第一音频数据的频谱和/或所述第二音频数据的频谱进行滤波；In step S310, filtering the frequency spectrum of the first audio data and/or the frequency spectrum of the second audio data based on the target noise ratio;

在步骤S320中，从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据和/或从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；In step S320, the time domain representation of the first audio data is obtained from the spectrum of the filtered first audio data as third audio data and/or the second audio is obtained from the spectrum of the filtered second audio data a time domain representation of the data as fourth audio data;

在步骤S330中，基于所述第三音频数据和/或所述第四音频数据以获取目标音频数据。In step S330, target audio data is acquired based on the third audio data and/or the fourth audio data.

根据本公开实施例，在步骤S320中，可以通过傅里叶逆变换，将滤波后的第一音频数据的频谱和/或滤波后的第二音频数据的频谱相应转换为时域表示的第三音频数据和/或第四音频数据。例如，对于滤波后的第一音频数据的频谱，可以通过傅里叶逆变换得到每个音频帧的时域表示，然后可以通过加窗操作处理并将多个帧前后叠加得到时域表示的第三音频数据。加窗操作可以使得叠加后的连续音频数据在帧与帧之间是相对平滑而连续的。According to an embodiment of the present disclosure, in step S320, the frequency spectrum of the filtered first audio data and/or the frequency spectrum of the filtered second audio data may be correspondingly converted into a third time-domain representation through inverse Fourier transform. audio data and/or fourth audio data. For example, for the spectrum of the filtered first audio data, the time-domain representation of each audio frame can be obtained through inverse Fourier transform, and then a windowing operation can be used to process and superimpose multiple frames before and after to obtain the time-domain representation of the first audio frame. Three audio data. The windowing operation can make the superimposed continuous audio data relatively smooth and continuous between frames.

根据本公开实施例，步骤S330有多种实现方式的。例如，可以采用步骤S330将第三音频数据和第四音频数据叠加为一个目标音频数据，向用户输出叠加后的目标音频数据。或者，可以采用步骤S330，将第三音频数据和第四音频数据共同作为目标音频数据向用户输出，例如可以通过左右双声道的方式向用户输出声音，或者从第三音频数据或第四音频数据中任选一个作为目标音频数据向用户输出。According to the embodiment of the present disclosure, step S330 can be implemented in various manners. For example, step S330 may be used to superimpose the third audio data and the fourth audio data into one target audio data, and output the superimposed target audio data to the user. Alternatively, step S330 may be adopted to output the third audio data and the fourth audio data together as target audio data to the user. Any one of the data is output to the user as the target audio data.

根据本公开实施例的技术方案，通过基于所述目标噪音比，对所述第一音频数据的频谱和/或所述第二音频数据的频谱进行滤波；从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据，和/或从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；基于所述第三音频数据和/或所述第四音频数据以获取目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, filtering the frequency spectrum of the first audio data and/or the frequency spectrum of the second audio data based on the target noise ratio; obtaining a time-domain representation of the first audio data as third audio data, and/or obtaining a time-domain representation of the second audio data from the spectrum of the filtered second audio data as fourth audio data; based on the The third audio data and/or the fourth audio data are used to obtain the target audio data, which improves the estimation accuracy of the noise parameter, so that the signal of the desired audio source can be better extracted from the environment.

图4A示出根据本公开实施例的对所述第一音频数据和第二音频数据进行滤波的流程图。如图4A所示，步骤S310可以包括以下步骤：4A shows a flowchart of filtering the first audio data and the second audio data according to an embodiment of the present disclosure. As shown in FIG. 4A, step S310 may include the following steps:

在步骤S410中，获取期望音源的方位范围；In step S410, obtain the azimuth range of the desired sound source;

在步骤S420中，基于所述主音源方位和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果，所述当前音频数据为所述第一音频数据或所述第二音频数据；In step S420, based on the azimuth of the main audio source and the azimuth range of the desired audio source, obtain a judgment result that the current audio data is desired audio data or undesired audio data, and the current audio data is the first audio data or the second audio data;

在步骤S430中，基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数；In step S430, the spatial filter coefficients are updated based on the judgment result, the current audio data and the target noise ratio;

在步骤S440中，通过更新后的空域滤波器系数对所述当前音频数据滤波。In step S440, the current audio data is filtered through the updated spatial filter coefficients.

根据本公开实施例，电子设备上可以设置两个麦克风，形成麦克风阵列。根据电子设备的常用使用状态，可以预先划定用户的口部相对于麦克风所应当处于的方向范围作为期望音源的方位范围。According to the embodiment of the present disclosure, two microphones may be provided on the electronic device to form a microphone array. According to the common use state of the electronic device, the range of the direction in which the user's mouth should be positioned relative to the microphone can be pre-defined as the range of the desired sound source.

例如，可以在两个麦克风之间做一条有向连线，将过两个麦克风之间的有向连线的中点，且与该有向连线的方向所形成的角度在预设范围内的角度区域作为期望音源的方位范围。期望音源的方位范围可以是对称的，例如45度～135度，因而期望音源的方位范围的边界在空间中可形成一个二次锥面。又如，期望音源的方位范围可以是非对称的，例如，可以设置为40度～180度，在方位为160度时，确定为期望数据，但在方位为20度时，确定为非期望数据。根据本公开的实施例，音源的方位定义为从音源位置到两个麦克风之间有向连线的中点的连线与所述有向连线的方向之间的夹角。在实际应用中，该期望音源的方位范围可以根据需要进行设定。For example, a directional line can be made between two microphones, which will pass through the midpoint of the directional line between the two microphones, and the angle formed with the direction of the directional line is within a preset range. The angle area of is used as the azimuth range of the desired sound source. The azimuth range of the desired sound source may be symmetrical, for example, 45 degrees to 135 degrees, so the boundary of the azimuth range of the desired sound source may form a quadratic cone in space. For another example, the azimuth range of the desired sound source can be asymmetric, for example, it can be set to 40 degrees to 180 degrees. When the azimuth is 160 degrees, it is determined as expected data, but when the azimuth is 20 degrees, it is determined as undesired data. According to an embodiment of the present disclosure, the orientation of the sound source is defined as the included angle between a line from the position of the sound source to the midpoint of the directional line between the two microphones and the direction of the directional line. In practical applications, the azimuth range of the desired sound source can be set as required.

例如，如图4B所示，M₁和M₂示意出了两个麦克风的位置，M₀为M₁、M₂的有向连线的中点。将经过M₀，且与该有向连线的方向所形成的角度在预设范围内(θ₁和θ₂之间)的角度区域作为期望音源的方位范围，如图4B中虚线所示。例如，θ₁可以是45度，θ₂可以是135度。图4B仅仅是二维示意图，在三维空间的实际应用中，虚线可以以有向连线为轴，在三维空间中旋转，以确定轴对称的空间区域作为期望音源的方位范围。For example, as shown in FIG. 4B , M ₁ and M ₂ indicate the positions of the two microphones, and M ₀ is the midpoint of the directional connection line between M ₁ and M ₂ . The angular region passing through M ₀ and the angle formed with the direction of the directional connection line is within a preset range (between θ ₁ and θ ₂ ) as the azimuth range of the desired sound source, as shown by the dotted line in FIG. 4B . For example, θ ₁ can be 45 degrees and θ ₂ can be 135 degrees. FIG. 4B is only a two-dimensional schematic diagram. In a practical application in a three-dimensional space, the dotted line can be rotated in the three-dimensional space with a directional connection line as an axis to determine an axially symmetric spatial region as the azimuth range of the desired sound source.

根据本公开实施例，当前音频数据为音频帧的数据。对于每个音频帧，如上文所述，确定该音频帧的主音源方位是否在期望音源的方位范围内。如果在期望音源的方位范围内，则确定该帧为期望帧，如果不在期望音源的方位范围内，则确定该帧不是期望帧。根据是否为期望帧的判断结果以及前述步骤确定的目标噪音比，可以更新空域滤波器系数，并基于更新后的滤波器系数对该音频帧进行滤波。According to an embodiment of the present disclosure, the current audio data is data of an audio frame. For each audio frame, as described above, it is determined whether the orientation of the main audio source for that audio frame is within the orientation range of the desired audio source. If it is within the azimuth range of the expected sound source, it is determined that the frame is an expected frame, and if it is not within the azimuth range of the expected sound source, it is determined that the frame is not an expected frame. According to the judgment result of whether it is a desired frame and the target noise ratio determined in the preceding steps, the spatial filter coefficients can be updated, and the audio frame is filtered based on the updated filter coefficients.

根据本公开实施例的技术方案，通过获取期望音源的方位范围；基于所述主音源方位和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果；基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数；通过更新后的空域滤波器系数对所述当前音频数据滤波，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the azimuth range of the desired audio source is obtained; based on the azimuth range of the main audio source and the azimuth range of the desired audio source, the judgment result of whether the current audio data is desired audio data or undesired audio data is obtained; The judgment result, the current audio data, and the target noise ratio update the spatial filter coefficients; the current audio data is filtered by the updated spatial filter coefficients, which improves the estimation accuracy of noise parameters, so that it can be more accurate. The signal of the desired source is well extracted from the environment.

图5示出根据本公开实施例的更新空域滤波器系数的示意图。FIG. 5 shows a schematic diagram of updating spatial filter coefficients according to an embodiment of the present disclosure.

如图5所示，所述基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数，包括：在所述当前音频数据为期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的全局协方差矩阵；在所述当前音频数据为非期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的噪音协方差矩阵和所述全局协方差矩阵；基于所述噪音协方差矩阵和所述全局协方差矩阵更新空域滤波器系数。As shown in FIG. 5 , the updating the spatial filter coefficients based on the judgment result, the current audio data and the target noise ratio includes: in the case that the current audio data is expected audio data, based on the The current audio data and the target noise ratio update the global covariance matrix of the current audio data; when the current audio data is undesired audio data, update the current audio data and the target noise ratio based on the current audio data and the target noise ratio. the noise covariance matrix and the global covariance matrix of the current audio data; and the spatial filter coefficients are updated based on the noise covariance matrix and the global covariance matrix.

根据本公开的实施例，噪音协方差矩阵用于表示噪音信号的协方差，全局协方差矩阵用于表示整体信号(包括噪音信号和非噪音信号)的协方差。According to an embodiment of the present disclosure, the noise covariance matrix is used to represent the covariance of the noise signal, and the global covariance matrix is used to represent the covariance of the whole signal (including the noise signal and the non-noise signal).

例如，假设当前帧在某一频率的信号值是X，目标噪音比为0.25，那么表示当前值X中有(1/(1+0.25))的成分是噪音，即，非期望成分。可以用以上方法计算矩阵中的元素，实现对协方差矩阵的更新。根据最小方差无失真响应法MVDR算法，噪音协方差矩阵和所述全局协方差矩阵可以用于确定空滤滤波器系数，从而可以通过MVDR算法对当前音频数据进行滤波。For example, assuming that the signal value of the current frame at a certain frequency is X, and the target noise ratio is 0.25, it means that the component of (1/(1+0.25)) in the current value X is noise, that is, an undesired component. The elements in the matrix can be calculated by the above method to realize the update of the covariance matrix. According to the minimum variance undistorted response method MVDR algorithm, the noise covariance matrix and the global covariance matrix can be used to determine the coefficients of the air filter filter, so that the current audio data can be filtered by the MVDR algorithm.

根据本公开实施例的技术方案，通过在所述当前音频数据为期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的全局协方差矩阵；在所述当前音频数据为非期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的噪音协方差矩阵和所述全局协方差矩阵；基于所述噪音协方差矩阵和所述全局协方差矩阵更新空域滤波器系数，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, when the current audio data is desired audio data, the global covariance matrix of the current audio data is updated based on the current audio data and the target noise ratio; When the current audio data is undesired audio data, the noise covariance matrix and the global covariance matrix of the current audio data are updated based on the current audio data and the target noise ratio; based on the noise covariance The matrix and the global covariance matrix update the coefficients of the spatial filter, which improves the estimation accuracy of the noise parameters, so that the signal of the desired sound source can be better extracted from the environment.

图6示出根据本公开另一实施例的音频处理方法的示意图。FIG. 6 shows a schematic diagram of an audio processing method according to another embodiment of the present disclosure.

如图6所示，麦克风1采集的声道1的音频数据为y_1(m)，麦克风2采集的声道2的音频数据为y_2(m)。分别通过分帧，加窗以及傅里叶变换(FFT)得到频域表示的音频帧Y_1(ω,k)和Y_2(ω,k)。一方面，对Y_1(ω,k)和Y_2(ω,k)分别进行音源定位，确定每个音频帧的主音源方位，根据期望音源的方位范围(即图6中的拾音范围)进行拾音范围决策，确定是否为期望帧，例如可以通过Flag(标识符)标记。另一方面，对Y_1(ω,k)和Y_2(ω,k)计算互能量谱密度Φ₁₂(ω,k)和自能量谱密度Φ₁₁(ω,k)和Φ₂₂(ω,k)，并进一步确定目标噪音比TNR，即前文所述的

根据Flag、TNR、Y_1(ω,k)和Y_2(ω,k)，可以确定MVDR算法所需要的空域滤波器系数W(ω,k)。根据W(ω,k)对Y_1(ω,k)和Y_2(ω,k)进行滤波，并通过傅里叶逆变换(IFFT)、叠加，最终得到目标音频数据y(m)，并输出该y(m)。As shown in FIG. 6 , the audio data of channel 1 collected by microphone 1 is y_1(m), and the audio data of channel 2 collected by microphone 2 is y_2(m). The audio frames Y_1(ω,k) and Y_2(ω,k) represented in the frequency domain are obtained by framing, windowing and Fourier transform (FFT) respectively. On the one hand, perform sound source localization on Y_1(ω,k) and Y_2(ω,k) respectively, determine the orientation of the main sound source of each audio frame, and pick up the sound source according to the orientation range of the desired sound source (that is, the pickup range in Figure 6). Tone range decision, to determine whether it is a desired frame, for example, it can be marked by Flag (identifier). On the other hand, the mutual energy spectral density Φ ₁₂ (ω, k) and the self energy spectral density Φ ₁₁ (ω, k) and Φ ₂₂ (ω, k) are calculated for Y_1(ω,k) and Y_2(ω,k) , and further determine the target-to-noise ratio TNR, that is, the aforementioned

According to Flag, TNR, Y_1(ω,k) and Y_2(ω,k), the spatial filter coefficient W(ω,k) required by the MVDR algorithm can be determined. Y_1(ω,k) and Y_2(ω,k) are filtered according to W(ω,k), and the target audio data y(m) is finally obtained through inverse Fourier transform (IFFT) and superposition, and output the y(m).

本公开实施例的技术方案通过利用双麦克风阵列在MVDR波束成形基础上增加目标-噪音比估计器，能够精确估计拾音区域内期望信号和非期望噪音信号之间的信噪比，提升了噪音参数的估计准确率，基于双麦克风阵列实现了嘈杂环境中期望音源的定向拾取。同时，由于不依赖于音源定位的准确结果，只判断是否在期望音源的入射区域内，降低了对音源定位的要求，在嘈杂环境下音源定位不精确的情况下，也能够较好地提取出期望成分。The technical solutions of the embodiments of the present disclosure can accurately estimate the signal-to-noise ratio between the desired signal and the undesired noise signal in the pickup area by adding a target-to-noise ratio estimator on the basis of MVDR beamforming by using a dual-microphone array, thereby improving noise The estimation accuracy of the parameters is based on the dual microphone array to achieve the directional pickup of the desired sound source in the noisy environment. At the same time, because it does not depend on the accurate result of sound source localization, it only judges whether it is in the incident area of the desired sound source, which reduces the requirements for sound source localization. desired ingredients.

图7示出根据本公开的实施例的音频处理装置700的框图。其中，该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。FIG. 7 shows a block diagram of an audio processing apparatus 700 according to an embodiment of the present disclosure. Wherein, the apparatus may be realized by software, hardware or a combination of the two to become part or all of the electronic device.

如图7所示，所述音频处理装置700包括第一获取模块710、第一确定模块720、第二确定模块730和第三获取模块740。As shown in FIG. 7 , the audio processing apparatus 700 includes a first obtaining module 710 , a first determining module 720 , a second determining module 730 and a third obtaining module 740 .

第一获取模块710，被配置为获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；a first acquiring module 710, configured to acquire first audio data collected by a first microphone and second audio data corresponding to the first audio data collected by a second microphone;

第一确定模块720，被配置为确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；The first determination module 720 is configured to determine the main sound source position of the first audio data and the second audio data, and the main sound source position includes locating the first audio data and the second audio data. The position of the sound source whose probability meets the preset conditions among the positions of the multiple sound sources;

第二确定模块730，被配置为基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；以及The second determining module 730 is configured to determine a target noise ratio based on the first audio data, the second audio data and the main audio source position, where the target noise ratio represents the first audio data and the second audio the ratio of the respective expected signal energy to the undesired signal energy of the data; and

第二获取模块740，被配置为基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据。A second obtaining module 740, configured to filter the first audio data and/or the second audio data based on the target-to-noise ratio, and obtain a target based on the filtered first audio data and/or the second audio data audio data.

根据本公开实施例的技术方案，通过第一获取模块，被配置为获取第一麦克风采集的第一音频数据和第二麦克风采集的与所述第一音频数据相对应的第二音频数据；第一确定模块，被配置为确定所述第一音频数据和所述第二音频数据的主音源方位，所述主音源方位包括对所述第一音频数据和所述第二音频数据定位出的多个音源方位中概率符合预设条件的音源方位；第二确定模块，被配置为基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述第一音频数据和所述第二音频数据各自的期望信号能量与非期望信号能量的比值；第二获取模块，被配置为基于所述目标噪音比，对所述第一音频数据和/或第二音频数据进行滤波并基于滤波后的第一音频数据和/或第二音频数据获取目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the first acquisition module is configured to acquire the first audio data collected by the first microphone and the second audio data corresponding to the first audio data collected by the second microphone; a determining module, configured to determine the main sound source location of the first audio data and the second audio data, the main sound source location including the location of the first audio data and the second audio data. Among the sound source azimuths, a sound source azimuth whose probability meets a preset condition; the second determining module is configured to determine a target noise ratio based on the first audio data, the second audio data and the main sound source azimuth, and the target noise ratio represents The ratio of the desired signal energy to the undesired signal energy of the first audio data and the second audio data; a second acquisition module, configured to based on the target noise ratio, for the first audio data and/or or the second audio data is filtered and the target audio data is obtained based on the filtered first audio data and/or the second audio data, which improves the estimation accuracy of the noise parameter, so that the signal of the desired audio source can be better extracted from the environment .

根据本公开实施例，所述装置还包括第三获取模块，被配置为在确定目标噪音比之前，获取所述第一音频数据的频谱和所述第二音频数据的频谱。According to an embodiment of the present disclosure, the apparatus further includes a third acquisition module configured to acquire the frequency spectrum of the first audio data and the frequency spectrum of the second audio data before determining the target noise ratio.

根据本公开实施例的技术方案，通过第三获取模块，被配置为在确定目标噪音比之前，获取所述第一音频数据的频谱和所述第二音频数据的频谱，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the third acquisition module is configured to acquire the frequency spectrum of the first audio data and the frequency spectrum of the second audio data before determining the target noise ratio, which improves the estimation of noise parameters accuracy, so that the signal of the desired sound source can be better extracted from the environment.

图8示出根据本公开实施例的第二确定模块800的框图。FIG. 8 shows a block diagram of a second determination module 800 according to an embodiment of the present disclosure.

如图8所示，所述第二确定模块800可以包括第一确定子模块810和第二确定子模块820。As shown in FIG. 8 , the second determination module 800 may include a first determination sub-module 810 and a second determination sub-module 820 .

第一确定子模块810，被配置为针对指定频点，确定所述第一音频数据的频谱与所述第二音频数据的频谱之间的相关函数；a first determination sub-module 810, configured to determine a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data for a specified frequency point;

第二确定子模块820，被配置为基于所述相关函数和主音源方位确定所述指定频点的目标噪音比。The second determination sub-module 820 is configured to determine the target noise ratio of the specified frequency point based on the correlation function and the position of the main sound source.

根据本公开实施例的技术方案，通过第一确定子模块，被配置为针对指定频点，确定所述第一音频数据的频谱与所述第二音频数据的频谱之间的相关函数；第二确定子模块，被配置为基于所述相关函数和主音源方位确定所述指定频点的目标噪音比，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the first determination sub-module is configured to determine the correlation function between the spectrum of the first audio data and the spectrum of the second audio data for a specified frequency point; the second The determining submodule is configured to determine the target noise ratio of the specified frequency point based on the correlation function and the orientation of the main sound source, which improves the estimation accuracy of the noise parameter, so that the signal of the desired sound source can be better extracted from the environment.

根据本公开实施例，所述第二确定子模块820可以包括：According to an embodiment of the present disclosure, the second determination submodule 820 may include:

根据本公开实施例的技术方案，通过第一确定单元，被配置为确定所述相关函数的实部的期望信号成分表示和非期望信号成分表示；第二确定单元，被配置为确定所述相关函数的虚部的期望信号成分表示和非期望信号成分表示；第三确定单元，被配置为基于所述相关函数的实部的期望信号成分表示和非期望信号成分表示、所述相关函数的虚部的期望信号成分表示和非期望信号成分表示，和所述主音源方位，确定所述指定频点的目标噪音比，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the first determination unit is configured to determine the desired signal component representation and the undesired signal component representation of the real part of the correlation function; the second determination unit is configured to determine the correlation a desired signal component representation and an undesired signal component representation of the imaginary part of the function; a third determination unit configured to represent the desired signal component representation and the undesired signal component representation of the real part of the correlation function, the imaginary signal component representation of the correlation function The expected and undesired signal component representations of the part, and the orientation of the main sound source, determine the target-to-noise ratio of the specified frequency point, improve the estimation accuracy of noise parameters, and thus can better extract the desired signal from the environment. source signal.

根据本公开实施例，所述第二确定模块还包括第一获取子模块，被配置为获取所述频谱中各个频点的目标噪音比。According to an embodiment of the present disclosure, the second determination module further includes a first acquisition sub-module configured to acquire the target-to-noise ratio of each frequency point in the frequency spectrum.

根据本公开实施例的技术方案，通过第一获取子模块，被配置为获取所述频谱中各个频点的目标噪音比，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the first obtaining sub-module is configured to obtain the target noise ratio of each frequency point in the frequency spectrum, so that the signal of the desired sound source can be better extracted from the environment.

图9示出根据本公开实施例的第二获取模块900的框图。FIG. 9 shows a block diagram of a second acquisition module 900 according to an embodiment of the present disclosure.

如图9所示，所述第二获取模块900可以包括滤波子模块910、第二获取子模块920和第三获取子模块930。As shown in FIG. 9 , the second obtaining module 900 may include a filtering sub-module 910 , a second obtaining sub-module 920 and a third obtaining sub-module 930 .

滤波子模块910，被配置为基于所述目标噪音比，对所述第一音频数据的频谱和所述第二音频数据的频谱进行滤波；a filtering sub-module 910, configured to filter the frequency spectrum of the first audio data and the frequency spectrum of the second audio data based on the target-to-noise ratio;

第二获取子模块920，被配置为从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据，和/或从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；A second obtaining sub-module 920, configured to obtain a time-domain representation of the first audio data as third audio data from a frequency spectrum of the filtered first audio data, and/or from a frequency spectrum of the filtered second audio data obtaining a time domain representation of the second audio data as the fourth audio data;

第三获取子模块930，被配置为基于所述第三音频数据和/或所述第四音频数据以获取目标音频数据。The third obtaining sub-module 930 is configured to obtain target audio data based on the third audio data and/or the fourth audio data.

根据本公开实施例的技术方案，通过滤波子模块，被配置为基于所述目标噪音比，对所述第一音频数据的频谱和所述第二音频数据的频谱进行滤波；第二获取子模块，被配置为从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据，和/或从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；第三获取子模块，被配置为基于所述第三音频数据和/或所述第四音频数据以获取目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the filtering sub-module is configured to filter the frequency spectrum of the first audio data and the frequency spectrum of the second audio data based on the target noise ratio; the second obtaining sub-module , configured to obtain a time-domain representation of the first audio data from the spectrum of the filtered first audio data as third audio data, and/or to obtain the second audio data from the spectrum of the filtered second audio data The time domain representation of the data is used as the fourth audio data; the third acquisition sub-module is configured to acquire the target audio data based on the third audio data and/or the fourth audio data, which improves the estimation accuracy of the noise parameter , so that the signal of the desired sound source can be better extracted from the environment.

图10示出根据本公开实施例的滤波子模块1000的框图。FIG. 10 shows a block diagram of a filtering sub-module 1000 according to an embodiment of the present disclosure.

如图10所示，所述滤波子模块1000可以包括第一获取单元1010、第二获取单元1020、更新单元1030和滤波单元1040。As shown in FIG. 10 , the filtering sub-module 1000 may include a first obtaining unit 1010 , a second obtaining unit 1020 , an updating unit 1030 and a filtering unit 1040 .

第一获取单元1010，被配置为获取期望音源的方位范围；The first obtaining unit 1010 is configured to obtain the azimuth range of the desired sound source;

第二获取单元1020，被配置为基于所述当前音频数据的音源位置和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果，所述当前音频数据为所述第一音频数据或所述第二音频数据；The second obtaining unit 1020 is configured to obtain a judgment result of whether the current audio data is desired audio data or undesired audio data based on the position of the audio source of the current audio data and the azimuth range of the desired audio source, where the current audio data is the first audio data or the second audio data;

更新单元1030，被配置为基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数；an update unit 1030, configured to update the spatial filter coefficient based on the judgment result, the current audio data and the target noise ratio;

滤波单元1040，被配置为通过更新后的空域滤波器系数对所述当前音频数据滤波。The filtering unit 1040 is configured to filter the current audio data through the updated spatial filter coefficients.

根据本公开实施例的技术方案，通过第一获取单元，被配置为获取期望音源的方位范围；第二获取单元，被配置为基于所述当前音频数据的音源位置和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果；更新单元，被配置为基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数；滤波单元，被配置为通过更新后的空域滤波器系数对所述当前音频数据滤波，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the first acquisition unit is configured to acquire the azimuth range of the desired sound source; the second acquisition unit is configured to be based on the audio source position of the current audio data and the azimuth range of the desired audio source , obtain the judgment result that the current audio data is expected audio data or undesired audio data; the updating unit is configured to update the spatial filter coefficient based on the judgment result, the current audio data and the target noise ratio; the filtering unit, It is configured to filter the current audio data through the updated spatial filter coefficients, which improves the estimation accuracy of the noise parameter, so that the signal of the desired audio source can be better extracted from the environment.

根据本公开实施例，所述更新单元包括：According to an embodiment of the present disclosure, the update unit includes:

根据本公开实施例的技术方案，通过第一更新子单元，被配置为在所述当前音频数据为期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的全局协方差矩阵；第二更新子单元，被配置为在所述当前音频数据为非期望音频数据的情况下，基于所述当前音频数据和所述目标噪音比更新所述当前音频数据的噪音协方差矩阵和所述全局协方差矩阵；第三更新子单元，被配置为基于所述噪音协方差矩阵和所述全局协方差矩阵更新空域滤波器系数，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the first update subunit is configured to update the current audio based on the current audio data and the target noise ratio when the current audio data is desired audio data The global covariance matrix of the data; the second update subunit is configured to update the current audio data based on the current audio data and the target noise ratio when the current audio data is undesired audio data. a noise covariance matrix and the global covariance matrix; a third update subunit is configured to update the spatial filter coefficients based on the noise covariance matrix and the global covariance matrix, which improves the estimation accuracy of noise parameters, Thereby, the signal of the desired sound source can be better extracted from the environment.

图11示出根据本公开另一实施例的音频处理方法的流程图。FIG. 11 shows a flowchart of an audio processing method according to another embodiment of the present disclosure.

如图11所示，该音频处理方法包括以下步骤S1110～S1140：As shown in FIG. 11, the audio processing method includes the following steps S1110-S1140:

在步骤S1110中，获取N个麦克风分别采集的彼此对应的N个音频数据，N≥3；In step S1110, N pieces of audio data corresponding to each other collected by N microphones are acquired, N≥3;

在步骤S1120中，基于所述N个音频数据确定一个或多个音频数据对；In step S1120, one or more pairs of audio data are determined based on the N pieces of audio data;

在步骤S1130中，对于每个音频数据对进行处理以获取音频数据对所对应的滤波后的音频数据。具体地，对于每个音频数据对，确定所述音频数据对所对应的音频数据的主音源方位，所述主音源方位包括对所述音频数据对定位出的多个音源方位中概率符合预设条件的音源方位；基于所述音频数据对所对应的音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述音频数据对所对应的音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述音频数据对所对应的音频数据进行滤波以获取滤波后的音频数据；In step S1130, each audio data pair is processed to obtain filtered audio data corresponding to the audio data pair. Specifically, for each audio data pair, determine the main sound source position of the audio data corresponding to the audio data pair, where the main sound source position includes the probability that the positions of the multiple audio sources located for the audio data pair meet a preset probability The sound source orientation of the condition; based on the audio data corresponding to the audio data pair and the main sound source orientation, a target noise ratio is determined, and the target noise ratio represents the expected signal energy and the The ratio of expected signal energy; based on the target-to-noise ratio, filtering the audio data to the corresponding audio data to obtain filtered audio data;

在步骤S1140中，基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据。In step S1140, target audio data is determined based on the filtered audio data obtained from the one or more pairs of audio data.

根据本公开实施例的技术方案，通过获取N个麦克风分别采集的彼此对应的N个音频数据，N≥3；基于所述N个音频数据确定一个或多个音频数据对；对于每个音频数据对，确定所述音频数据对所对应的音频数据的主音源方位，所述主音源方位包括对所述音频数据对定位出的多个音源方位中概率符合预设条件的音源方位；基于所述音频数据对所对应的音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述音频数据对所对应的音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述音频数据对所对应的音频数据进行滤波以获取滤波后的音频数据；基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, N pieces of audio data corresponding to each other collected by N microphones are acquired, N≥3; one or more pairs of audio data are determined based on the N pieces of audio data; for each audio data Yes, determine the position of the main sound source of the audio data corresponding to the audio data pair, and the main sound source position includes the position of the audio source whose probability meets the preset condition among the plurality of sound source positions located for the audio data pair; based on the A target noise ratio is determined for the audio data corresponding to the audio data pair and the position of the main sound source, and the target noise ratio represents the ratio of the expected signal energy to the undesired signal energy of the audio data corresponding to the audio data pair; the target-to-noise ratio, filtering the audio data corresponding to the audio data pair to obtain filtered audio data; determining target audio data based on the filtered audio data obtained from the one or more audio data pairs , which improves the estimation accuracy of noise parameters, so that the signal of the desired sound source can be better extracted from the environment.

图11所示意的实施例与上文参考图1至图6所描述的实施例的区别在于，在如图11所示意的实施例中，麦克风的数量为三个以上。本公开实施例的方法通过步骤S1120，基于所述N个音频数据确定一个或多个音频数据对，将问题转化为如图1至图6所描述的情况，通过与前述实施例类似的方式，步骤S1130基于目标噪音比对音频数据对中的音频进行滤波，在步骤S1140，根据滤波后的结果确定目标音频数据。The difference between the embodiment shown in FIG. 11 and the embodiments described above with reference to FIGS. 1 to 6 is that in the embodiment shown in FIG. 11 , the number of microphones is three or more. The method of this embodiment of the present disclosure determines one or more audio data pairs based on the N pieces of audio data through step S1120, and transforms the problem into the situation described in FIG. 1 to FIG. In step S1130, the audio in the audio data pair is filtered based on the target noise ratio, and in step S1140, the target audio data is determined according to the filtered result.

其中，在步骤S1120，根据本公开实施例，所述基于所述多个音频数据确定一个或多个音频数据对包括：Wherein, in step S1120, according to an embodiment of the present disclosure, the determining one or more pairs of audio data based on the plurality of audio data includes:

根据本公开实施例的技术方案，通过所述基于所述多个音频数据确定一个或多个音频数据对包括：根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对；或者，将所述多个音频数据中的任意两个音频数据组成音频数据对，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the determining one or more pairs of audio data based on the plurality of audio data includes: determining the one or more pairs of audio data according to the positional relationship of the N microphones; Alternatively, any two audio data in the plurality of audio data are formed into an audio data pair, which improves the estimation accuracy of the noise parameter, so that the signal of the desired audio source can be better extracted from the environment.

例如，所述根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对包括：For example, determining the one or more pairs of audio data according to the positional relationship of the N microphones includes:

根据本公开实施例的技术方案，通过所述根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对包括：若所述N个麦克风以线性方式布置，则选择距离所述N个麦克风构成的阵列的几何中心点最近的两个麦克风对应的音频数据组成音频数据对，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, determining the one or more pairs of audio data according to the positional relationship of the N microphones includes: if the N microphones are arranged in a linear manner, selecting a distance from the N microphones The audio data corresponding to two microphones whose geometric center points are closest to the array formed by N microphones form an audio data pair, which improves the estimation accuracy of noise parameters, so that the signal of the desired sound source can be better extracted from the environment.

例如，在如图12A所示的实施例中，偶数个麦克风(M₁、M₂、M₃、M₄)以线性方式布置，可以选择距离所述N个麦克风构成的阵列的几何中心点最近的两个麦克风(M₂-M₃)确定为一个音频数据对。For example, in the embodiment shown in FIG. 12A , an even number of microphones (M ₁ , M ₂ , M ₃ , M ₄ ) are arranged in a linear manner, and the closest point to the geometric center point of the array formed by the N microphones can be selected The two microphones (M ₂ -M ₃ ) of are identified as an audio data pair.

又如，在如图12B所示的实施例中，三个麦克风M₁、M₂、M₃以线性方式布置，可以选择距离所述N个麦克风构成的阵列的几何中心点最近的M₁-M₂或M₂-M₃其中之一确定为一个音频数据对。For another example, in the embodiment shown in FIG. 12B , the three microphones M ₁ , M ₂ , and M ₃ are arranged in a linear manner, and M ₁ − which is closest to the geometric center point of the array formed by the N microphones can be selected. One of M ₂ or M ₂ -M ₃ is determined as an audio data pair.

根据本公开实施例，在N个麦克风以非线性方式布置时，也可以选择靠近中心位置的两个麦克风组成音频数据对。According to the embodiment of the present disclosure, when the N microphones are arranged in a non-linear manner, two microphones close to the center position may also be selected to form an audio data pair.

根据本公开实施例，也可以将所述多个音频数据中的任意两个音频数据组成音频数据对。例如，在如图12C所示的实施例中，三个麦克风M₁、M₂、M₃以中心对称的方式布置，可以选择M₁-M₂、M₂-M₃、M₁-M₃分别确定为音频数据对，共确定三个音频数据对，在步骤S1130中对该三个音频数据对分别处理。According to the embodiment of the present disclosure, any two audio data in the plurality of audio data may also be formed into an audio data pair. For example, in the embodiment shown in FIG. 12C , the three microphones M ₁ , M ₂ , M ₃ are arranged in a center-symmetric manner, and M ₁ -M ₂ , M ₂ -M ₃ , M ₁ -M ₃ can be selected The audio data pairs are respectively determined, three audio data pairs are determined in total, and the three audio data pairs are processed respectively in step S1130.

根据本公开实施例，所述基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据，包括：According to an embodiment of the present disclosure, the determining of target audio data based on the filtered audio data obtained from the one or more pairs of audio data includes:

根据本公开实施例的技术方案，通过所述基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据，包括：通过对从所述一个或多个音频数据对获取的所述滤波后的音频数据进行加权求和以获取目标音频数据；或者，在从所述一个或多个音频数据对获取的所述滤波后的音频数据中，选择与预设位置的麦克风相对应的滤波后的音频数据作为目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, determining target audio data based on the filtered audio data obtained from the one or more pairs of audio data includes: Weighted summation is performed on the obtained filtered audio data to obtain target audio data; or, in the filtered audio data obtained from the one or more pairs of audio data, a selection corresponding to a preset position is selected. The filtered audio data corresponding to the microphone is used as the target audio data, which improves the estimation accuracy of the noise parameters, so that the signal of the desired audio source can be better extracted from the environment.

根据本公开实施例，对一个或多个音频数据对处理之后，得到至少两个麦克风的滤波后的音频数据，可以从中选择一个麦克风对应的滤波后的音频数据作为目标音频数据输出，或者，对两个以上的麦克风对应的滤波后的音频数据加权求和处理，以获取目标音频数据。例如，在如图12A所示的实施例中，可以选择麦克风M₂或麦克风M₃对应的滤波后的音频数据作为目标音频数据输出，或者可以对麦克风M₂或麦克风M₃对应的滤波后的音频数据加权处理，将处理结果作为目标音频数据输出；在如图12B所示的实施例中，可以选择麦克风M₂对应的滤波后的音频数据作为目标音频数据输出；在如图12C所示的实施例中，可以对麦克风M₁、麦克风M₂以及麦克风M₃对应的滤波后的音频数据加权求和处理，将处理结果作为目标音频数据输出。According to an embodiment of the present disclosure, after processing one or more pairs of audio data, filtered audio data of at least two microphones are obtained, and the filtered audio data corresponding to one microphone may be selected as the target audio data for output, or, The filtered audio data corresponding to the two or more microphones are weighted and summed to obtain target audio data. For example, in the embodiment shown in FIG. 12A , the filtered audio data corresponding to the microphone M ₂ or the microphone M ₃ can be selected as the target audio data output, or the filtered audio data corresponding to the microphone M ₂ or the microphone M ₃ can be output. Audio data weighting processing, the processing result is output as the target audio data; in the embodiment shown in Figure 12B, the filtered audio data corresponding to the microphone M ₂ can be selected as the target audio data output; as shown in Figure 12C In the embodiment, the filtered audio data corresponding to the microphone M ₁ , the microphone M ₂ and the microphone M ₃ may be weighted and summed, and the processing result may be output as the target audio data.

图13示出根据本公开另一实施例的音频处理装置的框图。其中，该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。FIG. 13 shows a block diagram of an audio processing apparatus according to another embodiment of the present disclosure. Wherein, the apparatus may be realized by software, hardware or a combination of the two to become part or all of the electronic device.

如图13所示，所述音频处理装置1300包括第四获取模块1310、第三确定模块1320、滤波模块1330和第四确定模块1340。As shown in FIG. 13 , the audio processing apparatus 1300 includes a fourth obtaining module 1310 , a third determining module 1320 , a filtering module 1330 and a fourth determining module 1340 .

第四获取模块1310，被配置为获取N个麦克风分别采集的彼此对应的N个音频数据，N≥3；The fourth acquisition module 1310 is configured to acquire N pieces of audio data corresponding to each other collected by the N microphones, N≥3;

第三确定模块1320，被配置为基于所述N个音频数据确定一个或多个音频数据对；a third determining module 1320, configured to determine one or more pairs of audio data based on the N pieces of audio data;

滤波模块1330，被配置为对于每个音频数据对，确定所述音频数据对所对应的音频数据的主音源方位，所述主音源方位包括对所述音频数据对定位出的多个音源方位中概率符合预设条件的音源方位；基于所述音频数据对所对应的音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述音频数据对所对应的音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述音频数据对所对应的音频数据进行滤波以获取滤波后的音频数据；The filtering module 1330 is configured to, for each pair of audio data, determine the position of the main sound source of the audio data corresponding to the pair of audio data, where the position of the main sound source includes the positions of the multiple sound sources located for the pair of audio data. The sound source orientation whose probability meets the preset conditions; the target noise ratio is determined based on the audio data corresponding to the audio data pair and the main sound source orientation, and the target noise ratio represents the respective expectations of the audio data pair corresponding to the audio data. The ratio of signal energy to undesired signal energy; based on the target-to-noise ratio, filtering the audio data to the corresponding audio data to obtain filtered audio data;

第四确定模块1340，被配置为基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据。The fourth determination module 1340 is configured to determine target audio data based on the filtered audio data obtained from the one or more pairs of audio data.

根据本公开实施例的技术方案，通过第四获取模块，被配置为获取N个麦克风分别采集的彼此对应的N个音频数据，N≥3；第三确定模块，被配置为基于所述N个音频数据确定一个或多个音频数据对；滤波模块，被配置为对于每个音频数据对，确定所述音频数据对所对应的音频数据的主音源方位，所述主音源方位包括对所述音频数据对定位出的多个音源方位中概率符合预设条件的音源方位；基于所述音频数据对所对应的音频数据以及所述主音源方位确定目标噪音比，所述目标噪音比表示所述音频数据对所对应的音频数据各自的期望信号能量与非期望信号能量的比值；基于所述目标噪音比，对所述音频数据对所对应的音频数据进行滤波以获取滤波后的音频数据；第四确定模块，被配置为基于从所述一个或多个音频数据对获取的所述滤波后的音频数据确定目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the fourth obtaining module is configured to obtain N pieces of audio data corresponding to each other collected by the N microphones, N≥3; the third determining module is configured to obtain N pieces of audio data corresponding to each other collected by N microphones; The audio data determines one or more pairs of audio data; the filtering module is configured to, for each pair of audio data, determine the position of the main sound source of the audio data corresponding to the pair of audio data, and the position of the main sound source includes A sound source orientation whose probability meets a preset condition among the multiple sound source orientations located by the data pair; a target noise ratio is determined based on the audio data corresponding to the audio data pair and the main sound source orientation, and the target noise ratio represents the audio The ratio of the respective expected signal energy and undesired signal energy of the audio data corresponding to the data pair; based on the target noise ratio, filtering the audio data corresponding to the audio data pair to obtain filtered audio data; fourth; A determination module, configured to determine target audio data based on the filtered audio data obtained from the one or more audio data pairs, improves the estimation accuracy of noise parameters, so that expectations can be better extracted from the environment source signal.

根据本公开实施例，所述第三确定模块包括：According to an embodiment of the present disclosure, the third determining module includes:

根据本公开实施例的技术方案，通过第三确定子模块，被配置为根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对；或者，第四确定子模块，被配置为将所述多个音频数据中的任意两个音频数据组成音频数据对，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the third determination sub-module is configured to determine the one or more audio data pairs according to the positional relationship of the N microphones; or, the fourth determination sub-module is configured to In order to form an audio data pair from any two audio data in the plurality of audio data, the estimation accuracy of the noise parameter is improved, so that the signal of the desired audio source can be better extracted from the environment.

根据本公开实施例，所述第三确定子模块被配置为：According to an embodiment of the present disclosure, the third determination submodule is configured to:

根据本公开实施例的技术方案，通过所述第三确定子模块被配置为若所述N个麦克风以线性方式布置，则选择距离所述N个麦克风构成的阵列的几何中心点最近的两个麦克风对应的音频数据组成音频数据对，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the third determination submodule is configured to select two closest to the geometric center point of the array formed by the N microphones if the N microphones are arranged in a linear manner The audio data corresponding to the microphones are composed of audio data pairs, which improves the estimation accuracy of the noise parameters, so that the signal of the desired audio source can be better extracted from the environment.

根据本公开实施例，所述第四确定模块被配置为：According to an embodiment of the present disclosure, the fourth determination module is configured to:

根据本公开实施例的技术方案，通过所述第四确定模块被配置为：通过对从所述一个或多个音频数据对获取的所述滤波后的音频数据进行加权求和以获取目标音频数据；或者，在从所述一个或多个音频数据对获取的所述滤波后的音频数据中，选择与预设位置的麦克风相对应的滤波后的音频数据作为目标音频数据，提升了噪音参数的估计准确率，从而可以更好地从环境中提取期望音源的信号。According to the technical solutions of the embodiments of the present disclosure, the fourth determining module is configured to: obtain target audio data by weighting and summing the filtered audio data obtained from the one or more pairs of audio data Or, in the described filtered audio data obtained from the one or more audio data pairs, the filtered audio data corresponding to the microphone at the preset position is selected as the target audio data, which improves the noise parameter. Estimate the accuracy, so that the signal of the desired sound source can be better extracted from the environment.

本公开实施例还提供了一种电子设备，图14示出根据本公开的实施例的电子设备的框图。An embodiment of the present disclosure further provides an electronic device, and FIG. 14 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

如图14所示，所述电子设备1400包括存储器1401和处理器1402，其中，存储器1401用于存储一条或多条计算机指令，其中，所述一条或多条计算机指令被所述处理器1402执行以实现根据本公开的实施例的方法。As shown in FIG. 14 , the electronic device 1400 includes a memory 1401 and a processor 1402 , wherein the memory 1401 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1402 to implement a method according to an embodiment of the present disclosure.

根据本公开实施例，所述存储器1401用于存储一条或多条计算机指令，其中，所述一条或多条计算机指令被所述处理器1402执行以实现以下步骤：According to an embodiment of the present disclosure, the memory 1401 is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1402 to implement the following steps:

基于所述目标噪音比，对所述第一音频数据和第二音频数据进行滤波并基于滤波后的第一音频数据和第二音频数据获取目标音频数据。Based on the target noise ratio, filtering the first audio data and the second audio data and obtaining target audio data based on the filtered first audio data and the second audio data.

根据本公开实施例，所述一条或多条计算机指令被所述处理器1402执行还用以实现：According to an embodiment of the present disclosure, the one or more computer instructions are executed by the processor 1402 to further implement:

根据本公开实施例，所述基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，包括：According to an embodiment of the present disclosure, the determining a target noise ratio based on the first audio data, the second audio data and the main audio source azimuth includes:

根据本公开实施例，所述基于所述相关函数和主音源方位确定所述指定频点的目标噪音比，包括：According to an embodiment of the present disclosure, the determining the target noise ratio of the specified frequency point based on the correlation function and the main sound source orientation includes:

根据本公开实施例，所述基于所述第一音频数据、第二音频数据以及所述主音源方位确定目标噪音比，还包括获取所述频谱中各个频点的目标噪音比。According to an embodiment of the present disclosure, the determining a target noise ratio based on the first audio data, the second audio data, and the orientation of the main audio source further includes acquiring the target noise ratio of each frequency point in the frequency spectrum.

根据本公开实施例，所述基于所述目标噪音比，对所述第一音频数据和第二音频数据进行滤波并基于滤波后的第一音频数据和第二音频数据获取目标音频数据，包括：According to an embodiment of the present disclosure, the filtering of the first audio data and the second audio data based on the target noise ratio and obtaining the target audio data based on the filtered first audio data and the second audio data includes:

从滤波后的第一音频数据的频谱获取所述第一音频数据的时域表示作为第三音频数据，从滤波后的第二音频数据的频谱获取所述第二音频数据的时域表示作为第四音频数据；The time domain representation of the first audio data is obtained from the spectrum of the filtered first audio data as the third audio data, and the time domain representation of the second audio data is obtained from the spectrum of the filtered second audio data as the third audio data. four audio data;

基于所述第三音频数据和所述第四音频数据以获取目标音频数据。Target audio data is acquired based on the third audio data and the fourth audio data.

根据本公开实施例，所述基于所述目标噪音比，对所述第一音频数据和第二音频数据进行滤波，包括：According to an embodiment of the present disclosure, the filtering of the first audio data and the second audio data based on the target noise ratio includes:

基于所述主音源方位和所述期望音源的方位范围，获取当前音频数据为期望音频数据或非期望音频数据的判断结果；Based on the azimuth of the main sound source and the azimuth range of the desired sound source, obtain a judgment result that the current audio data is desired audio data or undesired audio data;

根据本公开实施例，所述基于所述判断结果、所述当前音频数据以及所述目标噪音比更新空域滤波器系数，包括：According to an embodiment of the present disclosure, the updating of the spatial filter coefficients based on the judgment result, the current audio data, and the target noise ratio includes:

根据本公开实施例，所述基于所述多个音频数据确定一个或多个音频数据对包括：According to an embodiment of the present disclosure, the determining one or more pairs of audio data based on the plurality of audio data includes:

根据本公开实施例，所述根据所述N个麦克风的位置关系，确定所述一个或多个音频数据对包括：According to an embodiment of the present disclosure, the determining of the one or more pairs of audio data according to the positional relationship of the N microphones includes:

图15示出适于用来实现根据本公开实施例的音频处理方法的计算机系统的结构示意图。FIG. 15 shows a schematic structural diagram of a computer system suitable for implementing the audio processing method according to an embodiment of the present disclosure.

如图15所示，计算机系统1500包括处理单元1501，其可以根据存储在只读存储器(ROM)1502中的程序或者从存储部分1508加载到随机访问存储器(RAM)1503中的程序而执行上述实施例中的各种处理。在RAM 1503中，还存储有系统1500操作所需的各种程序和数据。处理单元1501、ROM 1502以及RAM 1503通过总线1504彼此相连。输入/输出(I/O)接口1505也连接至总线1504。As shown in FIG. 15 , the computer system 1500 includes a processing unit 1501 that can perform the above-described implementation according to a program stored in a read only memory (ROM) 1502 or a program loaded from a storage section 1508 into a random access memory (RAM) 1503 various treatments in the example. In the RAM 1503, various programs and data necessary for the operation of the system 1500 are also stored. The processing unit 1501 , the ROM 1502 , and the RAM 1503 are connected to each other through a bus 1504 . An input/output (I/O) interface 1505 is also connected to bus 1504 .

以下部件连接至I/O接口1505：包括键盘、鼠标等的输入部分1506；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1507；包括硬盘等的存储部分1508；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1509。通信部分1509经由诸如因特网的网络执行通信处理。驱动器1510也根据需要连接至I/O接口1505。可拆卸介质1511，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器1510上，以便于从其上读出的计算机程序根据需要被安装入存储部分1508。其中，所述处理单元1501可实现为CPU、GPU、TPU、FPGA、NPU等处理单元。The following components are connected to the I/O interface 1505: an input section 1506 including a keyboard, a mouse, etc.; an output section 1507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1508 including a hard disk, etc. ; and a communication section 1509 including a network interface card such as a LAN card, a modem, and the like. The communication section 1509 performs communication processing via a network such as the Internet. Drivers 1510 are also connected to I/O interface 1505 as needed. A removable medium 1511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1510 as needed so that a computer program read therefrom is installed into the storage section 1508 as needed. The processing unit 1501 may be implemented as a processing unit such as a CPU, a GPU, a TPU, an FPGA, and an NPU.

特别地，根据本公开的实施例，上文描述的方法可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括有形地包含在及其可读介质上的计算机程序，所述计算机程序包含用于执行上述方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分1509从网络上被下载和安装，和/或从可拆卸介质1511被安装。In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a readable medium thereof, the computer program containing program code for performing the above-described method. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 1509, and/or installed from the removable medium 1511.

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元或模块可以通过软件的方式实现，也可以通过可编程硬件的方式来实现。所描述的单元或模块也可以设置在处理器中，这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。The units or modules involved in the embodiments of the present disclosure may be implemented in a software manner, or may be implemented in a programmable hardware manner. The described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves in certain circumstances.

作为另一方面，本公开还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中电子设备或计算机系统中所包含的计算机可读存储介质；也可以是单独存在，未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序，所述程序被一个或者一个以上的处理器用来执行描述于本公开的方法。As another aspect, the present disclosure also provides a computer-readable storage medium, and the computer-readable storage medium may be a computer-readable storage medium included in the electronic device or computer system in the above-mentioned embodiments; it may also exist independently , a computer-readable storage medium that does not fit into a device. The computer-readable storage medium stores one or more programs used by one or more processors to perform the methods described in the present disclosure.

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离所述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Claims

1. an audio processing method, is characterized in that, comprises:

acquiring first audio data collected by the first microphone and second audio data corresponding to the first audio data collected by the second microphone;

Determine the main sound source position of the first audio data and the second audio data. Set the position of the sound source of the condition;

A target noise ratio is determined based on the first audio data, the second audio data, and the main audio source orientation, the target noise ratio representing the desired signal energy and the undesired signal energy of each of the first audio data and the second audio data the ratio of signal energy; and

Based on the target noise ratio, filtering the first audio data and/or the second audio data and obtaining target audio data based on the filtered first audio data and/or the second audio data.

2. The method according to claim 1, wherein the method further comprises:

Before determining the target noise ratio, the frequency spectrum of the first audio data and the frequency spectrum of the second audio data are acquired.

3. The method according to claim 2, wherein the determining a target noise ratio based on the first audio data, the second audio data and the main audio source azimuth comprises:

For a specified frequency point, determine a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data;

A target-to-noise ratio for the specified frequency point is determined based on the correlation function and the orientation of the main sound source.

4. The method according to claim 3, wherein the determining the target noise ratio of the specified frequency point based on the correlation function and the main sound source azimuth comprises:

determining a desired signal component representation and an undesired signal component representation of the real part of the correlation function;

determining a desired signal component representation and an undesired signal component representation of the imaginary part of the correlation function;

The assignment is determined based on the desired signal component representation and the undesired signal component representation of the real part of the correlation function, the desired signal component representation and the undesired signal component representation of the imaginary part of the correlation function, and the main sound source orientation The target-to-noise ratio of the frequency point.

5. The method according to claim 3, wherein the determining a target noise ratio based on the first audio data, the second audio data and the main audio source position further comprises:

Obtain the target-to-noise ratio of each frequency point in the spectrum.

6. The method according to claim 2, wherein the first audio data and/or the second audio data are filtered based on the target noise ratio and based on the filtered first audio data and / or the second audio data to obtain target audio data, including:

filtering the frequency spectrum of the first audio data and/or the frequency spectrum of the second audio data based on the target noise ratio;

Obtain a time-domain representation of the first audio data from the spectrum of the filtered first audio data as third audio data and/or obtain a time-domain representation of the second audio data from the spectrum of the filtered second audio data as fourth audio data;

Target audio data is acquired based on the third audio data and/or the fourth audio data.

7. The method according to claim 6, wherein the filtering the first audio data and the second audio data based on the target noise ratio comprises:

Obtain the azimuth range of the desired sound source;

Based on the azimuth of the main audio source and the azimuth range of the desired audio source, a judgment result of whether the current audio data is desired audio data or undesired audio data is obtained, and the current audio data is the first audio data or the second audio data;

updating spatial filter coefficients based on the judgment result, the current audio data, and the target noise ratio;

The current audio data is filtered by the updated spatial filter coefficients.

8. The method according to claim 7, wherein the updating of the spatial filter coefficients based on the judgment result, the current audio data and the target noise ratio comprises:

When the current audio data is desired audio data, update the global covariance matrix of the current audio data based on the current audio data and the target noise ratio;

When the current audio data is undesired audio data, update the noise covariance matrix and the global covariance matrix of the current audio data based on the current audio data and the target noise ratio;

Spatial filter coefficients are updated based on the noise covariance matrix and the global covariance matrix.

9 . The method according to claim 1 , wherein the sound source orientation whose probability meets a preset condition includes the sound source orientation with the highest probability. 10 .

10. An audio processing method, comprising:

Obtain N pieces of audio data corresponding to each other collected by N microphones, N≥3;

determining one or more pairs of audio data based on the N audio data;

For each pair of audio data, determine the location of the main sound source of the audio data corresponding to the pair of audio data, where the location of the main sound source includes a sound source whose probability meets a preset condition among the locations of multiple audio sources located for the pair of audio data Orientation; determine a target noise ratio based on the audio data corresponding to the audio data pair and the main audio source orientation, where the target noise ratio represents the respective expected signal energy and undesired signal energy of the audio data corresponding to the audio data pair based on the target-to-noise ratio, filtering the audio data to the corresponding audio data to obtain filtered audio data;

Target audio data is determined based on the filtered audio data obtained from the one or more pairs of audio data.

11. The method of claim 10, wherein the determining one or more pairs of audio data based on the plurality of audio data comprises:

determining the one or more pairs of audio data according to the positional relationship of the N microphones; or

Any two audio data in the plurality of audio data are formed into an audio data pair.

12. The method according to claim 11, wherein the determining the one or more pairs of audio data according to the positional relationship of the N microphones comprises:

If the N microphones are arranged in a linear manner, audio data corresponding to two microphones closest to the geometric center point of the array formed by the N microphones are selected to form an audio data pair.

13. The method of claim 10, wherein the determining of target audio data based on the filtered audio data obtained from the one or more pairs of audio data comprises:

obtaining target audio data by weighted summing said filtered audio data obtained from said one or more pairs of audio data; or

Among the filtered audio data obtained from the one or more pairs of audio data, the filtered audio data corresponding to the microphone at the preset position is selected as the target audio data.

14 . The method according to claim 10 , wherein the sound source location whose probability meets a preset condition includes the sound source location with the highest probability. 15 .

15. An audio processing device, comprising:

a first acquisition module, configured to acquire first audio data collected by a first microphone and second audio data corresponding to the first audio data collected by a second microphone;

The first determining module is configured to determine the main sound source orientation of the first audio data and the second audio data, and the main sound source orientation includes the location of the first audio data and the second audio data. The position of the sound source whose probability meets the preset conditions among the positions of the multiple sound sources;

A second determination module configured to determine a target noise ratio based on the first audio data, the second audio data and the main audio source orientation, the target noise ratio representing the first audio data and the second audio data the respective ratios of desired signal energy to undesired signal energy; and

A second obtaining module, configured to filter the first audio data and/or the second audio data based on the target noise ratio, and obtain target audio based on the filtered first audio data and/or the second audio data data.

16. The apparatus of claim 15, wherein the apparatus further comprises:

The third acquisition module is configured to acquire the frequency spectrum of the first audio data and the frequency spectrum of the second audio data before determining the target noise ratio.

17. The apparatus according to claim 16, wherein the second determining module comprises:

a first determining submodule, configured to determine, for a specified frequency point, a correlation function between the frequency spectrum of the first audio data and the frequency spectrum of the second audio data;

The second determination submodule is configured to determine the target noise ratio of the specified frequency point based on the correlation function and the position of the main sound source.

18. The apparatus according to claim 17, wherein the second determination submodule comprises:

a first determination unit configured to determine a desired signal component representation and an undesired signal component representation of the real part of the correlation function;

a second determination unit configured to determine a desired signal component representation and an undesired signal component representation of the imaginary part of the correlation function;

a third determination unit configured to represent desired and undesired signal components based on the real part of the correlation function, the desired and undesired signal component representations of the imaginary part of the correlation function, and the The orientation of the main sound source, and the target noise ratio of the specified frequency point is determined.

19. The apparatus according to claim 17, wherein the second determining module further comprises:

The first obtaining sub-module is configured to obtain the target noise ratio of each frequency point in the frequency spectrum.

20. The apparatus according to claim 16, wherein the second obtaining module comprises:

a filtering submodule configured to filter the frequency spectrum of the first audio data and the frequency spectrum of the second audio data based on the target-to-noise ratio;

A second acquisition sub-module configured to acquire a time-domain representation of the first audio data from the spectrum of the filtered first audio data as third audio data, and/or to acquire from the spectrum of the filtered second audio data a time domain representation of the second audio data as the fourth audio data;

A third obtaining sub-module is configured to obtain target audio data based on the third audio data and/or the fourth audio data.

21. The apparatus according to claim 20, wherein the filtering sub-module comprises:

a first obtaining unit, configured to obtain the azimuth range of the desired sound source;

a second obtaining unit, configured to obtain a judgment result of whether the current audio data is expected audio data or undesired audio data based on the azimuth of the main sound source and the azimuth range of the desired sound source, where the current audio data is the first audio data audio data or the second audio data;

an update unit configured to update the spatial filter coefficient based on the judgment result, the current audio data and the target noise ratio;

A filtering unit configured to filter the current audio data through the updated spatial filter coefficients.

22. The apparatus according to claim 21, wherein the updating unit comprises:

A first update subunit, configured to update the global covariance matrix of the current audio data based on the current audio data and the target noise ratio when the current audio data is the desired audio data;

a second update subunit, configured to update the noise covariance matrix of the current audio data and the global covariance matrix;

A third update subunit is configured to update spatial filter coefficients based on the noise covariance matrix and the global covariance matrix.

23 . The device according to claim 15 , wherein the sound source location with the probability meeting a preset condition includes the sound source location with the highest probability. 24 .

24. An audio processing device, comprising:

a fourth acquisition module, configured to acquire N pieces of audio data corresponding to each other collected by the N microphones, N≥3;

a third determining module configured to determine one or more pairs of audio data based on the N pieces of audio data;

The filtering module is configured to, for each pair of audio data, determine the position of the main sound source of the audio data corresponding to the pair of audio data, and the position of the main sound source includes a probability among the positions of the multiple sound sources located for the pair of audio data The sound source orientation that meets the preset conditions; the target noise ratio is determined based on the audio data corresponding to the audio data pair and the main sound source orientation, and the target noise ratio represents the respective expected signals of the audio data pair corresponding to the audio data. a ratio of energy to undesired signal energy; based on the target-to-noise ratio, filtering the audio data to the corresponding audio data to obtain filtered audio data;

A fourth determination module configured to determine target audio data based on the filtered audio data obtained from the one or more pairs of audio data.

25. The apparatus according to claim 24, wherein the third determining module comprises:

a third determination submodule, configured to determine the one or more audio data pairs according to the positional relationship of the N microphones; or

The fourth determining submodule is configured to form an audio data pair from any two audio data in the plurality of audio data.

26. The apparatus according to claim 25, wherein the third determination sub-module is configured to:

27. The apparatus of claim 24, wherein the fourth determining module is configured to:

28 . The device according to claim 24 , wherein, the sound source position whose probability meets a preset condition includes the sound source position with the highest probability. 29 .

29. An electronic device, comprising a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement The method steps of any one of claims 1-14.

30. A readable storage medium on which computer instructions are stored, characterized in that, when the computer instructions are executed by a processor, the method steps of any one of claims 1-14 are implemented.