CN1299254C

CN1299254C - Approaching end voice detection realizing method for echo inhibitor

Info

Publication number: CN1299254C
Application number: CNB2004100091628A
Authority: CN
Inventors: 肖志方; 杜军; 王侃
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2004-05-31
Filing date: 2004-05-31
Publication date: 2007-02-07
Anticipated expiration: 2024-05-31
Also published as: CN1584977A

Abstract

The present invention relates to a method for realizing near-end speech detection in an echo suppressor, adopting Geigel algorithm for near-end speech detection, taking M sampling points in the far-end input sequence y(i) as a subframe, and M is less than the The filter length N in the echo suppressor, when calculating the input of the first sampling point of each subframe, first obtain |y(i)| to |y(i -N+M)|The maximum value of the latest data of a total of N-M+1 sampling points, and save it as Frame_MAX, and then perform operations on the M sampling points of the subframe one by one, and take the maximum value Frame_MAX and the current The maximum value of the data of the remaining M-1 sampling points in the sliding window is taken as the maximum value MAX of each sampling point of the subframe, if the near-end input sequence s(i) of the sampling points of the subframe satisfies the condition| s(i)|≥u ^* MAX, then the near-end has voice. The method of the invention can greatly reduce the amount of calculation required for program calculation and reduce the cost under the condition that the calculation result is completely consistent with the existing method.

Description

A Realization Method of Near-End Voice Detection in Echo Suppressor

技术领域technical field

本发明涉及一种通讯设备中的回声抑制器中近端话音检测的实现方法。The invention relates to a method for realizing near-end voice detection in an echo suppressor in a communication device.

背景技术Background technique

回声抑制是目前广泛应用于通讯领域的一项技术，该技术的应用目的是消除掉由于电路的阻抗不匹配或者由于声学反射造成的回声，以为通话双方提供更高质量的语音通讯服务。Echo suppression is a technology widely used in the field of communication at present. The purpose of this technology is to eliminate the echo caused by the impedance mismatch of the circuit or the acoustic reflection, so as to provide higher quality voice communication services for both parties in the call.

目前应用最广泛的回声抑制器由以下三个部分组成，分别是双工检测器、回声估计消除器和非线性处理器。其中双工检测器的作用是指示回声估计消除器是否进行滤波器系数更新以及是否开启非线性处理器。回声估计消除器的核心结构为自适应滤波器，该滤波器的作用是根据远端输入的语音估计出回声数据并从近端输入中减掉该数据以达到削弱或消除回声的作用，同时依据估计误差即近端输入和估计数据的差并按照某种收敛算法来进行滤波器系数的更新，以使滤波器系数更逼近于回波路径参数从而使得滤波器达到收敛。非线性处理器的作用是通过一些非线性的处理方法来消除掉残余回声，从而进一步提升语音质量。The most widely used echo suppressor is composed of the following three parts, which are duplex detector, echo estimation canceller and nonlinear processor. The function of the duplex detector is to instruct the echo estimation canceller whether to update the filter coefficients and whether to enable the nonlinear processor. The core structure of the echo estimation canceller is an adaptive filter. The function of the filter is to estimate the echo data according to the far-end input voice and subtract the data from the near-end input to weaken or eliminate the echo. The estimated error is the difference between the near-end input and the estimated data, and the filter coefficients are updated according to a certain convergence algorithm, so that the filter coefficients are closer to the parameters of the echo path so that the filter converges. The function of the nonlinear processor is to eliminate the residual echo through some nonlinear processing methods, so as to further improve the voice quality.

双工检测器最重要的工作是检测近端是否有话音输入。如果检测到近端有话音输入的话则立即停止自适应滤波器的系数更新，并停止非线性处理器。因为此时滤波器估计回声与近端输入的差已经不是估计误差而是估计误差与近端语音的和，如果此时进行滤波器系数的更新，则会导致滤波器处于非正常工作状态，对于某些收敛算法甚至会导致滤波器发散。如果此时仍进行非线性处理则会导致近端语音被削波，影响语音质量。由此看出近端话音检测在回波抑制器中起着非常重要的作用。为在同一硬件平台上实现更多通道的回声抑制处理或集成更多功能，有必要提供一种高效率的，稳定的近端话音检测方法。The most important job of a duplex detector is to detect whether there is voice input at the near end. If a near-end voice input is detected, the coefficient update of the adaptive filter is stopped immediately, and the non-linear processor is stopped. Because the difference between the estimated echo of the filter and the near-end input is no longer the estimation error but the sum of the estimation error and the near-end speech, if the filter coefficients are updated at this time, the filter will be in an abnormal working state. Certain convergent algorithms can even cause the filter to diverge. If the nonlinear processing is still performed at this time, the near-end voice will be clipped and the voice quality will be affected. It can be seen that the near-end voice detection plays a very important role in the echo suppressor. In order to implement echo suppression processing of more channels or integrate more functions on the same hardware platform, it is necessary to provide an efficient and stable near-end voice detection method.

目前近端话音检测中最为经典的算法是Geigel算法，该算法的前提是假设回声相对于语音至少有6dB的衰减，即回声的幅值至少会小于远端语音幅值的一半。假设滤波器长度为N，近端当前采样数据为s(i)，远端输入序列为y(i)到y(i-N-1)，则如果不等式|s(i)|≥0.5max(|y(i)|，|y(i-1)|，...，|y(i-N+1)|)成立则表明近端一定有语音输入。At present, the most classic algorithm in near-end voice detection is the Geigel algorithm. The premise of this algorithm is that the echo has at least 6dB attenuation relative to the voice, that is, the echo amplitude will be at least half of the far-end voice amplitude. Assuming that the filter length is N, the current sampling data at the near end is s(i), and the input sequence at the far end is y(i) to y(i-N-1), then if the inequality |s(i)|≥0.5max(|y (i)|, |y(i-1)|, .

由上式可以看出，如果采用图1所描述的方法1直接按照以上判决条件进行判断，则对应每一个近端输入采样点需要计算N点远端输入序列的最大值，其计算量跟滤波器阶数N成正比。以在8KHz采样的电话系统中实现16ms的回声抑制为例，按照通用处理器或DSP上对N点数据取最大值至少需要2N个指令周期来计算，每秒钟需要的计算量为16*8*2*8000＝2048000，合2MIPS，这样的运算量接近甚至大于核心滤波器运算的计算量，因此直接在通用处理器或者DSP中采用这种方法运算的话是非常不经济的。It can be seen from the above formula that if the method 1 described in Figure 1 is used to directly judge according to the above judgment conditions, then corresponding to each near-end input sampling point, it is necessary to calculate the maximum value of the N-point far-end input sequence, and the calculation amount is similar to that of filtering It is proportional to the order N of the device. Taking the implementation of 16ms echo suppression in a telephone system sampled at 8KHz as an example, it takes at least 2N instruction cycles to calculate the maximum value of N-point data on a general-purpose processor or DSP, and the amount of calculation required per second is 16*8 *2*8000=2048000, 2MIPS, the amount of calculation is close to or even greater than that of the core filter, so it is very uneconomical to use this method directly in a general-purpose processor or DSP.

一种改进的实现方法为图2描述的方法2，这种实现方法是在初始化时计算一下远端输入序列的最大值，在后续的处理中，只需要计算|y(i)|和上一帧最大值之间的最大值即为本帧数据的最大值，但是在运算结束后需要检测|y(i-N+1)|是否即为本帧最大值，如果是则表示下次从滑动窗口中移出的值为最大值，需要重新计算N-1点的最大值给下帧使用。由这种方法的原理可以看出在正常语音情况下，这种情况可以大幅度降低运算量。但是在极限情况下，例如一段时间内远端输入为静音情况下，即所有的远端输入序列的绝对值均相等，则此时的运算量和上述运算量没有改变。即这种方法可以将运算量的平均值降至一个不确定的水平，但是运算量峰值并没有变。而在回声抑制这种实时运算中，需要保证在下一数据点或数据帧到来之前必须完成上一数据点或数据帧的操作，因此开发人员在设计时必须按照处理量的峰值来分配计算资源，而这种方法提供的运算峰值与方法1的运算峰值相同，因此相对于方法1来讲并没有实际的进步。同时该方法的运算平均值随远端输入信号的不同而不同，不能通过测试的方法获得实际运行计算量峰值，因此不便于开发人员进行性能统计。An improved implementation method is the method 2 described in Figure 2. This implementation method is to calculate the maximum value of the remote input sequence at the time of initialization, and in the subsequent processing, only need to calculate |y(i)| and the previous The maximum value between the frame maximum values is the maximum value of the data in this frame, but after the calculation, it is necessary to check whether |y(i-N+1)| is the maximum value of this frame, and if it is, it means that the next time it will slide The value moved out of the window is the maximum value, and the maximum value of N-1 points needs to be recalculated for use in the next frame. It can be seen from the principle of this method that in the case of normal speech, this case can greatly reduce the amount of calculation. However, in an extreme case, for example, when the remote input is silent for a period of time, that is, the absolute values of all remote input sequences are equal, the calculation amount at this time and the above calculation amount remain unchanged. That is to say, this method can reduce the average calculation volume to an uncertain level, but the peak calculation volume has not changed. In real-time operations such as echo suppression, it is necessary to ensure that the operation of the previous data point or data frame must be completed before the arrival of the next data point or data frame. Therefore, developers must allocate computing resources according to the peak processing capacity when designing. While this method provides the same operational peak value as Method 1, there is no real improvement over Method 1. At the same time, the average calculation value of this method varies with the remote input signal, and the peak value of the actual running calculation cannot be obtained through the test method, so it is not convenient for developers to perform performance statistics.

发明内容Contents of the invention

本发明所要解决的技术问题在于提供一种近端话音检测的低运算量解决方法，从而可以在同一计算平台上实现更多通道的回声抑制解决方案，或为其它模块结余更多的处理器计算资源，从而达到降低成本的目的。同时能够在开发阶段就能够精确的计算出模块的运算量，便于开发人员进行性能统计和计算资源分配，并有利于产品的稳定性和兼容性。The technical problem to be solved by the present invention is to provide a low-computing solution for near-end voice detection, so that more channels of echo suppression solutions can be realized on the same computing platform, or more processors with more balance can be calculated for other modules. resources in order to reduce costs. At the same time, it can accurately calculate the calculation amount of the module during the development stage, which is convenient for developers to perform performance statistics and calculation resource allocation, and is conducive to product stability and compatibility.

为了实现上述目的，本发明提供了一种回声抑制器中近端话音检测的实现方法，采用Geigel算法进行近端话音检测，其特点在于，在远端输入序列y(i)中取M个采样点作为一子帧，M小于所述回声抑制器中的滤波器长度N，在对每一子帧的第一个采样点的输入进行计算时，首先获得所述远端输入的滑动窗口内|y(i)|到|y(i-N+M)|共N-M+1个采样点的最新数据的最大值，并保存为Frame_MAX，然后对所述子帧的M个采样点逐个进行运算，取所述最大值Frame_MAX与当前滑动窗口内余下的M-1个采样点的数据的最大值做作为所述子帧的各个采样点的最大值MAX，若所述子帧的采样点的近端输入序列s(i)满足条件|s(i)|≥u*MAX，则所述近端有话音。In order to achieve the above object, the present invention provides a method for implementing near-end speech detection in an echo suppressor, using the Geigel algorithm for near-end speech detection, which is characterized in that M samples are taken in the far-end input sequence y(i) point as a subframe, M is less than the filter length N in the echo suppressor, when calculating the input of the first sampling point of each subframe, first obtain the sliding window | The maximum value of the latest data from y(i)| to |y(i-N+M)| with a total of N-M+1 sampling points, and save it as Frame_MAX, and then carry out the M sampling points of the subframe one by one Operation, take the maximum value of the maximum value Frame_MAX and the remaining M-1 sampling points in the current sliding window as the maximum value MAX of each sampling point of the subframe, if the sampling points of the subframe The near-end input sequence s(i) satisfies the condition |s(i)|≥u*MAX, then the near-end has voice.

上述的回声抑制器中近端话音检测的实现方法，其特点在于，对所述远端输入序列y(i)中的子帧进行近端话音检测的方法包括如下步骤：The implementation method of near-end speech detection in the above-mentioned echo suppressor is characterized in that, the method for carrying out near-end speech detection to the subframe in described far-end input sequence y (i) comprises the following steps:

步骤一，初始化一子帧计数器计数值为0；Step 1, initialize a subframe counter with a count value of 0;

步骤二，取近端输入数据绝对值为|s(i)|，取远端输入数据绝对值为|y(i)|，并将所述数据保存到远端数据缓冲区中，该缓冲区保存了从|y(i)|到|y(i-N+1)|共N个采样点的数字序列；Step 2, take the absolute value of the near-end input data |s(i)|, take the absolute value of the far-end input data |y(i)|, and save the data in the far-end data buffer, the buffer A digital sequence of N sampling points from |y(i)| to |y(i-N+1)| is saved;

步骤三，如果子帧计数器为0，则表示当前远端数据采样点为该子帧的第一个采样点，计算远端输入滑动窗口内最新的从|y(i)|到|y(i-N+M)|共计N-M+1个采样点数据的最大值，并将该值保存为Frame_MAX；如果子帧计数器计数值不为0，则执行步骤四；Step 3, if the subframe counter is 0, it means that the current remote data sampling point is the first sampling point of the subframe, calculate the latest from |y(i)| to |y(i -N+M)|A total of the maximum value of N-M+1 sampling point data, and save this value as Frame_MAX; if the count value of the subframe counter is not 0, then perform step 4;

步骤四，计算远端数据缓冲区中Frame_MAX及剩余的M-1个采样点的最大值MAX；Step 4, calculate the maximum value MAX of Frame_MAX and the remaining M-1 sampling points in the remote data buffer;

步骤五，比较衰减因子u与MAX的乘积u*MAX与近端输入数据绝对值|s(i)|的大小，如果满足条件|s(i)|≥u*MAX，则表明近端有语音输入；Step 5: Compare the product u*MAX of the attenuation factor u and MAX with the absolute value |s(i)| of the near-end input data. If the condition |s(i)|≥u*MAX is satisfied, it indicates that there is voice at the near-end enter;

步骤六，更新所述子帧计数器，子帧计数器数值加1；Step 6, updating the subframe counter, adding 1 to the subframe counter value;

步骤七，如果子帧计数器值等于M，则将子帧计数器数值复位为0；Step 7, if the subframe counter value is equal to M, then reset the subframe counter value to 0;

步骤八，跳转到步骤二，对所述子帧的下一采样点数据进行计算。Step eight, skip to step two, and calculate the next sampling point data of the subframe.

上述的回声抑制器中近端话音检测的实现方法，其特点在于，在所述方法应用于本身有语音帧长概念的应用中时，M同时应该为所述语音帧长的公约数。The feature of the implementation method of the near-end voice detection in the above-mentioned echo suppressor is that when the method is applied to an application with the concept of speech frame length itself, M should be the common divisor of the speech frame length at the same time.

上述的回声抑制器中近端话音检测的实现方法，其特点在于，对于一个固定的系统，根据不同的设计需求，有一个或两个最优的M值，在采用所述最优的M值时所消耗的处理器运算指令周期数最小。The implementation method of near-end voice detection in the above-mentioned echo suppressor is characterized in that, for a fixed system, according to different design requirements, there are one or two optimal M values, and when the optimal M value is adopted The number of processor operation instruction cycles consumed is the smallest.

上述的回声抑制器中近端话音检测的实现方法，其特点在于，所述最优M值通过实际测试的方法获得。The above method for implementing near-end voice detection in the echo suppressor is characterized in that the optimal M value is obtained through actual testing.

上述的回声抑制器中近端话音检测的实现方法，其特点在于，所述方法中，所有子帧的运算量均一致，而在各个子帧内部每个采样点的运算量不一致。The above method for implementing near-end voice detection in the echo suppressor is characterized in that, in the method, the calculation amount of all subframes is the same, but the calculation amount of each sampling point in each subframe is inconsistent.

上述的回声抑制器中近端话音检测的实现方法，其特点在于，参照用户电路回声相对语音至少衰减6dB，所述回声幅度衰减因子u取0.5。The above-mentioned implementation method of near-end voice detection in the echo suppressor is characterized in that the echo of the user circuit is attenuated by at least 6 dB relative to the voice, and the echo amplitude attenuation factor u is set to 0.5.

相较于现有技术，本发明提出的方法的特点是在运算结果与现有方法完全一致的情况下能够大幅度的降低程序运算所需要的运算量，因此可以在同一处理平台上实现更多通道的回声抑制处理，或为实现更多的功能提供足够的处理能力，因此可以达到降低成本的目的。同时使用该方法运算量的最大值和平均值一致，在参数一定的情况下运算量为一定值，便于开发人员进行性能统计和计算资源分配，并有利于应用产品的稳定性和兼容性。Compared with the prior art, the method proposed by the present invention is characterized in that it can greatly reduce the amount of calculation required for program calculations when the calculation results are completely consistent with the existing methods, so more can be realized on the same processing platform. The echo suppression processing of the channel, or provide enough processing power to realize more functions, so the purpose of cost reduction can be achieved. At the same time, using this method, the maximum value of computation is consistent with the average value. When the parameters are constant, the computation is a certain value, which is convenient for developers to perform performance statistics and allocation of computing resources, and is conducive to the stability and compatibility of application products.

以下结合附图和具体实施例对本发明进行详细描述，但不作为对本发明的限定。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

附图说明Description of drawings

图1是一种现有回声抑制中近端话音检测的实现方法的流程图；Fig. 1 is a kind of flow chart of the implementation method of near-end voice detection in existing echo suppression;

图2是另一种现有回声抑制中近端话音检测的实现方法的流程图；Fig. 2 is another kind of flow chart of the implementation method of near-end voice detection in the existing echo suppression;

图3是本发明方法的流程图。Fig. 3 is a flowchart of the method of the present invention.

具体实施方式Detailed ways

本发明提供的一种通讯领域内实现回声抑制器近端话音检测的实现方法，该方法的原理描述如下：A method for realizing near-end voice detection of an echo suppressor in the field of communication provided by the present invention, the principle of the method is described as follows:

(1)近端话音检测采用Geigel算法，滤波器长度为N，近端输入序列为s(i)，远端输入序列为y(i)，回声幅度衰减因子为u。则在下面不等式成立时则近端有话音：(1) Geigel algorithm is used for near-end speech detection, the filter length is N, the near-end input sequence is s(i), the far-end input sequence is y(i), and the echo amplitude attenuation factor is u. Then when the following inequality holds true, there is voice at the near end:

|s(i)|≥u*max(|y(i)|，|y(i-1)|，...，|y(i-N+1)|)|s(i)|≥u*max(|y(i)|, |y(i-1)|, ..., |y(i-N+1)|)

(2)取M个采样点为一子帧，M小于滤波器长度N。如果本发明被使用在VOIP等本身有语音帧长概念的应用中时，M同时应该为帧长的公约数，例如在G.729应用中，帧长为10ms，对应8000Hz采样的80采样点，则M应该可以被80整除。在本发明中可以保证在每一子帧数据的运算量完全一致。(2) Take M sampling points as a subframe, and M is smaller than the filter length N. If the present invention is used in VOIP and other applications that have the concept of voice frame length, M should be the common divisor of the frame length. For example, in G.729 applications, the frame length is 10ms, corresponding to 80 sampling points of 8000Hz sampling, Then M should be divisible by 80. In the present invention, it can be guaranteed that the amount of computation of data in each subframe is completely consistent.

(3)在对每一子帧的第一点输入进行计算时，首先计算|y(i)|到|y(i-N+M)|共计N-M+1点的最大值，并保存为Frame_MAX，再将该值与滑动窗口内余下的M-1点取最大值做为该采样点的最大值。(3) When calculating the first point input of each subframe, first calculate the maximum value of |y(i)| to |y(i-N+M)| totaling N-M+1 points, and save is Frame_MAX, then take the maximum value of this value and the remaining M-1 points in the sliding window as the maximum value of the sampling point.

(4)在对该子帧剩余的M-1点逐个进行运算时，用于计算Frame_MAX的N-M+1点此时仍然保留在滑动窗口内(同计算第一点时，不需要重复进行运算)，因此只需要计算Frame_MAX与此时滑动窗口内剩余的M-1点的最大值即可，即相当于取M点的最大值运算。(4) When calculating the remaining M-1 points of the subframe one by one, the N-M+1 points used to calculate Frame_MAX are still kept in the sliding window at this time (same as when calculating the first point, no need to repeat operation), so it is only necessary to calculate the maximum value of Frame_MAX and the remaining M-1 points in the sliding window at this time, which is equivalent to taking the maximum value of M points.

请参照图3，要实现本发明方法，具体的操作步骤如下：Please refer to Fig. 3, will realize the inventive method, concrete operating steps are as follows:

步骤1：初始化子帧计数器COUNT值为0。Step 1: Initialize the subframe counter COUNT value to 0.

步骤2：取近端输入数据绝对值为|s(i)|。取远端输入数据绝对值为|y(i)|，并将该数据保存到缓冲区BUFFER中。该缓冲区保存了从|y(i)|到|y(i-N+1)|共N点的数字序列。Step 2: Take the absolute value of the near-end input data |s(i)|. Take the absolute value of the remote input data |y(i)|, and save the data into the buffer BUFFER. The buffer stores a sequence of N points from |y(i)| to |y(i-N+1)|.

步骤3：判断子帧计数器是否为0？如果子帧计数器COUNT为0，则执行步骤4；如果子帧计数器COUNT不为0，则直接执行步骤5。Step 3: Determine whether the subframe counter is 0? If the subframe counter COUNT is 0, perform step 4; if the subframe counter COUNT is not 0, directly perform step 5.

步骤4：如果子帧计数器为0，则表示当前远端数据采样点为该子帧的第一个采样点，计算远端输入滑动窗口内最新的N-M+1点，即|y(i)|到|y(i-N+M)|共计N-M+1点数据的最大值，并将该值保存为Frame_MAX。Step 4: If the subframe counter is 0, it means that the current remote data sampling point is the first sampling point of the subframe, and calculate the latest N-M+1 points in the remote input sliding window, ie |y(i )| to |y(i-N+M)| totals the maximum value of N-M+1 point data, and saves this value as Frame_MAX.

步骤5：计算远端数据缓冲区BUFFER中Frame_MAX及剩余的M-1点的最大值，该值为MAX。Step 5: Calculate the maximum value of Frame_MAX and the remaining M-1 points in the remote data buffer BUFFER, which is MAX.

步骤6：比较衰减因子u与MAX的乘积u*MAX与近端输入数据绝对值|s(i)|的大小。Step 6: Compare the product u*MAX of the attenuation factor u and MAX with the absolute value |s(i)| of the near-end input data.

步骤7：如果不等式|s(i)|≥u*MAX成立则表明近端有语音输入(参照用户电路回声相对语音至少衰减6dB，可以取u为0.5)。如果不成立，则直接转为执行步骤8。Step 7: If the inequality |s(i)|≥u*MAX holds true, it indicates that there is voice input at the near end (referring to the attenuation of the user circuit echo relative to the voice by at least 6dB, u can be taken as 0.5). If not, go directly to step 8.

步骤8：更新子帧计数器，子帧计数器数值加1。Step 8: Update the subframe counter, and add 1 to the value of the subframe counter.

步骤9：判断子帧计数器值是否等于M？如果子帧计数器值不等于M，则跳转到步骤2，对下一采样点数据进行计算；如果子帧计数器值等于M，则执行步骤10。Step 9: Determine whether the subframe counter value is equal to M? If the subframe counter value is not equal to M, then jump to step 2 to calculate the next sampling point data; if the subframe counter value is equal to M, then perform step 10.

步骤10：将子帧计数器数值复位为0，并跳转到步骤2，对下一采样点数据进行计算。Step 10: Reset the value of the subframe counter to 0, and jump to step 2 to calculate the data of the next sampling point.

依照以上实施步骤，对于整个子帧而言耗费的运算量共需要进行N+M*(M-1)次两点取最大值运算而不是N*M次两点取最大值运算，从而达到了降低运算量的目的，并且可以保证每一子帧的运算量相等并且该运算量在设计阶段即可经过计算精确获得。According to the above implementation steps, the amount of computation consumed for the entire subframe requires a total of N+M*(M-1) two-point maximum calculations instead of N*M two-point maximum calculations, thus achieving The purpose of reducing the amount of calculation is to ensure that the amount of calculation of each sub-frame is equal and the amount of calculation can be accurately obtained through calculation in the design stage.

以实现16ms回声抑制器而言，以处理器内部实现一次两点取最大值需要两个指令周期为例，针对8000Hz采样的系统，滤波器长度N为(8000/1000)*16＝128。如取M为16，则按照图1方法实现一子帧M点的近端话音检测在计算最大值处所消耗的指令周期数为N*M*2＝128*16*2＝4096，而如果按照本发明所提供的方法进行运算所耗费的指令周期数为(N+M*(M-1))*2＝(128+16*(16-1))*2＝736。相对于原方法可以节省超过80％的处理器运算能力。Taking the implementation of a 16ms echo suppressor as an example, taking two instruction cycles as an example to realize the maximum value of two points inside the processor, for a system with 8000Hz sampling, the filter length N is (8000/1000)*16=128. If get M to be 16, then realize the near-end voice detection of a subframe M point according to the method in Fig. 1, the instruction cycle number consumed at calculating the maximum value is N*M*2=128*16*2=4096, and if according to The number of instruction cycles consumed by the method provided by the present invention for operation is (N+M*(M-1))*2=(128+16*(16-1))*2=736. Compared with the original method, more than 80% of processor computing power can be saved.

对于一个固定的系统，根据不同的设计需求，有一个或两个最优的M值，在采用此M值时所消耗的处理器运算指令周期数最小。即此时在所有满足系统要求的M的取值范围中(N+M*(M-1))/M最小。并且该最优M值也可以通过实际测试的方法获得。For a fixed system, according to different design requirements, there is one or two optimal M values, and the number of processor operation instruction cycles consumed when using this M value is the smallest. That is, (N+M*(M-1))/M is the smallest among all value ranges of M that meet the system requirements at this time. And the optimal M value can also be obtained through actual testing.

综上所述，本发明的特征如下：In summary, the features of the present invention are as follows:

(1)本发明采用子帧概念，在下一子帧数据到来之前处理完上一子帧内容。子帧的长度M小于滤波器阶数N，如果应用在有语音帧概念的应用上，子帧长度M应该能够被语音帧采样点数FRAME整除。(1) The present invention adopts the subframe concept, and processes the content of the previous subframe before the data of the next subframe arrives. The length M of the subframe is smaller than the filter order N. If it is applied to the application with the concept of speech frame, the length M of the subframe should be divisible by the number of sampling points FRAME of the speech frame.

(2)本发明保证所有子帧的运算量均一致，但在各个子帧内部每个采样点的运算量不一致。(2) The present invention guarantees that the calculation amount of all subframes is consistent, but the calculation amount of each sampling point in each subframe is inconsistent.

(3)本发明中每帧运算时总是先计算最新的N-M+1点的最大值Frame_MAX，在本子帧的后续采样点中只需要计算M点的最大值。(3) In the calculation of each frame in the present invention, the latest maximum value Frame_MAX of N-M+1 points is always calculated first, and only the maximum value of M points needs to be calculated in the subsequent sampling points of this subframe.

当然，本发明还可有其他多种实施例，在不背离本发明精神及其实质的情况下，熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形，但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。Of course, the present invention can also have other various embodiments, and those skilled in the art can make various corresponding changes and deformations according to the present invention without departing from the spirit and essence of the present invention, but these corresponding Changes and deformations should belong to the scope of protection of the appended claims of the present invention.

Claims

1, an implementation method of near-end speech detection in an echo suppressor, adopts Geigel algorithm to carry out near-end speech detection, it is characterized in that, get M sampling points as a subframe in far-end input sequence y (i), M is less than the filter length N in the echo suppressor, when calculating the input of the first sampling point of each subframe, first obtain |y(i)| to | y(i-N+M)|The maximum value of the latest data of a total of N-M+1 sampling points, and save it as Frame_MAX, and then perform operations on the M sampling points of the subframe one by one, and take the maximum value Frame_MAX and the maximum value of the data of the remaining M-1 sampling points in the current sliding window are used as the maximum value MAX of each sampling point of the subframe, if the near-end input sequence s(i ) satisfies the condition |s(i)|≥u*MAX, then the near-end has voice, where u is the echo amplitude attenuation factor.

2, the implementation method of near-end voice detection in the echo suppressor according to claim 1, is characterized in that, the method for carrying out near-end voice detection to the subframe in described far-end input sequence y (i) comprises the following steps :

Step 1, initialize a subframe counter with a count value of 0;

Step 2: Take the absolute value of the near-end input data |s(i)|, take the absolute value of the far-end input data |y(i)|, and save the data |y(i)| to the remote data buffer In the area, the buffer stores a digital sequence of N sampling points from |y(i)| to |y(i-N+1)|;

Step 3, if the subframe counter is 0, it means that the current remote data sampling point is the first sampling point of the subframe, calculate the latest from |y(i)| to |y(i -N+M)|A total of the maximum value of N-M+1 sampling point data, and save this value as Frame_MAX; if the count value of the subframe counter is not 0, then perform step 4;

Step 4, calculate the maximum value MAX of Frame_MAX and the remaining M-1 sampling points in the remote data buffer;

Step 5: Compare the product u*MAX of the attenuation factor u and MAX with the absolute value |s(i)| of the near-end input data. If the condition |s(i)|≥u*MAX is satisfied, it indicates that there is voice at the near-end enter;

Step 6, updating the subframe counter, adding 1 to the subframe counter value;

Step 7, if the subframe counter value is equal to M, then reset the subframe counter value to 0;

Step eight, skip to step two, and calculate the next sampling point data of the subframe.

3. The method for realizing near-end speech detection in the echo suppressor according to claim 1 or 2 is characterized in that, when the method is applied to the application of the speech frame length concept itself, M should be the same time as the The common divisor of the speech frame length.

4. The implementation method of near-end voice detection in the echo suppressor according to claim 3, characterized in that, for a fixed system, according to different design requirements, there are one or two optimal M values, which are used in The optimal M value consumes the smallest number of processor operation instruction cycles.

5. The implementation method of near-end voice detection in the echo suppressor according to claim 4, characterized in that the optimal M value is obtained through actual testing.

6. The implementation method of near-end voice detection in the echo suppressor according to claim 4, characterized in that, in the method, the calculation amount of all subframes is the same, and the calculation amount of each sampling point in each subframe is The amount of operation is inconsistent.

7. The implementation method of near-end voice detection in the echo suppressor according to claim 3, characterized in that referring to the subscriber circuit echo attenuates at least 6dB relative to speech, the echo amplitude attenuation factor u is set to 0.5.