CN120034813A

CN120034813A - Howling/feedback suppression method, device, medium and product in sound reinforcement scene

Info

Publication number: CN120034813A
Application number: CN202510479617.4A
Authority: CN
Inventors: 赵殊阳; 蔡佳纹; 陈凯南; 卢鸿欣; 曾心; 曾诚; 孙骁良; 杨柳; 庞雨虹; 管语希
Original assignee: Benxiang Space Zhuhai Technology Co ltd
Current assignee: Benxiang Space Zhuhai Technology Co ltd
Priority date: 2025-04-17
Filing date: 2025-04-17
Publication date: 2025-05-23
Anticipated expiration: 2045-04-17
Also published as: CN120034813B

Abstract

The invention provides a howling/feedback inhibition method, a device, a medium and a product under an expanded sound field, wherein the method comprises the steps of obtaining a microphone signal of a current frame; the method comprises the steps of carrying out frequency domain self-adaptive filtering processing on a microphone to be processed and outputting a predicted target signal to a loudspeaker, wherein the frequency domain self-adaptive filtering processing comprises the steps of multiplying an estimated acoustic transfer function of a previous frame with a loudspeaker signal of the previous frame on a frequency domain to obtain a feedback estimated signal, subtracting the feedback estimated signal from the microphone signal to obtain a residual signal, carrying out inverse transformation on the residual signal to serve as the predicted target signal and output the predicted target signal to the loudspeaker, and simultaneously carrying out iteration on the estimated acoustic transfer function of the previous frame to obtain an estimated acoustic transfer function of a current frame, wherein the step size of iteration is controlled through a recurrent neural network model. The method and the device can improve the accuracy of acoustic feedback estimation in various actual scenes, improve the distortion degree in the expanded sound field and improve the maximum gain.

Description

Howling/feedback inhibition method, device, medium and product under sound expansion scene

Technical Field

The invention relates to the technical field of audio processing, in particular to a howling/feedback inhibition method, device, medium and product under an expanded sound scene.

Background

Howling is a common problem in audio systems, and is commonly found in conference systems, classroom sound expansion, lectures, KTV, hearing aids, and other scenarios. The root cause of the howling is that the sound output by the speaker is picked up again by the microphone and amplified, forming a positive feedback loop. During this cycle, certain signals of specific frequencies can produce harsh squeaking sounds when rapidly accumulated and amplified in the feedback loop. Currently existing howling suppression techniques mainly include frequency shifting, notch, deep speech enhancement, and adaptive filtering.

The frequency shift technology is early in appearance and simple in implementation. The frequency shift can shift the peak of the feedback signal, so that the peak staggering and repeated superposition of the signal in the feedback loop can be realized, and the feedback amplification speed can be reduced. The frequency shift can slightly increase the gain when howling occurs, typically by 1dB-2dB. Frequency shifting can bring significant distortion while the gain that can be boosted is limited.

The trapping technique includes static trapping and adaptive trapping. Static notch requires measurement of a specific acoustic environment and designing a series of notch filters to filter the frequency bands where howling is most likely to occur. The adaptive notch realizes automatic estimation of the howling frequency through howling detection, and a filter is automatically designed to realize suppression of the howling frequency. The detection modes of the howling frequency points can be divided into two types, namely a fixed rule based on frequency spectrum characteristics and a deep learning method. However, the use of a notch filter causes signal distortion. For complex acoustic environments, such as multi-speaker and multi-microphone, and microphone moving scenes, the frequency points that may generate howling are excessive, and the use of a large number of traps can cause serious distortion, so that the speech is not intelligible. When howling detection is relatively reliable, the gain at which howling occurs is typically increased by 6dB-8dB by the notch.

The deep voice enhancement technology is to implement howling suppression by a method of performing voice enhancement through deep learning. These methods do not rely on known loudspeaker signals nor on predictions of acoustic transfer functions, and remove howling directly from microphone signals by neural network models in a manner that eliminates noise. Since the estimation of the acoustic transfer function is not relied upon, it is very robust to changes in the acoustic environment. However, since the known speaker signal is not used as a reference, it is difficult for the model to accurately estimate the feedback signal and stably restore the original sound source. Particularly, when the feedback is strong, the signal-to-noise ratio of the microphone signal is low, and the signal after voice enhancement is often severely distorted. In addition, sounds that are characterized as howling, such as some instruments, are highly susceptible to false elimination.

The adaptive filtering technology is a signal processing technology capable of continuously iterating parameters of the adaptive filtering technology to approach to predefined error minimization through real-time input and output of a system, and the adaptive filtering technology achieves the aim of restoring target voice by estimating a feedback signal and eliminating according to the estimated feedback signal. Since the loudspeaker signal is known, under the condition that the acoustic transfer function is completely known, not only the occurrence of howling can be completely avoided, but also feedback can be perfectly eliminated, and the target sound source can be truly restored. So perfect adaptive filtering has no upper limit on the maximum gain improvement. The difficulty with adaptive filtering is the estimation of the acoustic transfer function. The adaptive filtering is carried out by feeding back the result of the elimination, and the estimation of the acoustic transfer function is continuously and iteratively adjusted to gradually approach the actual acoustic transfer function. The adaptive filtering is divided into time domain adaptive filtering and frequency domain adaptive filtering, the time domain adaptive filtering is realized through time domain convolution according to the acting domain of the estimated acoustic transfer function, the delay is small, but the fitting of the longer acoustic transfer function is insufficient, the adaptive filtering is applied to the hearing aid field more, the frequency domain adaptive filtering processes the signal on the frequency domain, the convolution operation in the time domain adaptive filtering is replaced by multiplication, the time complexity grows slowly along with the length of the degree filter, so that the applicable filter is longer in time, the fitting capability of the long acoustic transfer function is stronger, and the adaptive filtering is suitable for a sound expansion scene.

In conventional frequency domain adaptive filtering, the control of the acoustic transfer function iteration can be categorized as wiener filtering, kalman filtering, etc. based on different theoretical assumptions. It is assumed that the inclusion of the acoustic transfer function does not mutate, the feedback signal is uncorrelated with the current target signal, etc. In practical applications, these assumptions often cannot be satisfied, such as when the direction and position of the microphone may be abrupt, the feedback signal is highly correlated with the target signal when the long voice is dragged in singing, etc. The estimation of the acoustic transfer function and thus the feedback will now produce a large error or even severe distortion of the spread sound or no howling suppression effect. The existing scheme provides a method for predicting feedback path mutation through a neural network so as to overcome the problem of locking in Kalman filtering, and aims at solving one problem of statistical adaptive filtering, but cannot systematically solve the defect of the statistical adaptive filtering. The existing other scheme is to assist the self-adaptive filtering through the neural network, but still adopts a statistical method to carry out the self-adaptive filtering, only judges the situation of locking of one step length through the neural network, and resets the step length in a targeted way, so that the requirement in a multi-sound scene can not be met.

Disclosure of Invention

The first object of the present invention is to provide a howling/feedback suppression method under an expanded sound scene, which can improve the accuracy of acoustic feedback estimation under various actual scenes, improve the distortion degree under the expanded sound scene and increase the maximum gain.

A second object of the present invention is to provide a computer apparatus for implementing the howling/feedback suppression method in the above-mentioned sound amplifying scenario.

A third object of the present invention is to provide a computer-readable storage medium embodying the howling/feedback suppression method in the above-described sound amplification scenario.

A fourth object of the present invention is to provide a computer program product implementing the howling/feedback suppression method in a spread-spectrum scenario as described above.

In order to achieve the first object, the howling/feedback suppression method under a spread-spectrum scene comprises the steps of obtaining a microphone signal of a current frame, carrying out frequency domain adaptive filtering processing on the signal of the current frame, and outputting a predicted target signal to a loudspeaker, wherein the frequency domain adaptive filtering processing comprises the steps of multiplying an estimated acoustic transfer function of a previous frame by a loudspeaker signal of the previous frame on a frequency domain to obtain a feedback estimated signal, subtracting the feedback estimated signal from the microphone signal of the current frame on the frequency domain to obtain a residual signal, carrying out inverse transformation on the residual signal to obtain a predicted target signal, outputting the residual signal to the loudspeaker through a feedforward path, carrying out iteration on the estimated acoustic transfer function of the previous frame towards a direction in which the residual signal of the current frame and the loudspeaker signal of the previous frame are smaller, obtaining the estimated acoustic transfer function of the current frame, and controlling the iterative step size through a pre-trained recurrent neural network model.

According to the scheme, howling suppression is performed in a frequency domain adaptive filtering mode, wherein the step length is controlled through a pre-trained recurrent neural network model to perform iteration of estimating an acoustic transfer function, the adaptive process mode is controlled based on statistical assumption in the traditional adaptive filtering mode is replaced, accuracy of acoustic feedback estimation in more actual scenes can be covered, and the distortion degree and the maximum gain in the sound-spreading field are improved remarkably.

The method comprises the steps of simulating impulse responses changing on different movement tracks in rooms with different sizes, simulating feedback paths through the collected voice and music data through the generated impulse responses, and processing through adaptive filtering, wherein the iteration step length of the adaptive filtering is controlled through the recurrent neural network model, and a loss function is a mean square error between the simulated feedback signals and amplitude spectrums of feedback signals estimated in the adaptive filtering in the whole simulation process.

Therefore, the collected voice and music data are used as training data to cover diversified scenes such as sudden change of acoustic function, long drag sound, strong background noise and the like, in the self-adaption process, the iteration step length for estimating the acoustic transfer function depends on the minimum regression of the estimation error of the feedback signal in the training data, and the feedback estimation is ensured to be the lowest and the sound recovery is ensured to be the most accurate under various real scenes. The method solves the problems that the traditional adaptive filtering does not model real audio data and is not optimized enough in many real scenes.

In a further scheme, during the training of the recurrent neural network model, in the process of simulating a feedback path by using the collected voice and music data through the generated impulse response, the feedback gain corresponding to the feedback path is random in a set range, wherein the feedback gain needs to be set below a critical gain when the training is started.

It follows that in the beginning of the training process, the estimated premature divergence of the acoustic transfer function is avoided so that the model can learn better.

The further scheme is that the input of the recurrent neural network model is a vector formed by combining the amplitude spectrums of the microphone signals of each frame and the amplitude spectrums of the residual signals, the vector is output as a vector with the same length as the input as a hidden layer, the hidden layer is activated through a linear layer by using a logic Stirling function, and a single scalar is output as a step length shared when all sub-bands of the current frame are iterated.

The further scheme is that the input of the recurrent neural network model is a vector formed by combining the amplitude spectrums of the microphone signals of each frame and the amplitude spectrums of the residual signals, the vector is output as a vector with the same length as the input as a hidden layer, the hidden layer is activated through a linear layer and by using a logic Stir function, and step sizes adopted in each sub-band iteration of the current frame are respectively output.

It can be seen that the recurrent neural network model can also be a step size adopted when each sub-band iteration of the current frame is output, so that the iteration effect of the estimated acoustic transfer function is improved.

Further, the recurrent neural network model has a model structure of LSTM or GRUs.

Further, the estimated acoustic transfer function of the previous frame iterates toward a direction in which the residual signal of the current frame and the speaker signal of the previous frame approach less cross-correlation, expressed as:

, For the gradient of the m-th subband at the time of iteration of the estimated acoustic transfer function of the k-1 th frame, An mth subband representing the residual signal of the kth frame,An mth subband of the speaker signal for the kth frame,Is thatIs used to determine the complex number of the conjugate,For recursive smoothing of speaker spectral energy for the m-th sub-band of the k-th frame,, wherein,Obtaining the estimated acoustic transfer function of the current frame:

, wherein, For the m-th subband of the estimated acoustic transfer function of the k-th frame,For the m-th subband of the estimated acoustic transfer function of the k-1 th frame,Is the step size of the iteration.

In order to achieve the second objective, the present invention provides a computer device, which includes a processor and a memory, wherein the memory stores a computer program, and the computer program when executed by the processor implements the howling/feedback suppression method based on neural network control adaptive filtering.

In order to achieve the third object, the present invention provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the howling/feedback suppression method under a spread spectrum scene.

In order to achieve the fourth object, the present invention provides a computer program product comprising computer instructions, wherein the computer instructions, when executed by a processor, implement the howling/feedback suppression method under an expanded sound scene.

Drawings

Fig. 1 is a flowchart of an embodiment of a howling/feedback suppression method in a sound amplification scenario of the present invention.

Fig. 2 is a schematic block diagram of an embodiment of the howling/feedback suppression method under the extended sound scene of the present invention.

The invention is further described below with reference to the drawings and examples.

Detailed Description

According to the howling/feedback inhibition method under the expanded sound field, iteration of the control step length of the recurrent neural network obtained through training on a large amount of real data is adopted, prediction accuracy of estimated acoustic transfer functions under various actual scenes is improved, and the maximum gain and distortion degree which can be achieved under the expanded sound field are improved. The invention also provides a computer device, a computer readable storage medium and a computer program product for realizing the howling suppression method.

An embodiment of a howling/feedback suppression method under a spread-spectrum scene:

The present embodiment is described with respect to indoor sound-expansion scenes. A microphone and a loudspeaker are arranged in the room, and signals collected by the microphone are processed by the adaptive filtering system and then output to the loudspeaker. In the scene, the target sound source is output to the loudspeaker after being collected by the microphone, sound played by the loudspeaker is collected by the microphone after being reflected by the surrounding environment, and the feedback signal collected by the microphone is analyzed and eliminated in real time by the adaptive filtering system, so that the howling phenomenon cannot occur in the sound expansion process of the target sound source, and the aim of howling inhibition is achieved.

The loudspeaker signals are played, and the sound waves are picked up again by the microphone after being reflected in the room and played again by the loudspeaker to form a closed loop system. The acoustic transfer function is typically reduced to a linear time-invariant system (LINEAR TIME-INVARIANT, LTI) based on the assumption that the actual physical environment is stable and the sound source and receiver positions are fixed. The acoustic transfer function (Acoustic Transfer Function, ATF) is a mathematical representation describing the change in audio signal during the propagation of sound from a sound source to a receiving point, which basically describes how an acoustic system "transfers" or "transforms" an input signal.

Thus, in this closed loop system with feedback, the microphone signal to be processed is represented as, wherein,A parameter representing a linear system of length L at time point n,Representing the last L samples of the loudspeaker signal at time point n,The signal representing the target sound source is the signal needed by the adaptive filtering systemA target of the reduction.

The adaptive filtering system realizes howling suppression by executing the howling/feedback suppression method in the sound expansion scene of the present embodiment. In particular, the adaptive filtering system is implemented by a computer program, see fig. 1, which when executed comprises the steps of:

and S11, acquiring a microphone signal of the current frame.

And S12, performing frequency domain adaptive filtering processing on the microphone signal of the current frame.

And S13, outputting a predicted target signal to a loudspeaker, and iterating the estimated acoustic transfer function of the previous frame to obtain the estimated acoustic transfer function of the current frame.

Referring to fig. 2, in the above step S11, the microphone 20 collects the microphone signal of the current frame in the frequency domainSignal comprising target sound sourceAnd the speaker signal of the previous frameAfter entering the loudspeaker 30, the loudspeaker 30 outputs an actual feedback signal obtained via an actual acoustic transfer function H。

Each frame of microphone signal has the same duration as each frame of speaker signal and is divided into the same number of sub-bands in the frequency domain after short time Fourier transform, and the m-th sub-band of the microphone signal of the current frame (k-th frame) is expressed asThe m-th subband of the loudspeaker signal of the previous frame (the k-1 th frame) is represented as. The mth subband of the estimated acoustic transfer function of the previous frame (the k-1 th frame) is represented as. In the step S12, the feedback estimation signal is obtained by multiplying the estimated acoustic transfer function of the previous frame with the speaker signal of the previous frame in the frequency domain, specifically including the mth sub-band of the estimated acoustic transfer function of the previous frame in the frequency domainM th subband of speaker signal with previous frameMultiplying to obtain the mth subband of the feedback estimation signal of the kth frame. Then subtracting the feedback estimation signal from the microphone signal of the current frame in the frequency domain to obtain a residual signal, which specifically comprises subtracting the corresponding sub-band of the feedback estimation signal from each sub-band of the microphone signal of the current frame to obtain the residual signal corresponding to the sub-band, wherein the m sub-band of the residual signal of the k frame is expressed as,。

In the above step S13, the residual signal is subjected to inverse short-time fourier transform to obtain a prediction target signal in the time domain, and the prediction target signal is output to the speaker 30 through the feedforward path G. The feed forward path G refers to the path that a signal passes from the system input to the output and typically includes post processing such as a limiter. Simultaneously, the estimated acoustic transfer function of the previous frame is iterated towards the direction that the residual signal of the current frame and the loudspeaker signal of the previous frame are smaller in the approach cross-correlation, so that the estimated acoustic transfer function of the current frame is obtained, and the iterated step size is controlled through the pre-trained recurrent neural network model 10.

Each subband is independently performed during the iteration of the estimated acoustic transfer function of the previous frame. Specifically, the estimated acoustic transfer function for the previous frame while determining the speaker outputResidual signal towards current frameSpeaker signal from previous frameToward smaller directional iterations of the cross-correlation, the process is expressed as:, for the gradient of the m-th subband when iterating the estimated acoustic transfer function of the k-th frame, A residual signal representing the m-th subband of the k-th frame,An mth subband of the speaker signal for the kth frame,Is thatIs used to determine the complex number of the conjugate,For recursive smoothing of speaker spectral energy for the m-th sub-band of the k-th frame,, wherein,For smoothing coefficients, empirically take values between 0.8 and 0.9, so that the mth subband of the estimated acoustic transfer function of the kth frame can be obtained, wherein,For the m-th subband of the estimated acoustic transfer function of the k-th frame,For the m-th subband of the estimated acoustic transfer function of the k-1 th frame,For the iterative step length, namely, in the iterative process of the estimated acoustic transfer function of the kth-1 frame, the step length is adopted when each sub-band is iterated respectively. The step size of each frame iteration is controlled by a pre-trained recurrent neural network model to find the most appropriate step size relative to the actual feedback signal.

The input of the recurrent neural network model of this embodiment is a vector formed by combining the amplitude spectrums of the microphone signal and the amplitude spectrums of the residual signal of each frame, and the vector is output as a vector with the same length as the input as a hidden layer, the hidden layer is activated by a linear layer and using a logistic function, and a single scalar is output as a step size shared when all sub-bands of the current frame are iterated, as follows: Wherein the RNN is the recurrent neural network model, Is a hidden state of the recurrent neural network, and is used for memorizing and backward transferring previous information in the time sequence.

In various embodiments, the pre-trained recurrent neural network model is output as a vectorThe vector isRepresenting the step size used at each subband iteration in the current frame, i.eThus, in the process of iterating the estimated acoustic transfer function of the previous frame to obtain the estimated acoustic transfer function of the current frame, the iteration is performed with the step length corresponding to each sub-band in the current frame. For example, the mth subband of the estimated acoustic transfer function of the kth frameM-1 th subband of the estimated acoustic transfer function of the kth frameM-2 th subband of the estimated acoustic transfer function of the kth frame。

Optionally, the recurrent neural network model has a model structure of Long Short-Term Memory (LSTM) or gate loop unit (Gated recurrent units, GRUs).

The input and output of the recurrent neural network model in the training process are generated through real-time simulation of a simulation feedback environment. The training process comprises the following steps:

The impulse response changing on different moving tracks in rooms with different sizes is simulated, and the method can be realized by the existing IMAGE method. And simulating feedback paths of the collected voice and music data through the generated impulse response, and processing by using adaptive filtering, wherein the iteration step length of the adaptive filtering is controlled through a recurrent neural network model, and a loss function is the mean square error in the whole simulation process between the simulated feedback signal and the amplitude spectrum of the feedback signal estimated in the adaptive filtering. The loss function comes from the whole simulation process to avoid feedback prediction over-fitting of the model to a certain frame. The feedback path refers to a path of a signal output by the system, which is transmitted from an output end to an input end, and the feedback path under the expanded acoustic scene is the actual acoustic transfer function.

During the training of the recurrent neural network model, in the process of simulating a feedback path by the collected voice and music data through the generated impulse response, the feedback gain corresponding to the feedback path is random in a set range. In the initial training stage, the feedback gain needs to be set below a critical gain, so that the model cannot learn because the estimated acoustic function diverges too early. With the improvement of the accuracy of the model effect, after howling can be avoided under higher gain, the feedback gain range in simulation can be continuously improved, so that the model gradually has the capability of coping with the higher feedback gain. The critical gain, in the context of howling suppression, refers to the maximum gain value that an acoustic system can withstand before howling begins to occur, usually expressed in units of decibels (dB), and is also known as the maximum stable gain (Maximum Stable Gain, MSG), i.e., the upper gain limit at which the system can safely operate in practical applications.

In summary, the howling suppression is performed by the frequency domain adaptive filtering method, wherein the iteration of estimating the acoustic transfer function is performed by controlling the step size by the pre-trained recurrent neural network model, instead of the statistical hypothesis-based adaptive process control method in the conventional adaptive filtering, the method has the advantages that there is no theoretical upper limit on the maximum stable gain (Additional Stable Gain, ASG) improvement, and a processing method for damaging the sound quality is not adopted, so that the processed sound has high reduction degree. The howling suppressor or the feedback suppressor realized based on the invention can cover the accuracy of acoustic feedback estimation in more actual scenes, and obviously improve the distortion degree and the maximum gain in the expanded sound scene.

Computer apparatus embodiment:

The computer device of the embodiment comprises a processor and a memory, wherein the memory stores a computer program, and the processor implements the embodiment of the howling/feedback suppression method under the expanded sound field when executing the computer program.

Computer devices may include, but are not limited to, processors and memory. Those skilled in the art will appreciate that a computer apparatus may include more or fewer components, or may combine certain components, or different components, e.g., a computer apparatus may also include input and output devices, network access devices, buses, etc.

For example, the Processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (Field Programmable GATE ARRAY, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microcontroller or the processor may be any conventional processor or the like. The processor is the control center of the computer device and connects the various parts of the entire computer device using various interfaces and lines.

The memory may be used to store computer programs and/or modules, and the controller implements various functions of the computer device by running or executing the computer programs and/or modules stored in the memory, and invoking data stored in the memory. For example, the memory may mainly include a storage program area that may store an operating system, an application program required for at least one function (e.g., a sound receiving function, a sound converting to text function, etc.), etc., and a storage data area that may store data (e.g., audio data, text data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Computer-readable storage medium embodiments:

The modules integrated with the computer apparatus of the above embodiments may be stored in a computer-readable storage medium if implemented in the form of software functional units and sold or used as a stand-alone product. Based on such understanding, implementing all or part of the flow of the howling/feedback suppression method embodiment under the amplified scene may also be accomplished by instructing the relevant hardware by a computer program, which may be stored in a computer-readable storage medium, and which, when executed by the controller, may implement the steps of the howling/feedback suppression method embodiment under the amplified scene. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The storage medium may include any entity or device capable of carrying computer program code, recording medium, USB flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier wave signals, telecommunications signals, software distribution media, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

Computer program product embodiments:

The computer program product of the present embodiment includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps of the howling/feedback suppression method embodiment in the above-mentioned sound amplifying scenario.

Finally, it should be emphasized that the foregoing description is merely illustrative of the preferred embodiments of the invention, and that various changes and modifications can be made by those skilled in the art without departing from the spirit and principles of the invention, and any such modifications, equivalents, improvements, etc. are intended to be included within the scope of the invention.

Claims

1. The howling/feedback inhibition method under the expanded sound field is characterized by comprising the following steps:

acquiring a microphone signal of a current frame;

Performing frequency domain adaptive filtering processing on the microphone signal of the current frame, and outputting a prediction target signal to a loudspeaker:

the frequency domain adaptive filtering process comprises the following steps:

Multiplying the estimated acoustic transfer function of the previous frame with the loudspeaker signal of the previous frame in the frequency domain to obtain a feedback estimated signal;

subtracting the feedback estimation signal from the microphone signal of the current frame in the frequency domain to obtain a residual signal;

the residual signal is used as the prediction target signal after inverse transformation and is output to a loudspeaker through a feedforward path, and simultaneously, the estimated acoustic transfer function of the previous frame is iterated towards the direction that the residual signal of the current frame and the loudspeaker signal of the previous frame approach to be less in cross-correlation, so that the estimated acoustic transfer function of the current frame is obtained, and the iterative step length is controlled through a pre-trained recurrent neural network model;

the recurrent neural network model comprises the following steps in the training process:

simulating impulse responses varied on different movement trajectories in rooms of different sizes;

And simulating feedback paths of the collected voice and music data through the generated impulse response, and processing by using adaptive filtering, wherein the iteration step length of the adaptive filtering is controlled by the recurrent neural network model, and a loss function is the mean square error between the simulated feedback signal and the amplitude spectrum of the feedback signal estimated in the adaptive filtering in the whole simulation process.

2. The method for howling/feedback suppression under an expanded sound scene as claimed in claim 1, wherein:

During training of the recurrent neural network model, in the process of simulating a feedback path through the generated impulse response by using the collected voice and music data, the feedback gain corresponding to the feedback path is random in a set range, wherein the feedback gain is set below a critical gain when training is started.

3. The method for howling/feedback suppression under an expanded sound scene as claimed in claim 1, wherein:

the model structure of the recurrent neural network model is LSTM or GRUs.

4. The method for howling/feedback suppression under an expanded sound scene as claimed in claim 1, wherein:

The input of the recurrent neural network model is a vector formed by combining the amplitude spectrums of the microphone signals of each frame and the amplitude spectrums of the residual signals, the vector is output as a vector with the same length as the input as a hidden layer, the hidden layer is activated through a linear layer and by using a logic Stirling function, and a single scalar is output, and the single scalar is used as a step length shared by all sub-bands of the current frame when iteration is carried out.

5. The method for howling/feedback suppression under an expanded sound scene as claimed in claim 1, wherein:

The input of the recurrent neural network model is a vector formed by combining the amplitude spectrums of the microphone signals of each frame and the amplitude spectrums of the residual signals, the vector is output as a vector with the same length as the input as a hidden layer, the hidden layer is activated through a linear layer and by using a logic Style function, and the step sizes adopted in each sub-band iteration of the current frame are respectively output.

6. A howling/feedback suppression method in a sound amplifying scenario according to any of claims 1 to 4, characterized by:

The estimated acoustic transfer function of the previous frame iterates toward a direction in which the residual signal of the current frame and the speaker signal of the previous frame approach less cross-correlation, expressed as:

, for the gradient of the m-th subband when iterating the estimated acoustic transfer function for the k-1 th frame, An mth subband representing the residual signal of the kth frame,An mth subband of the speaker signal for the kth frame,Is thatIs used to determine the complex number of the conjugate,For recursive smoothing of speaker spectral energy for the m-th sub-band of the k-th frame,, wherein,Is a smoothing coefficient;

An estimated acoustic transfer function of the current frame is obtained:

7. A computer apparatus comprising a processor and a memory, characterized in that:

the memory has stored thereon a computer program which, when executed by the processor, implements the howling/feedback suppression method in a sound amplifying scenario as set forth in any of the preceding claims 1 to 6.

8. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements a howling/feedback suppression method in a sound amplifying scenario as claimed in any of the preceding claims 1 to 6.

9. A computer program product comprising computer instructions, characterized in that:

the computer instructions, when executed by a processor, implement a howling/feedback suppression method in a sound amplifying scenario as claimed in any one of the preceding claims 1 to 6.