CN1094280C

CN1094280C - Mute Detection Method in Internet Telephony

Info

Publication number: CN1094280C
Application number: CN 98118363
Authority: CN
Inventors: 张景嵩; 谢纲; 薛克忠; 温周斌
Original assignee: Inventec Corp
Current assignee: Inventec Corp
Priority date: 1998-08-17
Filing date: 1998-08-17
Publication date: 2002-11-13
Anticipated expiration: 2018-08-17
Also published as: CN1245376A

Abstract

A silence detection method in network telephone includes the following steps: when audio data of a current frame is collected, the amplitude value of the audio data is calculated to obtain short-time average energy, when the short-time average energy exceeds a speech energy critical value, the speech is detected, then the short-time average energy of the current frame passes through a low-pass filter to obtain long-time average energy of the frame and detect the long-time average energy, when the long-time average energy of continuous frames is lower than the critical value of silence energy, silence is detected, then the audio data of the current frame is collected, the conversion times of positive and negative signs of adjacent data are calculated, and once the zero-crossing rate exceeds the critical value, the noise is judged to be detected.

Description

Silence detection method in Internet telephony

技术领域Technical field

本发明涉及一种网络电话中的静音检测方法。The invention relates to a mute detection method in Internet telephone.

所说静音检测方法是指在网络电话中通过静音检测模组内的各检测部分，正确的分离语音和背景噪声，以滤除冗余音频数据，从而能在有限的网络频宽内得到最佳的通话效果，此外，当检测到未传送语音数据时，即会传送静音帧(frame)数据，该静音帧数据能较语音数据节省传输频宽，同时，接收方一接收到静音帧数据时，也会回放背景噪声，而能保持说话的同步，其次，在半双工的说听模式中设有一转换模组，当接收到网络上的语音数据时，会自动转换成接听模式，而当本地检测到语音时则自动转换成说话模式，并设有按键，方便用户随时都能说话。The so-called silent detection method refers to the correct separation of voice and background noise through the various detection parts in the silent detection module in the Internet phone, so as to filter out redundant audio data, so as to obtain the best audio frequency within the limited network bandwidth. In addition, when it is detected that the voice data is not transmitted, it will transmit the silent frame data, which can save the transmission bandwidth compared with the voice data. At the same time, when the receiver receives the silent frame data, Background noise will also be played back, and the synchronization of speaking can be maintained. Secondly, there is a conversion module in the half-duplex speaking and listening mode. When receiving voice data on the network, it will automatically switch to answering mode. When the local When the voice is detected, it will automatically switch to the speaking mode, and a button is provided to facilitate the user to speak at any time.

背景技术 Background technique

一般网络电话属于即时通讯系统，但是受到网络带宽的限制，必须在不破坏语音音质的前提下尽量减少数据传输，另外，如果用户使用了半双工的声霸卡，就不能同时说话和接听，而只能通过转换说/听模式间接实现交谈，鉴于以上两大问题，人们开始借鉴语音辩识领域中的静音技术，以此来滤除冗余语音数据，并且实现半双工说/听模式的自动转换。Generally, VoIP is an instant messaging system, but due to the limitation of network bandwidth, data transmission must be reduced as much as possible without damaging the voice quality. In addition, if the user uses a half-duplex Sound Blaster card, he cannot speak and answer at the same time. The conversation can only be achieved indirectly by switching the speaking/listening mode. In view of the above two problems, people began to learn from the mute technology in the field of speech recognition to filter out redundant voice data and realize the half-duplex speaking/listening mode. automatic conversion.

但是，静音技术只是作为一种辅助手段，仍有以下缺点：However, the mute technology is only used as an auxiliary means, and still has the following disadvantages:

1.简单采用短时平均能量检测有无静音，对环境的适应能力差；1. Simple use of short-term average energy to detect whether there is silence, poor adaptability to the environment;

2.单纯滤除静音数据而不作任何处理，可能造成交谈的不同步性；2. Simply filtering out the silent data without any processing may cause the asynchrony of the conversation;

3.半双工自动转换说/听模式，忽视了交谈者的主动性。3. Half-duplex automatically switches between speaking/listening mode, ignoring the initiative of the talker.

因此，为改进上述惯用技术中的缺点，本发明提供一种网络电话中的静音检测方法。Therefore, in order to improve the above-mentioned shortcomings in the conventional technology, the present invention provides a method for detecting silence in Internet telephony.

发明内容Contents of the invention

本发明的目的，在于提供一种网络电话中的静音检测方法，主要是在网络电话上通过静音检测模组内的各检测部分，正确分离语音和背景噪声，以滤除冗余音频数据。The object of the present invention is to provide a method for detecting silence in the Internet phone, mainly through each detection part in the silence detection module on the Internet phone to correctly separate voice and background noise to filter out redundant audio data.

为实现所述目的，本发明提供一种网络电话中的静音检测方法，该方法通过静音检测模组进行，该静音检测模组内的各检测部分包括有语音检测、静音检测及噪声检测等，所述静音检测方法的步骤如下，首先，当收集到当前一帧的音频数据时，通过求其振幅值和而得到短时平均能量，一旦短时平均能量超过语音能量的临界值时，即代表检测到语音，其次，再将当前一帧的短时平均能量通过低通滤波器，得到这一帧的长时平均能量并进行检测，当连续数帧的长时平均能量均低于静音能量的临界值时，即检测到静音，嗣后，再收集当前一帧的音频数据，计算相邻数据其正、负符号的变换次数，(即过零率)，一旦过零率超过临界值时，即判定检测到噪音，因此通过该三种检测分离出语音和背景噪声，从而能在有限的网络频宽内得到最佳通话效果。In order to achieve the stated purpose, the present invention provides a method for detecting silence in an Internet phone, the method is carried out by a silence detection module, and each detection part in the silence detection module includes voice detection, silence detection and noise detection, etc. The steps of the silent detection method are as follows. First, when the audio data of the current frame is collected, the short-term average energy is obtained by calculating the sum of its amplitude values. Once the short-term average energy exceeds the critical value of speech energy, it represents Speech is detected, and secondly, pass the short-term average energy of the current frame through a low-pass filter to obtain the long-term average energy of this frame and detect it. When the long-term average energy of consecutive frames is lower than the silence energy When the critical value is reached, silence is detected, and then the audio data of the current frame is collected, and the number of transformations of the positive and negative signs of the adjacent data is calculated (ie, the zero-crossing rate). Once the zero-crossing rate exceeds the critical value, that is It is determined that noise is detected, so the voice and background noise are separated through the three detections, so that the best call effect can be obtained within the limited network bandwidth.

本发明所提供的网络电话中的静音检测方法，其中，当传送方的静音检测系统检测到传送方未传送语音数据时，会同时传送一静音帧数据，该静音帧数据能较语音数据节省传输频宽，同时，当接收方接收到静音帧数据时，接收方也会传送当地的背景噪声至传送方，因此，双方不会因网络传送的延迟性，而有谈话不同步的感觉。The mute detection method in the Internet phone provided by the present invention, wherein, when the mute detection system of the transmitting party detects that the transmitting party has not transmitted voice data, it will transmit a silent frame data at the same time, and the silent frame data can save transmission compared with voice data At the same time, when the receiving party receives the silent frame data, the receiving party will also transmit the local background noise to the transmitting party. Therefore, the two parties will not feel that the conversation is out of sync due to the delay of network transmission.

本发明所提供的网络电话中的静音检测方法，其中，在半双工的说听模式中有一转换模组，当接收到网络上的语音数据时，会自动转换成接听模式，而当本地检测到语音时则自动转换成说话模式，并设有功能键，方便用户随时都能说话。In the method for detecting silence in the Internet phone provided by the present invention, there is a conversion module in the half-duplex speaking and listening mode, which will automatically convert to the answering mode when receiving voice data on the network, and when the local detection When it comes to voice, it will automatically switch to speaking mode, and it is equipped with function keys, so that users can speak at any time.

附图说明Description of drawings

为能更进一步的认识与了解本发明的目的、形状构造装置特征及其功效，再举实施例结合附图，详细说明如下：In order to further understand and understand the purpose of the present invention, the characteristics of the shape structure device and its effect, the embodiment is described in detail as follows in conjunction with the accompanying drawings:

图1是本发明的硬件结构示意图。Fig. 1 is a schematic diagram of the hardware structure of the present invention.

图2是本发明的全双工静音技术的硬件结构方块图。Fig. 2 is a block diagram of the hardware structure of the full-duplex mute technology of the present invention.

图3是本发明的半双工静音技术的硬件结构方块图。Fig. 3 is a block diagram of the hardware structure of the half-duplex mute technology of the present invention.

图4是本发明的全双工录音端静音模组的硬件结构图。Fig. 4 is a hardware structural diagram of the full-duplex recording end mute module of the present invention.

图5是本发明的全双工录音数据处理流程图。Fig. 5 is a flow chart of full-duplex recording data processing in the present invention.

图6是本发明的半双工录音端静音模组的硬件结构图。Fig. 6 is a hardware structural diagram of the mute module of the half-duplex recording end of the present invention.

图7是本发明的半双工录音数据处理流程图。Fig. 7 is a flow chart of half-duplex recording data processing in the present invention.

图8是本发明的语音检测流程图。Fig. 8 is a flowchart of speech detection in the present invention.

图9是本发明的静音检测流程图。Fig. 9 is a flow chart of silence detection in the present invention.

图10是本发明的噪音检测流程图。Fig. 10 is a flowchart of noise detection in the present invention.

图11是本发明全双工放音端静音模组的硬件结构图。Fig. 11 is a hardware structural diagram of the full-duplex sound-emitting terminal mute module of the present invention.

图12是本发明全双工放音数据处理流程图。Fig. 12 is a flow chart of full-duplex playback data processing in the present invention.

图13是本发明半双工放音端静音模组的硬件结构图。Fig. 13 is a hardware structural diagram of the half-duplex sound-emitting terminal mute module of the present invention.

图14是本发明半双工放音数据处理流程图。Fig. 14 is a flow chart of half-duplex playback data processing in the present invention.

图15A-15B是本发明录音端总流程图。15A-15B are the general flowchart of the recording terminal of the present invention.

图16是本发明放音端总流程图。Fig. 16 is a general flow chart of the sound playback terminal of the present invention.

具体实施方式 Detailed ways

参照图1所示，本发明是一种“网络电话中的静音检测方法”，其包括有一个人电脑11、声霸卡12、麦克风13、扬声器14、数据机或网络卡15等，其中，麦克风13是将所录制的声音信号转化成电信号并输入至声霸卡12中，扬声器14再将声霸卡12所输出的电信号转换成声音信号放送出去，参照图2、图3所示，一般电话通话模式包括有全双工模式(如图2所示)及半双工模式(如图3所示)，所谓全双工模式是指通话双方可同时进行说听，至于半双工模式是指在网络电话系统中，声霸卡处于半双工的工作方式，在当前时刻，只能进行其中一录音或放音状态，而不能同时进行录音或放音，而本发明的静音检测技术适用于全双工模式与半双工模式。Shown in Fig. 1 with reference to, the present invention is a kind of " mute detection method in network telephone ", and it comprises a personal computer 11, sound blaster card 12, microphone 13, loudspeaker 14, modem or network card 15 etc., wherein, microphone 13 is to convert the recorded sound signal into an electrical signal and input it into the Sound Blaster 12, and the speaker 14 converts the electrical signal output by the Sound Blaster 12 into a sound signal and sends it out, as shown in Fig. 2 and Fig. 3, General telephone call modes include full-duplex mode (as shown in Figure 2) and half-duplex mode (as shown in Figure 3). The so-called full-duplex mode means that both parties can talk and listen at the same time. It means that in the Internet telephone system, the Sound Blaster card is in the half-duplex working mode. At the current moment, only one of the recording or playback states can be performed, and the recording or playback cannot be performed at the same time. The silent detection technology of the present invention Applicable to full-duplex mode and half-duplex mode.

参照图2、图3所示，其是网络电话在全双工模式或半双工模式进行传送及接收语音的工作方块图，该全双工模式与半双工模式在传送语音时，都是先经由麦克风13将所录制到的语音分别传输至混音器21、模/数转换22、录音应用程序介面(录音API)23、录音端静音模组24、数据机或网络卡15再通过网络将语音传输至接收方；而全双工模式与半双工模式在接收语音时，都是先通过网络接收语音数据，再将该语音数据经由数据机或网络卡15、放音端静音模组25、接收应用程序介面(接收API)26、数/模转换27、混音器21，再由扬声器14播送出去；其中，本发明的静音检测技术是应用在录音端静音模组24及放音端静音模组25中，至于，在半双工模式中，其静音检测技术中尚包括有一转换模组28，以实现说/听模式的自动和强制转换。With reference to Fig. 2, shown in Fig. 3, it is the working block diagram that network telephone transmits and receives voice in full-duplex mode or half-duplex mode, and this full-duplex mode and half-duplex mode are both when transmitting voice. First, the recorded voice is transmitted to the mixer 21, the analog/digital conversion 22, the recording application program interface (recording API) 23, the recording end mute module 24, the modem or the network card 15 respectively through the microphone 13, and then through the network The voice is transmitted to the receiving party; while the full-duplex mode and the half-duplex mode receive voice, they first receive voice data through the network, and then the voice data passes through the modem or network card 15, the mute module at the playback end 25, receiving application program interface (receiving API) 26, digital/analog conversion 27, sound mixer 21, broadcast out by loudspeaker 14 again; In the terminal mute module 25, as for the half-duplex mode, its mute detection technology still includes a conversion module 28 to realize the automatic and forced conversion of the speaking/listening mode.

参照图4、图5所示，是全双工录音端静音模组24的硬件结构图及流程图，首先，先检测录音应用程序介面23所抽样的音频数据，若检测为语音数据，则启动编码器31，将语音数据编码，并按压数据选择开关32将编码数据传送给数据机或网络卡15，反之，若检测不是语音数据时，即关闭编码器31，并按压数据选择开关32将静音帧33数据传送至数据机或网络卡15。With reference to Fig. 4, shown in Fig. 5, be the hardware structural diagram and flow chart of full-duplex recording end mute module 24, at first, detect the audio data sampled by recording application program interface 23 earlier, if detect as voice data, then start Encoder 31 encodes voice data, and presses data selection switch 32 to transmit the encoded data to modem or network card 15, otherwise, if it is not voice data, close encoder 31, and press data selection switch 32 to mute Frame 33 data is sent to the modem or network card 15 .

参照图6、图7所示，是半双工录音端静音模组24的硬件结构图及流程图，首先，先检测录音应用程序介面23所抽样的音频数据，若检测为语音数据，则启动编码器31，将语音数据编码，并传送给数据机或网络卡15，反之，若检测不是语音数据时，即关闭编码器31，并将检测结果输入至转换模组28以启动说/听模式的转换。With reference to Fig. 6, shown in Fig. 7, be the hardware structural diagram and flow chart of half-duplex recording end mute module 24, at first, detect the audio data sampled by recording application program interface 23 earlier, if detect as voice data, then start The encoder 31 encodes the voice data and transmits it to the modem or the network card 15. On the contrary, if the detection is not voice data, the encoder 31 is turned off, and the detection result is input to the conversion module 28 to start the speaking/listening mode conversion.

在上述的全双工与半双工的录音端静音模组24中，其具有一静音检测模组34，而本发明即藉由该静音检测模组34从背景噪声中找出语音的开始和终止，而静音检测模组包含有语音检测、静音检测及噪音检测等；参照图8所示，该语音检测是作为检测语音的起始部分(句子或段落的开头)，它是采用短时平均能量检测，首先收集当前一帧的音频数据N，将该音频数据N求其振幅值和并通过短时滤波，而得到这一帧的短时平均能量Se′，一旦短时平均能量Se′超过语音能量Se的临界值时，即表示检测到语音，而从这一帧开始的音频数据N皆被视为语音数据，直至检测到静音为止。4In the above-mentioned full-duplex and half-duplex recording end mute module 24, it has a mute detection module 34, and the present invention finds out the beginning and the beginning of the voice from the background noise by the mute detection module 34. Termination, and the silence detection module includes voice detection, silence detection and noise detection, etc.; as shown in Figure 8, the voice detection is used as the initial part of the detection voice (the beginning of a sentence or paragraph), and it uses short-term average Energy detection, first collect the audio data N of the current frame, calculate the amplitude value of the audio data N and pass short-term filtering to obtain the short-term average energy Se' of this frame, once the short-term average energy Se' exceeds When the speech energy Se reaches a critical value, it means that the speech is detected, and the audio data N starting from this frame are all regarded as speech data until silence is detected. 4

该语音检测按如下公式进行：The speech detection is carried out according to the following formula:

首先，计算当前帧的语音能量SeN为每帧语音数First, calculate the speech energy Se of the current frame N is the number of voices per frame

其次，计算当前帧的过零率Sz $Sz = Σ_{I = 0}^{N - 1} | sgn [Xin (i)] - sgn [Xin (i - 1)] | / 2 N$ Second, calculate the zero-crossing rate Sz of the current frame $Sz = Σ_{I = 0}^{N - 1} | sgn [new (i)] - sgn [new (i - 1)] | / 2 N$

其中：sgn[Xin(i)]＝1 ， Xin(i)＞＝0Among them: sgn[Xin(i)]＝1 ， Xin(i)＞＝0

sgn[Xin(I)]＝-1 ， Xin(I)＜0sgn[Xin(I)]＝-1 ， Xin(I)＜0

然后，上述语音能量经过短时滤波器后，得到短时平均能量Se，Then, after the above-mentioned speech energy is passed through a short-term filter, the short-term average energy Se is obtained,

Se＝0.5Se′+0.5SeSe=0.5Se'+0.5Se

因此，当(Se′＞Et)且(Szmin＜Sz＜Szmax)时，表示当前有语音信号，并将当前状态设置为说话状态，且程序进入静音检测；其中，Et为语音能量临界值，而Szmin与Szmax分别为过零率下限和上限。Therefore, when (Se'>Et) and (Szmin<Sz<Szmax), it means that there is a speech signal at present, and the current state is set to the speaking state, and the program enters the silence detection; wherein, Et is the critical value of speech energy, and Szmin and Szmax are the lower limit and upper limit of the zero-crossing rate respectively.

参照图9所示，本发明中，该静音检测是作为检测交谈间隙的静音(句子或段落之间的间隔)，由于静音的能量小且持续时间长，因此采用长时平均能量检测，它是将当前一帧的短时平均能量Se′通过低通滤波器，而得到这一帧的长时平均能量Se′并进行检测，只有当连续数帧的长时平均能量Ss′均低于静音能量临界Est时，才表示检测到静音，因此，从这一帧起的音频数据N均被视为背景噪声，直至检测到语音为止，至于连续检测帧数则由正常通话的停顿时间计算得到。With reference to shown in Figure 9, in the present invention, this silent detection is as the silent (sentence or the interval between paragraphs) of detection conversation gap, because the energy of silent is little and the duration is long, therefore adopt long-term average energy detection, it is Pass the short-term average energy Se' of the previous frame through a low-pass filter to obtain the long-term average energy Se' of this frame and perform detection, only when the long-term average energy Ss' of consecutive frames is lower than the silence energy When the critical Est is reached, it means that silence is detected. Therefore, the audio data N from this frame is regarded as background noise until the voice is detected. As for the number of consecutive detection frames, it is calculated from the pause time of normal conversation.

该静音检测按如下公式进行：The silent detection is performed according to the following formula:

其次，上述语音能量经过长时滤波器后，得到长时平均能量Ss′Secondly, after the above speech energy passes through the long-term filter, the long-term average energy Ss' is obtained

Ss′＝0.9Ss′+0.1SsSs'=0.9Ss'+0.1Ss

然后，当(Ss＜Est)则CONUT++(表示技术器累进加1)；Then, when (Ss<Est) then CONUT++ (indicates that the technical device is progressively increased by 1);

当(Ss＞Est)则CONUT＝0When (Ss>Est) then CONUT=0

即表示检测静音，其中Est为静音能量临界值，因此，当CONUT＝M时，即将目前状态设置为接听状态，而M是静音需达到的帧数。It means to detect silence, where Est is the critical value of silence energy, therefore, when CONUT=M, the current state is set as the answering state, and M is the number of frames required for silence.

至于，上述的语音能量临界值及静音能量临界值是由下述的公式求出，首先，当用户不说话及静音时，先测得麦克风输入能量的平均值Ens，当用户说话朗读一段句子时，测得麦克风输入能量的平均值Ent，故：As for the above-mentioned speech energy critical value and mute energy critical value are obtained by the following formula, first, when the user is not speaking or mute, the average value Ens of the microphone input energy is first measured, and when the user speaks and reads a sentence , the average value Ent of the microphone input energy is measured, so:

语音能量临界值Et＝Ens+0.5(Ent-Ens)Speech energy threshold Et=Ens+0.5(Ent-Ens)

静音能量临界值Est＝Ens+0.2(Ent-Ens)Silent energy critical value Est＝Ens+0.2(Ent-Ens)

现举一实施例，若每一音频数据N＝320时，则经过理论分析和实验结果，即可求出语音能量临界值Et＝250000，静音能量临界值Est＝100000，而过零率的临界值，其下限Szmin＝6，上限Szmax＝36。Give an embodiment now, if each audio frequency data N=320, then through theoretical analysis and experimental result, can obtain speech energy critical value Et=250000, silent energy critical value Est=100000, and the critical value of zero-crossing rate Value, its lower limit Szmin=6, upper limit Szmax=36.

请参照图10所示，本发明中，在噪音检测中引入了过零率检测，它是先收集当前一帧的音频数据，计算相邻数据其正、负符号的变换次数(即过零率)，当过零率高于临界值时，即判定检测到噪音，这一帧音频数据同样被现视为背景噪声，现举一例子说明如何计算相邻数据其正、负符号变换次数，例如：该语音数据为：20，50，100，40，10，-30，-50，-10，10，60，90，50由于，其整个数据的正、负符号的变换只有二次，所以其过零率为“2”，而过零率临界值是由统计特性得到。Please refer to shown in Fig. 10, among the present invention, introduced zero-crossing rate detection in noise detection, it is to collect the audio data of current one frame earlier, calculates the transformation number of its positive and negative sign of adjacent data (being zero-crossing rate ), when the zero-crossing rate is higher than the critical value, it is determined that noise is detected, and this frame of audio data is also regarded as background noise. Here is an example to illustrate how to calculate the number of positive and negative sign transformations of adjacent data, for example : This voice data is: 20, 50, 100, 40, 10, -30, -50, -10, 10, 60, 90, 50. Because the transformation of the positive and negative signs of the whole data is only twice, so its The zero-crossing rate is "2", and the critical value of the zero-crossing rate is obtained by statistical properties.

因此，经过上述该三种检测分离出语音和背景噪声，就能在有限的网络频宽内得到最佳的通话效果。Therefore, by separating the voice and background noise through the above three detections, the best call effect can be obtained within the limited network bandwidth.

参照图15A-15B所示，是录音端总流程图，它综合了录音端全双工静音模组、半双工静音模组和静音检测模组。首先使用语音检测方法确定语音的起始部分，然后使用静音检测方法确定语音的终止部分，最后使用噪音检测方法滤除语音中夹杂的噪音，这样就得到了“纯”语音数据。在全双工模式下，将语音数据进行编码通过网络传送，另外传送静音帧数据协调交谈的同步性。在半双工模式下，将语音数据进行编码通过网络传送，同时以有无语音数据为标志进行说/听模式的转换。Referring to Figures 15A-15B, it is the general flowchart of the recording end, which integrates the full-duplex mute module, the half-duplex mute module and the mute detection module of the recording end. First use the voice detection method to determine the beginning of the voice, then use the silence detection method to determine the end of the voice, and finally use the noise detection method to filter out the noise mixed in the voice, so that the "pure" voice data is obtained. In the full-duplex mode, the voice data is encoded and transmitted through the network, and the silent frame data is transmitted to coordinate the synchronization of the conversation. In the half-duplex mode, the voice data is encoded and transmitted through the network, and at the same time, the speaking/listening mode is switched based on the presence or absence of voice data.

本发明中，在上述的全双工录音端静音模组24中，当检测到传送方未传送语音数据时，则同时传送一静音帧33数据，该静音帧33数据能较语音数据节省传输频宽，因此当接收方收到静音帧数据时，即代表传送方未传送语音数据，接收方即可传送当地的背景噪声至传送方，令双方不会因网络传送的延迟性，而有谈话不同步的感觉；因此，参照图11、12所示，是全双工放音端静音模组25的硬件结构图及流程图，首先，该网络临测模组35是负责监测数据机或网络卡15接收的数据，当监测到是为编码数据，亦即语音数据时，即起动解码器36，并按压数据选择开关32，将解码后的语音数据传送至接收应用程序介面26；反之，若未监测到编码数据，即传送方未传送语音数据，则关闭解码器36，并按压数据选择开关32，将背景噪声37传送至接收应用程序介面26中。In the present invention, in the above-mentioned full-duplex recording terminal mute module 24, when it is detected that the transmitting party does not transmit voice data, a mute frame 33 data is transmitted at the same time, and the mute frame 33 data can save transmission frequency compared with voice data. Wide, so when the receiver receives the silence frame data, it means that the transmitter has not transmitted voice data, and the receiver can send the local background noise to the transmitter, so that the two parties will not have a conversation due to the delay of network transmission. Synchronous feeling; Therefore, with reference to shown in Fig. 11,12, be the hardware structural diagram and the flow chart of full-duplex sound-playing end mute module 25, at first, this network temporary test module 35 is to be responsible for monitoring modem or network card 15 received data, when monitoring to be coded data, i.e. voice data, promptly start decoder 36, and press data selection switch 32, the voice data after decoding is sent to receiving application program interface 26; Otherwise, if not When the coded data is detected, that is, the sender does not transmit voice data, the decoder 36 is turned off, and the data selection switch 32 is pressed to transmit the background noise 37 to the receiving API 26 .

参照图13、14所示，是半双工放音端静音模组25的硬件结构图及流程图，首先，该网络监测模组35是负责监测数据机或网络卡15接收的数据，当监测到编码数据时，即语音数据时，则启动解码器36，并将解码后的语音数据传送至接收应用程序介面26；反之，若未监测到编码数据，即未传送语音数据，则关闭解码器36，将检测结果输入至转换模组28，以触发听/说模式的转换。With reference to Fig. 13, shown in 14, be the hardware structural diagram and flow chart of half-duplex sound-playing end mute module 25, at first, this network monitoring module 35 is to be responsible for monitoring the data that modem or network card 15 receive, when monitoring When the coded data is received, i.e. the voice data, the decoder 36 is started, and the decoded voice data is sent to the receiving API 26; otherwise, if the coded data is not detected, that is, the voice data is not transmitted, the decoder is closed 36. Input the detection result to the conversion module 28 to trigger the conversion of the listening/speaking mode.

参照图16所示，是放音端总流程图，它综合了放音端全双工静音模组、和半双工静音模组。静音模组时刻监测是否接收到语音编码数据，在全双工模式下，将语音编码数据解码后通过声霸卡回放，另外回放背景噪声协调交谈的同步性；在半双工模式下，将语音编码数据解码后通过声霸卡回放，同时以有无语音编码数据为标志进行听/说模式的转换。With reference to shown in Fig. 16, it is the overall flow chart of the sound-playing end, which has integrated the full-duplex mute module and the half-duplex mute module of the sound-playing end. The mute module monitors whether the voice coded data is received at all times. In the full-duplex mode, the voice coded data is decoded and played back through the Sound Blaster card. In addition, the background noise is played back to coordinate the synchronization of the conversation; in the half-duplex mode, the voice After the coded data is decoded, it will be played back through the Sound Blaster card, and at the same time, the listening/speaking mode will be switched based on the presence or absence of speech coded data.

本发明中，上述转换模组28工作在半双工模式下，主要是自动或强制转换说/听模式，当转换模组28启动时，若在录音端静音模组24时，静音检测模组34检测到语音数据时，即刻保持说话模式，反之，则自动转换成接听模式；而若在放音端静音模组25时，当网络监测模组35监测到语音数据时，即刻保持接听模式，反之，则自动转换成说话模式，此外，在接听模式下，若用户按压一功能键29时，转换模组28立即强制转换成说话模式。In the present invention, the above-mentioned conversion module 28 works in half-duplex mode, mainly to automatically or forcibly convert the speaking/listening mode. When the conversion module 28 starts, if the mute module 24 is at the recording end, the mute detection module When 34 detects voice data, keep speaking mode at once, otherwise, then automatically convert into answering mode; And if when the mute module 25 at the playback end, when network monitoring module 35 monitors voice data, keep answering mode immediately, Otherwise, it will automatically switch to the speaking mode. In addition, in the answering mode, if the user presses a function key 29, the conversion module 28 will immediately switch to the speaking mode.

综上所述，综合多种检测可正确分离语音和背景噪声，以减少数据传输，同时，对静音数据作特殊处理，以协调交谈的同步性，此外，令半双工说/听模式能转换自如，以方便用户随时都能说话，所以，本发明实为一理想的静音检测技术。To sum up, a combination of multiple detections can correctly separate speech and background noise to reduce data transmission. At the same time, special processing is performed on silent data to coordinate the synchronization of conversations. In addition, the half-duplex speaking/listening mode can be switched Freely, so that the user can speak at any time conveniently, so the present invention is actually an ideal silence detection technology.

以上所述，仅为本发明的一些可行实施例，但并非用以限定本发明的保护范围，凡依据权利要求书所述的内容、特征以及其精神而进行其他变化的等效实施，都应包含在本发明的保护范围内。The above descriptions are only some feasible embodiments of the present invention, but are not intended to limit the protection scope of the present invention. All equivalent implementations with other changes based on the content, features and spirit of the claims should be included in the protection scope of the present invention.

Claims

1. the mute detection method in the networking telephone is used in the full duplex of the networking telephone and half-duplex call pattern correctly separating voice and background noise by each silence detection, and with the redundant voice data of filtering, described detection method is carried out as follows:

(1) detects the start-up portion of voice, when the voice data collected when former frame, by asking its amplitude and obtain short-time average energy, in case when short-time average energy surpasses the speech energy critical value, i.e. representative detects voice;

(2) detect the quiet of talk gap, to work as the short-time average energy that former frame calculates and pass through low pass filter, average energy and detecting when obtaining this frame long when average energy all is lower than quiet threshold amount of energy when consecutive numbers frame long, promptly detects quiet;

(3) detect the noise in talk gap, collect the voice data when former frame, calculate the number of transitions of its positive and negative symbol of adjacent data, i.e. the number of times of zero-crossing rate is in case when zero-crossing rate surpasses critical value, promptly judge to detect noise.

2. the mute detection method in the networking telephone as claimed in claim 1, wherein, this networking telephone when transmitting voice, earlier via microphone with the voice that are recorded to transfer to respectively mixer, mould/number conversion, recording API promptly record application program interface, the quiet module of recording end, modem or network card again by network with voice transfer to the recipient;

And wherein the quiet module of recording end at first detects the voice data of recording application program interface sampling, according to whether being voice data, determine to start encoder and carry out encoded speech data, still close encoder and the pressing data selector switch is sent to modem with quiet frame data.

3. the mute detection method in the networking telephone as claimed in claim 1, wherein, this networking telephone is when receiving voice, receive speech data by network earlier, again with this speech data via modem or network card, the quiet module of playback end, to receive API be application program interface, D/A switch, mixer, broadcast away by loud speaker again;

And wherein the quiet module of recording end at first network monitor module Monitoring Data machine whether be coded data, according to whether being that coded data starts decoder or closes decoder, if start decoder, then the pressing data selector switch is sent to the application program interface with the decoded speech data, if close decoder, then testing result input conversion module is triggered and listen/say mode switch.

4. the mute detection method in the networking telephone as claimed in claim 1, wherein, this speech detection is to adopt short-time average energy to detect, at first collect voice data N when former frame, this voice data N is asked its amplitude and also passes through filtering in short-term, and obtain the short-time average energy Se ' of this frame, when in case short-time average energy Se ' surpasses speech energy critical value Et, i.e. expression detects voice, and all be regarded as speech data from the voice data N that this frame begins, until detect quiet till.

5. the mute detection method in the networking telephone as claimed in claim 4, wherein, this speech energy critical value is to be obtained by following formula, at first, when the user is silent and quiet, record the mean value Ens of microphone intake earlier, when the user reads aloud one section sentence in a minute, record the mean value Ent of microphone intake, be thereby obtain the speech energy critical value

Et＝Ens+0.5(Ent-Ens)。

6. the mute detection method in the networking telephone as claimed in claim 1, wherein, this silence detection is that average energy detects when adopting length, at first, collect voice data N when former frame, with this voice data N ask its amplitude and and obtain the short-time average energy Se ' of this frame, at this moment, by low pass filter, and average energy Ss ' and detecting when obtaining this frame long is when having only when consecutive numbers frame long average energy Ss ' all to be lower than quiet threshold amount of energy Est with short-time average energy Se ', just expression detects quiet, therefore, all be regarded as background noise from the voice data N of this frame, till detecting voice.

7. the mute detection method in the networking telephone as claimed in claim 6, wherein, this quiet threshold amount of energy is to be obtained by following formula, at first, when the user is silent and quiet, record the mean value Ens of microphone intake earlier, when the user reads aloud one section sentence in a minute, record the mean value Ent of microphone intake, thereby obtain quiet threshold amount of energy

Est＝Ens+0.2(Ent-Ens)。

8. the mute detection method in the networking telephone as claimed in claim 2, wherein, in the quiet module of this full duplex recording end, when detecting the sender and not transmitting speech data, transmit quiet frame data simultaneously, these quiet frame data can be saved transmitting bandwidth than speech data, so when the recipient receives quiet frame data, promptly represent the sender not transmit speech data, the recipient can transmit local background noise to the sender, the retardance that makes both sides can not transmit because of network, and nonsynchronous sensation of talk is arranged.

9. the mute detection method in the networking telephone as claimed in claim 1, wherein, this networking telephone is in semiduplex mode, and it has a conversion module, makes half-duplex say/listen that pattern can change freely, can both speak at any time to make things convenient for the user.

10. the mute detection method in the networking telephone as claimed in claim 9, wherein, when the conversion module starts, if when the quiet module of recording end, when silence detection module one detects speech data, promptly keep the pattern of speaking, otherwise, then convert answer mode automatically to, and, when the network monitor module monitors speech data, promptly keep answer mode if when the quiet module of playback end, otherwise, then convert pattern in a minute automatically to.

11. the mute detection method in the networking telephone as claimed in claim 10, wherein, this conversion module also is provided with a function key, and when under answer mode, during as if user's pressing function key, the conversion module promptly forces to convert to the pattern of speaking.