[go: up one dir, main page]

CN114627886A - Conference voice processing method and device - Google Patents

Conference voice processing method and device Download PDF

Info

Publication number
CN114627886A
CN114627886A CN202210234843.2A CN202210234843A CN114627886A CN 114627886 A CN114627886 A CN 114627886A CN 202210234843 A CN202210234843 A CN 202210234843A CN 114627886 A CN114627886 A CN 114627886A
Authority
CN
China
Prior art keywords
conference
audio
characteristic value
conference terminal
mixing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210234843.2A
Other languages
Chinese (zh)
Other versions
CN114627886B (en
Inventor
李文
郑相全
夏启斌
都赟赟
张鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Original Assignee
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences filed Critical Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority to CN202210234843.2A priority Critical patent/CN114627886B/en
Publication of CN114627886A publication Critical patent/CN114627886A/en
Application granted granted Critical
Publication of CN114627886B publication Critical patent/CN114627886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a conference voice processing method and a device, wherein the method comprises the following steps: receiving audio data packets of a plurality of conference terminals and carrying out format adaptation to obtain an adapted audio data set and obtain a terminal type information set; analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; and processing the audio analysis result set by using the audio mixing rule, and mixing the audio analysis results conforming to the audio mixing rule. Therefore, the method and the device are favorable for adapting to complex conference environments, and further improve the sound mixing effect of the multi-party audio conference.

Description

Conference voice processing method and device
Technical Field
The present invention relates to the field of speech signal processing technologies, and in particular, to a conference speech processing method and apparatus.
Background
With the development of network technology, audio conferences are more and more widely used, and each receiving end needs to be able to hear the sound emitted by other terminals and cannot hear the sound emitted by the receiving end, so that a sound mixing function is required. In the prior art, noise reduction is performed on the sound emitted by each terminal, and sound mixing is performed after background noise is removed, so that the sound of a speaking conference participant is prevented from being covered by the background noise of other participants. At present, in most conferences, a sound mixing processing platform is difficult to ensure that all terminals can effectively participate in sound mixing, for example, because some conference terminals are subjected to noise processing because of small sound, and some conference terminals are subjected to intermittent sound because of small sound, the voice quality after sound mixing cannot meet the requirements of participants.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a conference voice processing method and device, which can adapt to a complex conference environment and improve the audio mixing effect of a multi-party audio conference.
In order to solve the above technical problem, a first aspect of an embodiment of the present invention discloses a conference voice processing method, where the method includes:
101. receiving audio data packets of a plurality of conference terminals and carrying out format adaptation to obtain an adaptive audio data set, wherein the adaptive audio data set comprises a plurality of adaptive audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
102. analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results;
103. processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; the terminal type rule comprises an amplitude characteristic value threshold value and a frequency characteristic value threshold value;
104. and processing the audio analysis result set by using the audio mixing rule, and mixing the audio analysis results conforming to the audio mixing rule.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the parsing the adapted audio data set by using a preset voice parsing rule includes: for the adaptive audio data of any conference terminal, processing the adaptive audio data of the conference terminal by using a preset first analysis rule to obtain first target audio information corresponding to the conference terminal; processing the adaptive audio data by using a preset first analysis rule, including analyzing the adaptive audio data of the conference terminal, extracting audio information for coding, decoding and converting, and establishing a digital filter for digitally filtering the audio information to obtain the first target audio information; processing the first target audio information by using a preset second analysis rule to obtain second target audio information corresponding to the conference terminal; and processing the first target audio information by using a preset second analysis rule, wherein the processing comprises slicing the first target audio information, writing the sliced first target audio information into an external storage device of a voice channel corresponding to the conference terminal, and regularly reading data of the voice channel and adapting the data to be a TDM data stream to obtain second target audio information.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the processing, by using the mixing rule, the set of audio parsing results includes: analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value; and if the first amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, processing the audio analysis result set by using the mixing rule includes: analyzing second target audio information of any conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value; and if the second amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the second frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, processing the audio analysis result set by using the mixing rule includes: s1401, analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value; s1402, if the first amplitude characteristic value is larger than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is smaller than the frequency activation threshold corresponding to the conference terminal, preliminarily judging that the conference terminal participates in sound mixing, and executing S1403; otherwise, the preliminary judgment is that the audio mixing is not involved, and S1405 is executed; s1403, analyzing the second target audio information of the conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value; s1404, if a second amplitude characteristic value is smaller than the amplitude activation threshold corresponding to the conference terminal, or a second frequency characteristic value is larger than the frequency activation threshold corresponding to the conference terminal, judging whether effective voices exist in the first R pieces of the adaptive audio data, wherein R is a positive integer; if yes, the voice mixing is finally judged to be involved; if not, the final judgment is that the sound mixing is not participated in; s1405, performing the operations from S1401 to S1404 on all the conference terminals to obtain a result of determining whether each conference terminal participates in mixing.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the adaptive audio data of each conference terminal is mirrored into two paths to be processed simultaneously: the first path executes step 102, and analyzes the adaptive audio data set by using the preset voice analysis rule to obtain the audio analysis result set; the second path of execution 104 is used for analyzing the adaptive audio data of each conference terminal and extracting a first amplitude characteristic and a first frequency characteristic; and analyzing the second target audio information of each conference terminal, and extracting a second amplitude characteristic value and a second frequency characteristic value.
A second aspect of an embodiment of the present invention discloses a conference voice processing apparatus, including:
the scanning receiving module is used for receiving audio data packets of a plurality of conference terminals and carrying out format adaptation to obtain an adaptive audio data set, wherein the adaptive audio data set comprises a plurality of adaptive audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
the first processing module is used for analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results;
the second processing module is used for processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; the terminal type rule comprises an amplitude characteristic value threshold value and a frequency characteristic value threshold value;
and the third processing module is used for processing the audio analysis result set by using the audio mixing rule and mixing the audio analysis results conforming to the audio mixing rule.
The third aspect of the present invention discloses another conference voice processing apparatus, including:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program codes stored in the memory to execute part or all of the steps in the conference voice processing method.
A fourth aspect of the present invention discloses a computer storage medium, where the computer storage medium stores computer instructions, and when the computer instructions are called, the computer instructions are used to execute part or all of the steps in the conference voice processing method disclosed in the first aspect of the embodiments of the present invention.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the conference terminal is scanned to obtain the audio terminal type information, the terminal type rule is utilized to process the audio terminal type information to obtain the audio mixing rule set, the audio mixing rule is utilized to process the audio data analysis result, and the audio data conforming to the audio mixing rule is subjected to audio mixing, so that the problems of wrong mixing (missing mixing, multiple mixing and interruption) during audio mixing are solved, the adaptation to a complex conference environment is facilitated, and the audio mixing effect of a multi-party audio conference is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a conference voice processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a conference voice processing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another conference voice processing apparatus disclosed in the embodiment of the present invention;
fig. 4 is a schematic structural diagram of another conference voice processing apparatus according to the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to those listed but may alternatively include other steps or elements not listed or inherent to such process, method, product, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
RTP: a Real-time Transport Protocol (Real-time Transport Protocol), and an audio RTP packet is a data packet with an audio signal and transmitted based on the RTP Protocol.
CPS: (abbreviation of Common part sublayer).
The invention discloses a conference voice processing method and device, which can obtain audio terminal type information by scanning a conference terminal, process the audio terminal type information by using a terminal type rule to obtain a sound mixing rule set, process an audio data analysis result by using a sound mixing rule, mix audio of audio data according with the sound mixing rule, solve the problem of mismixing (missing mixing, multiple mixing and intermittence) during sound mixing, are beneficial to adapting to a complex conference environment and further improve the sound mixing effect of a multi-party audio conference. The following are detailed below.
Example one
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a conference voice processing method according to an embodiment of the present invention. The conference voice processing method described in fig. 1 is applied to a voice data processing system, such as a platform end or a terminal side for conference voice processing, and the embodiment of the present invention is not limited thereto. As shown in fig. 1, the conference voice processing method may include the following operations:
101. receiving audio data packets of a plurality of conference terminals and carrying out format adaptation to obtain an adaptive audio data set, wherein the adaptive audio data set comprises a plurality of adaptive audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
102. analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results;
103. processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; the terminal type rule comprises an amplitude characteristic value threshold value and a frequency characteristic value threshold value;
104. and processing the audio analysis result set by using the audio mixing rule, and mixing the audio analysis results meeting the audio mixing rule.
As an optional implementation manner, the specific manner of scanning each of the conference terminals to obtain the terminal type information is as follows:
carrying out signaling interaction identification by utilizing the switch to which each conference terminal is attached to determine a terminal type information set; the terminal type information set comprises a plurality of terminal type information.
Optionally, the terminal type information includes channel type information.
Optionally, the channel type information includes a wired channel, and/or a short-distance wireless channel, and/or a long-distance wireless channel, and/or other channel types, which is not limited in the embodiments of the present invention.
Optionally, the format of the audio data packet includes an RTP packet format, and/or an Ahelp coding HDLC frame format, and/or a g.729 format, and/or a CVSD format, and/or other audio data formats, which is not limited in the embodiment of the present invention.
In this optional embodiment, as another optional implementation, the specific manner of performing format adaptation on the received audio data packets of the multiple conference terminals is as follows:
and when the audio data packet format sent by the conference terminal is the RTP packet format, converting the audio data packet into the PCM coding format.
When the audio data format sent by the conference terminal is Ahelp, G.729 or CVSD coding format, the audio data packet is adapted and converted into the RTP packet format of PCM coding.
Therefore, the conference voice processing method described in the embodiment of the present invention can perform format adaptation processing on different types of audio data packets, and convert the audio data packets into a PCM encoded RTP packet format, so as to unify the formats of the audio data, realize communication between terminals of different voice types, facilitate adaptation to a complex conference environment, and further improve the audio mixing effect of a multi-party audio conference.
In an optional embodiment, the specific manner of parsing the adapted audio data set by using the preset voice parsing rule is as follows:
for the adaptive audio data of any conference terminal, processing the adaptive audio data by using a preset first analysis rule to obtain first target audio information corresponding to the conference terminal;
and processing the first target audio information by using a preset second analysis rule to obtain second target audio information corresponding to the conference terminal.
In this optional embodiment, as an optional implementation manner, the specific manner of processing the adapted audio data by using the preset first parsing rule to obtain the first target audio information corresponding to the conference terminal is as follows:
and analyzing the packet header of the adaptive audio data corresponding to the conference terminal, extracting audio information for coding, decoding and converting, and establishing a digital filter for digitally filtering the audio information.
In this optional embodiment, as an optional implementation, the specific way of extracting the audio information for encoding and decoding conversion is as follows:
extracting audio information in the RTP voice packet, shaping the audio information into 8-bit signed number, and converting the audio information from PCM coding into linear code coding data.
The reason for converting the PCM code into the linear code coded data is that the linear code data really represent the amplitude of a signal, and the amplitude characteristic value can be obtained by directly operating the linear code data.
In this optional embodiment, as an optional implementation, the above establishing a digital filter to digitally filter the audio information specifically includes:
and filtering the linear code coded data converted from the PCM code by using a band-pass filter with a passband of 50 Hz-3000 kHz, and converting the linear code coded data into a PCM packet after filtering is finished, namely the first target audio information.
Therefore, the conference voice processing method described in the embodiment of the present invention analyzes the packet header of the RTP voice packet encoded by the PCM corresponding to each conference terminal, extracts the audio information, performs encoding and decoding conversion, and then establishes the digital filter to digitally filter the audio information after the encoding and decoding conversion, so as to filter the data in the non-human voice spectrum range in the voice, which is beneficial to adapting to a complex conference environment, and further improve the sound mixing effect of a multi-party audio conference.
In this optional embodiment, as an optional implementation manner, a specific manner of processing the first target audio information by using a preset second parsing rule is as follows:
and analyzing the packet header of the adapted audio data (RTP voice packet coded by PCM), and analyzing to obtain the voice channel corresponding to the adapted audio data according to the information of destination MAC, destination IP and the like in the packet header.
Optionally, any conference terminal corresponds to a unique voice channel, and a Mac address, an ip address, and the like are allocated according to packet header information of the conference terminal.
The first target audio information is sliced and written into an external storage device of the corresponding voice channel.
Alternatively, the above slice format may be a 10-byte cps slice.
And carrying out timing reading time configuration on the voice channel, and periodically reading the data of the voice channel from the external storage device and adapting the data to be a TDM data stream.
Alternatively, the timing read time is set to read 16 cps pieces, i.e., 160 bytes of audio data, every 20ms to constitute one PCM-encoded RTP packet.
Therefore, the conference voice processing method described in the embodiment of the present invention guarantees the continuity and stability of the sound by performing slice caching and timing reading on the filtered first target audio information, can effectively alleviate the problem of easy sound leakage caused by the unstable RTP packet, and is beneficial to adapting to a complex conference environment, thereby improving the sound mixing effect of a multi-party audio conference.
In another optional embodiment, the processing the terminal type information by using a preset terminal type rule to obtain a mixing rule set specifically includes:
each terminal type corresponds to a unique preset terminal type rule;
the terminal type rule comprises an amplitude characteristic value threshold and a frequency characteristic value threshold corresponding to the conference terminal of the type and is used for judging whether any conference terminal of the type participates in sound mixing;
as an optional implementation manner, the specific manner of determining the terminal type rule is as follows:
for a conference terminal with a wired channel as a channel type, acquiring the following parameters of an exchanger bearing the conference terminal: amplitude characteristic value D of silence of microphone when background noise is lowm1Frequency characteristic value Zm1And amplitude characteristic value D of human speaking with normal volumen1
The amplitude eigenvalue threshold D of the conference terminal with the channel type being the wired channel type1And a frequency eigenvalue threshold value Z1Respectively as follows:
D1=Dm1+A*(Dn1-Dm1)
Z1=Zm1*B
a, B is a configurable parameter, where a is 0.2 and B is 1.1 in general, which is suitable for most conference situations.
Optionally, if the environmental noise greatly changes during a call, the value of A, B needs to be adjusted appropriately, generally without changing, and a default value is used.
Amplitude eigenvalue threshold D of conference terminal with short-distance wireless channel as channel type2And a frequency eigenvalue threshold value Z2The setting mode is the same as that of the conference terminal with the channel type being the wired channel, and the description is omitted here.
Amplitude eigenvalue threshold D of conference terminal with long-distance wireless channel type3And a frequency eigenvalue threshold value Z3The setting mode is the same as that of the conference terminal with the channel type being the wired channel, and the description is omitted here.
In the embodiment, the default threshold is adopted for conference terminals of other channel types except for wired channel, short-distance wireless channel and long-distance wireless channel, and is respectively amplitude eigenvalue threshold D4And a frequency eigenvalue threshold value Z4The specific calculation method is as follows:
D4=(D1+D2+D3)/6
Z4=(Z1+Z2+Z3)/2
therefore, by implementing the conference voice processing method described in the embodiment of the present invention, different thresholds are set for different recognized types of conference terminal channels, so that whether to activate the conference terminal can be determined according to different characteristics of different conference terminal connection channels, and voice quality can be improved. The embodiment obtains the activation threshold suitable for the voice characteristics of different conference terminal devices by analyzing the amplitude and frequency characteristics of the different conference terminal devices. For example, analog telephony, because it is a wired channel transmission, has a background noise spectrum that is a relatively lower proportion of the total spectrum than a radio station transmitting over a wireless channel, and has a relatively high energy transmission efficiency, while the frequency characteristic threshold set in the manner described above is relatively small and the amplitude characteristic threshold is relatively large, matching the characteristics of the audio transmitted over the wired channel.
In yet another optional embodiment, the processing the audio analysis result set by using the mixing rule includes:
judging whether any audio analysis result meets the condition of participating in audio mixing according to the audio mixing rule;
mixing the audio analysis results meeting the audio mixing condition, packaging the audio analysis results into RTP packets, sending the RTP packets to the switches of a plurality of conference terminals participating in the conference, and sending the RTP packets to each conference terminal through the switches;
in this optional embodiment, as a first optional determining manner, determining whether the mixing rule satisfies a mixing participation condition according to the mixing rule includes:
analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value;
wherein the first amplitude characteristic value calculation formula is as follows:
Figure BDA0003541155080000091
wherein p isallAs amplitude characteristic value, dmIs the value of the lower seven bits of the mth sampling point, i.e. the absolute value of the audio signal amplitude corresponding to the sampling point. The sampling point can be set as required, and the embodiment is not limited.
The calculation formula of the first frequency characteristic value is as follows:
Figure BDA0003541155080000092
wherein Z is a frequency characteristic value, ZnAnd the highest bit value of the nth sampling point is used for representing the positive and negative of the sampling point. The sampling point can be set as required, and the embodiment is not limited.
And if the first amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
In this optional embodiment, as a second optional determining manner, determining whether the mixing rule satisfies a mixing participation condition according to the mixing rule includes:
analyzing second target audio information of any conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value;
in the embodiment of the present invention, the calculation manner of the second amplitude characteristic value and the second frequency characteristic value may refer to the calculation manner of the first amplitude characteristic value and the first frequency characteristic value, and is not described again.
And if the second amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the second frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
In this optional embodiment, as a third optional determining manner, determining whether the mixing rule satisfies a mixing participation condition according to the mixing rule includes:
s1401, analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value;
s1402, if the first amplitude characteristic value is larger than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is smaller than the frequency activation threshold corresponding to the conference terminal, preliminarily judging that the conference terminal participates in sound mixing, and executing S1403; otherwise, the preliminary judgment is that the audio mixing is not involved, and S1405 is executed;
s1403, analyzing second target audio information of the conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value;
s1404, if the second amplitude characteristic value is smaller than the amplitude activation threshold corresponding to the conference terminal, or the second frequency characteristic value is larger than the frequency activation threshold corresponding to the conference terminal, determining whether there is valid voice in the first R pieces of adapted audio data, where R is a positive integer; if yes, the voice mixing is finally judged to be involved; if not, finally judging that the sound mixing is not participated in;
s1405, performing the operations from S1401 to S1404 on all the conference terminals to obtain a result of determining whether each conference terminal participates in mixing.
Optionally, R is set to 7, if R is too small, the effect of preventing misjudgment is not obvious, and if R is too large, the calculation amount may be too large, resulting in a reduction in effective voice judgment efficiency and an influence on sound mixing efficiency.
By adopting the mode, the following advantages are achieved:
(1) since the RTP packet data (adapted audio data) is more time-efficient than the TDM stream data (second target audio information), the change process of the voice packet from nothing to nothing can be recognized more quickly by performing the preliminary judgment using the RTP packet data (adapted audio data).
(2) Since the RTP packet is not particularly stable, the above-mentioned preliminary judgment error is more likely, and the judgment of the voice termination at this time is performed by judging the TDM stream data (second target audio information) again and it is necessary that none of the consecutive R voice packets is a valid voice, so that the erroneous judgment can be prevented.
(3) Synthesizing the analysis results of RTP packet data (adaptive audio data) and TDM stream data (second target audio information) to automatically activate and control the participants of the voice conference, so that the activation control is more accurate;
under different meeting environment requirements, different sound mixing rule strategies are adopted in the implementation mode, and user requirements can be better matched. For example, when the real-time requirement is very high, the first or second mixing rule strategy can be adopted; when the requirements for noise removal and continuity are very high, a third mixing rule may be employed.
Therefore, the conference voice processing method described in the embodiment of the invention sets different sound mixing rules, can match the complexity of the conference environment and different sound mixing requirements, and is beneficial to adapting to the complex conference environment.
Example two
Another conference voice processing method disclosed in the embodiment of the present invention may include the following operations:
201. receiving audio data packets of a plurality of conference terminals and carrying out format adaptation to obtain an adaptive audio data set, wherein the adaptive audio data set comprises a plurality of adaptive audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
202. analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results;
203. processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set;
204. and processing the audio analysis result set by using the audio mixing rule, and mixing the audio analysis results conforming to the audio mixing rule.
In the embodiment of the present invention, the processing of the audio analysis result set by using the mixing rule is performed synchronously with step 202. The method specifically comprises the following steps:
and mirroring the adaptive audio data of each conference terminal into two paths to be processed simultaneously: wherein the first path performs step 202; the second path analyzes the adaptive audio data of each conference terminal and extracts a first amplitude characteristic and a first frequency characteristic; and analyzing the second target audio information of each conference terminal, and extracting a second amplitude characteristic value and a second frequency characteristic value.
Optionally, as a fourth optional determining means, determining whether the mixing rule satisfies a mixing participation condition according to the mixing rule, specifically:
s2401, analyzing adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value;
s2402, if the first amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, preliminarily judging that the conference terminal participates in sound mixing, and executing S2403; otherwise, the preliminary judgment is that the sound mixing is not involved, and S2405 is executed;
s2403, analyzing second target audio information of the conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value;
s2404, if the second amplitude characteristic value is smaller than an amplitude activation threshold corresponding to the conference terminal, or the second frequency characteristic value is larger than a frequency activation threshold corresponding to the conference terminal, judging whether effective voices exist in the previous R pieces of adaptive audio data, wherein R is a positive integer; if yes, the voice mixing is finally judged to be involved; if not, finally judging that the sound mixing is not participated in;
s2405, performing the operations from S2401 to S2404 on all the conference terminals, and obtaining a result of determining whether each conference terminal participates in audio mixing.
Optionally, R is set to 7, if R is too small, the effect of preventing misjudgment is not obvious, and if R is too large, the calculation amount may be too large, resulting in a reduction in effective voice judgment efficiency and an influence on sound mixing efficiency.
It can be seen that, in addition to the advantages of the third audio mixing rule mentioned in the first embodiment, the conference voice processing method described in the embodiment of the present invention adopts two mirror paths for processing simultaneously, where the first path analyzes the adapted audio data set by using a preset voice analysis rule; and the second path processes the audio analysis result set by using the obtained sound mixing rule. And finally, mixing the audio analysis result which accords with the mixing rule.
Therefore, by implementing the conference voice processing method described in the embodiment of the present invention, whether the voice data of the conference terminal participates in the audio mixing is determined by bypassing the mirror image processing, so that the requirements of removing noise and voice continuity can be satisfied while obviously improving the timeliness of the voice in the conference.
In the embodiment of the present invention, for specific technical details and technical noun explanations of step 201 to step 203, reference may be made to the detailed description of step 101 to step 103 in the first embodiment, and details are not repeated in the embodiment of the present invention.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a conference voice processing apparatus according to an embodiment of the present invention. The apparatus described in fig. 3 can be applied to a data processing system, such as a platform side or a terminal side for conference voice processing, and the embodiment of the present invention is not limited thereto. As shown in fig. 3, the apparatus may include:
a scanning receiving module 301, configured to receive audio data packets of multiple conference terminals and perform format adaptation to obtain an adapted audio data set, where the adapted audio data set includes multiple adapted audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
the first processing module 302 is configured to analyze the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results;
a second processing module 303, configured to process the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; the terminal type rule comprises an amplitude characteristic value threshold value and a frequency characteristic value threshold value;
the third processing module 304 is configured to process the audio analysis result set by using the audio mixing rule, and mix audio analysis results meeting the audio mixing rule.
It can be seen that, by implementing the conference voice processing apparatus described in fig. 3, it is possible to obtain audio terminal type information by scanning a conference terminal, and process the audio terminal type information by using a terminal type rule to obtain a mixing rule set, and then process an audio data analysis result by using a mixing rule to mix audio data conforming to the mixing rule, thereby solving the problem of mismixing (missing mixing, multiple mixing, and intermittence) occurring during mixing audio, facilitating adaptation to a complex conference environment, and further improving the mixing effect of a multi-party audio conference.
In another alternative embodiment, as shown in fig. 3, the preset voice parsing rules include a first parsing rule and a second parsing rule;
the first processing module 302 includes a first conversion sub-module, a second conversion sub-module, wherein:
the first conversion sub-module 3021 is configured to, for adaptive audio data of any conference terminal, process the adaptive audio data of the conference terminal by using a preset first parsing rule to obtain first target audio information corresponding to the conference terminal;
the processing of the adapted audio data by using the preset first analysis rule includes analyzing the adapted audio data of the conference terminal, extracting audio information for coding, decoding and converting, and establishing a digital filter for digitally filtering the audio information to obtain the first target audio information;
the second conversion submodule 3022, configured to process the first target audio information by using a preset second parsing rule, to obtain second target audio information corresponding to the conference terminal;
and processing the first target audio information by using a preset second analysis rule, wherein the step of slicing the first target audio information is included, the first target audio information is written into an external storage device of a voice channel corresponding to the conference terminal, and the data of the voice channel is read at regular time and is adapted to be a TDM data stream to obtain the second target audio information.
As can be seen, with the conference voice processing apparatus described in fig. 3, by analyzing the packet header of the RTP voice packet encoded by the PCM corresponding to each conference terminal, extracting the audio information and performing encoding, decoding and conversion, and then establishing a digital filter to perform digital filtering on the audio information after encoding, decoding and conversion, data in the non-human voice spectrum range in the voice can be filtered; the continuity and stability of sound are guaranteed by carrying out slice caching and timing reading on the filtered first target audio information, the problem that an RTP packet is not stable enough to cause sound leakage easily can be effectively solved, the method is favorable for adapting to a complex conference environment, and the sound mixing effect of a multi-party audio conference is further improved.
In another optional embodiment, the specific way of processing the terminal type information by using a preset terminal type rule in the second processing module 303 to obtain the mixing rule set is as follows:
each terminal type corresponds to a unique preset terminal type rule;
the terminal type rule comprises an amplitude characteristic value threshold and a frequency characteristic value threshold corresponding to the conference terminal of the type and is used for judging whether any conference terminal of the type participates in sound mixing;
the detailed description of the first embodiment of determining the terminal type rule is already provided, and is not repeated here.
It can be seen that, with the conference speech processing apparatus described in fig. 3, different thresholds are set for different recognized types of conference terminal channels, so that whether the conference terminal is activated can be determined according to different characteristics of different conference terminal connection channels, and the voice quality can be improved.
In another alternative embodiment, the third processing module 304 includes a first determining submodule 3041, a second determining submodule 3042, a third determining submodule 3043:
the first judgment sub-module 3041: the adaptive audio data analysis module is used for analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value; and if the first amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
Second determination sub-module 3042: the second target audio information of any conference terminal is analyzed to obtain a second amplitude characteristic value and a second frequency characteristic value; and if the second amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the second frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
Third determination sub-module 3043: the adaptive audio data analysis device is used for analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value; if the first amplitude characteristic value is larger than the amplitude activation threshold value corresponding to the conference terminal and the first frequency characteristic value is smaller than the frequency activation threshold value corresponding to the conference terminal, preliminarily judging that the conference terminal participates in sound mixing, and executing the next step; otherwise, the preliminary judgment is that the voice mixing is not participated.
Analyzing second target audio information of the conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value; if the second amplitude characteristic value is smaller than the amplitude activation threshold value corresponding to the conference terminal, or the second frequency characteristic value is larger than the frequency activation threshold value corresponding to the conference terminal, judging whether effective voices exist in the first R pieces of adaptive audio data, wherein R is a positive integer; if yes, the voice mixing is finally judged to be involved; if not, finally judging that the sound mixing is not participated in;
and performing the operation on all the conference terminals to obtain a judgment result of whether each conference terminal participates in sound mixing.
Therefore, the conference voice processing device described in fig. 3 is implemented, different mixing rules are set, the complexity of the conference environment and different mixing requirements can be matched, and the conference voice processing device is beneficial to adapting to the complex conference environment.
In another optional embodiment, the first processing module 302 and the third processing module 304 simultaneously process the adaptive audio data of each conference terminal, specifically:
the first processing module 302 is configured to analyze the adapted audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the third processing module 304 is configured to analyze the adapted audio data of each conference terminal, and extract a first amplitude characteristic and a first frequency characteristic; and analyzing the second target audio information of each conference terminal, and extracting a second amplitude characteristic value and a second frequency characteristic value.
According to the third determining sub-module 3043, the audio analysis result set is processed by using the audio mixing rule, which is not described herein again.
Therefore, by implementing the conference voice processing apparatus described in fig. 3, whether the voice data of the conference terminal participates in sound mixing is determined in a manner of bypass mirroring and simultaneous processing of two paths, so that the requirements of removing noise and voice continuity can be met while obviously improving the timeliness of the voice in a conference.
Example four
Referring to fig. 4, fig. 4 is a schematic structural diagram of another conference voice processing apparatus according to an embodiment of the present invention. The apparatus described in fig. 4 can be applied to a data processing system, such as a local server or a cloud server for data processing management of complex network communication, and the embodiment of the present invention is not limited thereto.
As shown in fig. 4, the apparatus may include:
a memory 401 storing executable program code;
a processor 402 coupled with the memory 401;
the processor 402 calls the executable program code stored in the memory 401 for executing the steps in the data processing method for complex network communication described in the first embodiment or the second embodiment.
EXAMPLE five
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute the steps of the data processing method for complex network communication described in the first embodiment or the second embodiment.
EXAMPLE six
The embodiment of the invention discloses a computer program product, which comprises a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to make a computer execute the steps in the data processing method for complex network communication described in the first embodiment or the second embodiment.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions essentially or contributing to the prior art may be embodied in the form of software products, which may be stored in a computer-readable storage medium, the storage medium including a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an optical Disc (CD-ROM), or other disk memories, CD-ROMs, magnetic disks, or other magnetic memories, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the data processing method and apparatus for complex network communication disclosed in the embodiments of the present invention are only the preferred embodiments of the present invention, which are only used for illustrating the technical solutions of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A conference voice processing method, comprising:
101, receiving audio data packets of a plurality of conference terminals and performing format adaptation to obtain an adapted audio data set, wherein the adapted audio data set comprises a plurality of adapted audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
102, analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results;
103, processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; the terminal type rule comprises an amplitude characteristic value threshold value and a frequency characteristic value threshold value;
and 104, processing the audio analysis result set by using the sound mixing rule, and mixing the audio analysis results conforming to the sound mixing rule.
2. The method of claim 1, wherein the parsing the adapted audio data set using a preset voice parsing rule comprises:
for the adaptive audio data of any conference terminal, processing the adaptive audio data of the conference terminal by using a preset first analysis rule to obtain first target audio information corresponding to the conference terminal;
processing the adaptive audio data by using a preset first analysis rule, including analyzing the adaptive audio data of the conference terminal, extracting audio information for coding, decoding and converting, and establishing a digital filter for digitally filtering the audio information to obtain the first target audio information;
processing the first target audio information by using a preset second analysis rule to obtain second target audio information corresponding to the conference terminal;
and processing the first target audio information by using a preset second analysis rule, wherein the processing comprises slicing the first target audio information, writing the sliced first target audio information into an external storage device of a voice channel corresponding to the conference terminal, and regularly reading data of the voice channel and adapting the data to be a TDM data stream to obtain second target audio information.
3. The conference voice processing method according to claim 1, wherein processing the audio analysis result set by using the mixing rule includes:
analyzing the adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value;
and if the first amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, judging that the conference terminal participates in sound mixing, otherwise, judging that the conference terminal does not participate in sound mixing.
4. The conference voice processing method according to claim 2, wherein processing the audio analysis result set by using the mixing rule includes:
analyzing second target audio information of any conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value;
and if the second amplitude characteristic value is greater than the amplitude activation threshold corresponding to the conference terminal and the second frequency characteristic value is less than the frequency activation threshold corresponding to the conference terminal, determining to participate in sound mixing, otherwise, determining not to participate in sound mixing.
5. The conference voice processing method according to claim 2, wherein processing the audio analysis result set by using the mixing rule includes:
s1401, analyze the adaptation audio data of any conference terminal, obtain first amplitude eigenvalue and first frequency eigenvalue;
s1402, if the first amplitude characteristic value is larger than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is smaller than the frequency activation threshold corresponding to the conference terminal, preliminarily judging that the conference terminal participates in sound mixing, and executing S1403; otherwise, the preliminary judgment is that the audio mixing is not involved, and S1405 is executed;
s1403, analyzing the second target audio information of the conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value;
s1404, if the second amplitude characteristic value is smaller than the amplitude activation threshold corresponding to the conference terminal, or the second frequency characteristic value is larger than the frequency activation threshold corresponding to the conference terminal, determining whether there is valid voice in the first R pieces of the adapted audio data, where R is a positive integer; if yes, the voice mixing is finally judged to be involved; if not, the final judgment is that the sound mixing is not participated in;
and S1405, performing the operations from the S1401 to the S1404 on all the conference terminals to obtain a judgment result of whether each conference terminal participates in sound mixing.
6. The conference voice processing method according to claim 2, wherein the adapted audio data of each conference terminal is mirrored into two paths for processing simultaneously: the first path executes step 102, and analyzes the adaptive audio data set by using the preset voice analysis rule to obtain the audio analysis result set; the second path of execution 104 is used for analyzing the adaptive audio data of each conference terminal and extracting a first amplitude characteristic and a first frequency characteristic; and analyzing the second target audio information of each conference terminal, and extracting a second amplitude characteristic value and a second frequency characteristic value.
7. The conference voice processing method according to claim 6, wherein processing the audio analysis result set by using the mixing rule includes:
s2401, analyzing adaptive audio data of any conference terminal to obtain a first amplitude characteristic value and a first frequency characteristic value;
s2402, if the first amplitude characteristic value is larger than the amplitude activation threshold corresponding to the conference terminal and the first frequency characteristic value is smaller than the frequency activation threshold corresponding to the conference terminal, preliminarily judging that the audio mixing is participated, and executing S2403; otherwise, the preliminary judgment is that the sound mixing is not involved, and S2405 is executed;
s2403, analyzing the second target audio information of the conference terminal to obtain a second amplitude characteristic value and a second frequency characteristic value;
s2404, if the second amplitude characteristic value is smaller than the amplitude activation threshold corresponding to the conference terminal, or the second frequency characteristic value is larger than the frequency activation threshold corresponding to the conference terminal, judging whether effective voices exist in the first R pieces of the adaptive audio data, wherein R is a positive integer; if yes, the voice mixing is finally judged to be involved; if not, the final judgment is that the audio mixing is not involved;
s2405, performing the operations from S2401 to S2404 on all the conference terminals, and obtaining a result of determining whether each conference terminal participates in audio mixing.
8. A conference voice processing apparatus, characterized in that the apparatus comprises:
the scanning receiving module is used for receiving audio data packets of a plurality of conference terminals and carrying out format adaptation to obtain an adaptive audio data set, wherein the adaptive audio data set comprises a plurality of adaptive audio data; scanning each conference terminal to obtain a terminal type information set, wherein the terminal type information set comprises a plurality of terminal type information;
the first processing module is used for analyzing the adaptive audio data set by using a preset voice analysis rule to obtain an audio analysis result set; the audio analysis result set comprises a plurality of audio analysis results:
the second processing module is used for processing the terminal type information by using a preset terminal type rule to obtain a sound mixing rule set; the terminal type rule comprises an amplitude characteristic value threshold value and a frequency characteristic value threshold value;
and the third processing module is used for processing the audio analysis result set by using the audio mixing rule and mixing the audio analysis results conforming to the audio mixing rule.
9. A conference voice processing apparatus, characterized in that the apparatus comprises:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the conference voice processing method according to any one of claims 1 to 7.
10. A computer-storable medium that stores computer instructions that, when invoked, perform a conference speech processing method according to any one of claims 1-7.
CN202210234843.2A 2022-03-10 2022-03-10 Conference voice processing method and device Active CN114627886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210234843.2A CN114627886B (en) 2022-03-10 2022-03-10 Conference voice processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234843.2A CN114627886B (en) 2022-03-10 2022-03-10 Conference voice processing method and device

Publications (2)

Publication Number Publication Date
CN114627886A true CN114627886A (en) 2022-06-14
CN114627886B CN114627886B (en) 2024-08-16

Family

ID=81899443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210234843.2A Active CN114627886B (en) 2022-03-10 2022-03-10 Conference voice processing method and device

Country Status (1)

Country Link
CN (1) CN114627886B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115881131A (en) * 2022-11-17 2023-03-31 广州市保伦电子有限公司 A Method of Speech Transcription under Multi-voice
CN117749947A (en) * 2023-12-22 2024-03-22 广东保伦电子股份有限公司 Multi-terminal protocol-based multi-party call processing method and system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418125B1 (en) * 1998-06-18 2002-07-09 Cisco Technology, Inc. Unified mixing, speaker selection, and jitter buffer management for multi-speaker packet audio systems
US20020172342A1 (en) * 2001-04-30 2002-11-21 O'malley William Audio conference platform with dynamic speech detection threshold
US20050062843A1 (en) * 2003-09-22 2005-03-24 Bowers Richard D. Client-side audio mixing for conferencing
CN1946029A (en) * 2006-10-30 2007-04-11 北京中星微电子有限公司 Method and its system for treating audio signal
CN106161814A (en) * 2015-03-24 2016-11-23 北京视联动力国际信息技术有限公司 The sound mixing method of a kind of Multi-Party Conference and device
CN107276777A (en) * 2017-07-27 2017-10-20 苏州科达科技股份有限公司 The audio-frequency processing method and device of conference system
CN109920445A (en) * 2019-03-04 2019-06-21 北京佳讯飞鸿电气股份有限公司 A kind of sound mixing method, device and equipment
CN110675885A (en) * 2019-10-17 2020-01-10 浙江大华技术股份有限公司 Sound mixing method, device and storage medium
US20200162524A1 (en) * 2017-07-11 2020-05-21 Zte Corporation Control method of multimedia conference terminal and multimedia conference server
CN111585776A (en) * 2020-05-26 2020-08-25 腾讯科技(深圳)有限公司 Data transmission method, device, equipment and computer readable storage medium
CN111741177A (en) * 2020-06-12 2020-10-02 浙江齐聚科技有限公司 Audio mixing method, device, equipment and medium for online conference
CN112118264A (en) * 2020-09-21 2020-12-22 苏州科达科技股份有限公司 Conference sound mixing method and system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418125B1 (en) * 1998-06-18 2002-07-09 Cisco Technology, Inc. Unified mixing, speaker selection, and jitter buffer management for multi-speaker packet audio systems
US20020172342A1 (en) * 2001-04-30 2002-11-21 O'malley William Audio conference platform with dynamic speech detection threshold
US20050062843A1 (en) * 2003-09-22 2005-03-24 Bowers Richard D. Client-side audio mixing for conferencing
CN1946029A (en) * 2006-10-30 2007-04-11 北京中星微电子有限公司 Method and its system for treating audio signal
CN106161814A (en) * 2015-03-24 2016-11-23 北京视联动力国际信息技术有限公司 The sound mixing method of a kind of Multi-Party Conference and device
US20200162524A1 (en) * 2017-07-11 2020-05-21 Zte Corporation Control method of multimedia conference terminal and multimedia conference server
CN107276777A (en) * 2017-07-27 2017-10-20 苏州科达科技股份有限公司 The audio-frequency processing method and device of conference system
CN109920445A (en) * 2019-03-04 2019-06-21 北京佳讯飞鸿电气股份有限公司 A kind of sound mixing method, device and equipment
CN110675885A (en) * 2019-10-17 2020-01-10 浙江大华技术股份有限公司 Sound mixing method, device and storage medium
CN111585776A (en) * 2020-05-26 2020-08-25 腾讯科技(深圳)有限公司 Data transmission method, device, equipment and computer readable storage medium
CN111741177A (en) * 2020-06-12 2020-10-02 浙江齐聚科技有限公司 Audio mixing method, device, equipment and medium for online conference
CN112118264A (en) * 2020-09-21 2020-12-22 苏州科达科技股份有限公司 Conference sound mixing method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115881131A (en) * 2022-11-17 2023-03-31 广州市保伦电子有限公司 A Method of Speech Transcription under Multi-voice
CN115881131B (en) * 2022-11-17 2023-10-13 广东保伦电子股份有限公司 Voice transcription method under multiple voices
CN117749947A (en) * 2023-12-22 2024-03-22 广东保伦电子股份有限公司 Multi-terminal protocol-based multi-party call processing method and system

Also Published As

Publication number Publication date
CN114627886B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
US11605394B2 (en) Speech signal cascade processing method, terminal, and computer-readable storage medium
EP2207335B1 (en) Method and apparatus for storing and forwarding voice signals
US7006456B2 (en) Method and apparatus for packet-based media communication
US7133521B2 (en) Method and apparatus for DTMF detection and voice mixing in the CELP parameter domain
US8284922B2 (en) Methods and systems for changing a communication quality of a communication session based on a meaning of speech data
US10257361B1 (en) Method and apparatus of processing user data of a multi-speaker conference call
CN114627886B (en) Conference voice processing method and device
JPH10285275A (en) Calling method, voice transmitting device and voice receiving device
CN107276777A (en) The audio-frequency processing method and device of conference system
US20080159240A1 (en) Method of conducting a communications session using incorrect timestamps
CN1242898A (en) Speech transmission between terminals in different networks
EP2158753B1 (en) Selection of audio signals to be mixed in an audio conference
CN112019488B (en) Voice processing method, device, equipment and storage medium
US6765995B1 (en) Telephone system and telephone method
CN1822681B (en) Erase DTMF signals sent as voice data
US20030174657A1 (en) Method, system and computer program product for voice active packet switching for IP based audio conferencing
US11070666B2 (en) Methods and devices for improvements relating to voice quality estimation
US7058026B1 (en) Internet teleconferencing
US7170886B1 (en) Devices, methods and software for generating indexing metatags in real time for a stream of digitally stored voice data
CN117411857A (en) Telephone system for automatic switching between SIP and analog telephone
US7299176B1 (en) Voice quality analysis of speech packets by substituting coded reference speech for the coded speech in received packets
CN1331340C (en) Sound code cut-over method and device and sound communication terminal
CN103686059A (en) A distributed-type audio mixing processing method and a system
US20040252813A1 (en) Tone clamping and replacement
CN115883501B (en) Multi-person instant messaging method, system, medium and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant