CN109803059A - Audio-frequency processing method and device - Google Patents
Audio-frequency processing method and device Download PDFInfo
- Publication number
- CN109803059A CN109803059A CN201811543825.2A CN201811543825A CN109803059A CN 109803059 A CN109803059 A CN 109803059A CN 201811543825 A CN201811543825 A CN 201811543825A CN 109803059 A CN109803059 A CN 109803059A
- Authority
- CN
- China
- Prior art keywords
- audio signal
- participant
- sent
- set text
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 230000005236 sound signal Effects 0.000 claims abstract description 235
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000003491 array Methods 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims description 30
- 230000015654 memory Effects 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000013461 design Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 14
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000011218 segmentation Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 241000209140 Triticum Species 0.000 description 3
- 235000021307 Triticum Nutrition 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the present application provides a kind of audio-frequency processing method and device, this method comprises: obtaining the audio signal that multiple concatenated microphone arrays are sent;Target audio signal is obtained according to the audio signal that multiple concatenated microphone array is sent;The target audio signal is sent to server, which generates minutes for server.The embodiment of the present application obtains meeting record according to the audio signal sent from multiple concatenated microphone arrays, improves the efficiency for obtaining meeting record, ensure that the accuracy rate of the meeting record of acquisition.
Description
Technical field
The invention relates to computer technology more particularly to a kind of audio-frequency processing methods and device.
Background technique
Currently with the fast development of business, for conference system using very extensive, most of meeting is multi-person conference, is needed
Make meeting record to determine and know together to reach meeting.
In this multi-person conference scene, general conference system is directed toward shape microphone progress pickup with multiple simulations and passes
It is defeated.Wherein, the necessary speaker of shape microphone closely alignment microphone is directed toward in this simulation, slightly at a distance will be to pickup signal
It makes a big impact, so that the quality defect of the audio signal of acquisition, causes the audio signal that user acquires according to microphone
When obtaining meeting record, obtains meeting record efficiency and accuracy rate is lower.
Summary of the invention
The embodiment of the present application provides a kind of audio-frequency processing method and device, improves the efficiency for obtaining meeting record, guarantees
The accuracy rate of the meeting record obtained.
In a first aspect, the embodiment of the present application provides a kind of audio-frequency processing method, comprising:
Obtain the audio signal that multiple concatenated microphone arrays are sent;
Target audio signal is obtained according to the audio signal that the multiple concatenated microphone array is sent;
The target audio signal is sent to server, the target audio signal generates meeting for the server
Record.
In a kind of possible design, further includes:
Obtain the mark of participant;
The mark of the participant is sent to the server, wherein the minutes include: the mark of the participant
Know.
In a kind of possible design, further includes:
Obtain the history audio signal of the participant;
The history audio signal of the participant is sent to the server, the history audio signal of the participant is used for
The current audio signals of the participant are determined in the target audio signal.
In a kind of possible design, the history audio signal of the participant is obtained, comprising:
Pre-set text is obtained from the server;
Show the pre-set text;
Obtain the history audio signal that the participant generates according to the pre-set text.
In a kind of possible design, before obtaining pre-set text from the server, further includes:
Obtain active conference theme;
The active conference theme is sent to server, the active conference theme is for the server according to correspondence
Pre-set text described in Relation acquisition, the corresponding relationship include: that multiple session topics and each session topic are corresponding default
Text set;Wherein, the pre-set text is the text that the corresponding pre-set text of the active conference theme is concentrated.
In a kind of possible design, target is obtained according to the audio signal that the multiple concatenated microphone array is sent
Audio signal, comprising:
The maximum audio signal of amplitude in the audio signal for selecting the multiple concatenated microphone array to send, and by institute
The maximum audio signal of amplitude is stated as the target audio signal.
Second aspect, the embodiment of the present application provide a kind of audio-frequency processing method, comprising:
The target audio signal that group audio terminal is sent is received, the target audio signal is according to multiple concatenated Mikes
What the audio signal that wind array is sent obtained;
Minutes are generated according to the target audio signal.
It is described that minutes are generated according to the target audio signal in a kind of possible design, comprising:
Speech recognition and natural-sounding processing are carried out to the target audio signal, obtain the minutes.
In a kind of possible design, further includes:
Receive the mark for the participant that the group audio terminal is sent.
In a kind of possible design, further includes:
Receive the history audio signal of participant;
Determine that the participant's is current in the target audio signal according to the history audio signal of the participant
Audio signal;
Correspondingly, described carry out speech recognition and natural-sounding processing to the target audio signal, the meeting is obtained
Record, comprising:
Speech recognition and natural-sounding processing carried out to the current audio signals of the participant, and will that treated is current
In audio signal and the corresponding write-in minutes of the mark of the participant.
In a kind of possible design, after the generation minutes according to the target audio signal, further includes:
The minutes are sent to terminal device.
In a kind of possible design, before the history audio signal for receiving participant, further includes:
Pre-set text is sent to the group audio terminal, the pre-set text obtains the ginseng for the group audio terminal
The history audio signal of meeting person.
In a kind of possible design, it is stored with corresponding relationship, the corresponding relationship includes: multiple session topics and every
The corresponding pre-set text collection of a session topic;Before it will send pre-set text to the group audio terminal, further includes:
Active conference theme is received from the group audio terminal;
According to the active conference theme and corresponding relationship, the corresponding pre-set text collection of the active conference theme is obtained;
It is concentrated from the corresponding pre-set text of the active conference theme and determines the pre-set text.
The third aspect, the embodiment of the present application provide a kind of apparatus for processing audio, comprising:
Module is obtained, the audio signal sent for obtaining multiple concatenated microphone arrays;
The acquisition module is also used to obtain target according to the audio signal that the multiple concatenated microphone array is sent
Audio signal;
Sending module, for the target audio signal to be sent to server, the target audio signal is for servicing
Device generates minutes.
In a kind of possible design, the acquisition module is also used to obtain the mark of participant;
The sending module is also used to send the mark of the participant to the server, wherein the minutes
It include: the mark of the participant.
In a kind of possible design, the acquisition module is also used to obtain the history audio signal of participant;
The sending module is also used to send the history audio signal of the participant to the server, described to attend a meeting
The history audio signal of person in the target audio signal for determining the current audio signals of the participant.
In a kind of possible design, the acquisition module is also used to obtain pre-set text from the server;
It further include display module, the display module, for showing the pre-set text;
The acquisition module is believed specifically for obtaining the participant according to the history audio that the pre-set text generates
Number.
In a kind of possible design, the acquisition module is also used to before obtaining pre-set text from the server:
Obtain active conference theme;
The sending module, is also used to for the active conference theme being sent to server, and the active conference theme is used
The pre-set text is obtained according to corresponding relationship in the server, the corresponding relationship includes: multiple session topics and every
The corresponding pre-set text collection of a session topic;Wherein, the pre-set text is the corresponding pre-set text of the active conference theme
The text of concentration.
In a kind of possible design, the acquisition module is specifically used for:
The maximum audio signal of amplitude in the audio signal for selecting the multiple concatenated microphone array to send, and by institute
The maximum audio signal of amplitude is stated as the target audio signal.
Fourth aspect, the embodiment of the present application provide a kind of apparatus for processing audio, comprising:
Receiving module, for receiving the target audio signal of group audio terminal transmission, the target audio signal is basis
What the audio signal that multiple concatenated microphone arrays are sent obtained;
Generation module, for generating minutes according to the target audio signal.
In a kind of possible design, the generation module is specifically used for carrying out voice knowledge to the target audio signal
It is not handled with natural-sounding, obtains the minutes.
In a kind of possible design, the receiving module is also used to receive the participant that the group audio terminal is sent
Mark.
In a kind of possible design, the receiving module is also used to receive the history audio signal of participant;
Further include: determining module, the determining module, for the history audio signal according to the participant in the mesh
Mark the current audio signals that the participant is determined in audio signal;
Correspondingly, the generation module, specifically for the current audio signals to the participant carry out speech recognition and
Natural-sounding processing, and the minutes are written by treated current audio signals and the mark of the participant are corresponding
In.
It further include sending module in a kind of possible design, for being generated described according to the target audio signal
After minutes, the minutes are sent to terminal device.
In a kind of possible design, sending module is also used to before the history audio signal for receiving participant
Pre-set text is sent to the group audio terminal, the pre-set text obtains going through for the participant for the group audio terminal
History audio signal.
In a kind of possible design, corresponding relationship is stored in the apparatus for processing audio, the corresponding relationship includes:
Multiple session topics and the corresponding pre-set text collection of each session topic;The receiving module is also used to the meeting
Before telephone set will send pre-set text: receiving active conference theme from the group audio terminal;
The determining module is also used to obtain the active conference master according to the active conference theme and corresponding relationship
Inscribe corresponding pre-set text collection;And it is concentrated from the corresponding pre-set text of the active conference theme and determines the pre-set text.
5th aspect, the embodiment of the present application provides a kind of readable storage medium storing program for executing, including program or instruction, when described program or
When instruction is run on computers, first aspect or any method of second aspect are performed.
6th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: processor, the processor and memory
Coupling;
The memory is used for, and stores computer program;
The processor is used for, and calls the computer program stored in the memory, to realize first aspect or second
Any method of aspect.
Group audio terminal obtains target according to the audio signal that multiple concatenated microphone arrays are sent is obtained in the application
Audio signal, server obtain minutes according to the target audio signal;And the series system of multiple microphone arrays can be with
Guarantee that the clock of each microphone array is homologous, therefore, group audio terminal obtains the audio that multiple concatenated microphone arrays are sent
After signal, the when delay time error for calibrating the audio signal that each microphone array is sent is very small, therefore group audio terminal determines each wheat
The comparison that the maximum target audio signal of amplitude determines in the audio signal that gram wind array is sent is accurate, and can guarantee meeting
The integrality of generated audio signal and the continuity in timing in journey, and then can guarantee finally obtained minutes
Accuracy.And noted down without manually generated meeting, improve the efficiency for obtaining meeting record.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this Shen
Some embodiments please for those of ordinary skill in the art without any creative labor, can be with
It obtains other drawings based on these drawings.
Fig. 1 is system architecture diagram provided by the embodiments of the present application;
Fig. 2 is the interaction figure of audio-frequency processing method embodiment one provided by the present application
Fig. 3 is the connection schematic diagram of multiple concatenated microphone arrays and group audio terminal provided by the embodiments of the present application;
Fig. 4 is the interaction figure of audio-frequency processing method embodiment two provided by the present application;
Fig. 5 is the structural schematic diagram of apparatus for processing audio embodiment one provided by the present application;
Fig. 6 is the structural schematic diagram of apparatus for processing audio embodiment two provided by the present application;
Fig. 7 is the structural schematic diagram of apparatus for processing audio embodiment three provided by the present application;
Fig. 8 is the structural schematic diagram of apparatus for processing audio example IV provided by the present application;
Fig. 9 is the structural schematic diagram of electronic equipment provided by the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.
Specifically, in the application, "at least one" refers to one or more, and " multiple " refer to two or more.
"and/or" describes the incidence relation of affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, can indicate: single
Solely there are A, A and B are existed simultaneously, the case where individualism B, wherein A, B can be odd number or plural number.The general table of character "/"
Show that forward-backward correlation object is a kind of relationship of "or".At least one of " following (a) " or its similar expression, refer to these in
Any combination, any combination including individual event (a) or complex item (a).For example, at least one (a) in a, b or c, it can
To indicate: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c can be individually, be also possible to multiple.Art in the application
Language " first ", " second " etc. are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
First to the invention relates to technical term be illustrated.
Microphone array: the audio front end system being made of multiple microphones, and audio is adopted with these microphones
Collection obtains sound source direction, forms beam position, achievees the purpose that the signal-to-noise ratio for enhancing audio signal.One microphone array
Multiple microphones of column are arranged on same circuit board, connect respectively with same processing chip.
Wherein, the geometry arrangement mode of multiple microphones of a microphone array on circuit boards can for triangle arrangement or
Rectangular arranged or circular arrangement are arranged in a straight line.
Beam position: microphone array only acquires the audio of specific direction, inhibits the behavior of the audio in other directions.
Omnidirectional microphone: the microphone of equal energy pickup on 360 degree of directions.
Fig. 1 is system architecture diagram provided by the embodiments of the present application;Referring to Fig. 1, which includes multiple concatenated wheats
Gram wind array, group audio terminal, server and terminal device.Wherein, multiple concatenated microphone arrays and group audio terminal are set
It sets in the inside of meeting room;The microphone for including inside microphone array can be omnidirectional microphone.
Multiple concatenated microphone arrays and the wired or wireless connection of group audio terminal, between group audio terminal and server
Wired or wireless connection.If multiple concatenated microphone arrays and group audio terminal wired connection, can be connect by audio-frequency bus
Mouth uses twisted pair line connection;Twisted pair line connection is used also by audio bus interface between multiple microphone arrays.
Wherein, multiple concatenated microphone arrays and group audio terminal wired connection refer to multiple concatenated microphone arrays
In a microphone array and conference telephone wired connection, such as multiple concatenated microphone arrays in first microphone array
Column or the last one microphone array.
Below based on above-mentioned system architecture, said using audio-frequency processing method of the specific embodiment to the application
It is bright.Fig. 2 is the interaction figure of audio-frequency processing method embodiment one provided by the present application, as shown in Fig. 2, the method for the present embodiment can be with
Include:
Step S101, group audio terminal obtains the audio signal that multiple concatenated microphone arrays are sent.
Specifically, when user makes a speech in active conference, each microphone array can acquire the audio signal of speech user,
And the direction of speech user is determined according to the audio signal of acquisition, that is, it determines the direction of sound source, then adjusts the direction of wave beam, make
The direction for obtaining wave beam is matched with the direction of sound source, continues the audio signal of acquisition speech user.
For the ease of subsequent processing and accurate minutes are obtained, each microphone array can be to the audio of acquisition
Signal is pre-processed.Wherein, to the audio signal of acquisition pre-processed involved in algorithm existing algorithm can be used.
Such as each microphone array, first to the collected audio signal of each microphone in the microphone array into
Row echo cancellation process, reprocessing (as weighted, time delay, scheduling algorithm of summing) are oriented pickup at space directivity, then again
According to characteristic audio signal such as frequency, intensity, duration etc. carry out noise suppressed, and by the spatial character of sound field do dereverberation with
And the processing such as Nonlinear Processing signal amplification (with automatic gain or dynamic range adjustment etc.).
Pretreated audio signal is sent to group audio terminal by each microphone array, and group audio terminal receives each
The audio signal that microphone array is sent.It is microphone array that i.e. microphone array, which is sent to the audio signal of group audio terminal,
To the pretreated audio signal of the original audio signal of acquisition.
Further, audio interface example audio-frequency bus (A2B) can be passed through through the pretreated audio signal of microphone array
Interface is transmitted to group audio terminal on the twisted-pair.Wherein, audio-frequency bus (A2B) can not only transmit digital audio and video signals and may be used also
Microphone array mould is given to provide control signal (such as Inter-IC Sound Bus control, abbreviation I2C control) and phantom power
Block power supply etc., conference microphone wiring complexity can greatly be reduced, in addition also ensure digital audio and video signals high fidelity,
Real-time audio signal transmission, and then the reduction degree of conference audio signal is improved to obtain accurate minutes.
Fig. 3 is the connection schematic diagram of multiple concatenated microphone arrays and group audio terminal provided by the embodiments of the present application.
Referring to Fig. 3, microphone array A is by pretreated audio signal through microphone array B, microphone array C, Mike
Wind array D is sent to group audio terminal, and microphone array B is by pretreated audio signal through microphone array C, microphone array
Column D is sent to group audio terminal, and pretreated audio signal is sent to meeting electricity through microphone array D by microphone array C
Pretreated audio signal is sent to group audio terminal by phone, microphone array D.
Since the series system of multiple microphone arrays can guarantee that the clock of each microphone array is homologous, meeting
After telephone set obtains the audio signal that multiple concatenated microphone arrays are sent, the audio signal that each microphone array is sent is calibrated
When delay time error it is very small.
Step S102, group audio terminal obtains target audio according to the audio signal that multiple concatenated microphone arrays are sent
Signal.
Specifically, target audio is obtained according to the audio signal that multiple concatenated microphone arrays are sent in group audio terminal
Before signal, the unlatching that group audio terminal receives user's input obtains the instruction of target audio signal function, according to the instruction, meeting
It discusses telephone set and opens the function of obtaining target audio signal.
In the first scheme, group audio terminal obtains mesh according to the audio signal that multiple concatenated microphone arrays are sent
Audio signal is marked, is specifically included: the maximum audio letter of amplitude in the audio signal that the multiple concatenated microphone arrays of selection are sent
Number, and using the maximum audio signal of the amplitude as the target audio signal.
In second scheme, group audio terminal obtains mesh according to the audio signal that multiple concatenated microphone arrays are sent
Audio signal is marked, is specifically included: the maximum audio letter of amplitude in the audio signal that the multiple concatenated microphone arrays of selection are sent
Number, and using the maximum audio signal of the amplitude as the preselected audio signal;Sentence segmentation is carried out to preselected audio signal, is obtained
Target audio signal.
Wherein, existing algorithm in the prior art can be used in sentence segmentation.
Therefore, if the group audio terminal when delay time error of calibrating the audio signal that each microphone array is sent is very small,
Group audio terminal determines the comparison determined when the maximum target audio signal of amplitude in audio signal that each microphone array is sent
Accurately, and it can guarantee that (audio for omitting certain period is not present in the integrality of generated audio signal in conference process
The problem of signal or audio signal in certain period repeat) and timing on continuity (i.e. there is no target audios to believe
Microphone array is listed in the sound that the corresponding preprocessed audio signal of audio signal of acquisition at the first time is acquired in the second time in number
The case where after the corresponding preprocessed audio signal of frequency signal, at the first time earlier than the second time), and then guarantee finally obtained
The accuracy of minutes.
Step S103, the target audio signal is sent to server by group audio terminal;
Specifically, which can be Cloud Server.
Step S104, server obtains minutes according to the target audio signal.
Specifically, it includes: to carry out to the target audio signal that server, which obtains minutes according to the target audio signal,
Speech recognition and natural-sounding processing, obtain the minutes of textual form.
If in step S102, group audio terminal obtains the target audio signal according to the first scheme, then in the present embodiment
Speech recognition can include: to target audio signal carry out sentence segmentation, then to the target audio signal after sentence segmentation into
Row speech recognition.
If in step S102, group audio terminal obtains the target audio signal according to second scheme, then in the present embodiment
Speech recognition do not include that sentence segmentation is carried out to target audio signal, including to being that target audio after sentence segmentation is believed
Number carry out speech recognition.
Wherein, existing algorithm in the prior art can be used in speech recognition and natural-sounding processing.
In audio acquisition methods in the present embodiment, group audio terminal is sent according to multiple concatenated microphone arrays are obtained
Audio signal obtain target audio signal, server obtains minutes according to the target audio signal, and multiple microphones
The series system of array can guarantee that the clock of each microphone array is homologous, and therefore, group audio terminal obtains multiple concatenated wheats
After the audio signal that gram wind array is sent, the when delay time error for calibrating the audio signal that each microphone array is sent is very small, therefore
Group audio terminal determines the comparison determined when the maximum target audio signal of amplitude in audio signal that each microphone array is sent
Accurately, and it can guarantee the integrality of generated audio signal and the continuity in timing in conference process, and then can
Guarantee the accuracy of finally obtained minutes.And noted down without manually generated meeting, improve the effect for obtaining meeting record
Rate.
Specific embodiment is used below, and the technical solution of embodiment of the method shown in Fig. 2 is described in detail.
Fig. 4 is the interaction figure of audio-frequency processing method embodiment two provided by the present application, as shown in figure 4, the side of the present embodiment
Method may include:
Step S201, group audio terminal obtains the audio signal that multiple concatenated microphone arrays are sent.
Specifically, the specific implementation of step S201 can refer to the step S101 in an embodiment.
Step S202, group audio terminal obtains target audio according to the audio signal that multiple concatenated microphone arrays are sent
Signal.
Specifically, the specific implementation of step S202 can refer to the step S102 in an embodiment.
Step S203, target audio signal is sent to server by group audio terminal.
Specifically, the specific implementation of step S203 can refer to the step S103 in an embodiment.
Step S204, group audio terminal obtains the mark of participant.
Specifically, when can be in session, user inputs the mark of each participant by group audio terminal, and group audio terminal obtains
The mark of participant.Wherein, the mark of participant may include at least one in following: the duty of the name, participant of participant
Position, participant position code.
Step S205, group audio terminal sends the mark of participant to server.
Step S206, group audio terminal obtains the history audio signal of participant.
Specifically, group audio terminal can prior record companies employee before a conference begins sound, obtain company personnel's
History audio signal, and in the server by the history audio signal of company personnel and the mark associated storage of company personnel.
When meeting carries out, after group audio terminal sends the mark of each participant to server, server determines whether to deposit
Contain the corresponding history audio signal of each participant, if it exists without the target participant of history audio signal, server then to
The mark of group audio terminal transmission pre-set text and target participant;After group audio terminal receives, the pre-set text is shown;
Target participant reads the pre-set text, and group audio terminal receives the history that target participant generates according to the pre-set text
Audio signal (i.e. target participant reads the audio signal that the pre-set text generates), and going through obtained target participant
History audio signal and the mark of target participant are sent to server associated storage.
Wherein, pre-set text can be text relevant to meeting, the history audio signal of participant to get and
The high similarity for the audio signal that participant's conference speech generates, improves the current audio signals of each participant of subsequent determination
Efficiency, and then improve obtain minutes efficiency.
For example, being stored with the corresponding relationship of session topic Yu pre-set text collection in server, which includes multiple
Session topic and the corresponding pre-set text collection of each session topic.It is understood that each session topic is corresponding default
Pre-set text in text set is related to the meeting theme.At this point, group audio terminal can also obtain the current of user's input
Active conference theme is sent to server by session topic, and server is worked as according to active conference theme and the corresponding relationship, determination
The corresponding pre-set text collection of preceding session topic is concentrated from the corresponding pre-set text of active conference theme and determines a pre-set text, and
The pre-set text is sent to group audio terminal.
Step S207, group audio terminal sends the history audio signal of participant to server.
Step S208, server determines working as participant according to the history audio signal of participant in target audio signal
Preceding audio signal.
Specifically, server obtains going through for each participant stored in server according to the mark of the participant received
History audio signal, and target audio signal and each history audio signal are subjected to acoustic feature comparison, in target audio signal
Determine the current audio signals of each participant.Wherein, existing algorithm can be used in the algorithm of acoustic feature comparison.
Step S209, server carries out speech recognition to the current audio signals of participant and natural-sounding is handled, and will
In current audio signals that treated and the corresponding write-in minutes of the mark of participant.
Specifically, for each participant, server carries out speech recognition and oneself to the current audio signals of participant
Right speech processes, and will be in treated current audio signals and the corresponding write-in minutes of the mark of participant.
Wherein, current audio signals that treated are the corresponding text of current audio signals.It will treated current sound
In the corresponding write-in minutes of the mark of frequency signal and participant, mark and participant as in minutes including participant
Speech text (speech content), and the speech text of the corresponding participant of mark of participant.
Step S210, server sends the minutes to terminal device.
Specifically, terminal device such as can be mobile phone, computer terminal or group audio terminal etc..
The present embodiment can guarantee the accuracy and tractability of finally obtained minutes.
Combine Fig. 2~Fig. 4 that the audio-frequency processing method of the embodiment of the present application is illustrated above.Below using combination figure
5~Fig. 9 is illustrated the apparatus for processing audio of the embodiment of the present application.
Fig. 5 is the structural schematic diagram of apparatus for processing audio embodiment one provided by the present application, as shown in figure 5, the present embodiment
Device may include: to obtain module 51 and sending module 52;
Module 51 is obtained, the audio signal sent for obtaining multiple concatenated microphone arrays;
The acquisition module 51 is also used to obtain mesh according to the audio signal that the multiple concatenated microphone array is sent
Mark audio signal;
Sending module 52, for the target audio signal to be sent to server, the target audio signal is for taking
Business device generates minutes.
Optionally, the acquisition module 51, is also used to obtain the mark of participant;
The sending module 52 is also used to send the mark of the participant to the server, wherein the meeting note
Record includes: the mark of the participant.
Optionally, the acquisition module 51, is also used to obtain the history audio signal of participant;
The sending module 52 is also used to send the history audio signal of the participant, the ginseng to the server
The history audio signal of meeting person in the target audio signal for determining the current audio signals of the participant.
Optionally, the acquisition module 51, is specifically used for:
The maximum audio signal of amplitude in the audio signal for selecting the multiple concatenated microphone array to send, and by institute
The maximum audio signal of amplitude is stated as the target audio signal.
The device of the present embodiment can be used for executing the corresponding technical solution of group audio terminal in above method embodiment,
That the realization principle and technical effect are similar is similar for it, and details are not described herein again.
Fig. 6 is the structural schematic diagram of apparatus for processing audio embodiment two provided by the present application, as shown in fig. 6, the present embodiment
Device apparatus structure shown in Fig. 5 on the basis of, can also include: display module 53 further;
The acquisition module 51 is also used to obtain pre-set text from the server;
The display module 53, for showing the pre-set text;
The acquisition module 51 is believed specifically for obtaining the participant according to the history audio that the pre-set text generates
Number.
Optionally, the acquisition module 51 is also used to before obtaining pre-set text from the server: obtaining current meeting
Discuss theme;
The sending module 52 is also used to for the active conference theme to be sent to server, the active conference theme
The pre-set text is obtained according to corresponding relationship for the server, the corresponding relationship include: multiple session topics and
The corresponding pre-set text collection of each session topic;Wherein, the pre-set text is the corresponding default text of the active conference theme
The text of this concentration.
The device of the present embodiment can be used for executing the corresponding technical solution of group audio terminal in above method embodiment,
That the realization principle and technical effect are similar is similar for it, and details are not described herein again.
Fig. 7 is the structural schematic diagram of apparatus for processing audio embodiment three provided by the present application, as shown in fig. 7, the present embodiment
Device may include: receiving module 71 and generation module 72;
Receiving module 71, for receiving the target audio signal of group audio terminal transmission, the target audio signal is root
It is obtained according to the audio signal that multiple concatenated microphone arrays are sent;
Generation module 72, for generating minutes according to the target audio signal.
Optionally, the generation module 72 is specifically used for carrying out the target audio signal speech recognition and natural language
Sound processing, obtains the minutes.
Optionally, the receiving module 71 is also used to receive the mark for the participant that the group audio terminal is sent.
The device of the present embodiment can be used for executing the corresponding technical solution of server in above method embodiment, in fact
Existing principle is similar with technical effect, and details are not described herein again.
Fig. 8 is the structural schematic diagram of apparatus for processing audio example IV provided by the present application, as shown in figure 8, the present embodiment
Device apparatus structure shown in Fig. 7 on the basis of, further, can also comprise determining that module 73 and sending module 74;
The receiving module 71 is also used to receive the history audio signal of participant;
The determining module 73, it is true in the target audio signal for the history audio signal according to the participant
The current audio signals of the fixed participant;
Correspondingly, the generation module 72, carries out speech recognition specifically for the current audio signals to the participant
With natural-sounding processing, and the minutes are written by treated current audio signals and the mark of the participant are corresponding
In.
Sending module 74 is used for after the generation minutes according to the target audio signal, to terminal device
Send the minutes.
Optionally, the sending module 74, be also used to it is described receive participant history audio signal forward direction described in
Group audio terminal sends pre-set text, and the pre-set text obtains the history audio of the participant for the group audio terminal
Signal.
Optionally, corresponding relationship is stored in the apparatus for processing audio, the corresponding relationship includes: multiple session topics
And the corresponding pre-set text collection of each session topic;The receiving module 71 is also used to send out to the group audio terminal
Before sending pre-set text: receiving active conference theme from the group audio terminal;
The determining module 73 is also used to obtain the active conference according to the active conference theme and corresponding relationship
The corresponding pre-set text collection of theme;And it is concentrated from the corresponding pre-set text of the active conference theme and determines the default text
This.
The device of the present embodiment can be used for executing the corresponding technical solution of server in above method embodiment, in fact
Existing principle is similar with technical effect, and details are not described herein again.
Fig. 9 is the structural schematic diagram of electronic equipment provided by the embodiments of the present application, referring to Fig. 9, the server of the present embodiment
It include: processor 62, memory 61 and communication bus 63, communication bus 63 is handled for connecting processor 62 and memory 61
Device 62 is coupled with memory 61;
The memory 61 is used for, and stores computer program;
The processor 62 is used for, and calls the computer program stored in the memory 61, to realize above method reality
The method for applying group audio terminal or server in example.
Wherein, computer program is also storable in the memory of electronic device exterior.
It should be understood that in the embodiment of the present application, which can be CPU, which can also be that other are logical
With processor, digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or its
His programmable logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be micro- place
Manage device either any conventional processor etc..
The memory 61 may include read-only memory and random access memory, and provide instruction sum number to processor 62
According to.Memory 61 can also include nonvolatile RAM.For example, memory 61 can be with storage device type
Information.
The memory 61 can be volatile memory or nonvolatile memory, or may each comprise volatibility and non-volatile
Both property memories.Wherein, nonvolatile memory can be read-only memory (read-only memory, ROM), may be programmed
Read-only memory (programmable ROM, PROM), Erasable Programmable Read Only Memory EPROM (erasable PROM,
EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory.Volatile memory
It can be random access memory (random access memory, RAM), be used as External Cache.By exemplary
It but is not restricted explanation, the RAM of many forms is available, such as static random access memory (static RAM, SRAM), dynamic
State random access memory (DRAM), Synchronous Dynamic Random Access Memory (synchronous DRAM, SDRAM), double number
According to rate synchronization dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic
Random access memory (enhanced SDRAM, ESDRAM), synchronized links dynamic random access memory (synchlink
DRAM, SLDRAM) and direct rambus random access memory (direct rambus RAM, DR RAM).
The bus 63 can also include power bus, control bus and status signal bus in addition in addition to including data/address bus
Deng.But for the sake of clear explanation, various buses are all designated as bus 63 in figure.
The embodiment of the present application provides a kind of readable storage medium storing program for executing, including program or instruction, when described program or instruction are being counted
When being run on calculation machine, group audio terminal in above-mentioned any means embodiment or the method as described in server are performed.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally, it should be noted that the above various embodiments is only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent
Pipe is described in detail the application referring to foregoing embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, each embodiment technology of the application that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (17)
1. a kind of audio-frequency processing method characterized by comprising
Obtain the audio signal that multiple concatenated microphone arrays are sent;
Target audio signal is obtained according to the audio signal that the multiple concatenated microphone array is sent;
The target audio signal is sent to server, the target audio signal generates meeting note for the server
Record.
2. the method according to claim 1, wherein further include:
Obtain the mark of participant;
The mark of the participant is sent to the server, wherein the minutes include: the mark of the participant.
3. according to the method described in claim 2, it is characterized by further comprising:
Obtain the history audio signal of the participant;
The history audio signal of the participant is sent to the server, the history audio signal of the participant is used in institute
State the current audio signals that the participant is determined in target audio signal.
4. according to the method described in claim 3, it is characterized in that, obtaining the history audio signal of the participant, comprising:
Pre-set text is obtained from the server;
Show the pre-set text;
Obtain the history audio signal that the participant generates according to the pre-set text.
5. according to the method described in claim 4, it is characterized in that, also being wrapped before obtaining pre-set text from the server
It includes:
Obtain active conference theme;
The active conference theme is sent to server, the active conference theme is for the server according to corresponding relationship
The pre-set text is obtained, the corresponding relationship includes: multiple session topics and the corresponding pre-set text of each session topic
Collection;Wherein, the pre-set text is the text that the corresponding pre-set text of the active conference theme is concentrated.
6. method according to claim 1-5, which is characterized in that according to the multiple concatenated microphone array
The audio signal of transmission obtains target audio signal, comprising:
The maximum audio signal of amplitude in the audio signal for selecting the multiple concatenated microphone array to send, and by the vibration
Maximum audio signal is as the target audio signal.
7. a kind of audio-frequency processing method characterized by comprising
The target audio signal that group audio terminal is sent is received, the target audio signal is according to multiple concatenated microphone array
What the audio signal that column are sent obtained;
Minutes are generated according to the target audio signal.
8. the method according to the description of claim 7 is characterized in that described generate meeting note according to the target audio signal
Record, comprising:
Speech recognition and natural-sounding processing are carried out to the target audio signal, obtain the minutes.
9. method according to claim 7 or 8, which is characterized in that further include:
Receive the mark for the participant that the group audio terminal is sent.
10. according to the method described in claim 9, it is characterized by further comprising:
Receive the history audio signal of participant;
The present video of the participant is determined in the target audio signal according to the history audio signal of the participant
Signal;
Correspondingly, described carry out speech recognition and natural-sounding processing to the target audio signal, the minutes are obtained,
Include:
Speech recognition and natural-sounding processing carried out to the current audio signals of the participant, and will treated present video
The mark of signal and the participant are corresponding to be written in the minutes.
11. method according to claim 7 or 8, which is characterized in that described to generate meeting according to the target audio signal
After record, further includes:
The minutes are sent to terminal device.
12. according to the method described in claim 10, it is characterized in that, it is described receive participant history audio signal it
Before, further includes:
Pre-set text is sent to the group audio terminal, the pre-set text obtains the participant for the group audio terminal
History audio signal.
13. according to the method for claim 12, which is characterized in that be stored with corresponding relationship, the corresponding relationship includes: more
A session topic and the corresponding pre-set text collection of each session topic;To the group audio terminal will send pre-set text it
Before, further includes:
Active conference theme is received from the group audio terminal;
According to the active conference theme and corresponding relationship, the corresponding pre-set text collection of the active conference theme is obtained;
It is concentrated from the corresponding pre-set text of the active conference theme and determines the pre-set text.
14. a kind of apparatus for processing audio characterized by comprising
Module is obtained, the audio signal sent for obtaining multiple concatenated microphone arrays;
The acquisition module, the audio signal for being sent according to the multiple concatenated microphone array obtain target audio letter
Number;
Sending module, for the target audio signal to be sent to server, the target audio signal is used for the service
Device generates minutes.
15. a kind of apparatus for processing audio characterized by comprising
Receiving module, for receiving the target audio signal of group audio terminal transmission, the target audio signal is according to multiple
What the audio signal that concatenated microphone array is sent obtained;
Generation module, for generating minutes according to the target audio signal.
16. a kind of readable storage medium storing program for executing, which is characterized in that including program or instruction, when described program or instruct on computers
When operation, claim 1~6 or 7~13 any methods are performed.
17. a kind of electronic equipment characterized by comprising processor, the processor are coupled with memory;
The memory is used for, and stores computer program;
The processor is used for, and calls the computer program stored in the memory, to realize claim 1~6 or 7~13
Any method.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811543825.2A CN109803059A (en) | 2018-12-17 | 2018-12-17 | Audio-frequency processing method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811543825.2A CN109803059A (en) | 2018-12-17 | 2018-12-17 | Audio-frequency processing method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109803059A true CN109803059A (en) | 2019-05-24 |
Family
ID=66556863
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811543825.2A Pending CN109803059A (en) | 2018-12-17 | 2018-12-17 | Audio-frequency processing method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109803059A (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110717751A (en) * | 2019-09-27 | 2020-01-21 | 维沃移动通信有限公司 | Data processing method and mobile terminal |
| CN111131616A (en) * | 2019-12-28 | 2020-05-08 | 科大讯飞股份有限公司 | Audio sharing method based on intelligent terminal and related device |
| CN112055122A (en) * | 2020-08-07 | 2020-12-08 | 联想(北京)有限公司 | Conference component equipment, conference equipment and data processing method |
| CN113111215A (en) * | 2021-03-30 | 2021-07-13 | 深圳市冠标科技发展有限公司 | User behavior analysis method and device, electronic equipment and storage medium |
| CN113963694A (en) * | 2020-07-20 | 2022-01-21 | 中移(苏州)软件技术有限公司 | A speech recognition method, speech recognition device, electronic device and storage medium |
| CN114143909A (en) * | 2021-12-06 | 2022-03-04 | 阿里巴巴达摩院(杭州)科技有限公司 | Data transmission method, pluggable switching equipment and computer storage medium |
| CN115348241A (en) * | 2022-08-17 | 2022-11-15 | 深圳市拔超科技股份有限公司 | Microphone cascading method |
| CN119446132A (en) * | 2025-01-09 | 2025-02-14 | 深圳市鸿哲智能系统工程有限公司 | A speech transcription processing system |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103813239A (en) * | 2012-11-12 | 2014-05-21 | 雅马哈株式会社 | Signal processing system and signal processing method |
| CN104579628A (en) * | 2015-01-07 | 2015-04-29 | 中国人民解放军国防科学技术大学 | Audio conference safety secrecy system and method |
| US9215327B2 (en) * | 2011-06-11 | 2015-12-15 | Clearone Communications, Inc. | Methods and apparatuses for multi-channel acoustic echo cancelation |
| CN106789133A (en) * | 2017-01-19 | 2017-05-31 | 广州市花都区国光音频科技中心(普通合伙) | A New Type of Digital Conference System |
| CN107978317A (en) * | 2017-12-18 | 2018-05-01 | 北京百度网讯科技有限公司 | Meeting summary synthetic method, system and terminal device |
| CN108022583A (en) * | 2017-11-17 | 2018-05-11 | 平安科技(深圳)有限公司 | Meeting summary generation method, application server and computer-readable recording medium |
| CN108335697A (en) * | 2018-01-29 | 2018-07-27 | 北京百度网讯科技有限公司 | Minutes method, apparatus, equipment and computer-readable medium |
-
2018
- 2018-12-17 CN CN201811543825.2A patent/CN109803059A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9215327B2 (en) * | 2011-06-11 | 2015-12-15 | Clearone Communications, Inc. | Methods and apparatuses for multi-channel acoustic echo cancelation |
| CN103813239A (en) * | 2012-11-12 | 2014-05-21 | 雅马哈株式会社 | Signal processing system and signal processing method |
| CN104579628A (en) * | 2015-01-07 | 2015-04-29 | 中国人民解放军国防科学技术大学 | Audio conference safety secrecy system and method |
| CN106789133A (en) * | 2017-01-19 | 2017-05-31 | 广州市花都区国光音频科技中心(普通合伙) | A New Type of Digital Conference System |
| CN108022583A (en) * | 2017-11-17 | 2018-05-11 | 平安科技(深圳)有限公司 | Meeting summary generation method, application server and computer-readable recording medium |
| CN107978317A (en) * | 2017-12-18 | 2018-05-01 | 北京百度网讯科技有限公司 | Meeting summary synthetic method, system and terminal device |
| CN108335697A (en) * | 2018-01-29 | 2018-07-27 | 北京百度网讯科技有限公司 | Minutes method, apparatus, equipment and computer-readable medium |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110717751A (en) * | 2019-09-27 | 2020-01-21 | 维沃移动通信有限公司 | Data processing method and mobile terminal |
| CN111131616A (en) * | 2019-12-28 | 2020-05-08 | 科大讯飞股份有限公司 | Audio sharing method based on intelligent terminal and related device |
| CN113963694A (en) * | 2020-07-20 | 2022-01-21 | 中移(苏州)软件技术有限公司 | A speech recognition method, speech recognition device, electronic device and storage medium |
| CN112055122A (en) * | 2020-08-07 | 2020-12-08 | 联想(北京)有限公司 | Conference component equipment, conference equipment and data processing method |
| CN113111215A (en) * | 2021-03-30 | 2021-07-13 | 深圳市冠标科技发展有限公司 | User behavior analysis method and device, electronic equipment and storage medium |
| CN114143909A (en) * | 2021-12-06 | 2022-03-04 | 阿里巴巴达摩院(杭州)科技有限公司 | Data transmission method, pluggable switching equipment and computer storage medium |
| CN115348241A (en) * | 2022-08-17 | 2022-11-15 | 深圳市拔超科技股份有限公司 | Microphone cascading method |
| CN119446132A (en) * | 2025-01-09 | 2025-02-14 | 深圳市鸿哲智能系统工程有限公司 | A speech transcription processing system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109803059A (en) | Audio-frequency processing method and device | |
| CN110992974B (en) | Speech recognition method, apparatus, device and computer readable storage medium | |
| US11295760B2 (en) | Method, apparatus, system and storage medium for implementing a far-field speech function | |
| CN112017681B (en) | Method and system for enhancing directional voice | |
| CN118098260B (en) | Voice signal processing method and related equipment | |
| US11115539B2 (en) | Smart voice system, method of adjusting output voice and computer readable memory medium | |
| CN111654806A (en) | Audio playback method, device, storage medium and electronic device | |
| US10978089B2 (en) | Method, apparatus for blind signal separating and electronic device | |
| CN108335697A (en) | Minutes method, apparatus, equipment and computer-readable medium | |
| WO2022005615A1 (en) | Speech enhancement | |
| CN112364144A (en) | Interaction method, device, equipment and computer readable medium | |
| Huang et al. | Advances in microphone array processing and multichannel speech enhancement | |
| US20250240565A1 (en) | Kalman-filter-based adaptive microphone array noise reduction method and apparatus | |
| US20200184973A1 (en) | Transcription of communications | |
| CN116206606B (en) | Speech processing method, device, computer equipment and storage medium | |
| CN118018674A (en) | An intelligent conference system | |
| WO2024158629A1 (en) | Guided speech-enhancement networks | |
| US11830120B2 (en) | Speech image providing method and computing device for performing the same | |
| US20220329960A1 (en) | Audio capture using room impulse responses | |
| CN110446142B (en) | Audio information processing method, server, device, storage medium and client | |
| CN114299932A (en) | Voice data processing method and device, computer equipment and storage medium | |
| CN114449341B (en) | Audio processing methods, devices, readable media and electronic equipment | |
| CN111650560A (en) | Sound source localization method and device | |
| CN112911465B (en) | Signal sending method and device and electronic equipment | |
| US20230260505A1 (en) | Information processing method, non-transitory recording medium, information processing apparatus, and information processing system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190524 |
|
| RJ01 | Rejection of invention patent application after publication |