CN109803059A

CN109803059A - Audio-frequency processing method and device

Info

Publication number: CN109803059A
Application number: CN201811543825.2A
Authority: CN
Inventors: 耿雷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-05-24

Abstract

The embodiment of the present application provides a kind of audio-frequency processing method and device, this method comprises: obtaining the audio signal that multiple concatenated microphone arrays are sent；Target audio signal is obtained according to the audio signal that multiple concatenated microphone array is sent；The target audio signal is sent to server, which generates minutes for server.The embodiment of the present application obtains meeting record according to the audio signal sent from multiple concatenated microphone arrays, improves the efficiency for obtaining meeting record, ensure that the accuracy rate of the meeting record of acquisition.

Description

Audio-frequency processing method and device

Technical field

The invention relates to computer technology more particularly to a kind of audio-frequency processing methods and device.

Background technique

Currently with the fast development of business, for conference system using very extensive, most of meeting is multi-person conference, is needed Make meeting record to determine and know together to reach meeting.

In this multi-person conference scene, general conference system is directed toward shape microphone progress pickup with multiple simulations and passes It is defeated.Wherein, the necessary speaker of shape microphone closely alignment microphone is directed toward in this simulation, slightly at a distance will be to pickup signal It makes a big impact, so that the quality defect of the audio signal of acquisition, causes the audio signal that user acquires according to microphone When obtaining meeting record, obtains meeting record efficiency and accuracy rate is lower.

Summary of the invention

The embodiment of the present application provides a kind of audio-frequency processing method and device, improves the efficiency for obtaining meeting record, guarantees The accuracy rate of the meeting record obtained.

In a first aspect, the embodiment of the present application provides a kind of audio-frequency processing method, comprising:

Obtain the audio signal that multiple concatenated microphone arrays are sent；

Target audio signal is obtained according to the audio signal that the multiple concatenated microphone array is sent；

The target audio signal is sent to server, the target audio signal generates meeting for the server Record.

In a kind of possible design, further includes:

Obtain the mark of participant；

The mark of the participant is sent to the server, wherein the minutes include: the mark of the participant Know.

In a kind of possible design, further includes:

Obtain the history audio signal of the participant；

The history audio signal of the participant is sent to the server, the history audio signal of the participant is used for The current audio signals of the participant are determined in the target audio signal.

In a kind of possible design, the history audio signal of the participant is obtained, comprising:

Pre-set text is obtained from the server；

Show the pre-set text；

Obtain the history audio signal that the participant generates according to the pre-set text.

In a kind of possible design, before obtaining pre-set text from the server, further includes:

Obtain active conference theme；

The active conference theme is sent to server, the active conference theme is for the server according to correspondence Pre-set text described in Relation acquisition, the corresponding relationship include: that multiple session topics and each session topic are corresponding default Text set；Wherein, the pre-set text is the text that the corresponding pre-set text of the active conference theme is concentrated.

In a kind of possible design, target is obtained according to the audio signal that the multiple concatenated microphone array is sent Audio signal, comprising:

The maximum audio signal of amplitude in the audio signal for selecting the multiple concatenated microphone array to send, and by institute The maximum audio signal of amplitude is stated as the target audio signal.

Second aspect, the embodiment of the present application provide a kind of audio-frequency processing method, comprising:

The target audio signal that group audio terminal is sent is received, the target audio signal is according to multiple concatenated Mikes What the audio signal that wind array is sent obtained；

Minutes are generated according to the target audio signal.

It is described that minutes are generated according to the target audio signal in a kind of possible design, comprising:

Speech recognition and natural-sounding processing are carried out to the target audio signal, obtain the minutes.

In a kind of possible design, further includes:

Receive the mark for the participant that the group audio terminal is sent.

In a kind of possible design, further includes:

Receive the history audio signal of participant；

Determine that the participant's is current in the target audio signal according to the history audio signal of the participant Audio signal；

Correspondingly, described carry out speech recognition and natural-sounding processing to the target audio signal, the meeting is obtained Record, comprising:

Speech recognition and natural-sounding processing carried out to the current audio signals of the participant, and will that treated is current In audio signal and the corresponding write-in minutes of the mark of the participant.

In a kind of possible design, after the generation minutes according to the target audio signal, further includes:

The minutes are sent to terminal device.

In a kind of possible design, before the history audio signal for receiving participant, further includes:

Pre-set text is sent to the group audio terminal, the pre-set text obtains the ginseng for the group audio terminal The history audio signal of meeting person.

In a kind of possible design, it is stored with corresponding relationship, the corresponding relationship includes: multiple session topics and every The corresponding pre-set text collection of a session topic；Before it will send pre-set text to the group audio terminal, further includes:

Active conference theme is received from the group audio terminal；

According to the active conference theme and corresponding relationship, the corresponding pre-set text collection of the active conference theme is obtained；

It is concentrated from the corresponding pre-set text of the active conference theme and determines the pre-set text.

The third aspect, the embodiment of the present application provide a kind of apparatus for processing audio, comprising:

Module is obtained, the audio signal sent for obtaining multiple concatenated microphone arrays；

The acquisition module is also used to obtain target according to the audio signal that the multiple concatenated microphone array is sent Audio signal；

Sending module, for the target audio signal to be sent to server, the target audio signal is for servicing Device generates minutes.

In a kind of possible design, the acquisition module is also used to obtain the mark of participant；

The sending module is also used to send the mark of the participant to the server, wherein the minutes It include: the mark of the participant.

In a kind of possible design, the acquisition module is also used to obtain the history audio signal of participant；

The sending module is also used to send the history audio signal of the participant to the server, described to attend a meeting The history audio signal of person in the target audio signal for determining the current audio signals of the participant.

In a kind of possible design, the acquisition module is also used to obtain pre-set text from the server；

It further include display module, the display module, for showing the pre-set text；

The acquisition module is believed specifically for obtaining the participant according to the history audio that the pre-set text generates Number.

In a kind of possible design, the acquisition module is also used to before obtaining pre-set text from the server: Obtain active conference theme；

The sending module, is also used to for the active conference theme being sent to server, and the active conference theme is used The pre-set text is obtained according to corresponding relationship in the server, the corresponding relationship includes: multiple session topics and every The corresponding pre-set text collection of a session topic；Wherein, the pre-set text is the corresponding pre-set text of the active conference theme The text of concentration.

In a kind of possible design, the acquisition module is specifically used for:

Fourth aspect, the embodiment of the present application provide a kind of apparatus for processing audio, comprising:

Receiving module, for receiving the target audio signal of group audio terminal transmission, the target audio signal is basis What the audio signal that multiple concatenated microphone arrays are sent obtained；

Generation module, for generating minutes according to the target audio signal.

In a kind of possible design, the generation module is specifically used for carrying out voice knowledge to the target audio signal It is not handled with natural-sounding, obtains the minutes.

In a kind of possible design, the receiving module is also used to receive the participant that the group audio terminal is sent Mark.

In a kind of possible design, the receiving module is also used to receive the history audio signal of participant；

Further include: determining module, the determining module, for the history audio signal according to the participant in the mesh Mark the current audio signals that the participant is determined in audio signal；

Correspondingly, the generation module, specifically for the current audio signals to the participant carry out speech recognition and Natural-sounding processing, and the minutes are written by treated current audio signals and the mark of the participant are corresponding In.

It further include sending module in a kind of possible design, for being generated described according to the target audio signal After minutes, the minutes are sent to terminal device.

In a kind of possible design, sending module is also used to before the history audio signal for receiving participant Pre-set text is sent to the group audio terminal, the pre-set text obtains going through for the participant for the group audio terminal History audio signal.

In a kind of possible design, corresponding relationship is stored in the apparatus for processing audio, the corresponding relationship includes: Multiple session topics and the corresponding pre-set text collection of each session topic；The receiving module is also used to the meeting Before telephone set will send pre-set text: receiving active conference theme from the group audio terminal；

The determining module is also used to obtain the active conference master according to the active conference theme and corresponding relationship Inscribe corresponding pre-set text collection；And it is concentrated from the corresponding pre-set text of the active conference theme and determines the pre-set text.

5th aspect, the embodiment of the present application provides a kind of readable storage medium storing program for executing, including program or instruction, when described program or When instruction is run on computers, first aspect or any method of second aspect are performed.

6th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: processor, the processor and memory Coupling；

The memory is used for, and stores computer program；

The processor is used for, and calls the computer program stored in the memory, to realize first aspect or second Any method of aspect.

Group audio terminal obtains target according to the audio signal that multiple concatenated microphone arrays are sent is obtained in the application Audio signal, server obtain minutes according to the target audio signal；And the series system of multiple microphone arrays can be with Guarantee that the clock of each microphone array is homologous, therefore, group audio terminal obtains the audio that multiple concatenated microphone arrays are sent After signal, the when delay time error for calibrating the audio signal that each microphone array is sent is very small, therefore group audio terminal determines each wheat The comparison that the maximum target audio signal of amplitude determines in the audio signal that gram wind array is sent is accurate, and can guarantee meeting The integrality of generated audio signal and the continuity in timing in journey, and then can guarantee finally obtained minutes Accuracy.And noted down without manually generated meeting, improve the efficiency for obtaining meeting record.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this Shen Some embodiments please for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.

Fig. 1 is system architecture diagram provided by the embodiments of the present application；

Fig. 2 is the interaction figure of audio-frequency processing method embodiment one provided by the present application

Fig. 3 is the connection schematic diagram of multiple concatenated microphone arrays and group audio terminal provided by the embodiments of the present application；

Fig. 4 is the interaction figure of audio-frequency processing method embodiment two provided by the present application；

Fig. 5 is the structural schematic diagram of apparatus for processing audio embodiment one provided by the present application；

Fig. 6 is the structural schematic diagram of apparatus for processing audio embodiment two provided by the present application；

Fig. 7 is the structural schematic diagram of apparatus for processing audio embodiment three provided by the present application；

Fig. 8 is the structural schematic diagram of apparatus for processing audio example IV provided by the present application；

Fig. 9 is the structural schematic diagram of electronic equipment provided by the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.

Specifically, in the application, "at least one" refers to one or more, and " multiple " refer to two or more. "and/or" describes the incidence relation of affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, can indicate: single Solely there are A, A and B are existed simultaneously, the case where individualism B, wherein A, B can be odd number or plural number.The general table of character "/" Show that forward-backward correlation object is a kind of relationship of "or".At least one of " following (a) " or its similar expression, refer to these in Any combination, any combination including individual event (a) or complex item (a).For example, at least one (a) in a, b or c, it can To indicate: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c can be individually, be also possible to multiple.Art in the application Language " first ", " second " etc. are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.

First to the invention relates to technical term be illustrated.

Microphone array: the audio front end system being made of multiple microphones, and audio is adopted with these microphones Collection obtains sound source direction, forms beam position, achievees the purpose that the signal-to-noise ratio for enhancing audio signal.One microphone array Multiple microphones of column are arranged on same circuit board, connect respectively with same processing chip.

Wherein, the geometry arrangement mode of multiple microphones of a microphone array on circuit boards can for triangle arrangement or Rectangular arranged or circular arrangement are arranged in a straight line.

Beam position: microphone array only acquires the audio of specific direction, inhibits the behavior of the audio in other directions.

Omnidirectional microphone: the microphone of equal energy pickup on 360 degree of directions.

Fig. 1 is system architecture diagram provided by the embodiments of the present application；Referring to Fig. 1, which includes multiple concatenated wheats Gram wind array, group audio terminal, server and terminal device.Wherein, multiple concatenated microphone arrays and group audio terminal are set It sets in the inside of meeting room；The microphone for including inside microphone array can be omnidirectional microphone.

Multiple concatenated microphone arrays and the wired or wireless connection of group audio terminal, between group audio terminal and server Wired or wireless connection.If multiple concatenated microphone arrays and group audio terminal wired connection, can be connect by audio-frequency bus Mouth uses twisted pair line connection；Twisted pair line connection is used also by audio bus interface between multiple microphone arrays.

Wherein, multiple concatenated microphone arrays and group audio terminal wired connection refer to multiple concatenated microphone arrays In a microphone array and conference telephone wired connection, such as multiple concatenated microphone arrays in first microphone array Column or the last one microphone array.

Below based on above-mentioned system architecture, said using audio-frequency processing method of the specific embodiment to the application It is bright.Fig. 2 is the interaction figure of audio-frequency processing method embodiment one provided by the present application, as shown in Fig. 2, the method for the present embodiment can be with Include:

Step S101, group audio terminal obtains the audio signal that multiple concatenated microphone arrays are sent.

Specifically, when user makes a speech in active conference, each microphone array can acquire the audio signal of speech user, And the direction of speech user is determined according to the audio signal of acquisition, that is, it determines the direction of sound source, then adjusts the direction of wave beam, make The direction for obtaining wave beam is matched with the direction of sound source, continues the audio signal of acquisition speech user.

For the ease of subsequent processing and accurate minutes are obtained, each microphone array can be to the audio of acquisition Signal is pre-processed.Wherein, to the audio signal of acquisition pre-processed involved in algorithm existing algorithm can be used.

Such as each microphone array, first to the collected audio signal of each microphone in the microphone array into Row echo cancellation process, reprocessing (as weighted, time delay, scheduling algorithm of summing) are oriented pickup at space directivity, then again According to characteristic audio signal such as frequency, intensity, duration etc. carry out noise suppressed, and by the spatial character of sound field do dereverberation with And the processing such as Nonlinear Processing signal amplification (with automatic gain or dynamic range adjustment etc.).

Pretreated audio signal is sent to group audio terminal by each microphone array, and group audio terminal receives each The audio signal that microphone array is sent.It is microphone array that i.e. microphone array, which is sent to the audio signal of group audio terminal, To the pretreated audio signal of the original audio signal of acquisition.

Further, audio interface example audio-frequency bus (A2B) can be passed through through the pretreated audio signal of microphone array Interface is transmitted to group audio terminal on the twisted-pair.Wherein, audio-frequency bus (A2B) can not only transmit digital audio and video signals and may be used also Microphone array mould is given to provide control signal (such as Inter-IC Sound Bus control, abbreviation I2C control) and phantom power Block power supply etc., conference microphone wiring complexity can greatly be reduced, in addition also ensure digital audio and video signals high fidelity, Real-time audio signal transmission, and then the reduction degree of conference audio signal is improved to obtain accurate minutes.

Fig. 3 is the connection schematic diagram of multiple concatenated microphone arrays and group audio terminal provided by the embodiments of the present application.

Referring to Fig. 3, microphone array A is by pretreated audio signal through microphone array B, microphone array C, Mike Wind array D is sent to group audio terminal, and microphone array B is by pretreated audio signal through microphone array C, microphone array Column D is sent to group audio terminal, and pretreated audio signal is sent to meeting electricity through microphone array D by microphone array C Pretreated audio signal is sent to group audio terminal by phone, microphone array D.

Since the series system of multiple microphone arrays can guarantee that the clock of each microphone array is homologous, meeting After telephone set obtains the audio signal that multiple concatenated microphone arrays are sent, the audio signal that each microphone array is sent is calibrated When delay time error it is very small.

Step S102, group audio terminal obtains target audio according to the audio signal that multiple concatenated microphone arrays are sent Signal.

Specifically, target audio is obtained according to the audio signal that multiple concatenated microphone arrays are sent in group audio terminal Before signal, the unlatching that group audio terminal receives user's input obtains the instruction of target audio signal function, according to the instruction, meeting It discusses telephone set and opens the function of obtaining target audio signal.

In the first scheme, group audio terminal obtains mesh according to the audio signal that multiple concatenated microphone arrays are sent Audio signal is marked, is specifically included: the maximum audio letter of amplitude in the audio signal that the multiple concatenated microphone arrays of selection are sent Number, and using the maximum audio signal of the amplitude as the target audio signal.

In second scheme, group audio terminal obtains mesh according to the audio signal that multiple concatenated microphone arrays are sent Audio signal is marked, is specifically included: the maximum audio letter of amplitude in the audio signal that the multiple concatenated microphone arrays of selection are sent Number, and using the maximum audio signal of the amplitude as the preselected audio signal；Sentence segmentation is carried out to preselected audio signal, is obtained Target audio signal.

Wherein, existing algorithm in the prior art can be used in sentence segmentation.

Therefore, if the group audio terminal when delay time error of calibrating the audio signal that each microphone array is sent is very small, Group audio terminal determines the comparison determined when the maximum target audio signal of amplitude in audio signal that each microphone array is sent Accurately, and it can guarantee that (audio for omitting certain period is not present in the integrality of generated audio signal in conference process The problem of signal or audio signal in certain period repeat) and timing on continuity (i.e. there is no target audios to believe Microphone array is listed in the sound that the corresponding preprocessed audio signal of audio signal of acquisition at the first time is acquired in the second time in number The case where after the corresponding preprocessed audio signal of frequency signal, at the first time earlier than the second time), and then guarantee finally obtained The accuracy of minutes.

Step S103, the target audio signal is sent to server by group audio terminal；

Specifically, which can be Cloud Server.

Step S104, server obtains minutes according to the target audio signal.

Specifically, it includes: to carry out to the target audio signal that server, which obtains minutes according to the target audio signal, Speech recognition and natural-sounding processing, obtain the minutes of textual form.

If in step S102, group audio terminal obtains the target audio signal according to the first scheme, then in the present embodiment Speech recognition can include: to target audio signal carry out sentence segmentation, then to the target audio signal after sentence segmentation into Row speech recognition.

If in step S102, group audio terminal obtains the target audio signal according to second scheme, then in the present embodiment Speech recognition do not include that sentence segmentation is carried out to target audio signal, including to being that target audio after sentence segmentation is believed Number carry out speech recognition.

Wherein, existing algorithm in the prior art can be used in speech recognition and natural-sounding processing.

In audio acquisition methods in the present embodiment, group audio terminal is sent according to multiple concatenated microphone arrays are obtained Audio signal obtain target audio signal, server obtains minutes according to the target audio signal, and multiple microphones The series system of array can guarantee that the clock of each microphone array is homologous, and therefore, group audio terminal obtains multiple concatenated wheats After the audio signal that gram wind array is sent, the when delay time error for calibrating the audio signal that each microphone array is sent is very small, therefore Group audio terminal determines the comparison determined when the maximum target audio signal of amplitude in audio signal that each microphone array is sent Accurately, and it can guarantee the integrality of generated audio signal and the continuity in timing in conference process, and then can Guarantee the accuracy of finally obtained minutes.And noted down without manually generated meeting, improve the effect for obtaining meeting record Rate.

Specific embodiment is used below, and the technical solution of embodiment of the method shown in Fig. 2 is described in detail.

Fig. 4 is the interaction figure of audio-frequency processing method embodiment two provided by the present application, as shown in figure 4, the side of the present embodiment Method may include:

Step S201, group audio terminal obtains the audio signal that multiple concatenated microphone arrays are sent.

Specifically, the specific implementation of step S201 can refer to the step S101 in an embodiment.

Step S202, group audio terminal obtains target audio according to the audio signal that multiple concatenated microphone arrays are sent Signal.

Specifically, the specific implementation of step S202 can refer to the step S102 in an embodiment.

Step S203, target audio signal is sent to server by group audio terminal.

Specifically, the specific implementation of step S203 can refer to the step S103 in an embodiment.

Step S204, group audio terminal obtains the mark of participant.

Specifically, when can be in session, user inputs the mark of each participant by group audio terminal, and group audio terminal obtains The mark of participant.Wherein, the mark of participant may include at least one in following: the duty of the name, participant of participant Position, participant position code.

Step S205, group audio terminal sends the mark of participant to server.

Step S206, group audio terminal obtains the history audio signal of participant.

Specifically, group audio terminal can prior record companies employee before a conference begins sound, obtain company personnel's History audio signal, and in the server by the history audio signal of company personnel and the mark associated storage of company personnel.

When meeting carries out, after group audio terminal sends the mark of each participant to server, server determines whether to deposit Contain the corresponding history audio signal of each participant, if it exists without the target participant of history audio signal, server then to The mark of group audio terminal transmission pre-set text and target participant；After group audio terminal receives, the pre-set text is shown； Target participant reads the pre-set text, and group audio terminal receives the history that target participant generates according to the pre-set text Audio signal (i.e. target participant reads the audio signal that the pre-set text generates), and going through obtained target participant History audio signal and the mark of target participant are sent to server associated storage.

Wherein, pre-set text can be text relevant to meeting, the history audio signal of participant to get and The high similarity for the audio signal that participant's conference speech generates, improves the current audio signals of each participant of subsequent determination Efficiency, and then improve obtain minutes efficiency.

For example, being stored with the corresponding relationship of session topic Yu pre-set text collection in server, which includes multiple Session topic and the corresponding pre-set text collection of each session topic.It is understood that each session topic is corresponding default Pre-set text in text set is related to the meeting theme.At this point, group audio terminal can also obtain the current of user's input Active conference theme is sent to server by session topic, and server is worked as according to active conference theme and the corresponding relationship, determination The corresponding pre-set text collection of preceding session topic is concentrated from the corresponding pre-set text of active conference theme and determines a pre-set text, and The pre-set text is sent to group audio terminal.

Step S207, group audio terminal sends the history audio signal of participant to server.

Step S208, server determines working as participant according to the history audio signal of participant in target audio signal Preceding audio signal.

Specifically, server obtains going through for each participant stored in server according to the mark of the participant received History audio signal, and target audio signal and each history audio signal are subjected to acoustic feature comparison, in target audio signal Determine the current audio signals of each participant.Wherein, existing algorithm can be used in the algorithm of acoustic feature comparison.

Step S209, server carries out speech recognition to the current audio signals of participant and natural-sounding is handled, and will In current audio signals that treated and the corresponding write-in minutes of the mark of participant.

Specifically, for each participant, server carries out speech recognition and oneself to the current audio signals of participant Right speech processes, and will be in treated current audio signals and the corresponding write-in minutes of the mark of participant.

Wherein, current audio signals that treated are the corresponding text of current audio signals.It will treated current sound In the corresponding write-in minutes of the mark of frequency signal and participant, mark and participant as in minutes including participant Speech text (speech content), and the speech text of the corresponding participant of mark of participant.

Step S210, server sends the minutes to terminal device.

Specifically, terminal device such as can be mobile phone, computer terminal or group audio terminal etc..

The present embodiment can guarantee the accuracy and tractability of finally obtained minutes.

Combine Fig. 2~Fig. 4 that the audio-frequency processing method of the embodiment of the present application is illustrated above.Below using combination figure 5~Fig. 9 is illustrated the apparatus for processing audio of the embodiment of the present application.

Fig. 5 is the structural schematic diagram of apparatus for processing audio embodiment one provided by the present application, as shown in figure 5, the present embodiment Device may include: to obtain module 51 and sending module 52；

Module 51 is obtained, the audio signal sent for obtaining multiple concatenated microphone arrays；

The acquisition module 51 is also used to obtain mesh according to the audio signal that the multiple concatenated microphone array is sent Mark audio signal；

Sending module 52, for the target audio signal to be sent to server, the target audio signal is for taking Business device generates minutes.

Optionally, the acquisition module 51, is also used to obtain the mark of participant；

The sending module 52 is also used to send the mark of the participant to the server, wherein the meeting note Record includes: the mark of the participant.

Optionally, the acquisition module 51, is also used to obtain the history audio signal of participant；

The sending module 52 is also used to send the history audio signal of the participant, the ginseng to the server The history audio signal of meeting person in the target audio signal for determining the current audio signals of the participant.

Optionally, the acquisition module 51, is specifically used for:

The device of the present embodiment can be used for executing the corresponding technical solution of group audio terminal in above method embodiment, That the realization principle and technical effect are similar is similar for it, and details are not described herein again.

Fig. 6 is the structural schematic diagram of apparatus for processing audio embodiment two provided by the present application, as shown in fig. 6, the present embodiment Device apparatus structure shown in Fig. 5 on the basis of, can also include: display module 53 further；

The acquisition module 51 is also used to obtain pre-set text from the server；

The display module 53, for showing the pre-set text；

The acquisition module 51 is believed specifically for obtaining the participant according to the history audio that the pre-set text generates Number.

Optionally, the acquisition module 51 is also used to before obtaining pre-set text from the server: obtaining current meeting Discuss theme；

The sending module 52 is also used to for the active conference theme to be sent to server, the active conference theme The pre-set text is obtained according to corresponding relationship for the server, the corresponding relationship include: multiple session topics and The corresponding pre-set text collection of each session topic；Wherein, the pre-set text is the corresponding default text of the active conference theme The text of this concentration.

Fig. 7 is the structural schematic diagram of apparatus for processing audio embodiment three provided by the present application, as shown in fig. 7, the present embodiment Device may include: receiving module 71 and generation module 72；

Receiving module 71, for receiving the target audio signal of group audio terminal transmission, the target audio signal is root It is obtained according to the audio signal that multiple concatenated microphone arrays are sent；

Generation module 72, for generating minutes according to the target audio signal.

Optionally, the generation module 72 is specifically used for carrying out the target audio signal speech recognition and natural language Sound processing, obtains the minutes.

Optionally, the receiving module 71 is also used to receive the mark for the participant that the group audio terminal is sent.

The device of the present embodiment can be used for executing the corresponding technical solution of server in above method embodiment, in fact Existing principle is similar with technical effect, and details are not described herein again.

Fig. 8 is the structural schematic diagram of apparatus for processing audio example IV provided by the present application, as shown in figure 8, the present embodiment Device apparatus structure shown in Fig. 7 on the basis of, further, can also comprise determining that module 73 and sending module 74；

The receiving module 71 is also used to receive the history audio signal of participant；

The determining module 73, it is true in the target audio signal for the history audio signal according to the participant The current audio signals of the fixed participant；

Correspondingly, the generation module 72, carries out speech recognition specifically for the current audio signals to the participant With natural-sounding processing, and the minutes are written by treated current audio signals and the mark of the participant are corresponding In.

Sending module 74 is used for after the generation minutes according to the target audio signal, to terminal device Send the minutes.

Optionally, the sending module 74, be also used to it is described receive participant history audio signal forward direction described in Group audio terminal sends pre-set text, and the pre-set text obtains the history audio of the participant for the group audio terminal Signal.

Optionally, corresponding relationship is stored in the apparatus for processing audio, the corresponding relationship includes: multiple session topics And the corresponding pre-set text collection of each session topic；The receiving module 71 is also used to send out to the group audio terminal Before sending pre-set text: receiving active conference theme from the group audio terminal；

The determining module 73 is also used to obtain the active conference according to the active conference theme and corresponding relationship The corresponding pre-set text collection of theme；And it is concentrated from the corresponding pre-set text of the active conference theme and determines the default text This.

Fig. 9 is the structural schematic diagram of electronic equipment provided by the embodiments of the present application, referring to Fig. 9, the server of the present embodiment It include: processor 62, memory 61 and communication bus 63, communication bus 63 is handled for connecting processor 62 and memory 61 Device 62 is coupled with memory 61；

The memory 61 is used for, and stores computer program；

The processor 62 is used for, and calls the computer program stored in the memory 61, to realize above method reality The method for applying group audio terminal or server in example.

Wherein, computer program is also storable in the memory of electronic device exterior.

It should be understood that in the embodiment of the present application, which can be CPU, which can also be that other are logical With processor, digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or its His programmable logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be micro- place Manage device either any conventional processor etc..

The memory 61 may include read-only memory and random access memory, and provide instruction sum number to processor 62 According to.Memory 61 can also include nonvolatile RAM.For example, memory 61 can be with storage device type Information.

The memory 61 can be volatile memory or nonvolatile memory, or may each comprise volatibility and non-volatile Both property memories.Wherein, nonvolatile memory can be read-only memory (read-only memory, ROM), may be programmed Read-only memory (programmable ROM, PROM), Erasable Programmable Read Only Memory EPROM (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory.Volatile memory It can be random access memory (random access memory, RAM), be used as External Cache.By exemplary It but is not restricted explanation, the RAM of many forms is available, such as static random access memory (static RAM, SRAM), dynamic State random access memory (DRAM), Synchronous Dynamic Random Access Memory (synchronous DRAM, SDRAM), double number According to rate synchronization dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic Random access memory (enhanced SDRAM, ESDRAM), synchronized links dynamic random access memory (synchlink DRAM, SLDRAM) and direct rambus random access memory (direct rambus RAM, DR RAM).

The bus 63 can also include power bus, control bus and status signal bus in addition in addition to including data/address bus Deng.But for the sake of clear explanation, various buses are all designated as bus 63 in figure.

The embodiment of the present application provides a kind of readable storage medium storing program for executing, including program or instruction, when described program or instruction are being counted When being run on calculation machine, group audio terminal in above-mentioned any means embodiment or the method as described in server are performed.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above various embodiments is only to illustrate the technical solution of the application, rather than its limitations；To the greatest extent Pipe is described in detail the application referring to foregoing embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, each embodiment technology of the application that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of audio-frequency processing method characterized by comprising

The target audio signal is sent to server, the target audio signal generates meeting note for the server Record.

2. the method according to claim 1, wherein further include:

Obtain the mark of participant；

The mark of the participant is sent to the server, wherein the minutes include: the mark of the participant.

3. according to the method described in claim 2, it is characterized by further comprising:

Obtain the history audio signal of the participant；

The history audio signal of the participant is sent to the server, the history audio signal of the participant is used in institute State the current audio signals that the participant is determined in target audio signal.

4. according to the method described in claim 3, it is characterized in that, obtaining the history audio signal of the participant, comprising:

Pre-set text is obtained from the server；

Show the pre-set text；

5. according to the method described in claim 4, it is characterized in that, also being wrapped before obtaining pre-set text from the server It includes:

Obtain active conference theme；

The active conference theme is sent to server, the active conference theme is for the server according to corresponding relationship The pre-set text is obtained, the corresponding relationship includes: multiple session topics and the corresponding pre-set text of each session topic Collection；Wherein, the pre-set text is the text that the corresponding pre-set text of the active conference theme is concentrated.

6. method according to claim 1-5, which is characterized in that according to the multiple concatenated microphone array The audio signal of transmission obtains target audio signal, comprising:

The maximum audio signal of amplitude in the audio signal for selecting the multiple concatenated microphone array to send, and by the vibration Maximum audio signal is as the target audio signal.

7. a kind of audio-frequency processing method characterized by comprising

The target audio signal that group audio terminal is sent is received, the target audio signal is according to multiple concatenated microphone array What the audio signal that column are sent obtained；

Minutes are generated according to the target audio signal.

8. the method according to the description of claim 7 is characterized in that described generate meeting note according to the target audio signal Record, comprising:

9. method according to claim 7 or 8, which is characterized in that further include:

Receive the mark for the participant that the group audio terminal is sent.

10. according to the method described in claim 9, it is characterized by further comprising:

Receive the history audio signal of participant；

The present video of the participant is determined in the target audio signal according to the history audio signal of the participant Signal；

Correspondingly, described carry out speech recognition and natural-sounding processing to the target audio signal, the minutes are obtained, Include:

Speech recognition and natural-sounding processing carried out to the current audio signals of the participant, and will treated present video The mark of signal and the participant are corresponding to be written in the minutes.

11. method according to claim 7 or 8, which is characterized in that described to generate meeting according to the target audio signal After record, further includes:

The minutes are sent to terminal device.

12. according to the method described in claim 10, it is characterized in that, it is described receive participant history audio signal it Before, further includes:

Pre-set text is sent to the group audio terminal, the pre-set text obtains the participant for the group audio terminal History audio signal.

13. according to the method for claim 12, which is characterized in that be stored with corresponding relationship, the corresponding relationship includes: more A session topic and the corresponding pre-set text collection of each session topic；To the group audio terminal will send pre-set text it Before, further includes:

Active conference theme is received from the group audio terminal；

14. a kind of apparatus for processing audio characterized by comprising

The acquisition module, the audio signal for being sent according to the multiple concatenated microphone array obtain target audio letter Number；

Sending module, for the target audio signal to be sent to server, the target audio signal is used for the service Device generates minutes.

15. a kind of apparatus for processing audio characterized by comprising

Receiving module, for receiving the target audio signal of group audio terminal transmission, the target audio signal is according to multiple What the audio signal that concatenated microphone array is sent obtained；

Generation module, for generating minutes according to the target audio signal.

16. a kind of readable storage medium storing program for executing, which is characterized in that including program or instruction, when described program or instruct on computers When operation, claim 1~6 or 7~13 any methods are performed.

17. a kind of electronic equipment characterized by comprising processor, the processor are coupled with memory；

The memory is used for, and stores computer program；

The processor is used for, and calls the computer program stored in the memory, to realize claim 1~6 or 7~13 Any method.