CN120496534A

CN120496534A - A method for generating a real-time summary framework for a meeting and related equipment

Info

Publication number: CN120496534A
Application number: CN202510781138.8A
Authority: CN
Inventors: 鲁骏; 李文轩; 方亮
Original assignee: Guangdong Dingchuangzhi Technology Co ltd
Current assignee: Guangdong Dingchuangzhi Technology Co ltd
Priority date: 2025-06-12
Filing date: 2025-06-12
Publication date: 2025-08-15

Abstract

The embodiment of the application belongs to the technical field of microphone systems, and relates to a conference real-time summarization frame generation method and related equipment; the method comprises the steps of carrying out real-time transfer operation on a current voice signal according to an ASR module to obtain current voice text data, sending the current voice text data to a receiving end receiving and sending module according to a transmitting end receiving and sending module, carrying out structuring processing operation on the current voice text data received by the receiving end receiving and sending module according to a LLM module to obtain a hierarchical logic frame, carrying out visualization processing operation on the hierarchical logic frame according to a graphic engine to obtain visualized data, and transmitting the visualized data to display equipment according to a connecting module to display. The application obviously improves the cooperative efficiency and the decision accuracy.

Description

Conference real-time summary frame generation method and related equipment

Technical Field

The application relates to the technical field of microphone systems, in particular to a method for generating a conference real-time summary frame and related equipment.

Background

In the traditional conference recording mode, after the conference is finished, conference contents are required to be manually arranged to form a conference summary, and the mode has the problems of information lag, easiness in missing key information, low efficiency and the like. Meanwhile, in a multi-person conference or high concurrency scene, the transmission of the original audio data has high requirement on network bandwidth, and the situation of blocking or interruption is easy to occur, so that the accuracy and instantaneity of conference recording are affected.

Therefore, the traditional conference summary generation method has the problems of low real-time performance and low accuracy.

Disclosure of Invention

The embodiment of the application aims to provide a method for generating a meeting real-time summary frame and related equipment, so as to solve the problems of low real-time performance and accuracy of the traditional meeting summary generation method.

In order to solve the technical problems described above, an embodiment of the present application provides a method for generating a real-time summary frame of a conference, where the method is applied to a wireless microphone, where the wireless microphone includes a transmitter and a receiver, the transmitter includes a recording module, an ASR module, and a transmitting-end transceiver module, and the receiver includes a receiving-end transceiver module, an LLM module, a graphics engine, and a connection module, and the method adopts the following technical scheme:

Collecting current voice signals of participants in real time according to the recording module;

performing real-time transcription operation on the current voice signal according to the ASR module to obtain current voice text data;

Transmitting the current voice text data to the receiving terminal receiving and transmitting module according to the transmitting terminal receiving and transmitting module;

Carrying out structural processing operation on the current voice text data received by the receiving-end receiving-transmitting module according to the LLM module to obtain a hierarchical logic framework;

Performing visualization processing operation on the hierarchical logic framework according to the graphic engine to obtain visualized data;

and transmitting the visual data to display equipment for display according to the connection module.

Further, the step of performing real-time transcription operation on the current voice signal according to the ASR module to obtain current voice text data specifically includes the following steps:

And caching the original audio fragment corresponding to the current voice text data to a transmitting end database, and constructing an index identifier corresponding to the current voice text data.

Further, after the step of caching the original audio clip corresponding to the current voice text data to the sender database and constructing the index identifier corresponding to the current voice text data, the method further comprises the following steps:

And performing audio cleaning operation on the original audio fragments stored in the transmitting end database according to a preset cleaning strategy.

Further, the step of performing a visualization processing operation on the hierarchical logic frame according to the graphic engine to obtain visualized data specifically includes the following steps:

judging whether a history hierarchical logic framework exists or not;

and if the history hierarchical logic framework exists, comparing the hierarchical logic framework with the difference content of the history hierarchical logic framework, and rendering the difference content on the basis of the history visual data to obtain the visual data.

Further, after the step of performing the visualization processing operation on the hierarchical logic frame according to the graphics engine to obtain the visualized data, the method further includes the following steps:

and adding a backtracking trigger button at each title node of the visual data, wherein the backtracking trigger button is used for acquiring index identification information corresponding to the title node.

Further, after the step of transmitting the visual data to a display device for display according to the connection module, the method further comprises the following steps:

After clicking the backtracking trigger button, a user acquires index identification information corresponding to the title node and sends an audio request instruction carrying the index identification information to the transmitter through the receiver;

the transmitter acquires an audio fragment corresponding to the index identification information from a transmitting end database and transmits the audio fragment to the receiver;

the receiver outputs the audio clip through a speaker of the display device.

In order to solve the technical problems, the embodiment of the application also provides a conference real-time summary frame generation system, which adopts the following technical scheme:

The transmitter comprises a sound recording module, an ASR module and a transmitting end receiving and transmitting module, and the receiver comprises a receiving end receiving and transmitting module, an LLM module, a graphic engine and a connecting module, wherein:

the recording module is used for collecting current voice signals of the participants in real time;

The ASR module is used for carrying out real-time transcription operation on the current voice signal to obtain current voice text data;

The transmitting terminal receiving and transmitting module is used for transmitting the current voice text data to the receiving terminal receiving and transmitting module;

the LLM module is used for carrying out structural processing operation on the current voice text data received by the receiving-end receiving-transmitting module to obtain a hierarchical logic framework;

The graphic engine is used for performing visualization processing operation on the hierarchical logic framework to obtain visualized data;

and the connection module is used for transmitting the visual data to display equipment for display.

Further, the ASR module includes:

and the voice text caching sub-module is used for caching the original audio fragment corresponding to the current voice text data to a sender database and constructing an index identifier corresponding to the current voice text data.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

The method comprises a memory and a processor, wherein the memory stores computer readable instructions, and the processor executes the computer readable instructions to realize the steps of the method for generating the conference real-time summary frame.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

The computer readable storage medium has stored thereon computer readable instructions which when executed by a processor implement the steps of the meeting real-time summary frame generation method as described above.

The application provides a conference real-time summarization frame generation method, which comprises the steps of collecting current voice signals of participants in real time according to a recording module, carrying out real-time transfer operation on the current voice signals according to an ASR module to obtain current voice text data, sending the current voice text data to a receiving-end receiving-sending module according to a transmitting-end receiving-sending module, carrying out structural processing operation on the current voice text data received by the receiving-end receiving-sending module according to a LLM module to obtain a hierarchical logic frame, carrying out visual processing operation on the hierarchical logic frame according to a graphic engine to obtain visual data, and transmitting the visual data to display equipment for display according to a connecting module. Compared with the prior art, the method and the system have the advantages that the conference speaking content is processed in real time and the visual framework is dynamically generated, so that participants can visually see the structural discussion venation and the core conclusion in the discussion process without waiting for post-conference manual arrangement, errors or supplement omission can be corrected immediately, information synchronization can be ensured by backtracking historical speaking at any time, meanwhile, the conference focusing key issues are guided by means of dynamic visual logic presentation, and the cooperative efficiency and the decision accuracy are remarkably improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

fig. 2 is a flowchart of an implementation of a method for generating a real-time conference summary framework according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a conference real-time summary frame generating system according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, and a server 103, where the terminal device 101 may be a notebook 1011, a tablet 1012, or a cell phone 1013. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 103 via the network 102 using the terminal device 101 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal device 101.

The terminal device 101 may be various electronic devices having a display screen and supporting web browsing, and the terminal device 101 may be an electronic book reader, an MP3 player (Movi ng Picture Experts Group Aud io Layer III, moving picture experts compression standard audio layer III), an MP4 (Movi ng Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer IV) player, a laptop portable computer, a desktop computer, and the like, in addition to the notebook 1011, the tablet 1012, or the mobile phone 1013.

The server 103 may be a server providing various services, such as a background server providing support for pages displayed on the terminal device 101.

It should be noted that, the method for generating the meeting real-time summary frame provided by the embodiment of the application is generally executed by a server/terminal device, and accordingly, the meeting real-time summary frame generating system is generally arranged in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow chart of one embodiment of a method of generating a meeting real-time summary frame in accordance with the present application is shown. The conference real-time summary frame generation method comprises the steps of S201, S202, S203, S204, S205 and S206.

In step S201, current voice signals of participants are collected in real time according to a recording module;

in step S202, performing real-time transcription operation on the current speech signal according to the ASR module to obtain current speech text data;

in step S203, current voice text data is sent to the receiving-end transceiver module according to the transmitting-end transceiver module;

In step S204, performing a structuring processing operation on the current voice text data received by the receiving-end transceiver module according to the LLM module to obtain a hierarchical logic frame;

In step S205, performing a visualization processing operation on the hierarchical logic frame according to the graphics engine to obtain visualized data;

In step S206, the visualized data is transmitted to the display device according to the connection module for display.

In the embodiment of the application, the wireless microphone comprises a transmitter and a receiver, wherein an ASR model is loaded on a processing module of the transmitter and can be integrated on an original chip or can be additionally provided with an SOC-level chip for special processing, the LLM model is integrated on the processing module of the receiver, when in use, the receiver is inserted on a display device, a conference starts, voice signals of participants are directly converted into text data on the transmitter chip through an end-side ASR model, the transmitter chip wirelessly transmits the text data to the receiver, and the receiver converts the text data into visual structural data in real time through the LLM model and transmits the visual structural data to the display device, so that visual real-time conference summary mentioned by the scheme is realized.

In the embodiment of the application, the implementation process of the visual real-time conference summary can be as follows:

Step 1, speech acquisition and pretreatment

The wireless microphone transmitter is internally provided with a high-sensitivity microphone array, and voice signals of participants are collected in real time.

And environmental noise (keyboard sound and air conditioning sound) is filtered through a hardware noise reduction module (such as a DSP chip), so that the clarity of human voice is enhanced.

Step 2, ASR real-time transfer of end side

An ASR chip (such as an SOC of an integrated NPU) built in the transmitter converts the preprocessed voice stream into text data in real time.

Text data is accompanied by a time stamp, speaker ID (distinguishing between different users by voiceprint recognition), and cached in segments (e.g., one segment every 5 seconds).

Step 3, transmitting text data

The transmitter transmits the encrypted text data to the receiver via a low power wireless protocol (e.g., bluetooth 5.3 or Wi-Fi direct) (wired connection may also be used).

The transmission content only contains the structured text, does not contain the original audio, saves bandwidth and improves privacy.

Step 4 receiver LLM structuring process

The receiver is internally provided with an LLM model, receives text data and then generates a hierarchical logic framework by combining context buffering (historical speaking content).

Dynamically adjusting the output structure according to preset rules (such as "meeting summary templates" or "mind map node types"), for example:

Identifying a decision point and automatically marking the decision point as a red node;

Extracting 'task allocation' to generate sub-branches and associating responsible persons.

Step 5, visual data generation and rendering

Structured data (e.g., JSON tree) output by LLM is converted to a visualization format by the receiver's graphics engine (built-in lightweight rendering library).

Data is transmitted to display equipment through an HDM I/USB-C interface, and is rendered into an interactive thinking guide diagram or outline summary in real time, so that the following functions are supported:

Node unfolding/folding, namely clicking a father node to check child content;

and (5) backtracking the time axis, namely sliding and viewing the history discussion nodes.

The embodiment of the application provides a conference real-time summary frame generation method, which comprises the steps of collecting current voice signals of participants in real time according to a recording module, carrying out real-time transfer operation on the current voice signals according to an ASR module to obtain current voice text data, sending the current voice text data to a receiving end receiving and transmitting module according to a transmitting end receiving and transmitting module, carrying out structural processing operation on the current voice text data received by the receiving end receiving and transmitting module according to the LLM module to obtain a hierarchical logic frame, carrying out visual processing operation on the hierarchical logic frame according to a graphic engine to obtain visual data, and transmitting the visual data to display equipment for display according to a connecting module. Compared with the prior art, the method and the system have the advantages that the conference speaking content is processed in real time and the visual framework is dynamically generated, so that participants can visually see the structural discussion venation and the core conclusion in the discussion process without waiting for post-conference manual arrangement, errors or supplement omission can be corrected immediately, information synchronization can be ensured by backtracking historical speaking at any time, meanwhile, the conference focusing key issues are guided by means of dynamic visual logic presentation, and the cooperative efficiency and the decision accuracy are remarkably improved.

In some optional implementations of the embodiments of the present application, the step of performing a real-time transcription operation on the current speech signal according to the ASR module to obtain current speech text data specifically includes the following steps:

In an embodiment of the application, the transmitter buffers the original audio clip in local storage at a time stamp (e.g., one clip every 5 seconds) while converting speech to text, and creates a unique index ID (e.g., time stamp + speaker ID) with the generated text data. As an example, for example, the text node "AI helper module" associates the audio index 20231105_1430_a.

In some optional implementations of the embodiments of the present application, after the step of caching the original audio clip corresponding to the current voice text data in the sender database and constructing the index identifier corresponding to the current voice text data, the method further includes the following steps:

and performing audio cleaning operation on the original audio fragments stored in the sender database according to a preset cleaning strategy.

In the embodiment of the application, the transmitter dynamically cleans up old audio clips (such as only reserving the data of the last 2 hours) according to the storage capacity, so as to avoid memory overflow. In particular, the cleaning policy may be configured to be preserved by time or by important node markers.

In some optional implementations of the embodiments of the present application, the step of performing a visualization processing operation on the hierarchical logic frame according to the graphics engine to obtain the visualized data specifically includes the following steps:

judging whether a history hierarchical logic framework exists or not;

If the history hierarchical logic frame exists, comparing the hierarchical logic frame with the difference content of the history hierarchical logic frame, and rendering the difference content on the basis of the history visual data to obtain the visual data.

In the embodiment of the application, the new speaking content triggers the incremental update of the guide map, and only the change part (such as a new branch and a modified task state) is rendered. The user can make real-time corrections (e.g. merge duplicate nodes, adjust priorities) through external touch screens or receiver physical keys.

In some optional implementations of the embodiments of the present application, after the step of performing a visualization processing operation on the hierarchical logic frame according to the graphics engine to obtain the visualized data, the method further includes the following steps:

In the embodiment of the application, in the visual structure data generated by the receiver, a backtracking icon (such as a horn symbol) is automatically added after each title node. When the user clicks the icon, the front end triggers a callback event to acquire the audio index ID associated with the node.

In some optional implementations of the embodiments of the present application, after the step of transmitting the visual data to the display device for presentation according to the connection module, the method further includes the steps of:

after clicking the backtracking trigger button, the user acquires index identification information corresponding to the title node, and sends an audio request instruction carrying the index identification information to the transmitter through the receiver;

The transmitter acquires the audio fragment corresponding to the index identification information from the transmitting end database and transmits the audio fragment to the receiver;

The receiver outputs the audio clip through a speaker of the display device.

In the embodiment of the application, when a user clicks an icon, a front end triggers a callback event to acquire an audio index ID associated with the node, a receiver sends an audio request instruction to a transmitter through a wireless link (such as Bluetooth), wherein the audio request instruction comprises a target index ID and a request type (such as playing), the transmitter retrieves a corresponding audio fragment from a local cache according to the index ID, returns an error code if the corresponding audio fragment is expired (such as exceeds a storage duration), transmits an audio stream to the receiver through a high-priority wireless channel (such as Wi-Fi QoS) in a format of compressed audio (such as OPUS coding), and a built-in audio decoding module of the receiver converts the compressed audio into PCM waveform data and outputs the PCM waveform data through a loudspeaker of display equipment.

In some optional implementations of the embodiments of the present application, when playing the audio clip, the visual interface synchronously highlights the associated text node (e.g., the node frame blinks), enhancing the feedback.

In summary, the method and the system can visually see the structural discussion context and the core conclusion in the discussion process by processing conference speaking contents in real time and dynamically generating the visual framework, and can correct errors or supplement omission in real time without waiting for post-conference manual arrangement, ensure information synchronization by backtracking historical speaking at any time, and simultaneously guide conference focusing key issues by means of dynamic visual logic presentation, so that the cooperative efficiency and decision accuracy are obviously improved, the problem of repeated discussion and conclusion deviation caused by information lag in the traditional scheme is solved, and in addition, the system can realize real-time summarization and dynamic visualization of conference contents under the condition of no need of manual intervention by using a three-core design of flow processing assembly line, incremental framework updating and interactive visualization. The technical key is to balance semantic understanding depth and real-time requirements, further improve accuracy of logic reasoning (according to content prereading) in future by combining a Neuro-Symbolic AI system, for example, the microphone product planning is mentioned, the related content of microphone product quotation in recent years can be displayed after the AI model recognizes keywords, and real-time dynamic summarization is realized through streaming data processing and incremental framework updating, so that instant structuring and visualization of conference content are realized. The method has the core advantages that the hysteresis information processing is converted into a dynamic cognition auxiliary tool accompanying the conference, and the cooperative efficiency and the decision quality are directly improved. The method comprises the steps of transmitting text data in real time by an end microphone, remarkably optimizing transmission efficiency and instantaneity, converting voice into text in real time by an end ASR model, transmitting text content only, reducing data volume by tens to hundreds of times compared with original audio data (such as PCM/WAV format), greatly reducing wireless channel bandwidth pressure, ensuring smooth transmission under high concurrency scenes (such as multi-person conferences), enabling text transmission to have higher tolerance to network fluctuation, maintaining real-time update even under environments with unstable signals or limited bandwidth (such as mobile conferences and remote areas), avoiding conference information loss caused by audio stream jamming or interruption, improving privacy and safety, enabling original voice data to be reserved in end equipment in whole course, only outwards transmitting text content after desensitization, avoiding sensitive information leakage risk from the source, meeting compliance requirements of industries such as finance, medical industry and the like, improving interaction convenience and system intelligence according to need, precisely positioning and timely responding, enabling a user to trigger an audio backtracking request through clicking an icon associated with a visual node, enabling the system to automatically position and backtrack back, enabling the system to be in a long time period, and continuously searching and enabling the original resource to be occupied by the system to be occupied by the active resource allocation when the system is required to be optimized. Meanwhile, the terminal side audio buffer is matched with an automatic cleaning mechanism (such as reserving the content of the last 2 hours), the storage space is saved to the greatest extent on the premise of ensuring the availability of functions, and the multi-mode interaction fusion is that when the audio is played, the visual interface is used for synchronously highlighting the associated text nodes, the operations such as dragging a progress bar and playing at double speed are supported, so that a closed loop experience of word positioning, audio verification and focus linkage is formed, and the reliability and examination convenience of conference records are enhanced.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIALINTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-On-y Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a system for generating a real-time summary frame of a meeting, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the conference real-time summary frame generation system 200 according to the embodiment of the present application includes:

The transmitter 210 and the receiver 220, the transmitter 210 is composed of a sound recording module 211, an ASR module 212 and a transmitting-end transceiver module 213, the receiver 220 is composed of a receiving-end transceiver module 221, an LLM module 222, a graphic engine 223 and a connection module 224, wherein:

the recording module 211 is used for collecting current voice signals of the participants in real time;

The ASR module 212 is configured to perform a real-time transcription operation on the current speech signal to obtain current speech text data;

a transmitting-end transceiver module 213 for transmitting the current voice text data to the receiving-end transceiver module 221;

The LLM module 222 is configured to perform a structuring processing operation on the current voice text data received by the receiving-end transceiver module 221, so as to obtain a hierarchical logic framework;

the graphic engine 223 is used for performing visualization processing operation on the hierarchical logic framework to obtain visualization data;

the connection module 224 is configured to transmit the visual data to the display device for display.

In the embodiment of the application, a conference real-time summary frame generating system 200 is provided, which comprises a transmitter 210 and a receiver 220, wherein the transmitter 210 comprises a recording module 211, an ASR module 212 and a transmitting-end transceiving module 213, the receiver 220 comprises a receiving-end transceiving module 221, an LLM module 222, a graphic engine 223 and a connecting module 224, the recording module 211 is used for collecting current voice signals of participants in real time, the ASR module 212 is used for carrying out real-time transcription operation on the current voice signals to obtain current voice text data, the transmitting-end transceiving module 213 is used for sending the current voice text data to the receiving-end transceiving module 221, the LLM module 222 is used for carrying out structural processing operation on the current voice text data received by the receiving-end transceiving module 221 to obtain a hierarchical logic frame, the graphic engine 223 is used for carrying out visual processing operation on the hierarchical logic frame to obtain visual data, and the connecting module 224 is used for transmitting the visual data to a display device to display. Compared with the prior art, the method and the system have the advantages that the conference speaking content is processed in real time and the visual framework is dynamically generated, so that participants can visually see the structural discussion venation and the core conclusion in the discussion process without waiting for post-conference manual arrangement, errors or supplement omission can be corrected immediately, information synchronization can be ensured by backtracking historical speaking at any time, meanwhile, the conference focusing key issues are guided by means of dynamic visual logic presentation, and the cooperative efficiency and the decision accuracy are remarkably improved.

In some optional implementations of embodiments of the present application, the ASR module includes:

And the voice text caching sub-module is used for caching the original audio fragment corresponding to the current voice text data to the sender database and constructing an index identifier corresponding to the current voice text data.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to an embodiment of the present application.

The computer device 300 includes a memory 310, a processor 320, and a network interface 330 communicatively coupled to each other via a system bus. It should be noted that only computer device 300 having components 310-330 is shown in the figures, but it should be understood that not all of the illustrated components need be implemented, and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing according to predetermined or stored instructions, and the hardware thereof includes, but is not limited to, microprocessors, application specific integrated circuits (APPLI CAT I on SpecificIntegrated Circuit, ASIC), programmable gate arrays (Field-Programmable GATE AR RAY, FPGA), digital processors (DIGITAL SIGNAL processors, DSPs), embedded devices, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 310 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 310 may be an internal storage unit of the computer device 300, such as a hard disk or a memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 300. Of course, the memory 310 may also include both internal storage units and external storage devices of the computer device 300. In the embodiment of the present application, the memory 310 is generally used to store an operating system and various application software installed on the computer device 300, such as computer readable instructions of a meeting real-time summary frame generating method. In addition, the memory 310 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 320 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 320 is generally used to control the overall operation of the computer device 300. In an embodiment of the present application, the processor 320 is configured to execute computer readable instructions stored in the memory 310 or process data, for example, execute computer readable instructions of the method for generating a real-time summary frame of a meeting.

The network interface 330 may include a wireless network interface or a wired network interface, the network interface 330 typically being used to establish communication connections between the computer device 300 and other electronic devices.

According to the computer equipment provided by the application, the conference speaking content is processed in real time and the visual frame is dynamically generated, so that participants can visually see the structural discussion venation and the core conclusion in the discussion process without waiting for post-conference manual arrangement, errors or supplementary omission can be corrected immediately, information synchronization can be ensured by backtracking historical speaking at any time, meanwhile, the conference focusing key issue is guided by means of dynamic visual logic presentation, and the cooperative efficiency and decision accuracy are obviously improved.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the method for generating a meeting real-time summary frame as described above.

The computer readable storage medium provided by the application can enable a participant to visually see the structural discussion venation and the core conclusion in the discussion process by processing the conference speaking content in real time and dynamically generating the visual frame, and does not need to wait for post-conference manual arrangement, so that errors can be corrected immediately or omission can be supplemented, information synchronization can be ensured by backtracking historical speaking at any time, and meanwhile, the conference focusing key issues are guided by means of dynamic visual logic presentation, so that the cooperative efficiency and the decision accuracy are obviously improved.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. The method is characterized by being applied to a wireless microphone, wherein the wireless microphone comprises a transmitter and a receiver, the transmitter comprises a sound recording module, an ASR module and a transmitting-end receiving-transmitting module, and the receiver comprises a receiving-end receiving-transmitting module, an LLM module, a graphic engine and a connecting module, and the method comprises the following steps:

2. The method for generating a real-time summary frame of a conference according to claim 1, wherein the step of performing real-time transcription operation on the current speech signal according to the ASR module to obtain current speech text data specifically comprises the following steps:

3. The method for generating a real-time summary frame for a conference according to claim 2, further comprising, after the step of caching an original audio clip corresponding to the current voice text data in a sender database and constructing an index identifier corresponding to the current voice text data, the steps of:

4. The method for generating a real-time summary frame of a conference according to claim 1, wherein the step of performing a visualization processing operation on the hierarchical logic frame according to the graphic engine to obtain visualized data specifically comprises the following steps:

judging whether a history hierarchical logic framework exists or not;

5. The method for generating a real-time summary frame for a conference according to claim 1, further comprising, after the step of performing a visualization processing operation on the hierarchical logical frame according to the graphic engine to obtain visualized data, the steps of:

6. The method for generating a real-time summary frame for a conference according to claim 5, further comprising, after said step of transmitting said visual data to a display device for presentation according to said connection module, the steps of:

the receiver outputs the audio clip through a speaker of the display device.

7. A conference real-time summary framework generation system, the system comprising:

8. The meeting real-time summary framework generation system of claim 7, wherein the ASR module comprises:

9. A computer device comprising a memory and a processor, wherein the memory has stored therein computer readable instructions which when executed by the processor implement the steps of the meeting real-time summary frame generation method of any one of claims 1 to 6.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the meeting real-time summary frame generation method of any of claims 1 to 6.