CN115209083B

CN115209083B - A multi-video conference collaborative meeting method, terminal and storage medium

Info

Publication number: CN115209083B
Application number: CN202111151556.7A
Authority: CN
Inventors: 廖毓功
Original assignee: Shenzhen Kelp Intelligent Co ltd
Current assignee: Shenzhen Kelp Intelligent Co ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2025-07-11
Anticipated expiration: 2041-09-29
Also published as: CN115209083A

Abstract

The present invention proposes a method, terminal and storage medium for collaborative conferencing of multiple video conferences. The method includes: obtaining voice information respectively input by multiple terminal devices; selecting the terminal device with the largest decibel among the multiple voice information as the target terminal device; closing the voice input of other terminal devices except the target terminal device, deploying multiple video conferencing terminal devices according to the size of the conference room, and when initializing the deployment of the video conferencing terminal devices, the video conferencing terminals in the same conference room are divided into the same conference group. When the meeting is in progress, the video conferencing terminal devices in the same group send the collected video data to the local video conferencing software, and the video data is sent to the video conferencing server after image splicing processing, so that other participants can understand the current meeting situation, so as to achieve the best audio and video effects in the same conference room and improve the efficiency of video conferencing.

Description

Multi-video conference collaborative meeting method, terminal and storage medium

Technical Field

The present invention relates to the field of video conferencing technologies, and in particular, to a multi-video conference collaborative meeting method, a terminal, and a storage medium.

Background

The multiparty video conference technology is a multimedia communication technology which enables people at different places to realize real-time, visual and interactive video communication through a certain transmission medium. The system can transmit various information such as static and dynamic images, voice, characters and the like of the participants to the terminal equipment of other participants through various conventional communication transmission media, so that geographically dispersed participants can communicate information in various modes such as videos, images, sounds and the like, and users can participate in the same conference as if they were on the spot.

In general, in a multiparty video conference system, a server collects videos transmitted from each client, and then integrates the videos transmitted from all the clients in the conference and transmits the integrated videos to each client of the conference, thereby implementing a multiparty video conference.

But currently only one video conference terminal will typically be placed in one conference room in a multiparty video conference system. However, only one video conference terminal has poor reception effect for conference participants far away from the conference terminal, and the situation of dragging the conference terminal often occurs. If a plurality of conference terminals are deployed, the situation of repeated pictures and disordered voice intersection occurs, and the following two problems often occur:

1. Frequent dragging to a meeting terminal is time-consuming and laborious, and larger noise and picture shaking can be generated when the meeting terminal is dragged to the meeting terminal, and hearing and looking of other meeting parties can be affected.

2. The conference terminal equipment with better replacement effect can cause cost rise, and the field angle of a single conference terminal is limited, and all conference participants can not be brought into the picture, so that the situation that only the conference participants can hear the conference without seeing the conference participants can occur.

Disclosure of Invention

In order to solve the problems, the invention provides a multi-video conference collaborative meeting method, a terminal and a storage medium.

The invention is realized by the following technical scheme:

the invention provides a multi-video conference collaborative meeting method, which comprises the following steps:

acquiring voice information respectively input by a plurality of terminal devices;

Selecting the terminal equipment with the largest decibel in the voice information as target terminal equipment;

and closing voice input of other terminal devices except the target terminal device.

Further, before obtaining the voice information respectively input by the plurality of terminal devices, the method further includes:

acquiring a physical address of each terminal and generating coding information according to the physical address;

The obtaining the voice information respectively input by the plurality of terminal devices specifically comprises the following steps:

The acquired voice information and the coding information are in one-to-one correspondence.

Further, after the voice input of the other terminal devices except the target terminal device is closed, the method further comprises the following steps:

Recording the speaking times when different terminal devices are used as target terminal devices;

Recording the number of speaking persons when different terminal devices are used as target terminal devices;

and recording the speaking time length when the different terminal equipment is used as the target terminal equipment.

Further, after recording the speaking duration when the different terminal device is used as the target terminal device, the method further includes:

The resulting data is analyzed and weights are assigned to the plurality of terminal devices.

Judging whether the terminal equipment receives voice information or not;

If not, repeatedly executing and judging whether the terminal equipment receives the voice information;

if yes, the decibel value of the voice information is obtained.

Further, recording the number of utterances when the different terminal device is the target terminal device includes:

Judging whether other terminal equipment inputs voice information or not;

if not, the number of speaking times of the terminal equipment is increased by one;

if yes, the number of speaking times of the terminal equipment is unchanged.

Further, recording the number of talkers when different terminal devices are used as target terminal devices includes:

judging whether the same person speaks;

If not, the number of the speaking persons of the terminal equipment is increased by one;

if yes, the number of the speaking persons of the terminal equipment is unchanged.

Further, recording the speaking duration of the different terminal device as the target terminal device includes:

Judging whether the speech of the same person is ended;

if not, recording the accumulated speaking time of the terminal equipment;

if yes, continuing to accumulate the speaking time of the terminal equipment.

A terminal comprising a memory, a processor and a multi-video conference collaboration session program stored on the memory and running on the processor, which when executed by the processor, implements the multi-video conference collaboration session method of any one of claims 1 to 8.

A storage medium storing a multi-video conference collaboration session program which, when executed by a processor, implements a multi-video conference collaboration session program as claimed in any one of claims 1 to 8.

The invention has the beneficial effects that:

the invention provides a multi-video conference collaborative meeting method, a terminal and a storage medium. A plurality of video conference terminal devices are deployed according to conference room size. When the video conference terminal equipment is initialized and deployed, the video conference terminals in the same conference room are divided into the same conference group. When a meeting is taken, the video conference terminal equipment in the same group sends the collected video data to the local video conference software through picture splicing processing on the video data and then sends the video data to the video conference server. And the video conference server sends the video conference software to video conference software of other conference participants. Conference terminals in the same group can only play sound by one conference terminal at the same time, and the conference terminals needing to play sound can be dynamically selected according to the speaking quantity detected by each conference terminal. In terms of pickup, all video conference terminal devices in the same group select the nearest video conference terminal device for pickup according to sound source localization. Whether simultaneous speaking is supported in the same packet can be selected by configuration. If simultaneous speaking is not supported, when one video conference terminal detects speaking, microphones of other conference terminals in the group are automatically turned off. And according to the sound source direction of the conference terminal which detects the speaking, notifying local conference software to mark the interested region in the video picture and notifying other conference participants, so that the other conference participants can know the current conference situation conveniently. Through above scheme, can realize that the audio-visual effect reaches the best in the same meeting room, video conferencing efficiency is more excellent, has following three advantage:

1. And the meeting efficiency of the video conference is improved.

2. And improving the audio and video quality in the video conference.

3. The participation of the meeting participants in the video conference is improved.

Drawings

FIG. 1 is a schematic diagram of a single conference room deployment networking scheme of the present invention;

FIG. 2 is a schematic diagram of a multiparty conferencing application networking scheme in accordance with the present invention;

Fig. 3 is a flow chart of a video conference terminal device joining and status services of the present invention;

fig. 4 is a flowchart of the video conference software of the present invention selecting a video conference terminal as a sound pickup device;

Fig. 5 is a flow chart of the video conferencing software selection speaker device of the present invention;

fig. 6 is a schematic view of the internal environment of the terminal according to the present invention.

Detailed Description

In order to more clearly and completely describe the technical scheme of the invention, the invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 6, the present invention provides a multi-video conference collaborative meeting method, a terminal and a storage medium, and the present invention provides a multi-video conference collaborative meeting method, which includes:

Judging whether the terminal equipment receives voice information or not;

if yes, the decibel value of the voice information is obtained.

Judging whether other terminal equipment inputs voice information or not;

if yes, the number of speaking times of the terminal equipment is unchanged.

judging whether the same person speaks;

Judging whether the speech of the same person is ended;

if not, recording the accumulated speaking time of the terminal equipment;

if yes, continuing to accumulate the speaking time of the terminal equipment.

In this embodiment, a corresponding number of video conference terminal devices are deployed according to the size of the conference room. For convenience of explanation, this scheme uses 3 video terminal devices. The numbers are MT-A, MT-B, MT-C respectively. Specific details of the scheme are set forth below.

In this solution, in addition to configuring the video conference terminal device, a display and a host are required to be configured in the conference room. For example, a conference tablet (host display in one device) may be used, as may a conventional host computer and display (two devices with separate host displays). Corresponding conference software is required to be installed in the host computer, and can process the audio and video to fulfill the aim of multiparty video meeting.

The scheme is divided into the following 4 stages under the aim of realizing multiparty video conference. The method comprises a deployment stage, a conference preparation stage, a conference proceeding stage and a conference ending stage. The specific tasks performed at each stage are described separately below.

1. Deployment phase

1. And accessing the video conference terminal equipment to a video conference host through a USB cable. The Video conference terminal equipment is accessed through a USB Video Class/USB Audio Class protocol. And when the video conference host detects the accessed video conference terminal, respectively numbering the devices. The numbering rule is MT-X, wherein X is numbered from the capital letter A, and the maximum is Z, and the numbering rule can be expanded according to practical situations. For example, the number of the video conference terminal device of the first access video conference host is MT-A, the number of the video conference terminal device of the second access video conference host is MT-B, the number of the video conference terminal device of the third access video conference host is MT-C, and so on.

2. After the video conference terminal equipment is accessed to the video conference host, the MAC address of the equipment is required to be reported through the USB equipment descriptor besides the equipment name and the equipment version, and the MAC address is used for uniquely identifying the equipment.

3. When the video conference software runs, the video conference terminal equipment which is currently existing is automatically detected, and the video conference terminal equipment is written into a database for storage according to the generated number. If the original equipment (judged according to the MAC address) is not present in the detection, the equipment is set to be in an off-line state, if the original equipment is present in the detection, the equipment is set to be in an on-line state, if the equipment is a newly added equipment, the equipment is written into a database to be stored and set to be in the on-line state, and the storage state of the equipment in the database is as follows:

ID	Device numbering	Device name	Device MAC	Device type	Status of
						1	MT-A	MeettingTerminal-A	11-22-33-44-55-66	Audio and video	Online line
2	MT-B	MeettingTerminal-B	22-33-44-55-66-77	Video frequency	Online line
						3	MT-C	MeettingTerminal-C	33-44-55-66-77-88	Audio and video	Online line
4	MT-D	MeettingTerminal-D	44-55-66-77-88-99	Audio frequency	Offline

4. When the video conference software runs, a unique UUID is generated according to hardware information such as the MAC address, the CPU ID and the like of the equipment where the software runs. All the video conference terminal devices connected to the current video conference host are automatically divided into groups corresponding to the current UUID.

2. Preparation stage

1. After the video conference software is accessed to the cloud video conference server, multiparty video conferences can be performed.

2. When the multiparty conference is ready to start, the video conference software will report the grouping information of the current video conference software to the video conference server. And meanwhile, the local video conference software collects the images which support the video collection function according to the equipment in the current grouping information, and performs scene splicing. The specific scene splicing technology is the prior art, and the scheme is not repeated in detail.

3. And the video conference server is used for storing the conference ID according to the reported grouping information of each conference room in association with the created conference ID, so that follow-up tracking of conference information and knowing of conference progress are facilitated. The grouping information of each conference room only comprises video conference terminal equipment participating in the conference, and the off-line or non-participating video conference terminal equipment cannot be reported to a video conference server.

4. When the packet ID is already present on the video conference server, the original packet information is automatically updated, and otherwise, the packet information is newly added.

5. And after the connection is established between the conference host and the video conference server, entering a conference proceeding stage. And the conference host invites the meeting participant information added into the conference, and the grouping information is automatically associated to the corresponding conference ID. And records the respective joining meeting time and leaving meeting time.

6. The relevant information of the conference stored on the video conference server is as follows:

7. the packet information stored on the videoconference server is as follows:

Packet ID	Device MAC	Device type
			f46e5e53-00a3-4ad0-bef7-91d2695d7049	11-22-33-44-55-66	Audio and video
f46e5e53-00a3-4ad0-bef7-91d2695d7049	22-33-44-55-66-77	Video frequency
			f46e5e53-00a3-4ad0-bef7-91d2695d7049	33-44-55-66-77-88	Audio and video
0b7bd4a6-7129-4bef-91ea-b34ac27ff68a	aa-bb-cc-dd-ee-ff	Audio and video
			0b7bd4a6-7129-4bef-91ea-b34ac27ff68a	bb-cc-dd-ee-ff-gg	Audio frequency
01aa2db1-c4a0-4d5f-a32b-de81b0666e3f	1a-2b-3c-4d-5e-6f	Audio and video
			00e7dd8d-aeb4-4047-8109-3842ad560eda	zz-yy-xx-ww-vv-uu	Audio and video

3. Stage of proceeding

1. When the video conference server detects that other meeting participants join the conference, the video conference server notifies the meeting participants and the host to process the audio and video data and send the processed audio and video data to the video conference server.

2. The video conference server tracks the video conference software that the received video data is sent to other participants except the video source, and is used for displaying video pictures and voice.

3. Each video information stored on the video conference server is as follows.

4. Each video forwarding information stored on the video conference server is as follows.

StreamID	Destination(s)
		bPKpxbcc7Z	0b7bd4a6-7129-4bef-91ea-b34ac27ff68a
bPKpxbcc7Z	01aa2db1-c4a0-4d5f-a32b-de81b0666e3f
		bPKpxbcc7Z	00e7dd8d-aeb4-4047-8109-3842ad560eda
Xj3muLZTVd	f46e5e53-00a3-4ad0-bef7-91d2695d7049
		Xj3muLZTVd	01aa2db1-c4a0-4d5f-a32b-de81b0666e3f
Xj3muLZTVd	00e7dd8d-aeb4-4047-8109-3842ad560eda
		LRaQCsrDm8	f46e5e53-00a3-4ad0-bef7-91d2695d7049
LRaQCsrDm8	0b7bd4a6-7129-4bef-91ea-b34ac27ff68a
		LRaQCsrDm8	00e7dd8d-aeb4-4047-8109-3842ad560eda
o8OFRSEwbw	f46e5e53-00a3-4ad0-bef7-91d2695d7049
		o8OFRSEwbw	0b7bd4a6-7129-4bef-91ea-b34ac27ff68a
o8OFRSEwbw	01aa2db1-c4a0-4d5f-a32b-de81b0666e3f

5. A default videoconference terminal is initially set as a sound input device and a sound output device, e.g. the first detected videoconference terminal device is set as a sound input device and a sound output device, and the other videoconference terminal devices turn off the pick-up and loudspeaker functions. In the meeting process, if each video conference terminal detects sound input, the energy of each detected sound is counted and expressed in dB. If the plurality of terminals detect sound, the video conference terminal with the largest energy is used for picking up sound, and other microphones are used for closing the microphone. Assuming that three video conference terminal devices are currently available, since only two devices support microphones, only two microphones have a pickup effect, and sound can be detected. The detected sound sizes of the two devices are MT-A56 dB and MT-C48 dB respectively. When the video terminal device delivers the detected sound energy level to the video conferencing software, the video conferencing software tracks the sound energy level and selects MT-a as the pickup microphone, and MT-C turns off the microphone. And at the same time, the MT-C is notified to turn off pickup. And counts the times of each video conference terminal device as a pickup microphone and the times of microphones with different tone colors. At the same time, only one microphone can be used for picking up sound, and other microphones are in a closed state, so that the phenomenon of echo is avoided.

6. When the other microphones are turned off, it is ensured that one microphone can pick up sound normally. If the microphone with normal pickup is abnormal, the video conference software needs to be informed in time, and other video conference terminal equipment is selected to be started for pickup, so that the situation of sound loss is prevented. According to the scheme, multiple video conference terminal devices can be adopted to pick up sound simultaneously, and the sound with the largest energy is selected on video conference software according to the sound energy and transmitted to the video conference server. Meanwhile, the sound can be overlapped and eliminated, and the echo effect is prevented.

7. When the video conference terminal device of the main pickup detects tone color change (namely different speakers), the video conference software needs to be notified in time. The video conference software informs other video conference terminal devices to timely turn on the microphone to pick up sound, and the processing procedure of the step 5 is repeated. At a certain moment, the information of each conference terminal stored by the video conference software is as follows:

Device numbering	Device MAC	Number of utterances	Speaking person times	Speaking time (minutes)
					MT-A	11-22-33-44-55-66	3	5	20
MT-C	33-44-55-66-77-88	1	3	32

8. When the video conference software receives the audio and video data of other meeting parties sent by the video conference server, the video data is decoded and then output to a display. The audio data is weighted according to the number of utterances and the number of utterances on each video conference terminal device recorded on the video conference software, or the like, or the device for playing sound is selected according to a policy. The strategy mode which is not used can be selected according to the priority of the speaking times, the priority of the speaking persons, the priority of the speaking duration and the priority of the weighted calculation. The weighting calculation method can calculate the final weight w=αn times+βn times+λt time according to a calculation formula. The coefficients α, β, λ may be set according to the actual situation, and α+β+λ=1. Assuming that the selection is based on the speaker sub-priority policy, according to the currently stored data, the MT-a video conference terminal device should be selected as a speaker device to play the sound.

9. And acquiring the approximate azimuth information of the current speaker according to a voice algorithm-sound source localization algorithm. The azimuth information is sent to the video conference software, and the video conference software automatically marks the approximate area of the current speaker when performing image processing. Image special processing can be performed on the basis. For example, the speaker screen may be automatically centered, digital magnification may be performed, or camera orientation may be adjusted by performing pan/tilt control in combination with the camera configuration.

4. Conference end phase

1. When the video conference software of the participants exits the video conference, the video conference server is notified. When the video conference server receives the exit message, the current conference information is updated, the conference participants exiting the conference are deleted, and the audio and video data of other conference parties are canceled from being sent to the conference participants exiting the conference.

2. When the last participant exits the conference, the corresponding video conference is ended on the video conference server, and occupied resources and the like are released.

Based on the multi-video conference collaborative meeting method, the invention also discloses a terminal 1, wherein the terminal 1 comprises a memory 2, a processor 3 and a multi-video conference collaborative meeting program 4 stored in the memory and running on the processor, and the multi-video conference collaborative meeting method is realized when the multi-video conference collaborative meeting program 4 is executed by the processor 3.

Based on the multi-video conference collaborative meeting method, the invention also discloses a storage medium, wherein the storage medium stores a multi-video conference collaborative meeting program 4, and the multi-video conference collaborative meeting method is realized when the multi-video conference collaborative meeting program 4 is executed by the processor 3.

Of course, the present invention can be implemented in various other embodiments, and based on this embodiment, those skilled in the art can obtain other embodiments without any inventive effort, which fall within the scope of the present invention.

Claims

1. A method for collaborative conferencing of multiple video conferences, characterized in that the method comprises:

Acquire voice information respectively input by multiple terminal devices, select the terminal device with the loudest decibel among the multiple voice information as the target terminal device as the pickup microphone, and turn off the voice input of other terminal devices except the target terminal device as the pickup microphone;

When the target terminal device that is the pickup microphone detects a change in the timbre, it notifies the video conferencing software, which then notifies the other terminal devices to turn on their microphones to pick up the sound, and repeats the previous step;

The number of speeches when different terminal devices are used as target terminal devices, the number of speakers when different terminal devices are used as target terminal devices, and the length of speeches when different terminal devices are used as target terminal devices are recorded; weights are assigned to the number of speeches, the number of speakers, and the length of speeches, so as to select the terminal device as the speaker device for externally playing the audio data of other participants sent by the video conferencing server according to the weights.

2. The method for coordinating multiple video conferences according to claim 1, wherein before acquiring the voice information respectively input by the multiple terminal devices, the method further comprises:

Obtaining the physical address of each terminal and generating encoding information according to the physical address;

Acquiring voice information inputted by multiple terminal devices specifically includes:

Make the acquired voice information correspond one to one with the coding information.

3. The method for coordinating multiple video conferences according to claim 1, wherein before acquiring the voice information respectively input by the multiple terminal devices, the method further comprises:

Determine whether the terminal device receives the voice information;

If not, repeat the process of determining whether the terminal device receives the voice information;

If yes, the decibel value of the voice information is obtained.

4. The method for coordinating multiple video conferences according to claim 1, wherein recording the number of times different terminal devices speak when serving as target terminal devices comprises:

Determine whether there is other terminal device for inputting voice information;

If not, the number of times the terminal device speaks is increased by one;

If so, the speaking times of the terminal device remain unchanged.

5. The method for coordinating multiple video conferences according to claim 1, wherein recording the number of speakers when different terminal devices are used as target terminal devices comprises:

Determine whether the speaker is the same person;

If not, the number of speakers of the terminal device is increased by one;

If so, the number of speakers of the terminal device remains unchanged.

6. The method for coordinating multiple video conferences according to claim 1, wherein recording the speaking time of different terminal devices as target terminal devices comprises:

Determine whether the same person has finished speaking;

If not, recording the cumulative speaking time of the terminal device;

If yes, continue to accumulate the speaking time of the terminal device.

7. A terminal, characterized in that the terminal comprises: a memory, a processor, and a multi-video conference collaborative conferencing program stored in the memory and running on the processor, wherein when the multi-video conference collaborative conferencing program is executed by the processor, the multi-video conference collaborative conferencing method as described in any one of claims 1 to 6 is implemented.

8. A storage medium, characterized in that the storage medium stores a multi-video conference collaborative conferencing program, and when the multi-video conference collaborative conferencing program is executed by a processor, the multi-video conference collaborative conferencing method according to any one of claims 1 to 6 is implemented.