CN112258912B

CN112258912B - Network interactive teaching method, device, computer equipment and storage medium

Info

Publication number: CN112258912B
Application number: CN202011078978.1A
Authority: CN
Inventors: 陈志亮
Original assignee: Ifreecomm Technology Co ltd
Current assignee: Ifreecomm Technology Co ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2022-08-16
Anticipated expiration: 2040-10-10
Also published as: CN112258912A

Abstract

The application relates to a network interactive teaching method, a network interactive teaching device, computer equipment and a storage medium. The method comprises the following steps: in the network interactive teaching process, receiving the class data of a listening party, the video data of a calling party and the audio data of the calling party which are pushed by a plurality of listening terminals; decoding the classroom data of the listening and speaking party, and playing the decoded data; preprocessing the video data and the audio data of the main speaker; pushing the class data of the main speaker obtained after the preprocessing to each listening and speaking terminal; when receiving a triggering operation of an interaction mode in the data playing process, determining a target listening and speaking terminal according to the triggering operation; and encapsulating the speaker classroom data corresponding to the target speaker terminal and the preprocessed main speaker classroom data, and distributing the classroom data obtained after encapsulation to other speaker terminals so that the other speaker terminals play the classroom data obtained after encapsulation. By adopting the method, the network interactive teaching efficiency can be improved.

Description

Network interactive teaching method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for network interactive teaching, a computer device, and a storage medium.

Background

With the development of internet technology, the network audio and video technology is widely applied to a plurality of fields such as shopping, entertainment, education and the like. The network interactive teaching is an important application in the field of education, educational resources can be shared and propagated, interaction between teachers and students in the teaching process is achieved, and teaching efficiency is improved. In the traditional mode, network interactive teaching is realized through a video conference system, in the interactive discussion process, after a main speaking terminal in the video conference system receives multi-channel video data and multi-channel leading audio data pushed by a listening and speaking terminal, multi-picture coding is required to be carried out on the multi-channel video data, audio mixing processing is carried out on the multi-channel audio data, and then the processed video data and audio data are pushed to the listening and speaking terminal.

However, the multi-picture coding and audio data mixing process takes much time, which results in higher delay of the network interactive teaching and lower efficiency of the network interactive teaching.

Disclosure of Invention

In view of the above, it is desirable to provide a method, an apparatus, a computer device and a storage medium for interactive network education, which can improve the efficiency of interactive network education by reducing the delay of interactive network education.

A method for interactive network instruction, the method comprising:

in the network interactive teaching process, receiving the class data of a listening and speaking party, the video data of a main speaking party and the audio data of the main speaking party which are pushed by a plurality of listening and speaking terminals;

decoding the classroom data of the listening and speaking party, and playing the decoded data;

preprocessing the main speaker video data and the main speaker audio data to obtain preprocessed main speaker classroom data;

pushing the preprocessed main speaker classroom data to each listening and speaking terminal so that each listening and speaking terminal plays the preprocessed main speaker classroom data;

when a triggering operation of an interactive mode is received in the data playing process, determining a target listening and speaking terminal according to the triggering operation, and taking the listening and speaking terminals except the target listening and speaking terminal in a plurality of listening and speaking terminals as other listening and speaking terminals;

and encapsulating the speaker classroom data corresponding to the target speaker terminal and the preprocessed main speaker classroom data, and distributing the classroom data obtained after encapsulation to other speaker terminals so that the other speaker terminals play the classroom data obtained after encapsulation.

In one embodiment, the method further comprises:

receiving a tracking instruction in the network interactive teaching process, wherein the tracking instruction carries a teaching mode tracking identifier;

determining a target teaching mode according to the teaching mode tracking identification, and acquiring target speaker video data and target speaker audio data corresponding to the target teaching mode;

and switching the video data of the current speaker to the video data of the target speaker and switching the audio data of the current speaker to the audio data of the target speaker according to the tracking instruction.

In one embodiment, the decoding the lecturer classroom data, and playing the decoded data includes:

decoding the classroom data of the listening and speaking party to obtain target video data pushed by each listening and speaking terminal and target audio data pushed by each listening and speaking terminal;

splicing target video data pushed by a plurality of listening and speaking terminals, and displaying a plurality of classroom pictures obtained by splicing;

and performing audio mixing processing on the target audio data pushed by the plurality of listening and speaking terminals, and playing the audio data after the audio mixing processing.

In one embodiment, the preprocessing the video data of the main speaker and the audio data of the main speaker to obtain preprocessed classroom data of the main speaker includes:

encoding the audio data of the main speaker to obtain encoded audio data of the main speaker;

carrying out synchronous processing on the coded main speaker audio data and the main speaker video data to obtain synchronized main speaker audio data and synchronized main speaker video data;

and packaging the synchronized audio data of the main speaker and the synchronized video data of the main speaker, and taking the audio data obtained after packaging as the preprocessed classroom data of the main speaker.

In one embodiment, the plurality of listening and speaking terminals are listening and speaking terminals after the class state is updated, and before receiving the listening and speaking party class data, the speaker video data and the speaker audio data pushed by the plurality of listening and speaking terminals in the network interactive teaching process, the method further includes:

when prompt information which is ready to be finished is received, determining a prompt terminal according to the prompt information, and updating the class state of the prompt terminal to a ready state;

and after the class state of the prompt terminal is updated, receiving the class data of the listening and speaking party pushed by the prompt terminal.

In one embodiment, the method further comprises:

after the class state of the prompt terminal is updated, sending a heartbeat packet to the prompt terminal according to a first time interval;

when the response information of the prompting terminal is not received within a preset time period, marking the class state of the prompting terminal as a ready state;

and sending a lesson instruction to the prompt terminal according to a second preset time interval until the prompt information which is returned by the prompt terminal and is ready to finish is received.

An interactive network instruction device, the device comprising:

the first communication module is used for receiving the classroom data of the listening and speaking party pushed by a plurality of listening and speaking terminals in the network interactive teaching process;

the second communication module is used for receiving the video data and the audio data of the main speaker;

the decoding module is used for decoding the classroom data of the audiences and the interphones and playing the decoded data;

the preprocessing module is used for preprocessing the video data of the main speaker and the audio data of the main speaker to obtain preprocessed classroom data of the main speaker;

the first communication module is further configured to push the preprocessed speaker class data to each listening and speaking terminal, so that each listening and speaking terminal plays the preprocessed speaker class data;

the determining module is used for determining a target listening and speaking terminal according to the triggering operation when the triggering operation of the interactive mode is received in the data playing process, and taking the listening and speaking terminals except the target listening and speaking terminal in the plurality of listening and speaking terminals as other listening and speaking terminals;

the first communication module is further configured to encapsulate the lecturer classroom data corresponding to the target lecturer terminal and the preprocessed main lecturer classroom data, and distribute the classroom data obtained after encapsulation to the other lecturer terminals, so that the other lecturer terminals play the classroom data obtained after encapsulation.

In one embodiment, the apparatus further comprises:

the second communication module is also used for receiving a tracking instruction in the network interactive teaching process, and the tracking instruction carries a teaching mode tracking identifier;

the first communication module is further used for determining a target teaching mode according to the teaching mode tracking identifier, and acquiring target speaker video data and target speaker audio data corresponding to the target teaching mode;

and the switching module is used for switching the video data of the current main speaker to the video data of the target main speaker according to the tracking instruction and switching the audio data of the current main speaker to the audio data of the target main speaker.

A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps in the various method embodiments described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the respective method embodiments described above.

According to the network interactive teaching method, the network interactive teaching device, the computer equipment and the storage medium, in the network interactive teaching process, the lecturer classroom data, the speaker video data and the speaker audio data which are pushed by the plurality of lecturer terminals are received, the lecturer classroom data are decoded, and the decoded data are played. The teacher can watch the panoramic images of the students in the listening and speaking classrooms corresponding to the listening and speaking terminals in the course of lecturing, and can know the participation situation of the listening and speaking classrooms in time. The method comprises the steps of preprocessing video data and audio data of a main speaker to obtain preprocessed classroom data of the main speaker, pushing the preprocessed classroom data of the main speaker to each listening terminal, and enabling each listening terminal to learn in classroom. When the triggering operation of the interactive mode is received in the data playing process, the target listening and speaking terminal is determined according to the triggering operation, the main speaking terminal can directly encapsulate the listening and speaking party classroom data corresponding to the target listening and speaking terminal and the preprocessed main speaking party classroom data, and the classroom data obtained after encapsulation is distributed to other listening and speaking terminals, so that students in the listening and speaking classroom can watch the main speaking teacher corresponding to the main speaking terminal and the video data in the target listening and speaking classroom corresponding to the target listening and speaking terminal in the interactive discussion process and listen to the audio data in the interactive discussion process. The main speaking terminal does not need to carry out multi-picture coding and sound mixing processing on the speaking party class data corresponding to the target listening and speaking terminal and the preprocessed main speaking party class data before data transmission, so that the one-time decoding and coding process is reduced, and the time delay caused by multi-picture coding and sound mixing processing is reduced. Meanwhile, after receiving the speaker class data corresponding to the target speaker terminal and the preprocessed main speaker class data, the speaker terminal performs decoding, multi-picture splicing and sound mixing processing, so that the resource occupation space between the main speaker terminal and the speaker terminal can be balanced, the main speaker terminal and the speaker terminal do not need to have strong coding capacity, only multi-channel decoding capacity is needed, and the hardware cost in network interactive teaching is effectively reduced.

Drawings

FIG. 1 is a diagram of an application environment of a network interactive teaching method according to an embodiment;

FIG. 2 is a flow chart illustrating a method for interactive network instruction in one embodiment;

FIG. 3 is a flowchart illustrating the teaching mode switching step according to the tracking command in one embodiment;

FIG. 4 is a block diagram of an embodiment of a networked interactive teaching device;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The network interactive teaching method provided by the application can be applied to the application environment shown in fig. 1. The main terminal 102 and the plurality of listening and speaking terminals 104 communicate with each other through a network. The main terminal 102 communicates with the first video capture device 106 and the second video data capture device 108 over a network. The intercom terminal 102 communicates with the first audio capture device 110 through an audio port. The first video capture device 106 may be a plurality, the second video data capture device 108 may be a plurality, and the first audio capture device 110 may be a plurality. Each of the listening and speaking terminals 104 communicates with a corresponding third video capture device 112 and a corresponding second audio capture device 114 over a network. The first video capture device and the third video capture device may be, but are not limited to, a webcam. The second video capture device may be, but is not limited to, various personal computers, laptops, tablets. The first audio capturing device and the second audio capturing device may be a microphone, an audio processor, or the like. In the network interactive teaching process, the third video capture device 112 transmits the captured video data of the listening and speaking party to the listening and speaking party terminal 104, and the second audio capture device 114 transmits the captured audio data of the listening and speaking party to the listening and speaking party terminal 104. After the encoded video data and the audio data of the listening and speaking party are synchronized and encapsulated, the listening and speaking party terminal 104 pushes the classroom data of the listening and speaking party obtained after encapsulation to the main speaking terminal 102. Similarly, the first video capture device 106 transmits the captured first video data of the lecture room where the main speaker is located to the main speaker terminal 102. The second video data acquisition device 108 transmits the acquired courseware image data of the speaker to the speaker terminal 102, and at this time, the speaker terminal 102 receives the speaker video data. The video data of the speaker comprises first video data and courseware picture data. The first audio capture device 110 transmits the captured audio data of the main speaker to the main speaker terminal 102. The main phone terminal 102 decodes the classroom data of each listening and speaking party and plays the decoded data. The main speaker terminal 102 preprocesses the main speaker video data and the main speaker audio data to obtain preprocessed main speaker classroom data. The main speaking terminal 102 pushes the preprocessed main speaking party classroom data to each listening and speaking terminal, so that each listening and speaking terminal plays the preprocessed main speaking party classroom data. When the master terminal 102 receives a trigger operation of a master teacher to the interactive mode in a data playing process, a target listening and speaking terminal is determined according to the trigger operation, and listening and speaking terminals other than the target listening and speaking terminal in the plurality of listening and speaking terminals are used as other listening and speaking terminals. The main speaking terminal 102 encapsulates the speaker class data corresponding to the target listening and speaking terminal and the preprocessed main speaking class data, and distributes the encapsulated class data to the other listening and speaking terminals 104, so that the other listening and speaking terminals 104 play the encapsulated class data. The main speaking terminal 102 and the listening and speaking terminal 104 may be, but are not limited to, various recording and broadcasting terminals, live broadcasting terminals, personal computers, notebook computers, smart phones, and tablet computers.

In one embodiment, as shown in fig. 2, a network interactive teaching method is provided, which is described by taking the example that the method is applied to the main speaking terminal in fig. 1, and includes the following steps:

step 202, in the process of network interactive teaching, receiving the class data of the listening and speaking party, the video data of the main speaking party and the audio data of the main speaking party which are pushed by a plurality of listening and speaking terminals.

The lecturer classroom data refers to classroom data in a lecturer classroom. The lecturer classroom data may include video data of a lecturer classroom, such as panoramic video of students, and audio data. When the listening and speaking classroom is not closed, the audio data is the actual sound of the listening and speaking classroom, and when the listening and speaking classroom is closed, the audio data is the mute data. The video data of the main speaker refers to video data in the main speaker classroom. The speaker audio data refers to audio data in a speaker classroom.

In the network interactive teaching process, the main speaking terminal receives the classroom data of the listening and speaking parties pushed by the plurality of listening and speaking terminals. The lecturer classroom data comprises lecturer video data and lecturer audio data. The lecturer classroom data is data encapsulated by an RTMP (Real Time Messaging Protocol) Protocol. Specifically, the video data of the listening and speaking party is obtained by the listening and speaking terminal acquiring the video data of the lecture room through the third video acquisition device and encoding the acquired video data through the third video acquisition device. The third video capture device may be a network camera, such as an RTMP (Real Time Messaging Protocol) camera. The coding mode of the listening and speaking video data can be H.264high Profile. The RTMP camera can be connected with the listening and speaking terminal through an RJ45 network port on the listening and speaking terminal.

The audio data of the listening and speaking party can be determined by the current teaching mode, if the audio data is in the interactive mode, the collected audio data of the listening and speaking party is transmitted to the listening and speaking party terminal through the second audio collecting equipment, and the listening and speaking party terminal codes the audio data of the listening and speaking party. The second audio acquisition device can communicate with the main speaking terminal through an audio interface, such as a 3.5mm audio interface, a lotus head, a phoenix terminal and the like. If the lecture mode is adopted, the lecturer terminal directly adopts locally pre-stored mute data to carry out audio coding or directly uses pre-stored mute data after coding. The encoding method of the Audio data of the listening and speaking party may be AAC (Advanced Audio Coding). The speaker terminal performs timestamp alignment on the encoded speaker video data and speaker audio data, encapsulates the speaker video data and the speaker audio data after the timestamp alignment according to a Real Time Messaging Protocol (RTMP), and transmits the encapsulated speaker class data to the main speaker terminal. The video data of the main speaker can comprise a first video data of a main speaker classroom where the main speaker is located, courseware image data of the main speaker and other multi-path video data. The first video data may include, among other things, a teacher image, a blackboard-writing image, a student image, and so on. The main speaking terminal can acquire first video data of a main speaking classroom where the main speaking party is located through the first video acquisition equipment, acquire courseware picture video data of the main speaking classroom where the main speaking party is located through the second video acquisition equipment and acquire audio data of the main speaking classroom through the first audio acquisition equipment. The first video acquisition equipment has a video coding function, and transmits the coded first video data to the talkback terminal. The first video capture device may be a webcam, such as an RTMP camera. The RTMP camera can be connected with the main speaking terminal through an RJ45 network port on the main speaking terminal. The encoding mode of the first video data may be h.264High Profile. The second video data acquisition equipment runs a preset application program, and acquires courseware image data of a speaker through the preset application program. The first audio acquisition equipment transmits the acquired audio data of the main speaker to the main speaker terminal. And the speaker terminal respectively encodes the audio data of the speaker and the courseware image data. The encoding mode of the audio data of the main speaker may be AAC. The coding mode of the courseware picture data can be H.264High Profile. The first audio acquisition device can communicate with the talkback terminal through interfaces such as a 3.5mm audio interface, a lotus head and a phoenix terminal.

Because the video data in the classroom received by the main speaking terminal and the listening and speaking terminal are the coded video data, the main speaking terminal and the listening and speaking terminal do not need to code the video data, the requirement on the coding capacity of the main speaking terminal and the listening and speaking terminal is reduced, and the hardware cost is reduced.

In one embodiment, the video capture device and the audio capture device can be arranged in different listening and speaking classrooms or main speaking classroom areas and have different purposes. For example, the video capture device may be an RTMP camera. When a plurality of RTMP cameras are arranged, the plurality of RTMP cameras can be connected through the switch and connected with the listening and speaking terminal through the RJ45 net port on the listening and speaking terminal, and therefore the plurality of RTMP cameras and the listening and speaking terminal are connected. For example, the audio acquisition device may be a microphone, the microphone may be connected to the audio processor through a 3.5mm audio interface on the listening and speaking terminal, a lotus head, or a phoenix terminal interface, and an output port of the audio processor is connected to the listening and speaking terminal through a 3.5mm audio interface on the listening and speaking terminal.

And step 204, decoding the classroom data of the listening and speaking party and playing the decoded data.

After receiving the lecturer classroom data, the main lecturer video data and the main lecturer audio data, the main lecturer terminal decodes the lecturer classroom data to obtain the decoded lecturer classroom data. The main speaker terminal can select data to be played according to the current teaching scene. The data to be played may include video data as well as audio data. The selected data to be played can be data of any one of the listening terminal and the speaking terminal or the main speaking terminal. When the selected data to be played is only the data of one terminal, the data to be played can be directly sent to the terminal for playing. When the selected data to be played is data of a plurality of terminals, video data of each terminal in the data to be played can be zoomed, and the zoomed video data is spliced according to a preset display position and played. For the audio data of each terminal in the data to be played, the audio data of multiple terminals can be mixed and played. In one embodiment, when the selected data to be played is data of a plurality of terminals, the method decodes the classroom data of the listening and speaking party, and playing the decoded data includes: decoding classroom data of the listening and speaking party to obtain target video data pushed by each listening and speaking terminal and target audio data pushed by each listening and speaking terminal; splicing target video data pushed by a plurality of listening and speaking terminals, and displaying a plurality of classroom pictures obtained by splicing; and performing audio mixing processing on the target audio data pushed by the plurality of listening and speaking terminals, and playing the audio data after the audio mixing processing.

The class data of the listening and speaking party is data obtained by packaging corresponding video data and audio data of the listening and speaking party by each listening and speaking terminal. The main speaking terminal can firstly analyze the class data of the listening and speaking party corresponding to each listening and speaking terminal to obtain the separated video data and audio data of the listening and speaking party. The main terminal decodes the video data of the listening and speaking party to obtain decoded video data, namely target video data. And decoding the audio data of the listening and speaking party to obtain decoded audio data, namely target audio data. The main speaking terminal can sequentially splice the target video data according to the preset display position of the received target video data to obtain a plurality of classroom pictures. The splicing mode can be GLSurfaceView. Therefore, the main speaking terminal outputs the combined multiple classroom pictures to the display device through the HDMI output port, for example, the display device can be a television, and the display device displays the spliced multiple classroom pictures. The main speaking terminal transmits the target audio data to the audio mixer, and the audio mixer performs audio mixing processing on the target audio data, so that mutual interference among multiple paths of audio data is avoided. And the main speaking terminal transmits the audio data after the audio mixing processing to an audio player for playing. For example, the audio player may be a sound box. And the teacher can watch the panoramic images of the students in the listening and speaking classrooms corresponding to the listening and speaking terminals in the process of lecturing, so that the participation situation of each listening and speaking classroom can be known in time.

And step 206, preprocessing the video data and the audio data of the main speaker to obtain preprocessed classroom data of the main speaker.

And step 208, pushing the preprocessed class data of the main speaker to each listening and speaking terminal so that each listening and speaking terminal plays the preprocessed class data of the main speaker.

After receiving the video data of the main speaker transmitted by the first video acquisition device and the second video acquisition device and the audio data of the main speaker transmitted by the first audio acquisition device, the main speaker terminal can preprocess the video data of the main speaker and the audio data of the main speaker, and the preprocessing mode can include coding, synchronizing and packaging. In one embodiment, the preprocessing the video data and the audio data of the main speaker to obtain the preprocessed classroom data of the main speaker includes: encoding the audio data of the main speaker to obtain encoded audio data of the main speaker; carrying out synchronous processing on the coded main speaker audio data and the main speaker video data to obtain synchronized main speaker audio data and synchronized main speaker video data; and packaging the synchronized audio data of the main speaker and the synchronized video data of the main speaker, and taking the audio data obtained after packaging as the preprocessed classroom data of the main speaker.

Because the video data of the main speaker received by the main speaker terminal is the video data after being coded, the main speaker terminal only needs to code the audio data of the main speaker. For example, the Audio data may be encoded in AAC (Advanced Audio Coding). Therefore, the main speaker terminal carries out synchronous processing on the coded main speaker audio data and the main speaker video data, and the synchronous processing can be that the coded main speaker audio data and the coded main speaker video data are aligned in time stamp so as to ensure that the coded main speaker audio data and the coded main speaker video data are synchronous. And then the main speaker terminal packages the synchronized main speaker audio data and the synchronized main speaker video data into the same RTMP channel according to a preset protocol, and the audio and video data obtained after packaging is used as the preprocessed main speaker classroom data. For example, the preset protocol may be an RTMP protocol.

And the main speaker terminal pushes the preprocessed main speaker classroom data to each listening and speaking terminal through an RTMP channel, and each listening and speaking terminal decodes the preprocessed main speaker classroom data after receiving the preprocessed main speaker classroom data source to obtain the decoded main speaker video data and the decoded main speaker audio data. And the listening and speaking terminal transmits the decoded video data of the main speaking party to the display equipment. For example, the display device may be a television. And displaying through the display device. And the listening and speaking terminal transmits the decoded audio data of the listening and speaking party to an audio player for playing. For example, the audio player may be a sound box.

And step 210, when the triggering operation of the interactive mode is received, determining a target listening and speaking terminal according to the triggering operation, and using listening and speaking terminals except the target listening and speaking terminal in the plurality of listening and speaking terminals as other listening and speaking terminals.

And 212, encapsulating the speaker class data corresponding to the target speaker terminal and the preprocessed main speaker class data, and distributing the encapsulated class data to other speaker terminals so that the other speaker terminals play the encapsulated class data.

When the master teacher receives the trigger operation of the master teacher to the interactive mode button, the master terminal responds to the trigger operation, and determines a target listening and speaking terminal based on the trigger operation. And taking the listening and speaking terminals except the target listening and speaking terminal in the plurality of listening and speaking terminals as other listening and speaking terminals.

The main speaking terminal acquires the speaker class data corresponding to the target speaker terminal, and the main speaking terminal can directly transmit the speaker class data corresponding to the target speaker terminal and the preprocessed main speaking class data to other speaker terminals by adopting a multi-path distribution technology. Specifically, the main speaking terminal determines RTMP channels through which the main speaking terminal performs data communication with other listening and speaking terminals, so that the obtained speaker class data corresponding to the target listening and speaking terminal and the preprocessed main speaking class data are packaged in the determined RTMP channels, and the packaged class data are transmitted to the corresponding listening and speaking terminals through the RTMP channels. After receiving the encapsulated classroom data, the listening and speaking terminal decodes the encapsulated classroom data to obtain the listening and speaking party video data, the listening and speaking party audio data, the main speaking party video data and the main speaking party audio data corresponding to the target listening and speaking terminal. The listening and speaking terminal can sequentially splice the listening and speaking party video data and the main speaking party video data corresponding to the target listening and speaking terminal according to the data receiving time sequence to obtain a plurality of classroom pictures. The splicing mode may be GLSurfaceView. The listening and speaking terminal outputs the combined plurality of classroom pictures to the display device through the HDMI output port, for example, the display device may be a television, and the display device displays the spliced plurality of classroom pictures. The listening and speaking terminal transmits listening and speaking audio data and main speaking audio data corresponding to the target listening and speaking terminal to the audio mixer, and audio mixing processing is carried out through the audio mixer, so that mutual interference between the listening and speaking audio data and main speaking audio data audio is avoided. And the listening and speaking terminal transmits the audio data after the audio mixing processing to the audio player for playing. For example, the audio player may be a sound box. And then the students in the listening and speaking classroom corresponding to the listening and speaking terminal can watch the video data in the main speaking teacher corresponding to the main speaking terminal and the target listening and speaking classroom corresponding to the target listening and speaking terminal and hear the audio data in the interactive discussion process.

In one embodiment, an RTMP (Real Time Messaging Protocol) Protocol may be used to implement communication between the talkback terminal and the listen and talk terminal. Compared with a complex H.323 communication protocol system adopted by a traditional mode, the method omits the steps of establishing control communication, establishing a media channel, transmitting media and the like, so that the development cost of the interactive teaching service is lower. In addition, the RTMP protocol is adopted for communication, only a teaching mode is required to be concerned, a conference mode is not required to be concerned, complex conference control operation is not required to be realized, and the interactive teaching service control flow and the teaching mode switching flow are simplified.

In one embodiment, the main speaking terminal and the listening and speaking terminal can output the audio data to be played to the audio processor through the 3.5mm audio interface, and perform echo cancellation through the audio processor. The Echo Cancellation method may be Acoustic Echo Cancellation (AEC), Line Echo Cancellation (LEC), or the like. And the audio processor transmits the audio data subjected to echo cancellation to an audio player for playing. By eliminating the echo, the accuracy of the audio data played by the main speaking terminal and the listening and speaking terminal can be improved.

In this embodiment, in the network interactive teaching process, the main speaking terminal receives the listening party classroom data, the main speaking party video data and the main speaking party audio data pushed by the plurality of listening terminals, decodes the listening party classroom data, and plays the decoded data. The teacher can watch the panoramic images of the students in the listening and speaking classrooms corresponding to the listening and speaking terminals in the course of lecturing, and can know the participation situation of the listening and speaking classrooms in time. The main speaker terminal preprocesses the main speaker video data and the main speaker audio data to obtain preprocessed main speaker classroom data, and pushes the preprocessed main speaker classroom data to each listening terminal, so that each listening terminal can learn in classroom. When the triggering operation of the interactive mode is received in the data playing process, the target listening and speaking terminal is determined according to the triggering operation, the main speaking terminal directly encapsulates the listening and speaking party classroom data corresponding to the target listening and speaking terminal and the preprocessed main speaking party classroom data, and the classroom data obtained after encapsulation is distributed to other listening and speaking terminals, so that students in the listening and speaking classroom can watch the main speaking teacher corresponding to the main speaking terminal and the video data in the target listening and speaking classroom corresponding to the target listening and speaking terminal in the interactive discussion process and listen to the audio data in the interactive discussion process. The main speaking terminal does not need to carry out multi-picture coding and sound mixing processing on the speaking party class data corresponding to the target listening and speaking terminal and the preprocessed main speaking party class data before data transmission, so that the one-time decoding and coding process is reduced, and the time delay caused by multi-picture coding and sound mixing processing is reduced. Meanwhile, after receiving the speaker class data corresponding to the target speaker terminal and the preprocessed main speaker class data, the speaker terminal performs decoding, multi-picture splicing and sound mixing processing, so that the resource occupation space between the main speaker terminal and the speaker terminal can be balanced, the main speaker terminal and the speaker terminal do not need to have strong coding capacity, only multi-channel decoding capacity is needed, and the hardware cost in network interactive teaching is effectively reduced.

In one embodiment, as shown in fig. 3, the method further comprises: the method comprises the following steps of switching teaching modes according to a tracking instruction, wherein the steps specifically comprise:

step 302, in the network interactive teaching process, receiving a tracking instruction, wherein the tracking instruction carries a teaching mode tracking identifier.

And step 304, determining a target teaching mode according to the teaching mode tracking identifier, and acquiring target speaker video data and target speaker audio data corresponding to the target teaching mode.

And step 306, switching the video data of the current main speaker to the video data of the target main speaker and switching the audio data of the current main speaker to the audio data of the target main speaker according to the tracking instruction.

In the network interactive teaching process, the master terminal can also switch the teaching mode. And after the first video acquisition equipment corresponding to the main speaking terminal acquires the video data, tracking and identifying the acquired video data. The captured video data may include a plurality of frames of images. The image tracking algorithm used in the tracking identification process may be, but is not limited to, MeanShift, Camshift, Kalman Filter. The teacher classroom that the terminal that talkbacks corresponds is provided with in advance and carries out data acquisition's first video equipment to different regions, and including teacher region and student area, teacher's regional first video acquisition equipment gathers teacher's image promptly, and student's regional first video acquisition equipment gathers student's image.

Specifically, the first video acquisition device monitors the human body of each frame of image, monitors the human body target in each frame of image, obtains the image containing the human body target, thereby detecting the motion of the image containing the human body target, and identifies the motion of the human body target, thereby determining the motion condition of the target human body. The action of the human target may be standing up, sitting down, etc. The first video equipment determines a corresponding teaching mode tracking identifier according to the motion condition of the target human body, generates a tracking instruction and sends the tracking instruction to the talkback terminal. For example, the instructional mode tracking identifier may be a tracking code. And the speaker terminal analyzes the tracking instruction to obtain a teaching mode tracking identifier, and determines a target teaching mode according to the teaching mode tracking identifier. The target teaching mode can be one or more of various teaching modes such as appearance of a teacher, teaching of the teacher, disappearance of the teacher, walking on a platform, walking off the platform, standing up of a student, sitting down of the student, standing up of multiple persons, starting writing on a blackboard, stopping writing on a blackboard and the like. The main speaker terminal obtains target main speaker video data and target main speaker audio data corresponding to the target teaching mode from the first video collecting device or the second video collecting device. And switching the video data of the current main speaker to the video data of the target main speaker according to the tracking instruction, and switching the audio data of the current main speaker to the audio data of the target main speaker. For example, when a teacher speaks a class, the master terminal acquires close-up images and sound of the teacher and sends the close-up images and the sound to each listening and speaking terminal; when the teacher speaks courseware, courseware picture data acquired by the second video acquisition equipment and teacher sound acquired by the first audio acquisition equipment are acquired and sent to each listening and speaking terminal; when a teacher writes on a blackboard, the master terminal acquires blackboard writing images and teacher sound and sends the blackboard writing images and the teacher sound to each listening and speaking terminal; when the students answer the questions, the main speaking terminal acquires the close-up images and the sound of the standing students and sends the close-up images and the sound to each listening and speaking terminal; during interactive discussion, panoramic images and sounds of a teacher, students and students in an interactive listening and speaking classroom are collected and sent to other listening and speaking terminals.

Similarly, the manner of switching the teaching mode of the listening/speaking terminal is the same as the manner of switching the teaching mode of the main speaking terminal. And will not be described in detail herein.

In this embodiment, the speaker terminal determines the target teaching mode according to the teaching mode tracking identifier in the tracking instruction by receiving the tracking instruction sent by the first video acquisition device, so as to obtain corresponding target speaker video data and target speaker audio data in the first video acquisition device or the second video acquisition device according to the target teaching mode, and further perform teaching mode switching according to the target speaker video data and the target speaker audio data. Can effectively restore each side's scene of participating in teaching at interactive teaching in-process, automatically, tailor away unnecessary rubbish image, only need obtain the target owner's party video data and the target owner's party audio data that the target teaching mode corresponds, send to each listen to talk terminal can, realize the switching of complicated scene, simultaneously, avoided student's attention to be dispersed, effectively improved the effect of giving lessons.

In one embodiment, the plurality of listening and speaking terminals are listening and speaking terminals after the class state is updated, and before receiving the listening and speaking party class data, the speaker video data and the speaker audio data pushed by the plurality of listening and speaking terminals in the network interactive teaching process, the method further includes: when the prompt information which is ready is received, determining a prompt terminal according to the prompt information, and updating the class state of the prompt terminal to a ready state; and after the class state of the prompt terminal is updated, receiving the class data of the listening and speaking party pushed by the prompt terminal.

The main speaking terminal can update the classroom state of the listening and speaking terminal before receiving the classroom data, the main speaking video data and the main speaking audio data of the listening and speaking terminals. Specifically, when the main speaking terminal receives the prepared prompt information, the main speaking terminal determines the prompt terminal according to the prompt information, and the prompt terminal is a listening and speaking terminal for sending the prepared prompt information. And the main speaking terminal updates the classroom state of the prompting terminal into a ready state. The ready state indicates that the terminal is ready to finish listening. After the class state of the prompt terminal is updated, the main speaker terminal can receive the class data of the audiences pushed by the prompt terminal. For the listening and speaking terminals which do not receive the prompt messages ready for completion, the main speaking terminal can regularly send the class request to the corresponding listening and speaking terminals until the prompt messages ready for completion are received.

In this embodiment, the main speaker terminal may receive the data of the lecturer class pushed by the prompt terminal only after the class state of the prompt terminal is updated to the ready state. The effectiveness of received lecturer classroom data can be ensured, and the transmission of junk data is avoided, so that the waste of transmission resources is avoided.

In one embodiment, the method further comprises: after the class state of the prompt terminal is updated, sending a heartbeat packet to the prompt terminal according to a first time interval; when the response information of the prompting terminal is not received within a preset time period, marking the class state of the prompting terminal as a ready state; and sending a lesson instruction to the prompt terminal according to a second preset time interval until receiving prompt information which is returned by the prompt terminal and is ready to finish.

After the class state of the prompt terminal is updated, namely the class state of the prompt terminal is updated to be a ready state, the main speaker terminal sends a heartbeat packet to the prompt terminal according to a first time interval, namely the heartbeat packet is sent to the prompt terminal at a certain frequency. The normal connection between the main speaking terminal and the listening and speaking terminal is ensured by sending the heartbeat packet.

When the main speaking terminal does not receive the response information of the prompt terminal within the preset time period, the heartbeat of the prompt terminal is abnormal, the main speaking terminal can be automatically disconnected from the prompt terminal, and the class state of the prompt terminal is restored to the ready state. At this time, the main speaking terminal will send the lesson request to the prompting terminal again at regular time until receiving the prompt information of the preparation completion sent by the prompting terminal. Meanwhile, the main speaking terminal can reestablish the connection with the prompt terminal. Therefore, the safety of data transmission between the main speaking terminal and the listening and speaking terminal is improved.

In one embodiment, the main teacher needs to prepare the main terminal and the listening terminal before the lecture. Specifically, the main lecture terminal receives a lecture preparation instruction triggered by a main lecture teacher, and creates an RTMP Server according to the lecture preparation instruction, namely, presetting a network address, an RTMP Client and establishing an RTMP channel. The RTMP Client is used for pushing flow, namely the preprocessed class data of the main speaker is transmitted to a preset network address. The RTMP channel is a transmission channel for transmitting the preprocessed main speaker classroom data pushed by the RTMP Client to a preset network address.

Then, the main speaking terminal obtains first configuration information, and the first configuration information may include a main speaking terminal identifier, a first video collecting device identifier, a second video collecting device identifier, a first audio collecting device identifier, and the like. The talkback terminal starts a first video acquisition device corresponding to a first video acquisition device identification according to the first configuration information, a second video acquisition device corresponding to a second video acquisition device identification and a first audio acquisition device corresponding to a first audio acquisition device identification, receives the first video acquisition device, the speaker video data transmitted by the second video acquisition device and the speaker audio data transmitted by the second audio acquisition device, and pre-processes the speaker video data and the speaker audio data, wherein the pre-processing mode comprises coding, synchronization and packaging, and then obtains the pre-processed speaker classroom data. The main speaker terminal pushes the preprocessed main speaker classroom data to a preset network address, and then the main speaker terminal obtains the preprocessed main speaker classroom data through the preset network address and decodes the obtained data to obtain decoded main speaker video data and decoded main speaker audio data. And the main speaking terminal transmits the decoded main speaking video data to the display equipment. For example, the display device may be a television. And displaying through the display device. And the main speaking terminal transmits the decoded audio data of the listening and speaking party to an audio player for playing. For example, the audio player may be a sound box. At this time, the main speaking terminal completes the preparation work before class. The master teacher can see the local teaching picture and hear the local teaching sound.

And the main speaking terminal acquires the interactive class identification and the course configuration information after completing the preparation work before class. The course configuration information comprises a plurality of listening and speaking terminal identifications corresponding to the preset interactive course identifications. And the main speaking terminal determines the listening and speaking terminal identification corresponding to the interactive course identification in the course configuration information. When the main speaking terminal receives the selection operation of the plurality of listening terminal identifications corresponding to the interactive class identification triggered by the main speaking teacher, the main speaking terminal responds to the selection operation of the plurality of listening terminal identifications, generates corresponding class-giving requests according to the listening terminal identifications, and sequentially sends class-giving requests to the listening terminals corresponding to the listening terminal identifications. Because the course configuration information is pre-configured and stored in the main speaking terminal, when the main speaking terminal starts the live broadcast, the corresponding listening terminal identification of the interactive course can be directly selected according to the course configuration information so as to send a lesson request, thereby realizing the quick start of the interactive teaching.

After receiving the request of class, the listening terminal responds to the request of class and replies confirmation information to the main speaking terminal. And after receiving the confirmation information, the main speaking terminal updates the network state of the corresponding listening and speaking terminal to be on-line. For the listening and speaking terminals which do not receive the confirmation information, the main speaking terminal can regularly send the class request to the listening and speaking terminals corresponding to the confirmation information until the confirmation information is received.

After updating the network state of the listening and speaking terminal, the main speaking terminal may send a heartbeat packet to the listening and speaking terminal after updating the network state at a certain frequency. The normal connection between the main speaking terminal and the listening and speaking terminal is ensured by sending the heartbeat packet. When the main speaking terminal does not receive the response information of the listening terminal for updating the network state within the preset time period, the heartbeat of the listening terminal is abnormal, the main speaking terminal can automatically disconnect the listening terminal, and the class state of the prompting terminal is recovered to an offline state. At this time, the main speaking terminal will send the class request to the listening and speaking terminal again at regular time until receiving the confirmation message sent by the listening and speaking terminal. Meanwhile, the main speaking terminal can reestablish the connection with the listening and speaking terminal. Therefore, the safety of data transmission between the main speaking terminal and the listening and speaking terminal is improved.

After the listening terminal replies the confirmation message, the listening terminal can complete the preparation work before class by adopting the way of the preparation work before class of the main speaking terminal. And will not be described in detail herein. And the main speaking terminal updates the classroom state of the prompting terminal corresponding to the prompting information into a ready state when receiving the prompting information which is ready. After the class state of the prompt terminal is updated, the main speaker terminal can receive the class data of the audiences pushed by the prompt terminal.

In this embodiment, the main speaking terminal and the listening terminal complete preparation before class, and both parties can be ensured to receive corresponding data normally locally. The main speaking terminal can receive the class data of the listening and speaking party pushed by the listening and speaking terminal only when the network state of the listening and speaking terminal is on-line and the class state is ready. The effectiveness of received lecturer classroom data can be ensured, and the transmission of junk data is avoided, so that the waste of transmission resources is avoided.

It should be understood that although the steps in the flowcharts of fig. 2 to 3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided a network interactive teaching device, including: a first communication module 402, a second communication module 404, a decoding module 406, a preprocessing module 408, and a determination module 410, wherein:

the first communication module 402 is configured to receive lecturer classroom data pushed by multiple lecturer terminals in the network interactive teaching process.

A second communication module 404 for receiving the main speaker video data and the main speaker audio data.

And the decoding module 406 is configured to decode the classroom data of the listening and speaking party, and play the decoded data.

The preprocessing module 408 is configured to preprocess the speaker video data and the speaker audio data to obtain preprocessed speaker classroom data.

The first communication module 402 is further configured to push the preprocessed main speaker classroom data to each listening and speaking terminal, so that each listening and speaking terminal plays the preprocessed main speaker classroom data.

The determining module 410 is configured to determine, when a trigger operation of the interaction mode is received in a data playing process, a target listening and speaking terminal according to the trigger operation, and use a listening and speaking terminal other than the target listening and speaking terminal in the plurality of listening and speaking terminals as another listening and speaking terminal.

The first communication module 402 is further configured to encapsulate the lecturer classroom data corresponding to the target lecturer terminal and the preprocessed speaker classroom data, and distribute the classroom data obtained after encapsulation to other lecturer terminals, so that the other lecturer terminals play the classroom data obtained after encapsulation.

In one embodiment, the above apparatus further comprises:

the second communication module 404 is further configured to receive a tracking instruction during the network interactive teaching process, where the tracking instruction carries a teaching mode tracking identifier.

The first communication module 402 is further configured to determine a target teaching mode according to the teaching mode tracking identifier, and acquire target speaker video data and target speaker audio data corresponding to the target teaching mode.

And the switching module is used for switching the audio data of the current main speaker to the video data of the target main speaker according to the tracking instruction and switching the audio data of the current main speaker to the audio data of the target main speaker.

In one embodiment, the above apparatus further comprises:

and the detection module is used for detecting the current network parameters before encapsulating the speaker classroom data corresponding to the target speaker terminal and the preprocessed main speaker classroom data.

And the generating module is used for generating corresponding interaction prompt information according to the target listening and speaking terminal identification when the network parameter is detected to be smaller than the threshold value.

The first communication module 402 is further configured to send the interaction prompt information to other listening and speaking terminals, so that the other listening and speaking terminals pull the data of the listening and speaking party class corresponding to the target listening and speaking terminal according to the interaction prompt information.

In an embodiment, the decoding module 406 is further configured to decode the classroom data of the listening and speaking party to obtain target video data pushed by each listening and speaking terminal and target audio data pushed by each listening and speaking terminal; splicing target video data pushed by a plurality of listening and speaking terminals, and displaying a plurality of classroom pictures obtained by splicing; and performing audio mixing processing on the target audio data pushed by the plurality of listening and speaking terminals, and playing the audio data after the audio mixing processing.

In one embodiment, the preprocessing module 408 is further configured to encode the audio data of the main speaker to obtain encoded audio data of the main speaker; carrying out synchronous processing on the coded main speaker audio data and the main speaker video data to obtain synchronized main speaker audio data and synchronized main speaker video data; and packaging the synchronized audio data of the main speaker and the synchronized video data of the main speaker, and taking the audio data obtained after packaging as the preprocessed classroom data of the main speaker.

In one embodiment, the above apparatus further comprises:

and the state updating module is used for determining the prompt terminal according to the prompt information and updating the classroom state of the prompt terminal to a ready state before receiving the speaker classroom data, the speaker video data and the speaker audio data which are pushed by the plurality of speaker terminals in the network interactive teaching process and when the prompt information which is ready is received.

The first communication module 402 is further configured to receive the class data of the lecturer pushed by the prompt terminal after the class state of the prompt terminal is updated.

In one embodiment, the above apparatus further comprises:

and the heartbeat packet sending module is used for sending heartbeat packets to the prompt terminal according to a first time interval after the classroom state of the prompt terminal is updated.

And the state marking module is used for marking the class state of the prompting terminal as a ready state when the response information of the prompting terminal is not received in a preset time period.

And the instruction sending module is used for sending the instruction for class to the prompting terminal according to the second preset time interval until the prompt information which is returned by the prompting terminal and is ready to be finished is received.

For specific limitations of the network interactive teaching device, reference may be made to the above limitations of the network interactive teaching method, which are not described herein again. All modules in the network interactive teaching device can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a walkie-talkie terminal, the internal structure of which may be as shown in fig. 5. The computer apparatus includes a processor, a memory, a communication interface, an input device, and an output device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a network interactive teaching method. The input device of the computer equipment can be a touch layer covered on a display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like. The output means of the computer device may comprise a display screen, a loudspeaker, etc.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the various embodiments described above when the processor executes the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when executed by a processor, performs the steps in the various embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A network interactive teaching method is characterized in that the method comprises the following steps:

determining channels for data communication with the other listening and speaking terminals, encapsulating the listening and speaking party class data corresponding to the target listening and speaking terminal and the preprocessed main speaking party class data in the channels, and distributing the encapsulated class data to the other listening and speaking terminals through the channels so that the other listening and speaking terminals play the encapsulated class data.

2. The method of claim 1, further comprising:

and switching the video data of the current main speaker to the video data of the target main speaker according to the tracking instruction, and switching the audio data of the current main speaker to the audio data of the target main speaker.

3. The method of claim 1, wherein the decoding the lecturer classroom data, and wherein the playing the decoded data comprises:

4. The method of claim 1, wherein the pre-processing the main speaker video data and the main speaker audio data to obtain pre-processed main speaker classroom data comprises:

5. The method according to any one of claims 1 to 4, wherein the plurality of listening and speaking terminals are listening and speaking terminals after the class status is updated, and before receiving the listening and speaking class data, the speaker video data and the speaker audio data pushed by the plurality of listening and speaking terminals in the network interactive teaching process, the method further comprises:

6. The method of claim 5, further comprising:

7. An interactive network teaching device, the device comprising:

the first communication module is further configured to push the preprocessed main speaker classroom data to each listening and speaking terminal, so that each listening and speaking terminal plays the preprocessed main speaker classroom data;

the first communication module is further configured to determine a channel for performing data communication with each of the other listening and speaking terminals, encapsulate the talker classroom data corresponding to the target listening and speaking terminal and the preprocessed main talker classroom data in the channel, and distribute the encapsulated classroom data to the other listening and speaking terminals through the channel, so that the other listening and speaking terminals play the encapsulated classroom data.

8. The apparatus of claim 7, further comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.