CN105429983B

CN105429983B - Acquire method, media termination and the music lesson system of media data

Info

Publication number: CN105429983B
Application number: CN201510846324.1A
Authority: CN
Inventors: 刘军
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-11-27
Filing date: 2015-11-27
Publication date: 2018-09-14
Anticipated expiration: 2035-11-27
Also published as: CN105429983A

Abstract

The invention discloses a method for collecting media data, a media terminal and a music teaching system. The media terminal includes a video acquisition unit, a video buffer, an audio acquisition unit, an audio buffer, a sending buffer, a transmission unit and a control unit. The video capture unit captures images and encodes them into video frames. Video buffers are suitable for holding video frames. The audio collection unit collects sound and encodes it into audio frames. Audio buffers are suitable for storing audio frames. The sending buffer stores the data frames to be sent. Each data frame is a video frame or an audio frame. The transmission unit is adapted to transmit data frames to be sent to the media server. The control unit is adapted to detect the audio buffer and push the audio frames therein into the sending buffer. When the audio buffer is empty, if the number of data frames to be sent does not exceed the threshold and the video buffer is not empty, the control unit extracts video frames from the video buffer and pushes them to the sending buffer.

Description

Method for collecting media data, media terminal and music teaching system

技术领域technical field

本发明涉及通信领域，尤其涉及采集媒体数据的方法、媒体终端及音乐教学系统。The invention relates to the communication field, in particular to a method for collecting media data, a media terminal and a music teaching system.

背景技术Background technique

目前，在例如视频会议或网络直播等实时通信方案中，采集媒体数据的终端可以采集视频帧和音频帧等媒体数据，并向播放端传输媒体数据。例如可以将视频帧和音频帧进行一起封装后通网络传输。或者，终端将视频帧和音频帧各自进行封装并进行传输。At present, in real-time communication solutions such as video conferencing or webcasting, a terminal collecting media data may collect media data such as video frames and audio frames, and transmit the media data to the playback terminal. For example, video frames and audio frames may be encapsulated together and transmitted through the network. Alternatively, the terminal encapsulates the video frame and the audio frame respectively and transmits them.

然而，现有的网络状况是复杂多变的，例如存在网络抖动、间歇中断等问题。采集终端在通过网络发送音频和视频数据时，存在网络延时、网络阻塞等问题。因此，媒体播放端从采集终端获取音频和视频数据时，面临卡断不流畅的问题。However, the existing network conditions are complex and changeable, for example, there are problems such as network jitter and intermittent interruption. When the collection terminal sends audio and video data through the network, there are problems such as network delay and network congestion. Therefore, when the media player acquires audio and video data from the acquisition terminal, it faces the problem of being stuck and not smooth.

发明内容Contents of the invention

为此，本发明提供一种新的采集媒体数据的方案，有效的解决了上面至少一个问题。For this reason, the present invention provides a new solution for collecting media data, which effectively solves at least one of the above problems.

根据本发明的一个方面，提供一种媒体终端，包括视频采集单元、视频缓冲区、音频采集单元、音频缓冲区、发送缓冲区、传输单元和控制单元。视频采集单元适于采集图像并编码为视频帧。视频缓冲区适于存放来自视频采集单元的视频帧。音频采集单元适于采集声音并编码为音频帧。音频缓冲区适于存放来自音频采集单元的音频帧。发送缓冲区适于存放一个或多个待发送的数据帧。其中每个待发送数据帧为来自视频缓冲区的视频帧或来自音频缓冲区的音频帧。传输单元适于将一个或多个待发送的数据帧传输到媒体服务器。控制单元适于检测音频缓冲区并将其中的音频帧推送到发送缓冲区中，并在音频缓冲区为空时，判断是否发送缓冲区中待发送数据帧的数量未超过阈值且视频缓冲区不为空。如果待发送数据帧的数量未超过阈值且视频缓冲区不为空，控制单元从视频缓冲区中提取一帧视频帧并推送到发送缓冲区中。According to one aspect of the present invention, a media terminal is provided, including a video acquisition unit, a video buffer, an audio acquisition unit, an audio buffer, a sending buffer, a transmission unit, and a control unit. The video capture unit is adapted to capture images and encode them into video frames. The video buffer is adapted to store video frames from the video acquisition unit. The audio capture unit is adapted to capture sound and encode it into audio frames. The audio buffer is adapted to store audio frames from the audio capture unit. The send buffer is suitable for storing one or more data frames to be sent. Each data frame to be sent is a video frame from the video buffer or an audio frame from the audio buffer. The transmission unit is adapted to transmit one or more data frames to be transmitted to the media server. The control unit is adapted to detect the audio buffer and push the audio frames therein to the sending buffer, and when the audio buffer is empty, judge whether the number of data frames to be sent in the sending buffer does not exceed a threshold and the video buffer does not Is empty. If the number of data frames to be sent does not exceed the threshold and the video buffer is not empty, the control unit extracts a video frame from the video buffer and pushes it into the sending buffer.

根据本发明的又一个方面，提供一种采集媒体数据的方法。该方法适于在媒体终端中执行。该媒体终端包括发送缓冲区。发送缓冲区适于存放一个或多个待发送的数据帧。其中每个待发送的数据帧为视频帧或音频帧。该方法包括以下步骤。采集图像并编码为视频帧，并将该视频帧存放到视频缓冲区中。采集声音并编码为音频帧，并将该音频帧存放音频缓冲区中。检测音频缓冲区并将其中的音频帧推送到发送缓冲区中。如果音频缓冲区为空，则判断是否发送缓冲区中待发送数据帧的数量未超过阈值且视频缓冲区不为空。如果待发送的数据帧的数量未超过阈值且视频缓冲区不为空，则从该视频缓冲区中提取一帧视频帧并推送到发送缓冲区中。将所述一个或多个待发送的数据帧传输到媒体服务器。According to still another aspect of the present invention, a method for collecting media data is provided. The method is suitable for execution in a media terminal. The media terminal includes a send buffer. The send buffer is suitable for storing one or more data frames to be sent. Each data frame to be sent is a video frame or an audio frame. The method includes the following steps. Capture images and encode them into video frames, and store the video frames in the video buffer. Collect sound and encode it into an audio frame, and store the audio frame in the audio buffer. Detects an audio buffer and pushes audio frames from it into the send buffer. If the audio buffer is empty, it is judged whether the number of data frames to be sent in the sending buffer does not exceed the threshold and the video buffer is not empty. If the number of data frames to be sent does not exceed the threshold and the video buffer is not empty, a video frame is extracted from the video buffer and pushed to the sending buffer. Transmitting the one or more data frames to be sent to a media server.

根据本发明的又一个方面，提供一种音乐教学系统，包括根据本发明的媒体终端、媒体服务器和媒体播放端。媒体服务器适于接收媒体终端发送的音频帧和视频帧。媒体播放端适于从媒体服务器获取并播放音频帧和视频帧。According to yet another aspect of the present invention, a music teaching system is provided, including a media terminal, a media server and a media player according to the present invention. The media server is adapted to receive audio frames and video frames sent by the media terminal. The media player is adapted to acquire and play audio frames and video frames from the media server.

根据本发明的媒体数据采集方案，可以优先将音频缓冲区中音频帧推送到发送缓冲区，而在音频缓冲区为空并且发送缓冲区中数据帧的数量没有超过阈值时，才会将视频缓冲区中视频帧推送到发送缓冲区。这样，根据本发明的媒体数据采集方案可以始终优先对音频帧进行传输，而在传输音频帧的间隙对视频帧进行传输。特别是，在网络带宽较低时(即发送缓冲区中数据帧的数量超过阈值)，根据本发明的媒体数据采集方案可以停止向发送缓冲区推送视频帧，而正常地将音频缓冲区中音频帧推送到发送缓冲区中。这样的工作方式可以在网络带宽较低时有效减小传输单元需要发送的数据量，从而有效的保证音频帧的实时传输。换言之，根据本发明的媒体数据采集方案实现了对音频帧的优先传输，从而避免了媒体播放端所播放声音的卡断问题。应当注意，在音乐教学等声音重要性较高的场合中，根据本发明的媒体数据采集方案保证声音数据的实时传输，并且媒体播放端可以播放连续、不失真的音频数据，从而极大提高用户的体验度。According to the media data collection scheme of the present invention, the audio frame in the audio buffer can be pushed to the sending buffer first, and when the audio buffer is empty and the number of data frames in the sending buffer does not exceed the threshold, the video will be buffered The video frames in the region are pushed to the send buffer. In this way, the media data collection solution according to the present invention can always give priority to transmission of audio frames, and transmit video frames in the interval of transmission of audio frames. Especially, when the network bandwidth is low (that is, the number of data frames in the sending buffer exceeds the threshold), the media data acquisition scheme according to the present invention can stop pushing video frames to the sending buffer, and normally send audio frames in the audio buffer. Frames are pushed into the send buffer. Such a working method can effectively reduce the amount of data that the transmission unit needs to send when the network bandwidth is low, thereby effectively ensuring the real-time transmission of audio frames. In other words, according to the media data collection solution of the present invention, the priority transmission of audio frames is realized, thereby avoiding the problem of interruption of the sound played by the media player. It should be noted that in the occasions where the importance of sound is high, such as music teaching, the media data acquisition scheme according to the present invention ensures the real-time transmission of sound data, and the media player can play continuous, undistorted audio data, thereby greatly improving user experience. experience.

附图说明Description of drawings

为了实现上述以及相关目的，本文结合下面的描述和附图来描述某些说明性方面，这些方面指示了可以实践本文所公开的原理的各种方式，并且所有方面及其等效方面旨在落入所要求保护的主题的范围内。通过结合附图阅读下面的详细描述，本公开的上述以及其它目的、特征和优势将变得更加明显。遍及本公开，相同的附图标记通常指代相同的部件或元素。To the accomplishment of the foregoing and related ends, certain illustrative aspects are herein described, taken in conjunction with the following description and drawings, which are indicative of the various ways in which the principles disclosed herein may be practiced, and all aspects and their equivalents are intended to fall within the scope of within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent by reading the following detailed description in conjunction with the accompanying drawings. Like reference numerals generally refer to like parts or elements throughout this disclosure.

图1示出了根据本发明一个示例性音乐教学系统100的框图；Fig. 1 shows a block diagram of an exemplary music teaching system 100 according to the present invention;

图2示出了根据本发明一些实施例的媒体终端200的框图；以及FIG. 2 shows a block diagram of a media terminal 200 according to some embodiments of the invention; and

图3示出了根据本发明的一些实施例的采集媒体数据的方法300的流程图。Fig. 3 shows a flowchart of a method 300 for collecting media data according to some embodiments of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

图1示出了根据本发明一个示例性音乐教学系统100的框图。如图1所示，音乐教学系统100可以包括多个学生客户端110、服务器120和老师客户端130。在音乐教学系统100中，学生客户端110和老师客户端130通过服务器120进行实时通信，以便进行在线音乐教学。例如，在学生进行演奏时，学生客户端110可以被实现为媒体终端，采集学生演奏相关的诸如视频和音频等媒体数据，并通过服务器120向老师客户端130传输这些媒体数据。老师客户端130可以被实现为媒体播放端，接收并播放媒体数据，以便老师实时了解学生的演奏情况。同时，老师客户端130也可以被实现为媒体终端，采集老师对学生演奏的反馈指导或者教学演示等内容的媒体数据，并通过服务器120向学生客户端传输。学生客户端110可以被实现为媒体播放端，接收并播放来自老师客户端130的媒体数据，以便老师对学生演奏进行实时反馈，或者实时地对学生进行教学演示。总之，学生客户端110和老师客户端130都可以被实现为媒体终端和媒体播放终端。这里，媒体数据例如包括演奏乐器的指法、气息、乐器声音和指导文案等教学内容，但不限于此。FIG. 1 shows a block diagram of an exemplary music teaching system 100 according to the present invention. As shown in FIG. 1 , the music teaching system 100 may include multiple student clients 110 , a server 120 and a teacher client 130 . In the music teaching system 100, the student client 110 and the teacher client 130 communicate in real time through the server 120 for online music teaching. For example, when a student performs a performance, the student client 110 can be implemented as a media terminal, which collects media data such as video and audio related to the student's performance, and transmits the media data to the teacher client 130 through the server 120 . The teacher client 130 can be implemented as a media player, receiving and playing media data, so that the teacher can understand the performance of the students in real time. At the same time, the teacher client 130 can also be implemented as a media terminal, which collects media data such as the teacher's feedback on the student's performance or teaching demonstration, and transmits it to the student client through the server 120 . The student client 110 can be implemented as a media player, receiving and playing media data from the teacher client 130, so that the teacher can give real-time feedback on the performance of the students, or give real-time teaching demonstrations to the students. In short, both the student client 110 and the teacher client 130 can be implemented as media terminals and media playback terminals. Here, the media data includes, for example, teaching content such as fingering, breath, sound of musical instruments, and instructional texts for playing a musical instrument, but is not limited thereto.

通常，音乐教学系统100面临传输带宽不稳定的问题，例如网络抖动、网络间歇中断等情况。然而，高质量的音乐教学对媒体数据的实时性、同步性和流畅性等方面有较高的要求。本发明针对音乐教学系统的采集媒体数据环节，提出了一种新的媒体终端。下面结合图2对音乐教学系统中的媒体终端进行进一步的示例性说明。当然，媒体终端可以是学生客户端或者老师客户端，为了简化描述，下文中不再区分媒体终端的具体类型。同样，学生客户端110和老师客户端130都可以被实现为媒体播放端。需要说明的是，根据本发明的媒体终端可以被应用在音乐教学系统中，但并不限于此。例如，根据本发明的媒体终端也可以应用在例如视频会议、比赛直播等实时流媒体方案中。Usually, the music teaching system 100 faces the problem of unstable transmission bandwidth, such as network jitter, intermittent network interruption, and the like. However, high-quality music teaching has high requirements for real-time, synchronization and fluency of media data. The invention proposes a new media terminal aiming at the link of collecting media data in the music teaching system. The media terminal in the music teaching system will be further exemplified below in conjunction with FIG. 2 . Certainly, the media terminal may be a student client or a teacher client. To simplify the description, the specific types of media terminals will not be distinguished below. Likewise, both the student client 110 and the teacher client 130 can be implemented as media playback terminals. It should be noted that the media terminal according to the present invention can be applied in a music teaching system, but is not limited thereto. For example, the media terminal according to the present invention can also be applied in real-time streaming media solutions such as video conferencing and game live broadcast.

图2示出了根据本发明一些实施例的媒体终端200的框图。这里，计算设备可以被配置为媒体终端200。而计算设备可以实现为小尺寸便携(或者移动)电子设备的一部分，这些电子设备可以是诸如蜂窝电话、个人数字助理(PDA)、个人媒体播放器设备、无线网络浏览设备、个人头戴设备、应用专用设备、或者可以包括上面任何功能的混合设备。计算设备还可以实现为包括桌面计算机和笔记本计算机配置的个人计算机，但不限于此。FIG. 2 shows a block diagram of a media terminal 200 according to some embodiments of the invention. Here, the computing device may be configured as the media terminal 200 . Computing devices can be implemented as part of small-sized portable (or mobile) electronic devices such as cellular phones, personal digital assistants (PDAs), personal media player devices, wireless Internet browsing devices, personal headsets, Application-specific devices, or hybrid devices that can include any of the above functions. The computing device can also be implemented as a personal computer including, but not limited to, desktop and notebook computer configurations.

如图2所示，媒体终端200包括视频采集单元210、视频缓冲区220、音频采集单元230、音频缓冲区240、发送缓冲区250、传输单元260和控制单元270。As shown in FIG. 2 , the media terminal 200 includes a video acquisition unit 210 , a video buffer 220 , an audio acquisition unit 230 , an audio buffer 240 , a sending buffer 250 , a transmission unit 260 and a control unit 270 .

视频采集单元210适于采集图像并编码为视频帧。例如，视频采集单元210对学生演奏乐器的场景进行拍摄，以获取视频帧序列。根据本发明一个实施例，视频采集单元210包括摄像头211和编码单元212。摄像头211适于采集原始图像帧。原始图像帧的采集参数例如为640*480尺寸和25帧/秒，但不限于此。另外，视频采集单元210还可以在每捕获一帧原始图像帧时，记录当前的时间值作为捕获该原始图像帧的第一时间戳。根据本发明一个实施例，原始图像帧的格式示例为：The video capture unit 210 is adapted to capture images and encode them into video frames. For example, the video capture unit 210 shoots a scene where a student plays a musical instrument, so as to acquire a sequence of video frames. According to an embodiment of the present invention, the video acquisition unit 210 includes a camera 211 and an encoding unit 212 . The camera 211 is adapted to capture raw image frames. The acquisition parameters of the original image frame are, for example, a size of 640*480 and 25 frames per second, but are not limited thereto. In addition, the video acquisition unit 210 may also record the current time value as the first time stamp of capturing the original image frame each time an original image frame is captured. According to an embodiment of the present invention, an example of the format of the original image frame is:

{dwstamp videodata}{dwstamp videodata}

其中，dwstamp为第一时间戳，videodata为YUV420格式的图像帧。Among them, dwstamp is the first timestamp, and videodata is the image frame in YUV420 format.

编码单元212适于对原始图像帧进行编码。例如，编码单元212可以对原始图像帧进行H.264(由ITU-T视频编码专家组(VCEG)和ISO/IEC动态图像专家组(MPEG)联合组成的联合视频组(JVT，Joint Video Team)提出的高度压缩数字视频编解码器标准)格式的编码。编码单元212的编码参数包括完整画面组(GOP)。GOP的长度例如100帧，即一个图像组具有100帧视频帧。这里，GOP以I帧开始，随后跟随若干P帧。在相邻的P帧间还具有B帧。例如，一个GOP的部分序列为I P B P B P P P P B P。其中，I帧为帧内编码帧，包含了完整图像信息，并且在不参考任何附加信息来重建图像。P帧为前向预测编码帧，由在它前面的P帧或者I帧预测而来。B帧为双向预测的压缩帧。在将图像帧压缩成B帧时，编码单元212根据相邻的前一帧、本帧以及后一帧数据的不同点来压缩本帧。编码单元212的编码参数还可以包括编码帧率和编码尺寸，但不限于此。根据本发明一个实施例，编码单元212执行编码操作的代码示例如下：The encoding unit 212 is adapted to encode raw image frames. For example, the encoding unit 212 can perform H.264 (Joint Video Team (JVT, Joint Video Team) jointly formed by ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG)) on the original image frame. Proposed Highly Compressed Digital Video Codec Standard) format for encoding. The encoding parameters of the encoding unit 212 include a complete group of pictures (GOP). The length of the GOP is, for example, 100 frames, that is, one picture group has 100 video frames. Here, a GOP starts with an I frame followed by several P frames. There are also B frames between adjacent P frames. For example, the partial sequence of a GOP is I P B P B P P P P B P. Wherein, the I frame is an intra-frame coded frame, which contains complete image information, and does not refer to any additional information to reconstruct the image. A P frame is a forward predictive coded frame, which is predicted from the previous P frame or I frame. B-frames are bi-directionally predicted compressed frames. When compressing an image frame into a B frame, the encoding unit 212 compresses the current frame according to the differences in data of the adjacent previous frame, current frame and subsequent frame. The encoding parameters of the encoding unit 212 may also include encoding frame rate and encoding size, but are not limited thereto. According to an embodiment of the present invention, a code example of encoding unit 212 performing an encoding operation is as follows:

Ret＝CLDC_Open(width,height,bitrate,mode)；Ret = CLDC_Open(width, height, bitrate, mode);

width:视频压缩的宽度width: the width of video compression

height:视频压缩的高度height: the height of video compression

bitrate:视频压缩的目标比特率bitrate: Target bitrate for video compression

mode:视频压缩模式控制mode: video compression mode control

//图像压缩引擎初始化//Image compression engine initialization

Ret＝CLDC_Encode(pBuf,dwbase,m_pVideoBuffer,nEncoderLen)；Ret = CLDC_Encode(pBuf, dwbase, m_pVideoBuffer, nEncoderLen);

pBuf:采集的图像数据缓冲pBuf: the collected image data buffer

dwbase:原始图像数据的长度dwbase: the length of the original image data

m_pVideoBuffer：图像压缩的缓冲m_pVideoBuffer: buffer for image compression

nEncoderLen：图像压缩后的长度nEncoderLen: the length of the compressed image

//图像编码后产生包含SPS和PPS的H264图像码流// Generate H264 image stream containing SPS and PPS after image encoding

Ret＝CLDC_Close()；Ret = CLDC_Close();

//图像编码引擎关闭//The image encoding engine is closed

视频缓冲区220适于存放视频采集单元210所生成的视频帧。这里，视频缓冲区220例如为环形缓冲区。例如，视频缓冲区220可以始终存储最新生成的20帧数据。The video buffer 220 is suitable for storing video frames generated by the video capture unit 210 . Here, the video buffer 220 is, for example, a ring buffer. For example, the video buffer 220 may always store the latest 20 frames of data generated.

音频采集单元230适于采集声音并编码为音频帧。在根据本发明一个实施例中，音频采集单元230可以对所采集的声音进行ACC(Advanced Audio Coding，高级音频编码)格式的编码，生成码率为192kbps。这里，音频采集单元230可以采集乐器演奏时的音乐和气息，但不限于此。在根据本发明一个实施例中，音频采集单元230进行音频采集和编码操作的相关代码示例如下：The audio collection unit 230 is adapted to collect sound and encode it into audio frames. In an embodiment according to the present invention, the audio collection unit 230 may encode the collected sound in ACC (Advanced Audio Coding, Advanced Audio Coding) format, and the generated code rate is 192 kbps. Here, the audio collection unit 230 may collect music and breath when the musical instrument is played, but is not limited thereto. In one embodiment of the present invention, the relevant code examples of the audio collection unit 230 performing audio collection and encoding operations are as follows:

int nSamples＝AUDIO_SAMPLERATE； //音频采样率int nSamples=AUDIO_SAMPLERATE; //Audio sampling rate

int nChannels＝1； //声道数int nChannels=1; //Number of channels

int nBits＝16； //音频采样的位数int nBits=16; //The number of audio samples

int nAudioBitrate＝192000； //音频输出的比特率int nAudioBitrate=192000; //The bit rate of the audio output

int nRet＝0；int nRet = 0;

nRet＝m_pFaacCodec->Open(nSamples,nChannels,nBits,nAudioBitrate)；nRet＝m_pFaacCodec->Open(nSamples, nChannels, nBits, nAudioBitrate);

//设置音频压缩参数，并打开音频压缩器//Set the audio compression parameters and open the audio compressor

nCodecRet＝m_pFaacCodec->Encode(pBuf,nLen,streamabuffer,nEncodeLen)；nCodecRet=m_pFaacCodec->Encode(pBuf,nLen,streamabuffer,nEncodeLen);

pBuf:原始的音频数据pBuf: raw audio data

nLen:音频数据的长度nLen: the length of the audio data

streamabuffer:编码输出的缓冲streamabuffer: buffer for encoding output

nEncodeLen:编码后的数据长度nEncodeLen: encoded data length

//进行音频编码，编码后的数据为[ADTS头]+[音频压缩数据]//Audio encoding, the encoded data is [ADTS header]+[audio compression data]

nRet＝m_pFaacCodec->Close()；nRet=m_pFaacCodec->Close();

//音频编码结束//End of audio encoding

另外，音频采集单元230还可以捕获所采集音频帧的时间值，并记录为第二时间戳。每个音频帧对应的第二时间戳例如为该帧第一个音频采样点的采集时间。In addition, the audio collection unit 230 may also capture the time value of the collected audio frame and record it as a second time stamp. The second time stamp corresponding to each audio frame is, for example, the collection time of the first audio sampling point of the frame.

音频缓冲区240适于存放音频采集单元230所生成的音频帧。这里，音频缓冲区240例如为环形缓冲区。每帧音频帧还可以包括与其对应的第二时间戳。这样，媒体播放端在获取到媒体终端200传输的音频帧和视频帧时，可以根据第一时间戳和第二时间戳对音频帧和视频帧进行同步处理。The audio buffer 240 is suitable for storing audio frames generated by the audio acquisition unit 230 . Here, the audio buffer 240 is, for example, a ring buffer. Each audio frame may also include a second time stamp corresponding thereto. In this way, when the media player acquires the audio frame and the video frame transmitted by the media terminal 200, it can perform synchronous processing on the audio frame and the video frame according to the first timestamp and the second timestamp.

发送缓冲区250适于存储一个或多个待发送的数据帧。其中每个待发送的数据帧为来自视频缓冲区220的视频帧或者来自音频缓冲区240的音频帧。The transmit buffer 250 is adapted to store one or more data frames to be transmitted. Each data frame to be sent is a video frame from the video buffer 220 or an audio frame from the audio buffer 240 .

传输单元260适于将发送缓冲区250中待发送的数据帧依次传输到媒体服务器。这样，媒体服务器可以向媒体播放端传输待发送的数据帧。媒体播放端接收并播放视频数据和音频数据。The transmission unit 260 is adapted to sequentially transmit the data frames to be transmitted in the transmission buffer 250 to the media server. In this way, the media server can transmit the data frame to be sent to the media player. The media player receives and plays video data and audio data.

如上所述，发送缓冲区250中存放有等待传输单元260传输的数据帧。控制单元270适于从音频缓冲区220和视频缓冲区240提取数据帧并推送到发送缓冲区250。一般而言，在例如音乐教学等流媒体系统中，声音的重要性要高于视频信息等数据。控制单元270在每次执行推送操作时，对音频缓冲区220进行优先检测。一种情况是，音频缓冲区240具有音频帧，控制单元270提取一帧音频帧推送到发送缓冲区250。另一种情况是，控制单元270检测到音频缓冲区240为空。换言之，控制单元270执行本次推送操作时音频缓冲区240没有需要发送的音频帧。控制单元270继续检测视频帧缓冲区220是否为空和检测发送缓冲区250中待发送的数据帧的数量是否超过阈值。阈值例如为5。这里，之所以要检测发送缓冲区250中数据帧是否超过阈值，是为了通过发送缓冲区250中数据帧数量判断当前传输单元260是否处于阻塞状态(即网络异常)。换言之，传输单元260在未处于阻塞状态时，可以及时将发送缓冲区250中数据帧进行网络传输。那么，发送缓冲区250不会积累有超过阈值的数据帧。反之，传输单元260在处于阻塞状态时，不能及时将发送缓冲区250中数据帧进行网络传输。发送缓冲区250中积累的数据帧的数量会超过阈值。需要说明的是，本发明对检测视频缓冲区220和检测发送缓冲区250的顺序不做过多限定。在一个实施例中，控制单元270首先检测视频缓冲区220是否为空，并在视频缓冲区220为空时不再检测发送缓冲区250中数据帧数据是否超阈值。换言之，如果视频缓冲区220为空，则结束本次推送操作。这里，两次相邻推送操作的时间间隔例如为10毫秒。另外，如果视频缓冲区220不为空，控制单元270检测发送缓冲区250中数据帧数据量是否超阈值。在又一个实施例中，控制单元270先检测发送缓冲区250，并在数据帧数量未超过阈值时，检测视频缓冲区220是否为空。在根据本发明一个实施中，控制单元270工作过程的代码示例如下：As mentioned above, the sending buffer 250 stores data frames waiting to be transmitted by the transmission unit 260 . The control unit 270 is adapted to extract data frames from the audio buffer 220 and the video buffer 240 and push them to the sending buffer 250 . Generally speaking, in streaming media systems such as music teaching, sound is more important than data such as video information. The control unit 270 performs priority detection on the audio buffer 220 each time a push operation is performed. In one case, the audio buffer 240 has audio frames, and the control unit 270 extracts one audio frame and pushes it to the sending buffer 250 . Another situation is that the control unit 270 detects that the audio buffer 240 is empty. In other words, when the control unit 270 performs this push operation, the audio buffer 240 has no audio frames to be sent. The control unit 270 continues to detect whether the video frame buffer 220 is empty and whether the number of data frames to be transmitted in the transmission buffer 250 exceeds a threshold. The threshold value is 5, for example. Here, the reason for detecting whether the data frames in the sending buffer 250 exceed the threshold is to judge whether the current transmission unit 260 is in a blocked state (that is, the network is abnormal) according to the number of data frames in the sending buffer 250 . In other words, when the transmission unit 260 is not in the blocking state, it can transmit the data frames in the sending buffer 250 to the network in time. Then, the transmit buffer 250 will not accumulate data frames exceeding the threshold. On the contrary, when the transmission unit 260 is in the blocked state, it cannot transmit the data frames in the sending buffer 250 to the network in time. The number of data frames accumulated in the transmit buffer 250 may exceed a threshold. It should be noted that the order of detecting the video buffer 220 and detecting the sending buffer 250 is not too limited in the present invention. In one embodiment, the control unit 270 first detects whether the video buffer 220 is empty, and no longer detects whether the data frame data in the sending buffer 250 exceeds a threshold when the video buffer 220 is empty. In other words, if the video buffer 220 is empty, the push operation ends. Here, the time interval between two adjacent push operations is, for example, 10 milliseconds. In addition, if the video buffer 220 is not empty, the control unit 270 detects whether the amount of data frames in the sending buffer 250 exceeds a threshold. In yet another embodiment, the control unit 270 first detects the sending buffer 250, and when the number of data frames does not exceed the threshold, detects whether the video buffer 220 is empty. In an implementation according to the present invention, the code example of the working process of the control unit 270 is as follows:

如上所述，控制单元270在检测到视频缓冲区220不为空且待发送的数据帧数量未超过阈值时，从视频缓冲区220提取一帧视频帧并推送到发送缓冲区250。综上，根据本发明的控制单元270可以优先将音频缓冲区240中音频帧推送到发送缓冲区250，而在音频缓冲区240为空并且发送缓冲区中数据帧的数量没有超过阈值时，才会将视频缓冲区220中视频帧推送到发送缓冲区250。这样，根据本发明的媒体终端200可以始终优先对音频帧进行传输，而在传输音频帧的间隙对视频帧进行传输。特别是，在网络带宽较低时(即发送缓冲区250中数据帧的数量超过阈值)，根据本发明的媒体终端200可以停止向发送缓冲区250推送视频帧，而正常地将音频缓冲区240中音频帧推送到发送缓冲区250中。这样的工作方式可以在网络带宽较低时有效减小传输单元260需要发送的数据量，从而有效的保证音频帧的实时传输。换言之，根据本发明的媒体终端200实现了对音频帧的优先传输，从而避免了媒体播放端所播放声音的卡断问题。应当注意，在音乐教学等声音重要性较高的场合中，根据本发明的媒体终端保证声音数据的实时传输，进而媒体播放端可以播放播放连续、不失真的音频数据。As mentioned above, when the control unit 270 detects that the video buffer 220 is not empty and the number of data frames to be sent does not exceed the threshold, it extracts a video frame from the video buffer 220 and pushes it to the sending buffer 250 . In summary, the control unit 270 according to the present invention can preferentially push the audio frames in the audio buffer 240 to the sending buffer 250, and only when the audio buffer 240 is empty and the number of data frames in the sending buffer does not exceed the threshold The video frames in the video buffer 220 will be pushed to the sending buffer 250 . In this way, the media terminal 200 according to the present invention can always give priority to transmission of audio frames, and transmit video frames in the interval of transmission of audio frames. Especially, when the network bandwidth is low (that is, the number of data frames in the sending buffer 250 exceeds a threshold), the media terminal 200 according to the present invention can stop pushing video frames to the sending buffer 250, and normally send the audio buffer 240 The middle audio frame is pushed into the sending buffer 250 . Such a working method can effectively reduce the amount of data to be sent by the transmission unit 260 when the network bandwidth is low, thereby effectively ensuring real-time transmission of audio frames. In other words, the media terminal 200 according to the present invention implements the priority transmission of audio frames, thereby avoiding the problem of the sound being played by the media player terminal being interrupted. It should be noted that in music teaching and other occasions where the importance of sound is high, the media terminal according to the present invention ensures real-time transmission of sound data, and the media player can play continuous, undistorted audio data.

另外，在发送缓冲区250中数据帧的数量大于阈值时，控制单元270确认当前网络异常。控制单元270还适于计算传输单元260对发送缓冲区中视频帧的发送码率，以便视频采集单元210根据该发送码率对视频帧的生成码率进行调节。根据本发明一个实施例，控制单元270可以统计预定时间(例如2秒)内，传输单元对发送缓冲区中视频帧的发送数量，进而计算发送码率。随后，视频采集单元210可以根据发送码率对图像采集参数和编码参数进行调节，以便调节视频帧的生成码率。例如，视频采集单元210可以对采集原始图像的尺寸(分辨率)或者采集帧率进行调节。又例如，视频采集单元220可以对对原始图像帧进行编码操作时的编码参数进行调节。编码参数例如包括编码帧率、生成视频帧的编码尺寸等。这样，视频采集单元210在调节生成码率后，使得生成码率与传输单元260对视频帧的发送码率相匹配。进一步，传输单元260可以对视频帧进行实时发送，并有效减小了发送缓冲区250中数据帧数量超过阈值的情况出现，并且避免了由于视频缓冲区220中视频帧码率过高而不能被及时传输的问题。另外，由于视频采集单元可以对编码帧率进行调节，使得传输单元260所传输的视频帧的序列在采集时间上是均匀的。这样，媒体播放端避免了所播放的视频帧延时过大和画面跳跃的问题。In addition, when the number of data frames in the sending buffer 250 is greater than the threshold, the control unit 270 confirms that the current network is abnormal. The control unit 270 is also adapted to calculate the transmission bit rate of the video frames in the transmission buffer by the transmission unit 260, so that the video acquisition unit 210 can adjust the generation bit rate of the video frames according to the transmission bit rate. According to an embodiment of the present invention, the control unit 270 can count the number of video frames sent by the transmission unit to the sending buffer within a predetermined time (for example, 2 seconds), and then calculate the sending code rate. Subsequently, the video acquisition unit 210 may adjust the image acquisition parameters and encoding parameters according to the sending bit rate, so as to adjust the bit rate for generating video frames. For example, the video capture unit 210 may adjust the size (resolution) of the captured original image or the capture frame rate. For another example, the video acquisition unit 220 may adjust encoding parameters when performing an encoding operation on the original image frame. The coding parameters include, for example, a coding frame rate, a coding size of a generated video frame, and the like. In this way, after the video acquisition unit 210 adjusts the generated bit rate, the generated bit rate matches the sending bit rate of the video frame by the transmission unit 260 . Further, the transmission unit 260 can send the video frames in real time, effectively reducing the situation that the number of data frames in the sending buffer 250 exceeds the threshold, and avoiding the fact that the video frame bit rate in the video buffer 220 is too high and cannot be processed. Timely transmission issues. In addition, since the video acquisition unit can adjust the encoding frame rate, the sequence of video frames transmitted by the transmission unit 260 is uniform in acquisition time. In this way, the media player avoids the problems of excessive frame delay and picture jumping of the played video.

图3示出了根据本发明一些实施例的采集媒体数据的方法300的流程图。方法300适于在根据本发明的媒体终端中执行。Fig. 3 shows a flowchart of a method 300 for collecting media data according to some embodiments of the present invention. The method 300 is suitable for execution in a media terminal according to the present invention.

如图3所示，方法300始于步骤S310。在步骤S310中，采集声音并编码为音频帧，并将该音频帧存放到音频缓冲区中。方法300还可以包括步骤S320，采集图像并编码为视频帧，并将该视频帧存放到视频缓冲区中。这里，音频缓冲区和视频缓冲区例如是环形缓冲。另外，在步骤S320中，还可以记录捕获到视频帧对应的原始图像的当前时间，作为第一时间戳。每个视频帧还可以包括对应的第一时间戳。在步骤S310中，还可以记录音频帧的采集时间，作为第二时间戳。这里第二时间戳例如为音频帧第一个采样点的时间戳。音频帧还可以包括第二时间戳。这样媒体播放端在获取到媒体终端200传输的音频帧和视频帧时，可以根据第一时间戳和第二时间戳对音频帧和视频帧进行同步处理。As shown in FIG. 3 , the method 300 starts at step S310. In step S310, the sound is collected and encoded into an audio frame, and the audio frame is stored in an audio buffer. The method 300 may also include step S320 of collecting images and encoding them into video frames, and storing the video frames in a video buffer. Here, the audio buffer and the video buffer are, for example, ring buffers. In addition, in step S320, the current time at which the original image corresponding to the video frame is captured may also be recorded as the first time stamp. Each video frame may also include a corresponding first timestamp. In step S310, the acquisition time of the audio frame may also be recorded as the second time stamp. Here, the second time stamp is, for example, the time stamp of the first sampling point of the audio frame. The audio frame may also include a second timestamp. In this way, when the media player acquires the audio frame and the video frame transmitted by the media terminal 200, it can perform synchronous processing on the audio frame and the video frame according to the first timestamp and the second timestamp.

根据本发明的媒体终端包括发送缓冲区。发送缓冲区适于存放一个或多个待发送的数据帧。每个待发送的数据帧为来自视频缓冲区的视频帧或来自音频缓冲区的音频帧。对于存放到音频缓冲区中的音频帧和存放在视频缓冲区中的视频帧，方法300通过执行步骤S330、S340和S350对音频帧和视频帧的网络传输进行控制。在步骤S330中，检测音频缓冲区并将其中的音频帧推送到发送缓冲区中。另外，如果在步骤S330中检测到音频缓冲区为空，则方法300执行步骤S340。在步骤S340中，判断是否发送缓冲区中待发送数据帧的数量未超过阈值且视频缓冲区不为空。具体而言，根据本发明的一个实施例，在步骤S340中，首先检测视频缓冲区是否为空。如果视频缓冲区为空，则当前没有需要发送的视频帧，方法300继续执行步骤S330。如果视频帧不为空，则继续检测发送缓冲区中待发送的数据帧的数量是否超过阈值。如果，待发送数据帧数量超过阈值，则说明当前传输网络阻塞。为了优先传输音频帧，则不会将视频帧推送到发送缓冲区,而是继续执行步骤S330。根据本发明又一个实施例，在步骤S340中，首先检测当前发送缓冲区中待发送的数据帧的数量是否超过阈值。如果超过阈值，则不再检测视频缓冲区，而是继续执行步骤S330。A media terminal according to the invention comprises a send buffer. The send buffer is suitable for storing one or more data frames to be sent. Each data frame to be sent is a video frame from a video buffer or an audio frame from an audio buffer. For the audio frames stored in the audio buffer and the video frames stored in the video buffer, the method 300 controls the network transmission of the audio frames and video frames by performing steps S330, S340 and S350. In step S330, the audio buffer is detected and the audio frames therein are pushed to the sending buffer. In addition, if it is detected in step S330 that the audio buffer is empty, the method 300 executes step S340. In step S340, it is judged whether the number of data frames to be sent in the sending buffer does not exceed the threshold and the video buffer is not empty. Specifically, according to an embodiment of the present invention, in step S340, firstly, it is detected whether the video buffer is empty. If the video buffer is empty, there is currently no video frame to be sent, and the method 300 proceeds to step S330. If the video frame is not empty, continue to detect whether the number of data frames to be sent in the sending buffer exceeds the threshold. If the number of data frames to be sent exceeds the threshold, it indicates that the current transmission network is blocked. In order to transmit the audio frame first, the video frame will not be pushed to the sending buffer, but continue to execute step S330. According to yet another embodiment of the present invention, in step S340, it is first detected whether the number of data frames to be sent in the current sending buffer exceeds a threshold. If the threshold is exceeded, the video buffer is no longer detected, but step S330 is continued.

另外，如果在步骤S340中，检测到待发送的数据帧的数量未超过阈值，并且视频缓冲区不为空，则执行步骤S350。在步骤S350中，从视频缓冲区中提取一帧视频帧并推送到发送缓冲区。In addition, if in step S340, it is detected that the number of data frames to be sent does not exceed the threshold and the video buffer is not empty, then step S350 is performed. In step S350, a video frame is extracted from the video buffer and pushed to the sending buffer.

如上所述，根据本发明的方法300适于通过执行步骤S310和S320来生成音频帧和视频帧，并且通过执行步骤S330、S340和S350对所要发送的音频帧和视频帧进行选择，进而将待发送的数据帧存放到发送缓冲区中。对于发送缓冲区中数据帧，方法300通过执行步骤S360来进行网络传输。在步骤S360中，将发送缓冲区中待发送的数据帧传输到媒体服务器。根据本发明的方法300更具体的实施方式与图2中媒体终端200的工作方式一致，这里不再赘述。As mentioned above, the method 300 according to the present invention is suitable for generating audio frames and video frames by executing steps S310 and S320, and selecting the audio frames and video frames to be sent by executing steps S330, S340, and S350, and then the The data frame sent is stored in the sending buffer. For the data frames in the sending buffer, the method 300 performs network transmission by executing step S360. In step S360, the data frame to be sent in the sending buffer is transmitted to the media server. A more specific implementation manner of the method 300 according to the present invention is consistent with the working manner of the media terminal 200 in FIG. 2 , and will not be repeated here.

A10、如A8或A9所述的方法，其中，所述视频缓冲区、所述音频缓冲区和所述发送缓冲区为环形缓冲。A11、如A8-A10中任一项所述的方法，其中，所述判断是否发送缓冲区中待发送数据帧的数量未超过阈值且视频缓冲区不为空的步骤包括：检测视频缓冲区是否为空，如果不为空，继续检测所述发送缓冲区中待发送数据帧的数量是否超过阈值。A12、如A8-A11中任一项所述的方法，其中，所述判断是否发送缓冲区中待发送数据帧的数量未超过阈值且视频缓冲区不为空的步骤包括：检测所述发送缓冲区中待发送数据帧的数量是否超过阈值，并在未超过阈值时，检测视频缓冲区是否为空。A13、如A8-A12中任一项所述的方法，其中，所述视频帧包括第一时间戳，该第一时间戳为该视频帧对应的图像的捕获时间值；所述音频帧包括第二时间戳，该第二时间戳为该音频帧对应的声音的捕获时间值。A14、如A8-A13中任一项所述的方法，其中所述阈值为5。A10. The method described in A8 or A9, wherein the video buffer, the audio buffer and the sending buffer are ring buffers. A11. The method according to any one of A8-A10, wherein the step of judging whether the number of data frames to be sent in the sending buffer does not exceed a threshold and the video buffer is not empty includes: detecting whether the video buffer is empty, if not empty, continue to detect whether the number of data frames to be sent in the sending buffer exceeds the threshold. A12. The method according to any one of A8-A11, wherein the step of judging whether the number of data frames to be sent in the sending buffer does not exceed a threshold and the video buffer is not empty includes: detecting the sending buffer Whether the number of data frames to be sent in the zone exceeds the threshold, and if the threshold is not exceeded, check whether the video buffer is empty. A13. The method according to any one of A8-A12, wherein the video frame includes a first timestamp, which is the capture time value of the image corresponding to the video frame; the audio frame includes the first timestamp Two timestamps, where the second timestamp is the capture time value of the sound corresponding to the audio frame. A14. The method according to any one of A8-A13, wherein the threshold is 5.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员应当理解在本文所公开的示例中的设备的模块或单元或组件可以布置在如该实施例中所描述的设备中，或者可替换地可以定位在与该示例中的设备不同的一个或多个设备中。前述示例中的模块可以组合为一个模块或者此外可以分成多个子模块。Those skilled in the art will understand that the modules or units or components of the devices in the examples disclosed herein may be arranged in the device as described in this embodiment, or alternatively may be located in a different location than the device in this example. in one or more devices. The modules in the preceding examples may be combined into one module or furthermore may be divided into a plurality of sub-modules.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

此外，所述实施例中的一些在此被描述成可以由计算机系统的处理器或者由执行所述功能的其它装置实施的方法或方法元素的组合。因此，具有用于实施所述方法或方法元素的必要指令的处理器形成用于实施该方法或方法元素的装置。此外，装置实施例的在此所述的元素是如下装置的例子：该装置用于实施由为了实施该发明的目的的元素所执行的功能。Furthermore, some of the described embodiments are described herein as a method or combination of method elements that may be implemented by a processor of a computer system or by other means for performing the described function. Thus, a processor with the necessary instructions for carrying out the described method or element of a method forms a means for carrying out the method or element of a method. Furthermore, elements described herein of an apparatus embodiment are examples of means for carrying out the function performed by the element for the purpose of carrying out the invention.

如在此所使用的那样，除非另行规定，使用序数词“第一”、“第二”、“第三”等等来描述普通对象仅仅表示涉及类似对象的不同实例，并且并不意图暗示这样被描述的对象必须具有时间上、空间上、排序方面或者以任意其它方式的给定顺序。As used herein, unless otherwise specified, the use of ordinal numbers "first," "second," "third," etc. to describe generic objects merely means referring to different instances of similar objects and is not intended to imply such The described objects must have a given order temporally, spatially, sequentially or in any other way.

尽管根据有限数量的实施例描述了本发明，但是受益于上面的描述，本技术领域内的技术人员明白，在由此描述的本发明的范围内，可以设想其它实施例。此外，应当注意，本说明书中使用的语言主要是为了可读性和教导的目的而选择的，而不是为了解释或者限定本发明的主题而选择的。因此，在不偏离所附权利要求书的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围，对本发明所做的公开是说明性的，而非限制性的，本发明的范围由所附权利要求书限定。While the invention has been described in terms of a limited number of embodiments, it will be apparent to a person skilled in the art having the benefit of the above description that other embodiments are conceivable within the scope of the invention thus described. In addition, it should be noted that the language used in the specification has been chosen primarily for the purpose of readability and instruction rather than to explain or define the inventive subject matter. Accordingly, many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. With respect to the scope of the present invention, the disclosure of the present invention is intended to be illustrative rather than restrictive, and the scope of the present invention is defined by the appended claims.

Claims

1. a kind of media termination, including：

Video acquisition unit suitable for acquisition image and is encoded to video frame；

Screen buffer is suitable for video frame of the storage from the video acquisition unit；

Audio collection unit suitable for acquisition sound and is encoded to audio frame；

Audio buffer is suitable for audio frame of the storage from the audio collection unit；

Buffering area is sent, is suitable for storing one or more data frames to be sent, wherein each data frame to be sent is from institute State the video frame of screen buffer or the audio frame from the audio buffer；

Transmission unit is suitable for one or more of data frame transfers to be sent to media server；And

Control unit is adapted to detect for audio buffer and is pushed to audio frame therein to send in buffering area, and slow in audio When to rush area be empty, judge whether that the quantity for sending data frame to be sent in buffering area is less than threshold value and screen buffer is not Sky,

If the quantity of data frame to be sent is less than threshold value and screen buffer is not sky, extracted from the screen buffer One frame video frame is simultaneously pushed in transmission buffering area；

Wherein, described control unit is further adapted for when the quantity of data frame to be sent in detecting transmission buffering area is more than threshold value, Transmission code check of the transmission unit to video frame is detected, and code check adjusting parameter is generated according to the transmission code check；And

The video acquisition unit is further adapted for adjusting the generation code check of video frame according to the code check adjusting parameter.

2. media termination as described in claim 1, wherein the screen buffer, the audio buffer and the transmission Buffering area is loop buffer.

3. media termination as claimed in claim 1 or 2, wherein described control unit is suitable for being judged whether according to following manner The quantity for sending data frame to be sent in buffering area is less than threshold value and screen buffer is not sky：

Whether detection screen buffer is empty, if being not sky, continues to detect data frame to be sent in the transmission buffering area Whether quantity is more than threshold value.

4. media termination as claimed in claim 1 or 2, wherein described control unit is suitable for being judged whether according to following manner The quantity for sending data frame to be sent in buffering area is less than threshold value and screen buffer is not sky：

Whether the quantity for detecting data frame to be sent in the transmission buffering area is more than threshold value, and when being less than threshold value, detection Whether screen buffer is empty.

5. media termination as claimed in claim 1 or 2, wherein

The video frame includes stabbing at the first time, which is the capture time value of the corresponding image of the video frame；

The audio frame includes the second timestamp, which is the capture time value of the corresponding sound of the audio frame.

6. media termination as claimed in claim 1 or 2, wherein the threshold value is 5.

7. a kind of method of acquisition media data, suitable for being executed in media termination, which includes sending buffering area, is fitted In the one or more data frames to be sent of storage, wherein each data frame to be sent is video frame or audio frame, this method Including：

Acquisition image is simultaneously encoded to video frame, and the video frame is stored in screen buffer；

Acquisition sound is simultaneously encoded to audio frame, and the audio frame is stored in audio buffer；

Audio frame therein is simultaneously pushed in transmission buffering area by detection audio buffer；

If audio buffer is sky, judge whether that the quantity for sending data frame to be sent in buffering area is less than threshold value and regards Frequency buffering area is not sky,

If the quantity of data frame to be sent is less than threshold value and screen buffer is not sky, carried from the screen buffer It takes a frame video frame and is pushed to and send in buffering area；

When the quantity of data frame to be sent in detecting transmission buffering area is more than threshold value, detection transmission unit is to video frame Code check is sent, and code check adjusting parameter is generated according to the transmission code check；

The generation code check of video frame is adjusted according to the code check adjusting parameter；And

By one or more of data frame transfers to be sent to media server.

8. the method for claim 7, wherein the screen buffer, the audio buffer and transmission buffering Area is loop buffer.

9. method as claimed in claim 7 or 8, wherein the number for judging whether to send data frame to be sent in buffering area Amount is less than threshold value and screen buffer is not that empty step includes：

10. method as claimed in claim 7 or 8, wherein the number for judging whether to send data frame to be sent in buffering area Amount is less than threshold value and screen buffer is not that empty step includes：

11. method as claimed in claim 7 or 8, wherein

12. method as claimed in claim 7 or 8, wherein the threshold value is 5.

13. a kind of music lesson system, including：

Media termination as described in any one of claim 1-6；

Media server is suitable for receiving audio frame and video frame that the media termination is sent；And

Media play end, suitable for being obtained from the media server and playing the audio frame and video frame.