[go: up one dir, main page]

WO2023050921A1 - Video and audio data sending method, display method, sending end and receiving end - Google Patents

Video and audio data sending method, display method, sending end and receiving end Download PDF

Info

Publication number
WO2023050921A1
WO2023050921A1 PCT/CN2022/100589 CN2022100589W WO2023050921A1 WO 2023050921 A1 WO2023050921 A1 WO 2023050921A1 CN 2022100589 W CN2022100589 W CN 2022100589W WO 2023050921 A1 WO2023050921 A1 WO 2023050921A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
data
audio data
audio
sending
Prior art date
Application number
PCT/CN2022/100589
Other languages
French (fr)
Chinese (zh)
Inventor
刘志龙
石挺干
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023050921A1 publication Critical patent/WO2023050921A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless

Definitions

  • the embodiments of the present application relate to the field of data transmission, and in particular to a method for sending video and audio data, a display method, a sending end, a receiving end, electronic equipment, and a storage medium.
  • the embodiment of the present application provides a method for sending video and audio data, which is applied to the sending end, including: encoding the collected audio data and video data; according to the real-time detection result of the network quality, using the video data corresponding to the real-time detection result
  • the audio transmission strategy is used to transmit video and audio data; wherein, the real-time detection results include at least the first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the video frequency corresponding to the second quality level
  • the audio sending strategy includes: sending encoded audio data and encoded partial video data to the receiving end, the partial video data including at least some key frames and motion information of untransmitted video frames.
  • the embodiment of the present application also provides a video and audio data display method, which is applied to the receiving end, including: receiving encoded data; decoding the received encoded data; the decoded data includes audio data and part of video data
  • the untransmitted video frame is reconstructed to obtain the reconstructed video frame; wherein, the part of the video data includes at least part of the key frame and the untransmitted video frame
  • Motion information rendering and displaying the reconstructed video frame, key frames in the decoded data and the audio data.
  • the embodiment of the present application also provides a sending end, including: an encoding module, configured to encode the collected audio data and video data; and a sending module, configured to use the real-time detection result corresponding to the network quality
  • the video and audio transmission strategy is used to send video and audio data; wherein, the real-time detection results include at least the first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the network quality corresponding to the second quality level
  • the video and audio sending strategy includes: sending encoded audio data and encoded partial video data to the receiving end, the partial video data including at least some key frames and motion information of untransmitted video frames.
  • the embodiment of the present application also provides a receiving end, including: a receiving module for receiving encoded data; a decoding module for decoding the received encoded data, and the decoded data includes audio data and part of video data
  • a receiving module for receiving encoded data
  • a decoding module for decoding the received encoded data
  • the decoded data includes audio data and part of video data
  • the part of the video data includes at least part of the key frame and the untransmitted video frame motion information
  • a display module configured to display the reconstructed video frame, the key frame in the decoded data, and the obtained data when the decoded data includes audio data and part of the video data
  • the embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor.
  • the processor is executed, so that at least one processor can execute the above-mentioned method for sending video and audio data or the method for displaying video and audio data.
  • the embodiment of the present application also provides a computer-readable storage medium, which stores a computer program.
  • the computer program is executed by a processor, the above method for sending video and audio data or the method for displaying video and audio data is implemented.
  • Fig. 1 is a flowchart of a method for sending video and audio data provided according to an embodiment of the present application
  • Fig. 2 is a flow chart of a method for displaying video and audio data provided according to another embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a sending end provided according to another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a receiving end provided according to another embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an electronic device provided according to another embodiment of the present application.
  • An embodiment of the present application relates to a method for sending video and audio data, which is applied to a sending end.
  • the method includes: encoding the collected audio data and video data; according to the real-time detection result of the network quality, adopting the video and audio transmission strategy corresponding to the real-time detection result to send the video and audio data; wherein, the real-time detection result includes at least the first The first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the video and audio transmission strategy corresponding to the second quality level includes: the encoded audio data and the encoded video data The motion information of key frames and non-key frames is sent to the receiver.
  • the application scenarios of the embodiments of the present application may include but not limited to: video conferencing, video chat, and intelligent customer service.
  • Step 101 The sending end collects video and audio data and encodes it.
  • the collection device includes but not limited to a camera and a microphone.
  • Step 102 According to the real-time detection result of network quality, judge whether the current network quality level is at the first quality level, if yes, go to step 103, if not, go to step 104.
  • the sending end needs to judge the quality level of the current network quality before sending it.
  • the video and audio data corresponding to the real-time detection results are used.
  • the reference information is quality indicators such as packet loss, delay, jitter, false alarm rate, round-trip delay (Round-Trip Time, RTT), bandwidth, etc., or any combination of the above indicators.
  • the corresponding sending strategies are also different.
  • the above-mentioned network quality level is at the first quality level, it means that the current network status is normal, and the audio data and video data can be transmitted normally and simultaneously.
  • step 103 send the encoded audio data and encoded video data to the receiving end, the sending is completed, and the process ends.
  • the video and audio transmission strategy corresponding to the first quality level is adopted to transmit video and audio data
  • the video and audio transmission strategy corresponding to the first quality level is The encoded audio data and encoded video data are sent to the receiving end.
  • the above-mentioned first quality level means that the current network supports the simultaneous transmission of normally encoded audio data and video data, which is the most ideal state of the network in this embodiment of the application.
  • Step 104 Determine whether the current network is at the second quality level, if yes, go to step 105, if not, go to step 106.
  • the referenced information is the same as step 102, and will not be repeated here.
  • the video and audio transmission strategy corresponding to the second quality level is adopted.
  • the network quality is at the second quality level, which means that the current network does not support simultaneous transmission of audio data and complete video data, but can support normal transmission of audio data and partial video data.
  • the key points and the Jacobian matrix of the video frame can be extracted according to a fixed quantization step size, and the extracted key points and Jacobian matrix can be used as a video Frame motion information. This is because in the process of data transmission, the amount of motion data used to characterize the key features of video frames is much smaller than the amount of traditional video frame data. Based on this, the transmission of motion information representing key features of video can greatly reduce the impact on the network bandwidth requirements.
  • Step 105 Send the encoded audio data and part of the encoded video data to the receiving end, the sending is completed, and the process ends.
  • the partial video data includes at least some key frames and motion information of untransmitted video frames.
  • the motion information of the video frame is used to reconstruct the video frame, so that the receiving end can render and display according to the reconstructed video frame, transmitted key frames and audio data.
  • the sending end may select some key frames as the transmitted video frames, that is, the transmitted video data includes the selected key frames and the motion information of the unselected key frames and the motion information of the non-key frames. For example, select the first key frame, the fifth key frame, the 10th key frame...the Nth key frame as the transmitted video frame, the key frames not transmitted in the middle and the non-key frames are all transmitted through the motion information only way to reconstruct at the receiving end.
  • the sending end may also use all key frames as transmitted video frames, that is, the transmitted video data includes motion information of all key frames and non-key frames.
  • Non-key frames are reconstructed at the receiving end according to the motion information of the non-key frames. Selecting some key frames as video frames for transmission, or using all key frames as video frames for transmission, can be determined according to the requirements of the service on image quality.
  • the aforementioned key frame refers to the frame where the key action of the character or object is moving or changing, and the number of key frames is not limited in this embodiment of the present application.
  • the sender while encoding the audio data, extracts the motion information of the non-key frame of the game screen, first selects a frame of the reference game screen for transmission, and then extracts the non-reference game frame according to the fixed quantization step size.
  • Frame keypoints and Jacobian matrices used to represent motion information for these non-referenced game frames.
  • the data volume of the motion information representing the key features of the video is much smaller than the data volume of the real frame of the optimized picture in the traditional technology, thereby reducing the requirements on the network.
  • the untransmitted video frame is reconstructed to obtain the reconstructed video frame; wherein, The part of video data includes at least part of key frames and motion information of untransmitted video frames, and then the reconstructed video frames, transmitted key frames and audio data are rendered and displayed.
  • step 104 When it is determined in step 104 that the current network quality is not at the second quality level, it means that the current network quality is at the third quality level, and enter step 106: send the encoded audio data to the receiving end, the transmission is completed, and the process ends.
  • the video and audio sending strategy corresponding to the third quality level is adopted, that is, only the encoded audio data is sent to the receiving end.
  • the network quality is at the third quality level, which means that the current network cannot carry audio data and video data at the same time, and can only carry the normal transmission of audio data.
  • the method for transmitting video and audio data proposed in the embodiment of the present application includes: the transmitting end encodes the collected audio data and video data, and adopts a corresponding video and video transmission strategy for video and audio transmission according to the real-time detection results of network quality send.
  • the transmitting end encodes the collected audio data and video data, and adopts a corresponding video and video transmission strategy for video and audio transmission according to the real-time detection results of network quality send.
  • the encoded video data determine key frames and non-key frames, and send the encoded audio data and the motion information of the key frames and non-key frames in the encoded video data to Receiving end.
  • the above-mentioned real-time detection results include at least the first quality level and the second quality level, and the video and audio transmission strategy corresponding to the second quality level is to combine the motion information of key frames and non-key frames in the encoded audio data and encoded video data sent to the receiving end.
  • the above process solves the problem of video and audio data that appears in the environment of weak network, network switching, etc. Unsatisfactory transmission results lead to video and audio freezes, blurred screens, and interruptions, which lead to poor user experience.
  • the data required by the receiving end can also be sent in different network environments.
  • Another embodiment of the present application relates to a method for displaying video and audio data, which is applied to a receiving end.
  • the implementation details of the video and audio display method of the embodiment of the present application will be described in detail below with reference to FIG. 2 .
  • the following content is only the implementation details provided for the convenience of understanding, and is not necessary for implementing the solution.
  • the application scenarios of the embodiments of the present application may include but not limited to: video conferencing, video chat, and intelligent customer service.
  • Step 201 Receive coded data.
  • encoded data refers to encoded data of audio data and/or video data.
  • Step 202 After decoding the received coded data, whether the obtained data only includes audio data, if yes, go to step 206, if not, go to step 203.
  • Step 203 Further judge whether the decoded data includes audio data and part of video data. If the decoded data includes audio data and partial video data, go to step 204; if the decoded data includes audio data and complete video data, go to step 208.
  • Step 204 According to the motion information of the untransmitted video frames, reconstruct the untransmitted video frames to obtain reconstructed video frames.
  • the video frame is reconstructed according to the motion information of the video frame for subsequent rendering and display.
  • Step 205 Perform rendering and display according to the reconstructed video frame, key frames in the decoded data and audio data, and end the process.
  • the decoded data includes motion information of key frames and non-key frames in audio data and video data.
  • the decoded data includes motion information of key frames and non-key frames in audio data and video data
  • the receiving end reconstructs the non-key frames according to the motion information of key frames and non-key frames to obtain reconstruction After the non-key frame, the reconstructed non-key frame, key frame and audio data are rendered and displayed.
  • step 202 If it is determined in step 202 that the obtained data only includes audio data, go to step 206: drive the avatar model to generate a dynamic video frame of the avatar moving with the audio data, and go to step 207.
  • the receiving end drives the virtual human model according to the audio data, thereby generating dynamic video frames of the virtual human whose actions change with the audio data.
  • Step 207 Render and display the dynamic video frame and audio data of the avatar above, and end the process.
  • the above-mentioned virtual human model is a human model of different roles preset in the database.
  • a dynamic video frame of the virtual human is generated.
  • Generate dynamic video frames along with the audio data avoiding screen freezes when the network environment becomes poor, and improving user experience.
  • step 203 If it is determined in step 203 that the decoded data includes audio data and complete video data, go to step 208: render and display the decoded audio data and video data.
  • the method for displaying video and audio data includes, when the decoded data only includes audio data, using the audio data to drive the avatar model to generate a dynamic video frame of the avatar whose actions change with the audio data, and using the above-mentioned avatar Dynamic video frames and audio modules are rendered and displayed; when audio data and partial video data are included after decoding, the untransmitted video frames are reconstructed first, and then rendered according to the reconstructed video frames, transmitted key frames and audio data show.
  • the above process enables the receiving end to ensure that the displayed video and audio images will not be stuck, blurred or even directly interrupted based on the decoded data in scenarios such as a weak network environment or network switching, which greatly enhances the user experience.
  • Another embodiment of the present application also relates to a sending end, as shown in FIG. 3 , including: an encoding module 301 and a sending module 302 .
  • the encoding module 301 is used to encode the collected audio data and video data; the sending module 302 is used to transmit the video and audio data according to the real-time detection result of the network quality, using the video and audio transmission strategy corresponding to the real-time detection result.
  • the real-time detection result includes at least the first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the video and audio transmission strategy corresponding to the second quality level includes:
  • the audio data and encoded partial video data are sent to the receiving end, and the partial video data includes at least some key frames and motion information of untransmitted video frames.
  • the video and audio sending strategy corresponding to the first quality level includes: sending encoded audio data and encoded video data to a receiving end.
  • the real-time detection result further includes a third quality level
  • the video and audio transmission strategy corresponding to the third quality level includes: only sending encoded audio data to the receiving end.
  • the encoding module 301 is further configured to extract the key points and Jacobian matrix of the video frame according to a fixed quantization step, and use the extracted key points and Jacobian matrix as motion information of the video frame.
  • the encoding module 301 is further configured to detect the quality index of the transmission network in real time before adopting the video and audio transmission strategy corresponding to the real-time detection result according to the real-time detection result of the network quality, wherein the quality index includes one of the following or Any combination of them: packet loss rate, delay, jitter, false alarm rate, round trip delay RTT, bandwidth.
  • the coding module 301 mainly completes video and audio coding, motion information extraction and other functions. It consists of video and audio sub-coding and feature extraction sub-modules. Among them, the video and audio encoding sub-module is the same as the encoding module in the traditional video and audio system, mainly responsible for encoding and compressing the original video and audio data, and outputting media data for network transmission.
  • the feature sub-extraction module is mainly responsible for extracting the motion information that characterizes the key features of the video.
  • a frame of reference image is selected for transmission, and then the key points and Jacobian matrix of the transmitted video image are extracted according to the fixed quantization step size, which are used to represent these unknown.
  • the motion information of the transmitted video frame because the data volume of the motion information representing the key features of the video is much smaller than the traditional video frame data, so the requirement for network bandwidth can be greatly reduced.
  • the sender in this embodiment makes different sending strategies respectively under different network environments, and encodes the collected audio data and video data; adopts corresponding strategies according to the real-time detection results of network quality.
  • Send video and audio data to greatly reduce the amount of data transmitted during the transmission process, and ensure that the data required by the receiving end can also be sent in scenarios such as weak network environments or network switching.
  • FIG. 4 Another embodiment of the present application also relates to a receiving end, as shown in FIG. 4 , including: a receiving module 401 , a decoding module 402 and a display module 403 .
  • the receiving module 401 is used to receive encoded data; the decoding module 402 is used to decode the received encoded data, and when the decoded data includes audio data and part of video data, according to the untransmitted The motion information of the video frame is reconstructed to the untransmitted video frame to obtain the reconstructed video frame; wherein, the part of the video data includes at least part of the key frame and the motion information of the untransmitted video frame; the display module 403 , for rendering and displaying the reconstructed video frame, key frames in the decoded data, and the audio data when the decoded data includes audio data and partial video data .
  • the decoding module 402 is also used to drive the avatar model according to the audio data when the decoded data only includes audio data, and generate dynamic video frames of the avatar whose actions change with the audio data; display Module 403 is also used for rendering and displaying the dynamic video frame and audio data of the avatar when the decoded data only includes audio data.
  • the decoding module 402 mainly performs functions such as decoding video and audio, reconstructing video frames according to motion information representing key features of the video, and the like. It can be composed of video and audio decoding sub-module, audio-driven video sub-module and video reconstruction sub-module.
  • the video and audio decoding sub-module is the same as the decoding module in the traditional video and audio system, and is mainly responsible for the decoding of video and audio media data.
  • the audio-driven video sub-module is responsible for driving the avatar model with the transmitted audio information, so that the avatar model can generate dynamic video images of the avatar following the changes in the audio data.
  • the video reconstruction sub-module is mainly responsible for using the motion information that characterizes the key features of the video to drive the previously transmitted video frames to be reconstructed into moving video images, so as to realize the reconstruction and restoration of the video.
  • the receiver uses the audio data to drive the avatar model to generate dynamic video frames of the avatar whose actions change with the audio data, and uses the above-mentioned avatar dynamic video frames and audio modules to perform Rendering and display; when the motion information of key frames and non-key frames in audio data and video data is included after decoding, the non-key frames are reconstructed first, and then rendered according to the reconstructed non-key frames, key frames and audio data show.
  • the above process enables the receiving end to ensure that the displayed video and audio images will not be stuck, blurred or even directly interrupted based on the decoded data in scenarios such as a weak network environment or network switching, which greatly enhances the user experience.
  • this embodiment is an apparatus embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment.
  • the relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
  • modules involved in this embodiment and the previous embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or It can be implemented as a combination of multiple physical units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • FIG. 5 Another embodiment of the present application also relates to an electronic device, as shown in FIG. 5 , including: at least one processor 501; and a memory 502 communicatively connected to the at least one processor 501; wherein, the memory 502 stores There are instructions that can be executed by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the above-mentioned sending method of video and audio data, or the above-mentioned video and audio data transmission method. Display method of audio data.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor manages the bus and general processing, and can also provide functions including timing, peripheral interfacing, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • Another embodiment of the present application also relates to a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the above-mentioned sending method of video and audio data or the above-mentioned display method of video and audio data are realized.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiments of the present application relate to the field of data transmissions. Disclosed is a video and audio sending method, which is applied to a sending end. The method comprises: encoding collected audio data and video data; and according to a real-time detection result of network quality, sending the video and audio data by using a video and audio sending policy corresponding to the real-time detection result, wherein the real-time detection result at least comprises a first quality level and a second quality level, the network quality of the second quality level is lower than that of the first quality level, and a video and audio sending policy corresponding to the second quality level comprises: sending the encoded audio data and part of the encoded video data to a receiving end, wherein the part of video data comprises at least some key frames and motion information of video frames which have not been transmitted.

Description

视音频数据的发送方法、显示方法、发送端及接收端Video and audio data sending method, display method, sending end and receiving end
交叉引用cross reference
本申请基于申请号为“202111165982.6”、申请日为2021年09月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is based on the Chinese patent application with the application number "202111165982.6" and the filing date is September 30, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference. Apply.
技术领域technical field
本申请实施例涉及数据传输领域,特别涉及一种视音频数据的发送方法、显示方法、发送端、接收端、电子设备、存储介质。The embodiments of the present application relate to the field of data transmission, and in particular to a method for sending video and audio data, a display method, a sending end, a receiving end, electronic equipment, and a storage medium.
背景技术Background technique
随着视音频技术、移动互联网技术的不断发展,视音频数据的传输系统已经被广泛应用在人们的日常生活中,视频会议、视频聊天等典型的视音频系统的受众人群也越来越广。与此同时,也出现了一个问题,现有的网络环境中的大量弱网环境的存在,并不能时刻满足视音频数据同时传输的需求,例如,在电梯、地下车库、公共交通上、高铁上、人群聚集场所、第五代移动通信/第四代移动通信等网络切换等的场景下,经常会出现视音频画面卡顿、花屏甚至直接中断的情况,导致用户体验差。With the continuous development of video and audio technology and mobile Internet technology, the transmission system of video and audio data has been widely used in people's daily life, and the audience of typical video and audio systems such as video conferencing and video chat is becoming wider and wider. At the same time, there is also a problem. The existence of a large number of weak network environments in the existing network environment cannot always meet the needs of simultaneous transmission of video and audio data, for example, in elevators, underground garages, public transportation, and high-speed rail. , crowd gathering places, fifth-generation mobile communication/fourth-generation mobile communication and other network switching scenarios, video and audio images often freeze, blurred or even directly interrupted, resulting in poor user experience.
发明内容Contents of the invention
本申请实施例提供了一种视音频数据的发送方法,应用于发送端,包括:对采集到的音频数据和视频数据进行编码;根据网络质量的实时检测结果,采用与实时检测结果对应的视音频发送策略,进行视音频数据的发送;其中,实时检测结果至少包括第一质量等级和第二质量等级;第二质量等级的网络质量低于第一质量等级;与第二质量等级对应的视音频发送策略包括:将编码后的 音频数据以及编码后的部分视频数据发送至所述接收端,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息。The embodiment of the present application provides a method for sending video and audio data, which is applied to the sending end, including: encoding the collected audio data and video data; according to the real-time detection result of the network quality, using the video data corresponding to the real-time detection result The audio transmission strategy is used to transmit video and audio data; wherein, the real-time detection results include at least the first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the video frequency corresponding to the second quality level The audio sending strategy includes: sending encoded audio data and encoded partial video data to the receiving end, the partial video data including at least some key frames and motion information of untransmitted video frames.
本申请实施例还提供了一种视音频数据的显示方法,应用于接收端,包括:接收编码数据;对接收的编码数据进行解码;在经解码后得到的数据包括音频数据以及部分视频数据的情况下,根据未传输的视频帧的运动信息,对未传输的视频帧进行重构,得到重构后的视频帧;其中,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息;将所述重构后的视频帧、所述解码后得到的数据中的关键帧和所述音频数据进行渲染显示。The embodiment of the present application also provides a video and audio data display method, which is applied to the receiving end, including: receiving encoded data; decoding the received encoded data; the decoded data includes audio data and part of video data In this case, according to the motion information of the untransmitted video frame, the untransmitted video frame is reconstructed to obtain the reconstructed video frame; wherein, the part of the video data includes at least part of the key frame and the untransmitted video frame Motion information: rendering and displaying the reconstructed video frame, key frames in the decoded data and the audio data.
本申请实施例还提供一种发送端,包括:编码模块,用于对采集到的音频数据和视频数据进行编码;发送模块,用于根据网络质量的实时检测结果,采用与实时检测结果对应的视音频发送策略,进行视音频数据的发送;其中,实时检测结果至少包括第一质量等级和第二质量等级;第二质量等级的网络质量低于第一质量等级;与第二质量等级对应的视音频发送策略包括:将编码后的音频数据以及编码后的部分视频数据发送至所述接收端,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息。The embodiment of the present application also provides a sending end, including: an encoding module, configured to encode the collected audio data and video data; and a sending module, configured to use the real-time detection result corresponding to the network quality The video and audio transmission strategy is used to send video and audio data; wherein, the real-time detection results include at least the first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the network quality corresponding to the second quality level The video and audio sending strategy includes: sending encoded audio data and encoded partial video data to the receiving end, the partial video data including at least some key frames and motion information of untransmitted video frames.
本申请实施例还提供一种接收端,包括:接收模块,用于接收编码数据;解码模块,用于对接收的编码数据进行解码,并在经解码后得到的数据包括音频数据以及部分视频数据的情况下,根据未传输的视频帧的运动信息,对未传输的视频帧进行重构,得到重构后的视频帧;其中,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息;显示模块,用于在经解码后得到的数据包括音频数据以及部分视频数据的情况下,将所述重构后的视频帧、所述解码后得到的数据中的关键帧和所述音频数据进行渲染显示。The embodiment of the present application also provides a receiving end, including: a receiving module for receiving encoded data; a decoding module for decoding the received encoded data, and the decoded data includes audio data and part of video data In the case of , according to the motion information of the untransmitted video frame, the untransmitted video frame is reconstructed to obtain the reconstructed video frame; wherein, the part of the video data includes at least part of the key frame and the untransmitted video frame motion information; a display module, configured to display the reconstructed video frame, the key frame in the decoded data, and the obtained data when the decoded data includes audio data and part of the video data The above audio data is rendered and displayed.
本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述的视音频数据的发送方法或视音频数据的显示方法。The embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor. The processor is executed, so that at least one processor can execute the above-mentioned method for sending video and audio data or the method for displaying video and audio data.
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述的视音频数据的发送方法或视音频数据的显示方法。The embodiment of the present application also provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, the above method for sending video and audio data or the method for displaying video and audio data is implemented.
附图说明Description of drawings
图1是根据本申请一实施例提供的视音频数据的发送方法流程图;Fig. 1 is a flowchart of a method for sending video and audio data provided according to an embodiment of the present application;
图2是根据本申请另一实施例提供的视音频数据的显示方法流程图;Fig. 2 is a flow chart of a method for displaying video and audio data provided according to another embodiment of the present application;
图3是根据本申请另一实施例提供的发送端结构示意图;FIG. 3 is a schematic structural diagram of a sending end provided according to another embodiment of the present application;
图4是根据本申请另一实施例提供的接收端结构示意图;FIG. 4 is a schematic structural diagram of a receiving end provided according to another embodiment of the present application;
图5是根据本申请另一实施例提供的电子设备的结构示意图。Fig. 5 is a schematic structural diagram of an electronic device provided according to another embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the application, many technical details are provided for readers to better understand the application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.
本申请的一实施例涉及一种视音频数据的发送方法,应用于发送端。方法包括:对采集到的音频数据和视频数据进行编码;根据网络质量的实时检测结果,采用与实时检测结果对应的视音频发送策略,进行视音频数据的发送;其中,实时检测结果至少包括第一质量等级和第二质量等级;第二质量等级的网络质量低于第一质量等级;与第二质量等级对应的视音频发送策略包括:将编码后的音频数据以及编码后的视频数据中的关键帧和非关键帧的运动信息发送至接收端。An embodiment of the present application relates to a method for sending video and audio data, which is applied to a sending end. The method includes: encoding the collected audio data and video data; according to the real-time detection result of the network quality, adopting the video and audio transmission strategy corresponding to the real-time detection result to send the video and audio data; wherein, the real-time detection result includes at least the first The first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the video and audio transmission strategy corresponding to the second quality level includes: the encoded audio data and the encoded video data The motion information of key frames and non-key frames is sent to the receiver.
下面参考图1,对本申请实施例的视音频的发送方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。Referring to FIG. 1, the implementation details of the video and audio transmission method of the embodiment of the present application will be described in detail. The following content is only the implementation details provided for easy understanding, and is not necessary for implementing this solution.
本申请实施例的应用场景可以包括但不限于:视频会议、视频聊天、智能客服。The application scenarios of the embodiments of the present application may include but not limited to: video conferencing, video chat, and intelligent customer service.
步骤101:发送端采集视音频数据并对其进行编码。Step 101: The sending end collects video and audio data and encodes it.
具体地说,发送端采集到音频数据和视频数据后,对其进行编码。其中,采集设备包括但不限于摄像头和麦克风。Specifically, after the audio data and video data are collected by the sending end, they are encoded. Wherein, the collection device includes but not limited to a camera and a microphone.
步骤102:根据网络质量的实时检测结果,判断当前网络质量等级是否处于第一质量等级,若是,进入步骤103,若否,进入步骤104。Step 102: According to the real-time detection result of network quality, judge whether the current network quality level is at the first quality level, if yes, go to step 103, if not, go to step 104.
具体地说,发送端采集到视音频数据后对其进行发送前,需要对当下网络质量处于第几质量等级进行判断,根据网络质量的实时检测结果,采用与所述实时检测结果对应的视音频发送策略,进行视音频数据的发送。在对网络质量进行实时检测时,参考的信息即质量指标为丢包、时延、抖动、错报率、往返时延(Round-Trip Time,RTT)、带宽等或上述指标的任意组合,当网络处于不同的质量等级时,对应的发送策略也随之不同。Specifically, after collecting video and audio data, the sending end needs to judge the quality level of the current network quality before sending it. According to the real-time detection results of the network quality, the video and audio data corresponding to the real-time detection results are used. Send strategy to send video and audio data. When performing real-time detection of network quality, the reference information is quality indicators such as packet loss, delay, jitter, false alarm rate, round-trip delay (Round-Trip Time, RTT), bandwidth, etc., or any combination of the above indicators. When the network is at different quality levels, the corresponding sending strategies are also different.
上述网络质量等级处于第一质量等级时,表示当前网络状态正常,可以承载音频数据和视频数据正常的同时传输。When the above-mentioned network quality level is at the first quality level, it means that the current network status is normal, and the audio data and video data can be transmitted normally and simultaneously.
在当前网络质量等级处于第一质量等级的情况下,进入步骤103:将编码后的音频数据和编码后的视频数据发送至接收端,发送完成,结束流程。When the current network quality level is at the first quality level, go to step 103: send the encoded audio data and encoded video data to the receiving end, the sending is completed, and the process ends.
具体地说,当下网络质量等级处于第一质量等级时,采取与上述第一质量等级对应的视音频发送策略进行视音频数据的发送,与上述第一质量等级对应的视音频发送策略即,将编码后的音频数据和编码后的视频数据发送至接收端,上述第一质量等级即当下网络支持正常编码后的音频数据和视频数据的同时发送,为本申请实施例中网络最理想的状态。Specifically, when the current network quality level is at the first quality level, the video and audio transmission strategy corresponding to the first quality level is adopted to transmit video and audio data, and the video and audio transmission strategy corresponding to the first quality level is The encoded audio data and encoded video data are sent to the receiving end. The above-mentioned first quality level means that the current network supports the simultaneous transmission of normally encoded audio data and video data, which is the most ideal state of the network in this embodiment of the application.
步骤104:判断当前网络是否处于第二质量等级,若是,进入步骤105,若否,进入步骤106。Step 104: Determine whether the current network is at the second quality level, if yes, go to step 105, if not, go to step 106.
具体地说,判断网络所处的质量等级时,参考的信息同步骤102,此处不再赘述。当下网络质量处于上述第二质量等级时,则采取与第二质量等级对应的视音频发送策略。网络质量处于第二质量等级,表示当下网络不支持音频数据和完整的视频数据的同时传输,但能支持音频数据和部分视频数据的正常发送。Specifically, when judging the quality level of the network, the referenced information is the same as step 102, and will not be repeated here. When the current network quality is at the above-mentioned second quality level, the video and audio transmission strategy corresponding to the second quality level is adopted. The network quality is at the second quality level, which means that the current network does not support simultaneous transmission of audio data and complete video data, but can support normal transmission of audio data and partial video data.
在一个例子中,在确定当前网络处于第二质量等级后,可以根据固定的量化步长提取所述视频帧的关键点和雅克比矩阵,并将提取的所述关键点和雅克比矩阵作为视频帧的运动信息。这是因为在传输数据过程中,用来表征视频帧关键特征的运动数据的数据量远远小于传统的视频帧数据量,基于此,传输表征视频关键特征的运动信息可以极大的降低对网络带宽的要求。In one example, after determining that the current network is at the second quality level, the key points and the Jacobian matrix of the video frame can be extracted according to a fixed quantization step size, and the extracted key points and Jacobian matrix can be used as a video Frame motion information. This is because in the process of data transmission, the amount of motion data used to characterize the key features of video frames is much smaller than the amount of traditional video frame data. Based on this, the transmission of motion information representing key features of video can greatly reduce the impact on the network bandwidth requirements.
步骤105:将编码后的音频数据以及编码后的部分视频数据发送至所述接收端,发送完成,结束流程。其中,部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息。视频帧的的运动信息用于对视频帧进行重构,以便于接收端根据重构的视频帧、传输的关键帧和音频数据进行渲染显示。Step 105: Send the encoded audio data and part of the encoded video data to the receiving end, the sending is completed, and the process ends. Wherein, the partial video data includes at least some key frames and motion information of untransmitted video frames. The motion information of the video frame is used to reconstruct the video frame, so that the receiving end can render and display according to the reconstructed video frame, transmitted key frames and audio data.
在一个例子中,发送端可以选取部分关键帧作为传输的视频帧,即传输的视频数据中包括选取的部分关键帧以及未选取的关键帧的运动信息和非关键帧的运动信息。例如,选取第一个关键帧,第五个关键帧,第10个关键帧……第N个关键帧作为传输的视频帧,中间没传的关键帧以及非关键帧都通过只传运动信息的方式在接收端进行重构。In one example, the sending end may select some key frames as the transmitted video frames, that is, the transmitted video data includes the selected key frames and the motion information of the unselected key frames and the motion information of the non-key frames. For example, select the first key frame, the fifth key frame, the 10th key frame...the Nth key frame as the transmitted video frame, the key frames not transmitted in the middle and the non-key frames are all transmitted through the motion information only way to reconstruct at the receiving end.
在另一个例子中,发送端也可以将所有关键帧作为传输的视频帧,即传输的视频数据中包括所有关键帧以及非关键帧的运动信息。在接收端根据非关键帧的运动信息重构出非关键帧。选取部分关键帧作为传输的视频帧,或是将所有关键帧作为传输的视频帧,可以根据业务对图像质量的要求决定。In another example, the sending end may also use all key frames as transmitted video frames, that is, the transmitted video data includes motion information of all key frames and non-key frames. Non-key frames are reconstructed at the receiving end according to the motion information of the non-key frames. Selecting some key frames as video frames for transmission, or using all key frames as video frames for transmission, can be determined according to the requirements of the service on image quality.
其中,上述关键帧是指角色或物体运动或变化中的关键动作所处的那一帧,关于关键帧的数量问题,本申请实施例不作限制。Wherein, the aforementioned key frame refers to the frame where the key action of the character or object is moving or changing, and the number of key frames is not limited in this embodiment of the present application.
在一个例子中,发送端在对音频数据进行编码的同时,对游戏画面的非关键帧的运动信息进行提取,首先选取一帧参考游戏画面进行传输,再根据固定的量化步长提取非参考游戏画面的关键点和雅克比矩阵,用于表示这些非参考游戏画面的运动信息。上述过程中,表征视频关键特征的运动信息的数据量远远小于传统技术中优化画面真帧的数据量,从而降低了对网络的要求。对其进行解码后,得到的数据包括音频数据以及部分视频数据的情况下,根据未传输的视频帧的运动信息,对未传输的视频帧进行重构,得到重构后的视频帧;其中,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息,然后将重构后的视频帧、传输的关键帧和音频数据进行渲染显示。In one example, while encoding the audio data, the sender extracts the motion information of the non-key frame of the game screen, first selects a frame of the reference game screen for transmission, and then extracts the non-reference game frame according to the fixed quantization step size. Frame keypoints and Jacobian matrices, used to represent motion information for these non-referenced game frames. In the above process, the data volume of the motion information representing the key features of the video is much smaller than the data volume of the real frame of the optimized picture in the traditional technology, thereby reducing the requirements on the network. After decoding it, when the obtained data includes audio data and some video data, according to the motion information of the untransmitted video frame, the untransmitted video frame is reconstructed to obtain the reconstructed video frame; wherein, The part of video data includes at least part of key frames and motion information of untransmitted video frames, and then the reconstructed video frames, transmitted key frames and audio data are rendered and displayed.
当在步骤104中,判定当前网络不处于第二质量等级时,表示当前网络质量处于第三质量等级,进入步骤106:将编码后的音频数据发送至接收端,发送完成,结束流程。When it is determined in step 104 that the current network quality is not at the second quality level, it means that the current network quality is at the third quality level, and enter step 106: send the encoded audio data to the receiving end, the transmission is completed, and the process ends.
当网络质量处于第三质量等级时,则采取与第三质量等级对应的视音频发送策略,即,仅将编码后的音频数据发送至接收端。其中,网络质量处于第三 质量等级,表示当下网络无法承载音频数据和视频数据同时传输,只能承载音频数据的正常发送。When the network quality is at the third quality level, the video and audio sending strategy corresponding to the third quality level is adopted, that is, only the encoded audio data is sent to the receiving end. Among them, the network quality is at the third quality level, which means that the current network cannot carry audio data and video data at the same time, and can only carry the normal transmission of audio data.
本申请实施例提出的视音频数据的发送方法,此过程包括:发送端对采集到的音频数据和视频数据进行编码,根据网络质量的实时检测结果,采取对应的视频频发送策略进行视音频的发送。在网络质量较弱的情况下,在编码后的视频数据中,确定关键帧和非关键帧,将编码后的音频数据以及编码后的视频数据中的关键帧和非关键帧的运动信息发送至接收端。上述实时检测结果至少包括第一质量等级和第二质量等级,第二质量等级对应的视音频发送策略为将编码后的音频数据和编码后的视频数据中的关键帧和非关键帧的运动信息发送至接收端。由于表征视频帧的运动信息的数据量远远小于传统的视频帧数据,因此可以大大降低对网络带宽的要求,因此,上述过程解决了在弱网、网络切换等的环境下出现的视音频数据传输效果不理想从而导致的视音频画面卡顿、花屏和中断等导致用户体验差的问题。以极大地降低了传输过程中的传输数据的数据量,保证了在不同网络环境下也能将接收端需要的数据发送完成。The method for transmitting video and audio data proposed in the embodiment of the present application, this process includes: the transmitting end encodes the collected audio data and video data, and adopts a corresponding video and video transmission strategy for video and audio transmission according to the real-time detection results of network quality send. In the case of weak network quality, in the encoded video data, determine key frames and non-key frames, and send the encoded audio data and the motion information of the key frames and non-key frames in the encoded video data to Receiving end. The above-mentioned real-time detection results include at least the first quality level and the second quality level, and the video and audio transmission strategy corresponding to the second quality level is to combine the motion information of key frames and non-key frames in the encoded audio data and encoded video data sent to the receiving end. Since the amount of data representing the motion information of a video frame is much smaller than the traditional video frame data, it can greatly reduce the requirements for network bandwidth. Therefore, the above process solves the problem of video and audio data that appears in the environment of weak network, network switching, etc. Unsatisfactory transmission results lead to video and audio freezes, blurred screens, and interruptions, which lead to poor user experience. In order to greatly reduce the data volume of the transmission data in the transmission process, it is ensured that the data required by the receiving end can also be sent in different network environments.
本申请的另一个实施例涉及一种视音频数据的显示方法,应用于接收端。下面参考图2对本申请实施例的视音频的显示方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。Another embodiment of the present application relates to a method for displaying video and audio data, which is applied to a receiving end. The implementation details of the video and audio display method of the embodiment of the present application will be described in detail below with reference to FIG. 2 . The following content is only the implementation details provided for the convenience of understanding, and is not necessary for implementing the solution.
本申请实施例的应用场景可以包括但不限于:视频会议、视频聊天、智能客服。The application scenarios of the embodiments of the present application may include but not limited to: video conferencing, video chat, and intelligent customer service.
步骤201:接收编码数据。Step 201: Receive coded data.
具体地说,编码数据是指音频数据和/或视频数据的编码数据。Specifically, encoded data refers to encoded data of audio data and/or video data.
步骤202:对接收到的编码数据进行解码后,得到的数据是否仅包括音频数据,若是,进入步骤206,若否,进入步骤203。Step 202: After decoding the received coded data, whether the obtained data only includes audio data, if yes, go to step 206, if not, go to step 203.
对接收到的编码数据进行解码后,需要判断得到的数据是仅包括音频数据还是除了音频数据外还包括视频数据,判断后要采取不同的处理方式对相关数据进行处理并渲染显示。After decoding the received encoded data, it is necessary to judge whether the obtained data includes only audio data or video data in addition to audio data. After the judgment, different processing methods must be adopted to process the relevant data and render it for display.
步骤203:进一步判断解码后得到的数据中是否包括音频数据以及部分视频数据。若经解码后得到的数据包括音频数据以及部分视频数据,则进入步骤 204;若经解码后得到的数据包括音频数据以及完整的视频数据,则进入步骤208。Step 203: Further judge whether the decoded data includes audio data and part of video data. If the decoded data includes audio data and partial video data, go to step 204; if the decoded data includes audio data and complete video data, go to step 208.
步骤204:根据未传输的视频帧的运动信息,对未传输的视频帧进行重构,得到重构后的视频帧。Step 204: According to the motion information of the untransmitted video frames, reconstruct the untransmitted video frames to obtain reconstructed video frames.
具体地说,若在经解码后得到的数据中,除了音频数据外,还包括视频数据中的运动信息,根据视频帧的运动信息,对视频帧进行重构,用于后续的渲染显示。Specifically, if the decoded data includes motion information in video data in addition to audio data, the video frame is reconstructed according to the motion information of the video frame for subsequent rendering and display.
步骤205:根据重构后的视频帧、解码后得到的数据中的关键帧和音频数据进行渲染显示,结束流程。Step 205: Perform rendering and display according to the reconstructed video frame, key frames in the decoded data and audio data, and end the process.
具体地说,解码后的数据包括音频数据和视频数据中的关键帧和非关键帧的运动信息。Specifically, the decoded data includes motion information of key frames and non-key frames in audio data and video data.
在一个例子中,解码后的数据包括音频数据和视频数据中的关键帧和非关键帧的运动信息,接收端根据关键帧和非关键帧的运动信息对非关键帧进行重构,得到重构后的非关键帧,并将重构后的非关键帧、关键帧和音频数据进行渲染展示。In one example, the decoded data includes motion information of key frames and non-key frames in audio data and video data, and the receiving end reconstructs the non-key frames according to the motion information of key frames and non-key frames to obtain reconstruction After the non-key frame, the reconstructed non-key frame, key frame and audio data are rendered and displayed.
若在步骤202中判定得到的数据仅包括音频数据,进入步骤206:驱动虚拟人模型,生成随音频数据变化动作的虚拟人动态视频帧,进入步骤207。If it is determined in step 202 that the obtained data only includes audio data, go to step 206: drive the avatar model to generate a dynamic video frame of the avatar moving with the audio data, and go to step 207.
在经解码后得到的数据仅包括音频数据时,接收端根据音频数据驱动虚拟人模型,从而生成随音频数据变化动作的虚拟人动态视频帧。When the decoded data only includes audio data, the receiving end drives the virtual human model according to the audio data, thereby generating dynamic video frames of the virtual human whose actions change with the audio data.
步骤207:将上述虚拟人动态视频帧和音频数据进行渲染显示,结束流程。Step 207: Render and display the dynamic video frame and audio data of the avatar above, and end the process.
具体地说,上述虚拟人模型为数据库中预设的不同角色的人模型,在受到音频数据驱动时,生成虚拟人动态视频帧,若数据库中没有预设的人模型,则根据上一帧图像随音频数据生成动态视频帧,避免在网络环境变差时出现的卡屏现象,提升用户体验。Specifically, the above-mentioned virtual human model is a human model of different roles preset in the database. When driven by audio data, a dynamic video frame of the virtual human is generated. Generate dynamic video frames along with the audio data, avoiding screen freezes when the network environment becomes poor, and improving user experience.
若在步骤203中判定解码后得到的数据中包括音频数据以及完整视频数据,则进入步骤208:对解码后得到的音频数据和视频数据进行进行渲染显示。If it is determined in step 203 that the decoded data includes audio data and complete video data, go to step 208: render and display the decoded audio data and video data.
本申请实施例提供的视音频数据的显示方法,包括在解码后的数据仅包括音频数据时,用音频数据驱动虚拟人模型,生成随音频数据变化动作的虚拟人动态视频帧,用上述虚拟人动态视频帧和音频模块进行渲染展示;而在解码后 包括音频数据和部分视频数据时,先重构未传输的视频帧,再根据重构后的视频帧、传输的关键帧和音频数据进行渲染显示。上述过程使接收端在弱网环境或网络切换等场景下也能根据解码后的数据保证显示的视音频画面不会出现卡顿、花屏甚至直接中断等的情况,极大地增强了用户体验感。The method for displaying video and audio data provided by the embodiment of the present application includes, when the decoded data only includes audio data, using the audio data to drive the avatar model to generate a dynamic video frame of the avatar whose actions change with the audio data, and using the above-mentioned avatar Dynamic video frames and audio modules are rendered and displayed; when audio data and partial video data are included after decoding, the untransmitted video frames are reconstructed first, and then rendered according to the reconstructed video frames, transmitted key frames and audio data show. The above process enables the receiving end to ensure that the displayed video and audio images will not be stuck, blurred or even directly interrupted based on the decoded data in scenarios such as a weak network environment or network switching, which greatly enhances the user experience.
本申请另一实施例还涉及一种发送端,如图3所示,包括:编码模块301和发送模块302。Another embodiment of the present application also relates to a sending end, as shown in FIG. 3 , including: an encoding module 301 and a sending module 302 .
具体地说,编码模块301用于对采集到的音频数据和视频数据进行编码;发送模块302用于根据网络质量的实时检测结果,采用与实时检测结果对应的视音频发送策略,进行视音频数据的发送;其中,实时检测结果至少包括第一质量等级和第二质量等级;第二质量等级的网络质量低于第一质量等级;与第二质量等级对应的视音频发送策略包括:将编码后的音频数据以及编码后的部分视频数据发送至所述接收端,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息。Specifically, the encoding module 301 is used to encode the collected audio data and video data; the sending module 302 is used to transmit the video and audio data according to the real-time detection result of the network quality, using the video and audio transmission strategy corresponding to the real-time detection result. Wherein, the real-time detection result includes at least the first quality level and the second quality level; the network quality of the second quality level is lower than the first quality level; the video and audio transmission strategy corresponding to the second quality level includes: The audio data and encoded partial video data are sent to the receiving end, and the partial video data includes at least some key frames and motion information of untransmitted video frames.
在一个例子中,与第一质量等级对应的视音频发送策略包括:将编码后的音频数据和编码后的视频数据发送至接收端。In an example, the video and audio sending strategy corresponding to the first quality level includes: sending encoded audio data and encoded video data to a receiving end.
在一个例子中,实时检测结果还包括第三质量等级,与第三质量等级对应的视音频发送策略包括:仅将编码后的音频数据发送至接收端。In an example, the real-time detection result further includes a third quality level, and the video and audio transmission strategy corresponding to the third quality level includes: only sending encoded audio data to the receiving end.
在一个例子中,编码模块301还用于根据固定的量化步长提取视频帧的关键点和雅克比矩阵,并将提取的关键点和雅克比矩阵作为视频帧的运动信息。In an example, the encoding module 301 is further configured to extract the key points and Jacobian matrix of the video frame according to a fixed quantization step, and use the extracted key points and Jacobian matrix as motion information of the video frame.
在一个例子中,编码模块301还用于在根据网络质量的实时检测结果,采用与实时检测结果对应的视音频发送策略之前,实时检测传输网络的质量指标,其中,质量指标包括以下之一或其任意组合:丢包率、延时、抖动、错报率、往返时延RTT、带宽。In an example, the encoding module 301 is further configured to detect the quality index of the transmission network in real time before adopting the video and audio transmission strategy corresponding to the real-time detection result according to the real-time detection result of the network quality, wherein the quality index includes one of the following or Any combination of them: packet loss rate, delay, jitter, false alarm rate, round trip delay RTT, bandwidth.
在一个例子中,编码模块301主要完成视音频的编码、运动信息提取等功能。由视音频子编码和特征提取子模块组成。其中,视音频编码子模块与传统视音频系统中的编码模块相同,主要负责视音频原始数据的编码压缩,输出可供网络传输的媒体数据。特征子提取模块主要负责提取表征视频关键特征的运动信息,首先选取一帧参考图像进行传输,然后根据固定的量化步长提取为传输的视频图像的关键点和雅克比矩阵,用于表示这些未传输的视频帧的运动信 息,因为表征视频关键特征的运动信息的数据量远远小于传统的视频帧数据,因此可以大大降低对网络带宽的要求。In one example, the coding module 301 mainly completes video and audio coding, motion information extraction and other functions. It consists of video and audio sub-coding and feature extraction sub-modules. Among them, the video and audio encoding sub-module is the same as the encoding module in the traditional video and audio system, mainly responsible for encoding and compressing the original video and audio data, and outputting media data for network transmission. The feature sub-extraction module is mainly responsible for extracting the motion information that characterizes the key features of the video. First, a frame of reference image is selected for transmission, and then the key points and Jacobian matrix of the transmitted video image are extracted according to the fixed quantization step size, which are used to represent these unknown The motion information of the transmitted video frame, because the data volume of the motion information representing the key features of the video is much smaller than the traditional video frame data, so the requirement for network bandwidth can be greatly reduced.
本实施例中的发送端通过在不同的网络环境下,对其分别做出不同的发送策略,通过对采集到的音频数据和视频数据进行编码;根据网络质量的实时检测结果采取对应的策略来进行视音频数据的发送,以极大地降低了传输过程中的传输数据的数据量,保证了在弱网环境或网络切换等场景下也能将接收端需要的数据发送完成。The sender in this embodiment makes different sending strategies respectively under different network environments, and encodes the collected audio data and video data; adopts corresponding strategies according to the real-time detection results of network quality. Send video and audio data to greatly reduce the amount of data transmitted during the transmission process, and ensure that the data required by the receiving end can also be sent in scenarios such as weak network environments or network switching.
本申请另一实施例还涉及一种接收端,如图4所示,包括:接收模块401、解码模块402和显示模块403。Another embodiment of the present application also relates to a receiving end, as shown in FIG. 4 , including: a receiving module 401 , a decoding module 402 and a display module 403 .
具体地说,接收模块401,用于接收编码数据;解码模块402,用于对接收的编码数据进行解码,并在经解码后得到的数据包括音频数据以及部分视频数据的情况下,根据未传输的视频帧的运动信息,对未传输的视频帧进行重构,得到重构后的视频帧;其中,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息;显示模块403,用于在经解码后得到的数据包括音频数据以及部分视频数据的情况下,将所述重构后的视频帧、所述解码后得到的数据中的关键帧和所述音频数据进行渲染显示。Specifically, the receiving module 401 is used to receive encoded data; the decoding module 402 is used to decode the received encoded data, and when the decoded data includes audio data and part of video data, according to the untransmitted The motion information of the video frame is reconstructed to the untransmitted video frame to obtain the reconstructed video frame; wherein, the part of the video data includes at least part of the key frame and the motion information of the untransmitted video frame; the display module 403 , for rendering and displaying the reconstructed video frame, key frames in the decoded data, and the audio data when the decoded data includes audio data and partial video data .
在一个例子中,解码模块402还用于在经解码后得到的数据仅包括音频数据的情况下,根据所述音频数据驱动虚拟人模型,生成随音频数据变化动作的虚拟人动态视频帧;显示模块403还用于在经解码后得到的数据仅包括音频数据的情况下,将虚拟人动态视频帧和音频数据进行渲染显示。In an example, the decoding module 402 is also used to drive the avatar model according to the audio data when the decoded data only includes audio data, and generate dynamic video frames of the avatar whose actions change with the audio data; display Module 403 is also used for rendering and displaying the dynamic video frame and audio data of the avatar when the decoded data only includes audio data.
在一个例子中,解码模块402主要完成视音频的解码、根据表征视频关键特征的运动信息重构视频帧等功能。可以由视音频解码子模块、音频驱动视频子模块和视频重构子模块组成。其中,视音频解码子模块与传统视音频系统中的解码模块相同,主要负责视音频媒体数据的解码。音频驱动视频子模块负责将传输完成的音频信息驱动虚拟人模型,使虚拟人模型产生跟随音频数据变化动作的虚拟人动态视频画面。视频重构子模块主要负责使用表征视频关键特征的运动信息驱动先前传输完成的视频帧重构为运动的视频画面,实现视频的重构还原。In one example, the decoding module 402 mainly performs functions such as decoding video and audio, reconstructing video frames according to motion information representing key features of the video, and the like. It can be composed of video and audio decoding sub-module, audio-driven video sub-module and video reconstruction sub-module. Among them, the video and audio decoding sub-module is the same as the decoding module in the traditional video and audio system, and is mainly responsible for the decoding of video and audio media data. The audio-driven video sub-module is responsible for driving the avatar model with the transmitted audio information, so that the avatar model can generate dynamic video images of the avatar following the changes in the audio data. The video reconstruction sub-module is mainly responsible for using the motion information that characterizes the key features of the video to drive the previously transmitted video frames to be reconstructed into moving video images, so as to realize the reconstruction and restoration of the video.
本实施例中的接收端在解码后的数据仅包括音频数据时,用音频数据驱动 虚拟人模型,生成随音频数据变化动作的虚拟人动态视频帧,用上述虚拟人动态视频帧和音频模块进行渲染展示;而在解码后包括音频数据和视频数据中的关键帧和非关键帧的运动信息时,先重构非关键帧,再根据重构后的非关键帧、关键帧和音频数据进行渲染显示。上述过程使接收端在弱网环境或网络切换等场景下也能根据解码后的数据保证显示的视音频画面不会出现卡顿、花屏甚至直接中断等的情况,极大地增强了用户体验感。In this embodiment, when the decoded data only includes audio data, the receiver uses the audio data to drive the avatar model to generate dynamic video frames of the avatar whose actions change with the audio data, and uses the above-mentioned avatar dynamic video frames and audio modules to perform Rendering and display; when the motion information of key frames and non-key frames in audio data and video data is included after decoding, the non-key frames are reconstructed first, and then rendered according to the reconstructed non-key frames, key frames and audio data show. The above process enables the receiving end to ensure that the displayed video and audio images will not be stuck, blurred or even directly interrupted based on the decoded data in scenarios such as a weak network environment or network switching, which greatly enhances the user experience.
不难发现,本实施例为与上述方法实施例对应的装置实施例,本实施例可以与上述方法实施例互相配合实施。上述实施例中提到的相关技术细节和技术效果在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述实施例中。It is not difficult to find that this embodiment is an apparatus embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment. The relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
值得一提的是,本实施例和上一实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment and the previous embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or It can be implemented as a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
本申请另一实施例还涉及一种电子设备,如图5所示,包括:至少一个处理器501;以及,与所述至少一个处理器501通信连接的存储器502;其中,所述存储器502存储有可被所述至少一个处理器501执行的指令,所述指令被所述至少一个处理器501执行,以使所述至少一个处理器501能够执行上述视音频数据的发送方法,或者上述的视音频数据的显示方法。Another embodiment of the present application also relates to an electronic device, as shown in FIG. 5 , including: at least one processor 501; and a memory 502 communicatively connected to the at least one processor 501; wherein, the memory 502 stores There are instructions that can be executed by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the above-mentioned sending method of video and audio data, or the above-mentioned video and audio data transmission method. Display method of audio data.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时、外围接口、电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor manages the bus and general processing, and can also provide functions including timing, peripheral interfacing, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
本申请另一实施例还涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述上述视音频数据的发送方法或者上述的视音频数据的显示方法。Another embodiment of the present application also relates to a computer-readable storage medium storing a computer program. When the computer program is executed by the processor, the above-mentioned sending method of video and audio data or the above-mentioned display method of video and audio data are realized.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned implementation modes are specific examples for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.

Claims (13)

  1. 一种视音频数据的发送方法,应用于发送端,包括:A method for sending video and audio data, applied to a sending end, comprising:
    对采集到的音频数据和视频数据进行编码;Coding the collected audio data and video data;
    根据网络质量的实时检测结果,采用与所述实时检测结果对应的视音频发送策略,进行视音频数据的发送;According to the real-time detection result of network quality, adopt the video-audio sending strategy corresponding to described real-time detection result, carry out the sending of video-audio data;
    其中,所述实时检测结果至少包括第一质量等级和第二质量等级;所述第二质量等级的网络质量低于所述第一质量等级;Wherein, the real-time detection result includes at least a first quality level and a second quality level; the network quality of the second quality level is lower than the first quality level;
    与所述第二质量等级对应的视音频发送策略包括:将编码后的音频数据以及编码后的部分视频数据发送至所述接收端,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息。The video and audio sending strategy corresponding to the second quality level includes: sending encoded audio data and encoded partial video data to the receiving end, the partial video data including at least some key frames and untransmitted video Frame motion information.
  2. 根据权利要求1所述的视音频数据的传输方法,其中,所述实时检测结果还包括第三质量等级,所述第三质量等级的网络质量低于所述第二质量等级;The video and audio data transmission method according to claim 1, wherein the real-time detection result further includes a third quality level, and the network quality of the third quality level is lower than the second quality level;
    与所述第三质量等级对应的视音频发送策略包括:仅将编码后的音频数据发送至所述接收端。The video and audio sending strategy corresponding to the third quality level includes: only sending encoded audio data to the receiving end.
  3. 根据权利要求1或2所述的视音频数据的传输方法,其中,在所述根据网络质量的实时检测结果,采用与所述实时检测结果对应的视音频发送策略之前,还包括:The method for transmitting video and audio data according to claim 1 or 2, wherein, before adopting the video and audio transmission strategy corresponding to the real-time detection result according to the real-time detection result of network quality, further comprising:
    根据固定的量化步长提取所述视频帧的关键点和雅克比矩阵,并将提取的所述关键点和雅克比矩阵作为所述视频帧的运动信息。The key points and Jacobian matrix of the video frame are extracted according to a fixed quantization step size, and the extracted key points and Jacobian matrix are used as motion information of the video frame.
  4. 根据权利要求1至3中任一项所述的视音频数据的发送方法,其中,The method for transmitting video and audio data according to any one of claims 1 to 3, wherein,
    与所述第一质量等级对应的视音频发送策略包括:将编码后的音频数据和编码后的视频数据发送至接收端。The video and audio sending strategy corresponding to the first quality level includes: sending encoded audio data and encoded video data to a receiving end.
  5. 根据权利要求1至4中任一项所述的视音频数据的发送方法,其中,在所述根据网络质量的实时检测结果,采用与所述实时检测结果对应的视音频发送策略之前,还包括The method for transmitting video and audio data according to any one of claims 1 to 4, wherein, before adopting the video and audio transmission strategy corresponding to the real-time detection result according to the real-time detection result of the network quality, further comprising:
    实时检测传输网络的质量指标,其中,所述质量指标包括以下之一或其任意组合:Real-time detection of quality indicators of the transmission network, wherein the quality indicators include one of the following or any combination thereof:
    丢包率、延时、抖动、错报率、往返时延RTT、带宽。Packet loss rate, delay, jitter, false alarm rate, round trip delay RTT, bandwidth.
  6. 一种视音频数据的显示方法,应用于接收端,包括:A method for displaying video and audio data, applied to a receiving end, comprising:
    接收编码数据;receive encoded data;
    对接收的编码数据进行解码;Decode the received encoded data;
    在经解码后得到的数据包括音频数据以及部分视频数据的情况下,根据未传输的视频帧的运动信息,对未传输的视频帧进行重构,得到重构后的视频帧;其中,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息;When the decoded data includes audio data and partial video data, according to the motion information of the untransmitted video frame, the untransmitted video frame is reconstructed to obtain the reconstructed video frame; wherein, the portion of the video data includes at least some of the key frames and motion information for untransmitted video frames;
    将所述重构后的视频帧、所述解码后得到的数据中的关键帧和所述音频数据进行渲染显示。Rendering and displaying the reconstructed video frame, the key frame in the decoded data and the audio data.
  7. 根据权利要求6所述的视音频数据的显示方法,其中,在所述对接收的编码数据进行解码之后,还包括:The method for displaying video and audio data according to claim 6, wherein, after decoding the received encoded data, further comprising:
    在经解码后得到的数据仅包括音频数据的情况下,根据所述音频数据驱动虚拟人模型,生成随所述音频数据变化动作的虚拟人动态视频帧;In the case that the decoded data only includes audio data, drive the virtual human model according to the audio data to generate dynamic video frames of the virtual human whose actions change with the audio data;
    将所述虚拟人动态视频帧和所述音频数据进行渲染显示。Rendering and displaying the virtual human dynamic video frame and the audio data.
  8. 一种发送端,包括:A sender, comprising:
    编码模块,用于对采集到的音频数据和视频数据进行编码;An encoding module, configured to encode the collected audio data and video data;
    发送模块,用于根据网络质量的实时检测结果,采用与所述实时检测结果对应的视音频发送策略,进行视音频数据的发送;The sending module is used to send the video and audio data according to the real-time detection result of the network quality, using the video and audio transmission strategy corresponding to the real-time detection result;
    其中,所述实时检测结果至少包括第一质量等级和第二质量等级;所述第二质量等级的网络质量低于所述第一质量等级;Wherein, the real-time detection result includes at least a first quality level and a second quality level; the network quality of the second quality level is lower than the first quality level;
    与所述第二质量等级对应的视音频发送策略包括:将编码后的音频数据以及编码后的部分视频数据发送至所述接收端,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息。The video and audio sending strategy corresponding to the second quality level includes: sending encoded audio data and encoded partial video data to the receiving end, the partial video data including at least some key frames and untransmitted video Frame motion information.
  9. 根据权利要求8所述的发送端,其中,所述实时检测结果还包括第三质量等级,所述第三质量等级的网络质量低于所述第二质量等级;The sending end according to claim 8, wherein the real-time detection result further includes a third quality level, and the network quality of the third quality level is lower than the second quality level;
    与所述第三质量等级对应的视音频发送策略包括:仅将编码后的音频数据发送至所述接收端。The video and audio sending strategy corresponding to the third quality level includes: only sending encoded audio data to the receiving end.
  10. 一种接收端,包括:A receiver, comprising:
    接收模块,用于接收编码数据;A receiving module, configured to receive encoded data;
    解码模块,用于对接收的编码数据进行解码,并在经解码后得到的数据包括音频数据以及部分视频数据的情况下,根据未传输的视频帧的运动信息,对 未传输的视频帧进行重构,得到重构后的视频帧;其中,所述部分视频数据包括至少部分关键帧和未传输的视频帧的运动信息;The decoding module is used to decode the received coded data, and when the decoded data includes audio data and partial video data, according to the motion information of the untransmitted video frame, the untransmitted video frame is reconstructed. structure to obtain a reconstructed video frame; wherein, the part of the video data includes at least part of the key frame and motion information of the untransmitted video frame;
    显示模块,用于在经解码后得到的数据包括音频数据以及部分视频数据的情况下,将所述重构后的视频帧、所述解码后得到的数据中的关键帧和所述音频数据进行渲染显示。A display module, configured to display the reconstructed video frame, key frames in the decoded data, and the audio data when the decoded data includes audio data and partial video data Rendered display.
  11. 根据权利要求10所述的接收端,其中,The receiver according to claim 10, wherein,
    所述解码模块还用于在经解码后得到的数据仅包括音频数据的情况下,根据所述音频数据驱动虚拟人模型,生成随所述音频数据变化动作的虚拟人动态视频帧;The decoding module is also used to drive the avatar model according to the audio data when the decoded data only includes audio data, and generate a dynamic video frame of the avatar that moves as the audio data changes;
    所述显示模块还用于在经解码后得到的数据仅包括音频数据的情况下,将所述虚拟人动态视频帧和所述音频数据进行渲染显示。The display module is further configured to render and display the virtual human dynamic video frame and the audio data when the decoded data only includes audio data.
  12. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至5中任一项所述的视音频数据的发送方法,或者,执行如权利要求6或7所述的视音频数据的显示方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the operation described in any one of claims 1 to 5 The method for sending video and audio data described above, or performing the method for displaying video and audio data as described in claim 6 or 7.
  13. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至5中任一项所述的视音频数据的发送方法,或者,执行如权利要求6或7所述的视音频数据的显示方法。A computer-readable storage medium, storing a computer program, when the computer program is executed by a processor, it realizes the method for sending video and audio data according to any one of claims 1 to 5, or executes the method according to claim 6 or The method for displaying video and audio data described in 7.
PCT/CN2022/100589 2021-09-30 2022-06-22 Video and audio data sending method, display method, sending end and receiving end WO2023050921A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111165982.6A CN115914653A (en) 2021-09-30 2021-09-30 Video and audio data sending method, display method, sending end and receiving end
CN202111165982.6 2021-09-30

Publications (1)

Publication Number Publication Date
WO2023050921A1 true WO2023050921A1 (en) 2023-04-06

Family

ID=85733973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100589 WO2023050921A1 (en) 2021-09-30 2022-06-22 Video and audio data sending method, display method, sending end and receiving end

Country Status (2)

Country Link
CN (1) CN115914653A (en)
WO (1) WO2023050921A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024160031A1 (en) * 2023-01-31 2024-08-08 华为技术有限公司 Digital human communication method and apparatus

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278709B (en) * 2023-09-27 2025-09-12 北京京东电解智科技有限公司 Audio and video call adjustment method, adjustment device, AR device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404769A (en) * 2008-09-26 2009-04-08 北大方正集团有限公司 Video encoding/decoding method, apparatus and system
CN105847182A (en) * 2016-04-18 2016-08-10 武汉烽火众智数字技术有限责任公司 Method and system thereof for preferentially transmitting audio in audio and video system
US20180160077A1 (en) * 2016-04-08 2018-06-07 Maxx Media Group, LLC System, Method and Software for Producing Virtual Three Dimensional Avatars that Actively Respond to Audio Signals While Appearing to Project Forward of or Above an Electronic Display
CN110177308A (en) * 2019-04-15 2019-08-27 广州虎牙信息科技有限公司 Mobile terminal and its audio-video frame losing method in record screen, computer storage medium
CN110225347A (en) * 2019-06-24 2019-09-10 北京大米科技有限公司 Method of transmitting video data, device, electronic equipment and storage medium
CN110248256A (en) * 2019-06-25 2019-09-17 腾讯科技(深圳)有限公司 Processing method and processing device, storage medium and the electronic device of data
CN113192162A (en) * 2021-04-22 2021-07-30 清华珠三角研究院 Method, system, device and storage medium for driving image by voice
CN113299312A (en) * 2021-05-21 2021-08-24 北京市商汤科技开发有限公司 Image generation method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404769A (en) * 2008-09-26 2009-04-08 北大方正集团有限公司 Video encoding/decoding method, apparatus and system
US20180160077A1 (en) * 2016-04-08 2018-06-07 Maxx Media Group, LLC System, Method and Software for Producing Virtual Three Dimensional Avatars that Actively Respond to Audio Signals While Appearing to Project Forward of or Above an Electronic Display
CN105847182A (en) * 2016-04-18 2016-08-10 武汉烽火众智数字技术有限责任公司 Method and system thereof for preferentially transmitting audio in audio and video system
CN110177308A (en) * 2019-04-15 2019-08-27 广州虎牙信息科技有限公司 Mobile terminal and its audio-video frame losing method in record screen, computer storage medium
CN110225347A (en) * 2019-06-24 2019-09-10 北京大米科技有限公司 Method of transmitting video data, device, electronic equipment and storage medium
CN110248256A (en) * 2019-06-25 2019-09-17 腾讯科技(深圳)有限公司 Processing method and processing device, storage medium and the electronic device of data
CN113192162A (en) * 2021-04-22 2021-07-30 清华珠三角研究院 Method, system, device and storage medium for driving image by voice
CN113299312A (en) * 2021-05-21 2021-08-24 北京市商汤科技开发有限公司 Image generation method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024160031A1 (en) * 2023-01-31 2024-08-08 华为技术有限公司 Digital human communication method and apparatus

Also Published As

Publication number Publication date
CN115914653A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
US11792130B2 (en) Audio/video communication method, terminal, server, computer device, and storage medium
CN110430441B (en) Cloud mobile phone video acquisition method, system, device and storage medium
WO2023050921A1 (en) Video and audio data sending method, display method, sending end and receiving end
CN113747194B (en) Remote video transmission method, transmission device, storage medium and electronic equipment
US11627369B2 (en) Video enhancement control method, device, electronic device, and storage medium
CN110784740A (en) Video processing method, device, server and readable storage medium
US20150229960A1 (en) Information processing device, method, and terminal device
CN111586412B (en) High-definition video processing method, master device, slave device and chip system
CN107529069A (en) A kind of video stream transmission method and device
CN112954398A (en) Encoding method, decoding method, device, storage medium and electronic equipment
CN112532908B (en) Video image transmission method, transmission device, video call method and device
CN113996056A (en) Data sending and receiving methods and related equipment for cloud games
CN109168041B (en) Mobile terminal monitoring method and system
EP4380155A1 (en) Encoding and decoding method, encoder, decoder, and electronic device
CN112153413B (en) A method and server for dealing with blurry screen by broadcasting on the same screen
WO2024230330A1 (en) Video encoding and decoding processing method and apparatus, computer device and storage medium
CN116668741B (en) Cloud desktop display method, image display method, device and storage medium
WO2022155844A1 (en) Video transmission quality evaluation method and device
CN102196249A (en) Monitoring data playback method, EC (Encoder) and video management server
CN116827921A (en) Audio and video processing method, device and equipment for streaming media
CN113259690A (en) Inter-network system audio and video real-time online collaboration system and method
WO2025030977A1 (en) Data transmission method and apparatus, and device, medium and program product
WO2021057686A1 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium and electronic device
JP6483850B2 (en) Data processing method and apparatus
CN115396697B (en) Video data transmission method, system and storage device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874318

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22874318

Country of ref document: EP

Kind code of ref document: A1