[go: up one dir, main page]

CN121309819A - Video encoding methods, apparatus, devices, and readable storage media - Google Patents

Video encoding methods, apparatus, devices, and readable storage media

Info

Publication number
CN121309819A
CN121309819A CN202410913458.XA CN202410913458A CN121309819A CN 121309819 A CN121309819 A CN 121309819A CN 202410913458 A CN202410913458 A CN 202410913458A CN 121309819 A CN121309819 A CN 121309819A
Authority
CN
China
Prior art keywords
video
video frame
value
frame
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410913458.XA
Other languages
Chinese (zh)
Inventor
李志成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410913458.XA priority Critical patent/CN121309819A/en
Priority to PCT/CN2025/095634 priority patent/WO2026011968A1/en
Publication of CN121309819A publication Critical patent/CN121309819A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开一种视频编码方法、装置、设备以及可读存储介质,可获取视频编码器当前的编码码率;获取目标视频源的待编码帧序列,并通过视频编码器按照编码码率开始对待编码帧序列中的首个视频帧进行编码;确定待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值,以及确定位于第一视频帧之前相邻的第二视频帧的第二图像纹理值,第一视频帧为首个视频帧之后的任意视频帧;当第一图像纹理值大于第二图像纹理值时,根据第一图像纹理值和第二图像纹理值确定图像纹理比值;根据图像纹理比值调整视频编码器的编码码率,得到目标编码码率,并按照目标编码码率对第一视频帧进行编码。以此,按照帧级调整视频编码器的码率,保证编码后的视频帧画质。

This application discloses a video encoding method, apparatus, device, and readable storage medium. The method involves: acquiring the current encoding bitrate of a video encoder; acquiring a sequence of frames to be encoded from a target video source; encoding the first video frame in the sequence according to the encoding bitrate using the video encoder; determining the first image texture value of the first video frame currently to be encoded in the sequence, and determining the second image texture value of the second video frame adjacent to the first video frame (the first video frame being any video frame after the first video frame); determining an image texture ratio based on the first and second image texture values when the first image texture value is greater than the second image texture value; adjusting the encoding bitrate of the video encoder according to the image texture ratio to obtain the target encoding bitrate; and encoding the first video frame according to the target encoding bitrate. This frame-level adjustment of the video encoder bitrate ensures the image quality of the encoded video frames.

Description

Video encoding method, apparatus, device and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a video encoding method, apparatus, device, and readable storage medium.
Background
With the continuous development of the streaming media technology, the operation of online live broadcast, video downloading and playing, remote education, remote medical treatment and other streaming media services is driven, and the life of people is enriched. For streaming media services related to video, in order to improve the video traffic of transmission under a limited network bandwidth, video encoding needs to be performed in the process of video streaming data transmission, and in the process of video encoding, the video source code rate needs to be determined first, so that the encoding code rate of a video encoder needs to be determined according to the video source code rate to perform video encoding.
When the video source code rate is determined, the related technology counts the video source code rate once through fixed time duration at each interval so as to synchronously adjust the coding code rate of the video encoder according to the determined video source code rate.
In the research and practice process of the related technology, the inventor of the application finds that the related technology counts the video source code rate through fixed time length, and the counted video source code rate is relatively fixed, so that after synchronously adjusting the coding code rate of the video coder according to the determined video source code rate, the video coder can code according to the fixed coding code rate in the period, and the video coding quality can be influenced due to the diversity of video content, so that the picture quality of video frames after video coding is reduced.
Disclosure of Invention
The application provides a video coding method, a device, equipment and a readable storage medium, which can flexibly adjust the coding rate of a video coder according to the frame level and ensure the image quality of video frames after video coding.
In order to solve the technical problems, the application provides the following technical scheme:
the embodiment of the application provides a video coding method, which comprises the following steps:
Acquiring a current coding rate of a video coder;
acquiring a frame sequence to be encoded of a target video source, and encoding a first video frame in the frame sequence to be encoded according to the encoding code rate through the video encoder;
Determining a first image texture value of a first video frame currently prepared for encoding in the frame sequence to be encoded, and determining a second image texture value of a second video frame adjacent to the first video frame before the first video frame, wherein the first video frame is any video frame after the first video frame;
determining an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is greater than the second image texture value;
And adjusting the coding code rate of the video coder according to the image texture ratio to obtain a target coding code rate, and coding the first video frame according to the target coding code rate.
Accordingly, an embodiment of the present application provides a video encoding apparatus, including:
the acquisition unit is used for acquiring the current coding rate of the video coder;
The coding unit is used for acquiring a frame sequence to be coded of a target video source and coding a first video frame in the frame sequence to be coded according to the coding code rate through the video coder;
A first determining unit, configured to determine a first image texture value of a first video frame currently prepared for encoding in the frame sequence to be encoded, and determine a second image texture value of a second video frame adjacent to the first video frame, where the first video frame is any video frame after the first video frame;
a second determining unit configured to determine an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is greater than the second image texture value;
and the adjusting unit is used for adjusting the coding rate of the video coder according to the image texture ratio to obtain a target coding rate and coding the first video frame according to the target coding rate.
In some embodiments, the video encoding apparatus further includes a third determining unit configured to:
determining a similarity between the first video frame and the second video frame;
the first determining unit is further configured to:
And when the similarity is smaller than a preset similarity threshold, determining a first image texture value of a first video frame currently prepared for encoding in the frame sequence to be encoded.
In some embodiments, the similarity includes a color similarity, and the third determining unit is further configured to:
Acquiring a first pixel value of each first pixel point in a first video frame currently prepared for encoding in the frame sequence to be encoded, and generating a first histogram for the first video frame according to the statistical number of different first pixel values;
acquiring a second pixel value of each second pixel point in the second video frame, and generating a second histogram for the second video frame according to the statistical number of different second pixel values;
A color similarity between the first video frame and the second video frame is determined based on a difference between the first histogram and the second histogram.
In some embodiments, the third determining unit is further configured to:
Determining a first statistics of the first video frame in each pixel value dimension from the first histogram and a second statistics of the second video frame in each pixel value dimension from the second histogram;
Determining a pixel number difference value of the first video frame and the second video frame in each pixel value dimension according to the first statistics and the second statistics in each pixel value dimension;
The color similarity between the first video frame and the second video frame is determined based on the difference in the number of pixels in each pixel value dimension.
In some embodiments, the second determining unit is further configured to:
Acquiring an image texture difference value between a first image texture value and the second image texture value;
and when the image texture difference value is larger than a preset texture difference threshold value, calculating an image texture ratio of the first image texture value and the second image texture value.
In some embodiments, the first determining unit is further configured to:
Acquiring a gray value of each first pixel point in a first video frame currently prepared for encoding in the frame sequence to be encoded;
And determining a first image texture value of the first video frame according to the gray value of each first pixel point.
In some embodiments, the first determining unit is further configured to:
obtaining a target local texture window;
dividing a first pixel point in the first video frame according to the target local texture window to obtain a plurality of divided target local areas;
Determining a first local texture value corresponding to each target local region in the first video frame based on the gray value of a first pixel point contained in each target local region;
A first image texture value for the first video frame is determined based on the first local texture value for each target local region.
In some embodiments, the first determining unit is further configured to:
Determining a central first pixel point in each target local area and edge first pixel points around the central first pixel point;
Comparing the gray value of each edge first pixel point with the gray value of the center first pixel point according to each target local area to obtain a plurality of comparison results of each target local area;
And respectively carrying out binarization processing on a plurality of comparison results of each target local area to obtain a first local texture value corresponding to each target local area.
In some embodiments, the first determining unit is further configured to:
Acquiring a candidate local texture window, and determining a candidate window range value of the candidate local texture window;
Acquiring a target available resource amount of an image computing resource, and inquiring a window range list according to the target available resource amount to obtain a target window range value, wherein the window range list comprises association relations between different available resource amounts and window range values;
When the candidate window range value is smaller than the target window range value, the candidate local texture window is adjusted according to the target window range value, and a target local texture window is obtained;
And when the candidate window range value is greater than or equal to the target window range value, determining the candidate local texture window as a target local texture window.
In some embodiments, the video encoding apparatus further comprises a processing unit for:
Acquiring image resolution information corresponding to a first video frame currently prepared for encoding in the frame sequence to be encoded;
performing downsampling processing on the first video frame based on the image resolution information to obtain a downsampled first video frame;
the first determining unit is further configured to obtain a gray value of each first pixel point in the first video frame after the downsampling process.
In some embodiments, the adjusting unit is further configured to:
Acquiring an upper limit value of a coding rate for the video encoder;
Weighting the coding code rate according to the image texture ratio to obtain a candidate coding code rate;
when the upper limit value of the coding rate is larger than the candidate coding rate, the coding rate of the video coder is adjusted according to the candidate coding rate, so that a target coding rate is obtained;
And when the upper limit value of the coding rate is smaller than the candidate coding rate, adjusting the coding rate of the video coder according to the upper limit value of the coding rate to obtain a target coding rate.
In some embodiments, the adjusting unit is further configured to:
acquiring a third image texture value of a third video frame which is positioned behind the first video frame and is adjacent to the first video frame in the frame sequence to be coded;
Determining a target image texture ratio between the third video frame and the first video frame when the third image texture value is detected to be greater than the first image texture value;
Adjusting the target coding rate of the video encoder according to the target image texture ratio to obtain an adjusted target coding rate, and coding the third video frame according to the adjusted target coding rate;
and when the third image texture value is detected to be smaller than or equal to the first image texture value, encoding the third video frame according to the target encoding code rate.
In some embodiments, the acquisition unit is further configured to:
acquiring a current video source code rate of a target video source;
And synchronously updating the coding rate of the video coder according to the video source code rate.
In addition, the embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the video coding method when executing the computer program.
In addition, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the video coding method.
Furthermore, embodiments of the present application provide a computer program product or a computer program comprising computer instructions stored in a storage medium. The processor of the computer device reads the computer instructions from the storage medium and the processor executes the computer instructions so that the video encoding method described above is implemented.
The embodiment of the application can acquire the current coding rate of a video coder, acquire a frame sequence to be coded of a target video source, start coding a first video frame in the frame sequence to be coded according to the coding rate by the video coder, determine a first image texture value of a first video frame currently prepared to be coded in the frame sequence to be coded, determine a second image texture value of a second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame, determine an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is larger than the second image texture value, adjust the coding rate of the video coder according to the image texture ratio, acquire the target coding rate, and code the first video frame according to the target coding rate.
Based on the method, the current coding rate of the video encoder can be obtained in real time, then, the video frames in the current frame sequence to be coded of the target video source are coded according to the coding rate, further, in the coding process, the texture complexity between the first video frame of the current preparation coding and the previous adjacent second video frame is compared in real time, the picture content richness of the video frames is represented through the texture complexity, the required coding rate is evaluated, if the current first video frame is more complex than the texture of the previous frame, the image texture ratio between the first video frame and the second video frame is determined according to the first image texture value and the second image texture value, finally, the coding rate of the video encoder is adjusted to the target coding rate according to the image texture ratio, so that the video encoder codes the first video frame according to the adjusted target coding rate, compared with the video frame sequence to be coded of the video source according to the fixed time statistics, the texture complexity of the video encoder can be adjusted in the video frame sequence to the texture of the target video source, the quality of the video frame sequence to be coded can be guaranteed when the texture of the previous frame is compared, the video frame quality of the video frame to be coded is adjusted according to the video complexity of the video frame encoder, and the video quality of the video frame to be better can be adjusted when the video frame quality is coded according to the video frame quality is adjusted.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a video coding system according to an embodiment of the present application;
Fig. 2 is a schematic flow chart of steps of a video encoding method according to an embodiment of the present application;
fig. 3 is an exemplary diagram of an image texture value calculation scene in a video encoding method according to an embodiment of the present application;
Fig. 4 is a flowchart illustrating another step of the video encoding method according to an embodiment of the present application;
fig. 5 is an exemplary diagram of a video encoding scene provided in an embodiment of the present application;
fig. 6 is a diagram illustrating a code rate variation in a code control scenario of video coding according to an embodiment of the present application;
Fig. 7 is a diagram illustrating an example of a video encoding process according to an embodiment of the present application;
FIG. 8 is a diagram illustrating an exemplary calculation of image texture values of a video frame according to an embodiment of the present application;
Fig. 9 is a schematic structural diagram of a video encoding device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application;
Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the solution of the present application, a technical solution of an embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiment of the present application, and it is apparent that the described embodiment is only a part of the embodiment of the present application, not all the embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
It will be appreciated that in the specific embodiments of the present application, related data such as target video sources are involved, when the above embodiments of the present application are applied to specific products or technologies, subject permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards.
In addition, when the embodiment of the application needs to acquire the related data such as the target video source, the independent permission or independent agreement of the related data such as the target video source is acquired by popup or skip to a confirmation page, and after the independent permission or independent agreement of the related data such as the target video source is explicitly acquired, the related data such as the target video source which is necessary for enabling the embodiment of the application to normally operate is acquired.
It should be noted that, in some of the processes described in the specification, claims and drawings above, a plurality of steps appearing in a particular order are included, but it should be clearly understood that the steps may be performed out of order or performed in parallel, the step numbers are merely used to distinguish between the different steps, and the numbers themselves do not represent any order of execution. Furthermore, the description of "first," "second," or "object" and the like herein is for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
Before proceeding to further detailed description of the disclosed embodiments, the terms and terms involved in the disclosed embodiments are described, which are applicable to the following explanation:
IDR frames-in video coding algorithms (h.264/h.265/h.266/AV 1, etc.), images (i.e., video frames) are organized in units of sequences. The first picture of a sequence is an instantaneous decoding refresh picture (I nstantaneous Decod I NG REFRESH, IDR) frame, which belongs to intra picture (I ntra picture, I) frames, which can also be understood as key frames.
I-frame, intra-picture, key frame. IDR frames may cause the decoded picture buffer (Decoded Picture Buffer, DPB) to empty the reference frame list, while I frames may not. The IDR picture must be an I picture, but the I picture is not necessarily an IDR picture. There may be many I-pictures in a sequence, and the picture frames following an I-picture frame may reference a picture frame between two I-picture frames for motion reference.
The code rate control of the encoder is to allocate bits of the frame to be encoded by utilizing the existing information of the encoder, the number of bits actually generated after encoding and the buffer status, and adjust quantization parameters (Quant i zat ion Parameter, QP), thereby ensuring that the video quality is optimal through reasonable bit allocation under the limited target code rate.
Code rate/stream, which refers to the data flow rate, also known as code rate or code flow rate, used by video files per unit time, is colloquially understood as the sampling rate, which is the most important part of picture quality control in video coding, typically in kilobits per second (kbps) or megabits per second (mbps). At the same resolution, the larger the code stream of the video file, the smaller the compression ratio, and the higher the picture quality. The larger the code stream, the larger the sampling rate in unit time, the higher the data stream precision, the closer the file after coding is to the original file, the better the image quality, the clearer the image quality, and the higher the decoding capability of the playing device is required.
CQP, constant quantization parameters (Constant Quant i zat ion Parameter).
QoS is quality of service (Qua l ity of Servi ce), which is mainly responsible for service management and providing service differentiation from the network perspective, and network entities process different services according to different quality requirements, such as the common monitoring indexes including the blocking rate, blocking time, first frame, etc.
QoE, namely quality of experience (Qua l ity of Exper ience), wherein an evaluation subject is an end user, an evaluation object is a network for services and supporting services, and the video field is taken as an example, and evaluation indexes are not limited to average user video watching duration, video image quality definition, picture delay, blocking and the like in the video industry, and all the indexes can influence the watching experience QoE of a spectator end, so that advertisement conversion effect and the like can be influenced.
GOP group of pictures, specifically refers to the interval between two I frames, such as interval duration, interval length, etc. Wherein the Instantaneous Decoding Refresh (IDR) frame is the first I-frame in the GOP, the video stream restarts a new sequence encoding, which acts to allow the decoder to refresh immediately so that prediction errors do not propagate.
The embodiment of the application provides a video coding method, a video coding device, video coding equipment and a readable storage medium. Specifically, the embodiments of the present application will be described in terms of dimensions of a video encoding apparatus, where the video encoding apparatus may be specifically integrated in a computer device, and the computer device may be a server, or may be a device such as an object terminal. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The object terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart home appliance, a vehicle-mounted terminal, a smart voice interaction device, an aircraft, and the like.
The video source code rate is counted according to the intervals between key frames in a video frame sequence, the code rate is updated to the video encoder, on the basis, the code rate of the video encoder is adjusted according to the frame level by comparing the texture complexity between the front frame and the rear frame, and the video frame is encoded according to the adjusted code rate. The following examples are provided to illustrate the application:
The related technology calculates the video source code rate through fixed duration or determines the video source code rate through code rate information carried in the original video so as to update the coding code rate of the video encoder, so that the video encoder encodes video frames in a sequence of the video source according to the coding code rate. However, the video source code rate is counted in a fixed time length, and the code rate mode of the video encoder is updated and adjusted, so that the code rate of the video encoder is fixed within a certain time, and if the video encoder encodes video frames in a sequence according to the fixed code rate, the video frames with rich image content and complex textures can have poor image quality after encoding, the image quality of the video frames is reduced, and the viewing of an object video is influenced.
In order to cope with the above-mentioned problems, the embodiment of the present application proposes a video coding method, which can compare the texture complexity between the front and rear video frames in the coding process of the frame sequence to be coded of the target video source, and when the texture of the rear video frame is relatively complex to that of the front video frame, adjust the coding rate of the video coder according to the texture complexity ratio of the front and rear video frames, so as to flexibly and finely adjust the coding rate of the video coder according to the frame level, thereby ensuring the coding quality of the video coder to the rear video frame and ensuring the image quality of the video frame after video coding. Please refer to the following examples.
The embodiment of the application provides a video coding system, equipment in the scene system can comprise a server and/or a terminal, and the terminal can request the server to execute the video coding method.
The system comprises a server or a terminal, wherein the terminal or the server can acquire the current coding rate of a video coder, acquire a frame sequence to be coded of a target video source, start coding a first video frame in the frame sequence to be coded according to the coding rate through the video coder, determine a first image texture value of a first video frame currently prepared to be coded in the frame sequence to be coded, determine a second image texture value of a second video frame adjacent to the first video frame, the first video frame is any video frame after the first video frame, and when the first image texture value is larger than the second image texture value, determine an image texture ratio according to the first image texture value and the second image texture value, adjust the coding rate of the video coder according to the image texture ratio to obtain the target coding rate, and code the first video frame according to the target coding rate.
For another example, referring to fig. 1, a schematic view of a video coding system according to an embodiment of the present application is provided, where the system includes a server and a terminal, and a client is installed on the terminal, and real-time streaming data of a video, such as object attribute data of each service area, query object attribute data, and so on, can be sent or obtained to a server background through the client.
In addition, the terminal can be provided with a video client, and the video client can send or acquire the real-time streaming data of the video to the server.
The method comprises the steps of obtaining a current coding code rate of a video coder when the server executes a video coding method, obtaining a frame sequence to be coded of a target video source, starting coding a first video frame in the frame sequence to be coded according to the coding code rate through the video coder, determining a first image texture value of a first video frame currently prepared to be coded in the frame sequence to be coded, determining a second image texture value of a second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame, determining an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is larger than the second image texture value, adjusting the coding code rate of the video coder according to the image texture ratio, obtaining the target coding code rate, and coding the first video frame according to the target coding code rate.
It should be noted that the video encoding system may be applicable to live video or video on demand scenes. In the video encoding process of live video broadcast, the terminal may include a main broadcasting end and an audience end, where the main broadcasting end sends video source data to the server in real time, carries out encoding processing on video frames of the video source through the server, and sends the video frame stream data after the encoding processing to the audience end, so that the audience end decodes and plays the video frame stream data. In the video encoding process of video on demand, the terminal may be understood as an audience terminal or an on demand terminal, the server directly loads video from a pre-stored or other service terminals, takes the pre-stored or loaded video as a video source, encodes video frames according to a video frame stream sequence of the video source, and sends the encoded video frame stream data to the terminal, so that the audience terminal decodes and plays the video.
Taking live video as an example, assuming that a live video is playing, the live video playing end transmits stream data of a game picture of the live video as a target video source to a server in real time, the stream data is transmitted to the server in a sequence form, the server determines video source coding according to intervals between key frames in the sequence, specifically, determines in real time according to a frame interval (group of p ictures, GOP) granularity between two key frames (I frames), the frame interval can represent one video frame group (to-be-coded frame group), specifically, can determine a frame length (i.e. interval) between two key frames (I frames) first, and determines a frame rate of a video source, so that the number of the video frame groups in unit time, i.e. a video source code rate, is determined according to a ratio between the frame rate and the frame length, and then, the video source coding is dynamically loaded and updated to a video coding kernel (i.e. a video encoder) according to a current video source code rate, so that the video encoder obtains a current basic coding rate.
Further, for the video frames in the frame sequence to be encoded, the server starts to encode the video frames in the frame sequence to be encoded according to the encoding rate of the video encoder, and in the encoding process, determines the complexity of the video frames between two adjacent frames, for example, the complexity can be represented according to an image texture mode, firstly determines the image texture value of the current frame, and determines the image texture value of the previous frame, and by comparing the image texture values of the previous frame and the next frame, whether the current frame is more complex than the previous frame is determined, if the image texture value of the video frame to be encoded is greater than the image texture value of the previous adjacent video frame, the current video frame is more complex than the previous video frame, at this time, the image texture ratio between the current video frame and the previous video frame can be determined, and because the image ratio is greater than 1, the encoding rate in the video variable cipher device is adjusted according to the image texture ratio, the current adjusted target encoding rate is greater than the encoding rate when the previous frame is encoded, and the current video frame is processed according to the target encoding, so that the picture quality of the current complex video frame is ensured, and the picture quality of the video frame is ensured. And then, the encoded video frame data is transmitted to a spectator end for decoding and playing, and the spectator end displays the video frame data with better image quality aiming at the complex game pictures with rich contents, thereby having reliability.
It should be noted that, for a real-time audio/video scene, for example, an instant communication scene of a voice/video call, reference may be made to the above video live broadcast scene, and the actual processes are the same or similar, which is not described here in detail.
For example, taking video on demand as an example, the video on demand requires real-time performance of video streaming data, the server directly loads video from a pre-stored or other service terminals, takes the pre-stored or loaded video as a video source, determines video source coding according to intervals between key frames in the sequence for a video frame stream sequence of the video source, specifically, determines in real time according to a frame interval (group of pictures, GOP) granularity between two key frames (I frames), the frame interval can represent one video frame group (to-be-coded frame group), specifically, can determine a frame length between two key frames (I frames) first, and determines a frame rate of a playing client, so as to determine the number of the video frame groups in unit time, namely, a video source code rate according to a ratio between the frame rate and the frame length, and further, dynamically loads and updates to a video coding core (namely, a video encoder) according to the current video source code rate, so that the video encoder obtains the coding code rate of the current base.
Further, for the video frames in the frame sequence to be encoded, the server starts to encode the video frames in the frame sequence to be encoded according to the encoding rate of the video encoder, and in the encoding process, determines the complexity of the video frames between two adjacent frames, for example, the complexity can be represented according to an image texture mode, firstly determines the image texture value of the current frame, and determines the image texture value of the previous frame, and by comparing the image texture values of the previous frame and the next frame, whether the current frame is more complex than the previous frame is determined, if the image texture value of the video frame to be encoded is greater than the image texture value of the previous adjacent video frame, the current video frame is more complex than the previous video frame, at this time, the image texture ratio between the current video frame and the previous video frame can be determined, and because the image ratio is greater than 1, the encoding rate in the video variable cipher device is adjusted according to the image texture ratio, the current adjusted target encoding rate is greater than the encoding rate when the previous frame is encoded, and the current video frame is processed according to the target encoding, so that the picture quality of the current complex video frame is ensured, and the picture quality of the video frame is ensured. The encoded video frame data is then transmitted to a terminal (video client) for decoding and playing, and the terminal displays the video frame data with good image quality for the rich and complex game pictures, thereby achieving reliability.
Based on the above examples, compared with the scheme that the video encoder performs video frame encoding according to the video source code rate counted by the fixed duration in the prior art, the method and the device can compare the texture complexity between the front video frame and the rear video frame in the encoding process of the frame sequence to be encoded of the target video source, and adjust the encoding code rate of the video encoder according to the texture proportion of the front video frame and the rear video frame when the texture of the rear video frame is relatively complex to the front video frame, so that the flexible and fine adjustment of the encoding code rate of the video encoder according to the frame level can be realized, the encoding quality of the video encoder to the rear video frame can be ensured, and the video frame image quality of video encoding can be ensured.
It should be noted that the above is only an example, and may be applied to other video encoding scenarios, which are not described herein.
For ease of understanding, each step of the video encoding method will be described in detail below. The order of the following examples is not limited to the preferred order of the examples.
In the embodiment of the present application, description will be made from the dimension of the video encoding apparatus, and the video encoding apparatus may be integrated in a computer device, such as a terminal or a server. Referring to fig. 2, fig. 2 is a schematic flow chart of steps of a video encoding method according to an embodiment of the present application, where in the embodiment of the present application, a video encoding device is specifically integrated on a server, and when a processor on the server executes program instructions corresponding to the video encoding method, the specific flow chart is as follows:
101. The current coding rate of the video coder is obtained.
Video coding can be understood as compressing video source data, and transmitting the compressed data to a playing end for decoding and playing, so that a page at the playing end presents a video picture corresponding to a video source. It should be noted that, the video stream data is organized and transmitted in a sequence form, the video encoder encodes the video frames in the sequence through the video encoder, and the video encoder has an encoding rate, that is, when the video encoder encodes the video frames in the sequence, the video encoder encodes the video frames in the sequence according to the encoding rate, that is, the video frames in the sequence are sequentially compressed according to the encoding rate, so that the encoding rate of the video frames after compression is smaller than the encoding rate before compression, and a video file corresponding to the sequence is obtained, which can be understood as a video clip, so as to occupy a smaller communication bandwidth to be transmitted to a playing end for decoding and playing.
Among them, a video encoder is an application program for compressing and converting a video signal into a digital format, and its main function is to reduce the size of a video file by compression technology so as to occupy less bandwidth and storage space when transmitted or stored in a network. The video encoder may convert the original video signal to various digital formats, such as H.264, MPEG-4, AVC, VP9, etc. The functions of a video encoder include compressing video data, changing video formats, and adjusting image quality. It takes a variety of input signals (e.g., HDM I, SD I, IP video, etc.) and encodes these signals into a compressed digital format, such as h.264 or h.265, which requires less bandwidth to transmit and store. Further, the data is transmitted to a monitoring center at the back end through a network, and is decoded by corresponding decoding software or hardware running on the terminal.
However, the content may be different between different video frames, which may be understood that the content richness, i.e. the complexity, is different between video frames, and the occupied memory amount of the video frames with rich content is generally larger than the occupied memory amount of the video frames with simpler content, so that in the encoding process, the video frames with simpler content are encoded at the same encoding code rate of the video encoder, the image quality of the video frames with simpler content after encoding is relatively better, and the image quality of the video frames with rich content after encoding is relatively worse. In this regard, the coding rate of the video encoder may be adjusted to reduce the phenomenon that the quality of the encoded video frames is poor.
Wherein the code rate may be the number of bits per second of transmitted information, and may be expressed in units of kilobits per second (kbps) or megabits per second (mbps). Specifically, the code rate is the number of data bits or data flow transmitted per unit time during data transmission, reflecting the transmission rate and quality of video data. In the video coding process, the coding rate can be controlled to optimize the video quality and the file size.
In order to make the coding rate of the video coder more reasonable, the video coding rate can be adjusted according to the video source code rate of the video source as a basis. The related art periodically acquires a source code rate of a video source by a fixed time period (e.g., a fixed time period of 1 second, 3 seconds, etc.), to periodically adjust a coding rate of a video encoder according to the acquired source code rate. However, if the coding rate of the video encoder is adjusted according to the source code rate obtained in a fixed time period, the coding rate of the video encoder is fixed in the fixed time period, the granularity is larger, and the complexity between video frames is different because the sequence of the video source transmitted in real time contains a plurality of video frames, in this case, the video frames are coded by adopting the coding rate with larger granularity, so that the video frames with higher complexity become blurred after being coded, and the image quality is poor.
Where the video source rate may be the number of bits transmitted per second by the source video, and may be expressed in units of kilobits per second (kbps) or megabits per second (mbps). The higher the source code rate, the larger the data quantity transmitted in unit time, the higher the accuracy and quality of the video, and the clearer the picture. The code rate is one of important factors for determining the size and the image quality of a video file, and the higher the source code rate is, the larger the video file capacity and the better the image quality are under the same resolution.
In the embodiment of the application, in order to ensure that the image quality of the video frame with higher complexity after encoding is better, the video source code rate of the target video source is acquired by reducing the acquisition granularity of the video source code rate, so that the encoding code rate of the video encoder is synchronously adjusted, the encoding code rate of the video encoder is finely adjusted, and the video encoder is used as the basic code rate of video encoding, so that the video encoding method has reliability.
The target video source may be a video source transmitted in real time, which may be understood as a video source, or a video source end, which may provide real-time streaming data. For example, the streaming data of the video frames uploaded and transmitted to the server by the anchor in the live video broadcast, namely, the signal source, the streaming data of the video frames read and loaded in real time by the server from other platforms or databases in the video on demand scene, the streaming data of the audio and video in the audio and video call scene, and the like can be realized. The above is merely an example, and is not intended to be limiting, as other video sources capable of real-time video streaming data are equally applicable, and are not illustrated herein.
In some embodiments, the video source code rate may be determined based on the spacing between key video frames in the target video source to synchronously update the coding rate of the video encoder based on the video source code rate. For example, step 101 may include obtaining a current video source code rate of the target video source and synchronously updating the coding rate of the video encoder based on the video source code rate.
The encoding rate of the video encoder is adjustable, and the real-time encoding rate of the video encoder can be adjusted according to the interval between two adjacent key video frames of the target video source. The video Frame stream data of the target video source may be organized and transmitted to the server in the form of a video Frame sequence, and the video Frame sequence may include different types of video frames, such as key video frames (I ntra-coded frames, I frames), forward predicted frames (PRED ICTED FRAME, P frames), and Bi-directional predicted frames (Bi-d I rect iona L PRED ICTED FRAME, B frames). The video source rate between two key frames may be statistically determined based on the spacing (group of pictures, GOP) between the two key frames. Specifically, the frame rate of the target video source is determined, the interval, such as the interval length or the interval duration, between two adjacent key video frames in the transmission sequence of the target video source is detected in real time, and then the video source code rate is calculated according to the interval between the two adjacent key video frames and the frame rate, such as the video source code rate is determined according to the ratio between the frame rate and the interval. Further, the coding rate in the video encoder is synchronously updated according to the video source code rate, so that the current coding rate can be adapted to the frame sequence to be coded between the current two key frames in the video frame sequence, and can be understood as a sub-sequence or a fragment sequence. Based on the method, the real-time video frame stream data of the target video source are organized in a sequence form, and the video source code rate in the real-time stage is determined by adopting the intervals among key frames and is used as a basic code rate when a video encoder encodes, so that the requirement of the encoding code rate of each video frame sequence is more met, and the flexibility and the reliability are realized.
In the embodiment of the application, the coding rate can have instantaneity. The current coding rate of the video encoder can be the initial coding rate synchronously updated according to the video source code rate, and can also be the historical coding rate used in the coding process of the previous video frame, and the current coding rate is specific to the actual situation. It should be noted that, the "previous video frame" may refer to a "video frame in a frame sequence to be encoded" in the embodiment of the present application, and the "frame sequence to be encoded" will be described later, which is not discussed herein.
By the method, the real-time coding rate of the video encoder can be obtained and used as the basic rate of video coding, so that the requirement of the coding rate of each video frame sequence is met, and the video encoder has reliability.
102. And acquiring a frame sequence to be encoded of the target video source, and starting to encode the first video frame in the frame sequence to be encoded according to the encoding code rate by a video encoder.
In the embodiment of the application, after the real-time coding code rate of the video encoder is obtained, the frame sequence to be coded corresponding to the target video source can be obtained, so that the video frames in the frame sequence to be coded can be coded according to the current coding code rate by the video encoder.
The frame sequence to be encoded may be a sub-sequence or a sequence segment of real-time transmission of the target video source. Specifically, all video frames provided by the target video source are transmitted in a video frame sequence, and the frame sequence to be encoded may be a sub-sequence of the entire video frame sequence, for example, a sequence formed by two key video frames that are currently close together and a non-key video frame (such as the forward predicted frame and/or the bi-directional predicted frame) located between the two key video frames, or a sequence formed by a plurality of key video frames and a non-key video frame, which is not limited herein.
The first video frame may refer to a video frame arranged in the first bit in the current frame sequence to be encoded, and it should be noted that the first video frame refers to a first frame in a sub-sequence currently being transmitted by the target video source, that is, in the video encoding process, the first frames at different moments are different, which is specifically determined according to the real-time frame sequence to be encoded. Thus, the first video frame may be the first frame of the entire video frame sequence corresponding to the target video source, or may be an intermediate frame located in the video frame sequence, which is not limited herein. The type of the first video frame may be a key video frame.
When the server in the embodiment of the application performs video coding, the server can receive and acquire the video frame stream data of the target video source and perform coding processing on the video frame stream data acquired in real time, and can sequentially receive and acquire each frame sequence to be coded of the target video source and sequentially perform coding processing on the video frames in each frame sequence to be coded when the video frame stream data is transmitted in a sequence form. Therefore, the encoding process of the video stream data of the target video source is performed in real time, if the server has completed encoding the video frames in the historical frame sequence to be encoded, which is acquired in the historical time, when the server acquires a frame sequence to be encoded, and when the video encoder in the server has updated and adjusted to the corresponding encoding code rate, the video encoder uses the current encoding code rate as the basic code rate, that is, begins encoding the first video frame in the frame sequence to be encoded according to the current real-time encoding code rate.
Taking live video as an example, a target video source can be understood as a live video source of a main broadcasting end, a server acquires a live picture frame sequence from the main broadcasting end in real time, and after synchronously updating the coding rate of a video encoder according to the video source code rate calculated according to the intervals between key video frames in the live picture frame sequence, the video encoder starts to encode the first live picture frame in the frame sequence to be encoded according to the coding rate on the basis of the current real-time coding rate so as to obtain coded stream data for subsequent transmission to a viewer end.
For example, taking video on demand as an example, a target video source may be understood as target video stream data loaded in real time from a video database or other server, where the target video stream data loaded in real time is transmitted to a server in sequence, the server acquires a frame sequence to be encoded of the target video stream data in real time, and after synchronously updating the encoding code rate of the video encoder according to the video source code rate calculated according to the interval between key video frames, the video encoder starts encoding the first target video frame in the frame sequence to be encoded according to the encoding code rate based on the current real-time encoding code rate, so as to obtain encoded stream data for subsequent transmission to a playing end.
By the method, the frame sequence to be encoded corresponding to the target video source can be obtained, the current encoding code rate of the video encoder is used as a basic code rate, the video frames in the frame sequence to be encoded are encoded according to the current encoding code rate by the video encoder, the video frames are compressed, the data transmission flow in unit time under a specific bandwidth is improved, and the video playing fluency of a playing end is ensured.
103. A first image texture value of a first video frame currently ready for encoding in a sequence of frames to be encoded is determined, and a second image texture value of a second video frame located adjacent to the first video frame is determined.
In the embodiment of the application, in the encoding process of the video frames in the frame sequence to be encoded, the complexity of the video frame corresponding to the current encoding progress can be determined, and the complexity of the previous video frame can be determined, so that the subsequent comparison can be performed according to the complexity between the current video frame and the previous video frame, whether the video frame corresponding to the current encoding progress is more complex than the previous video frame can be determined, and the determination is made whether the encoding code rate of the video encoder is adjusted, thereby having reliability.
It should be noted that, when the complexity of the video frame is represented, the embodiment of the present application may be described in terms of constituent elements and/or gray level variation of the video frame, for example, euclidean distance, statistical histogram, local binary pattern (Loca l Bi NARY PATTERNS, LBP) detection algorithm, convolutional neural network (Convo l ut iona l Neura l Networks, CNN) feature extraction classification algorithm, and the like. For ease of understanding, embodiments of the present application will represent the complexity of the preceding and following video frames with a local binary pattern detection algorithm, which may refer to image texture to represent the complexity of the video frames.
The first video frame may be a video frame to be encoded corresponding to the current encoding progress in the frame sequence to be encoded, and the second video frame may be a video frame preceding the first video frame to be encoded in the frame sequence to be encoded, where it is noted that the second video frame is typically a video frame currently being encoded or already encoded, and the first video frame refers to a video frame to be encoded next to be ready for encoding, so that the first video frame is any video frame after the first video frame in the frame sequence to be encoded.
Wherein the image texture value may be a parameter representing a texture feature of the video frame for quantitatively or qualitatively describing the texture feature of the video frame, the image texture value may be determined based on an analysis of the texture feature by important surface gray information in the video frame. The texture feature is a value calculated from the video frame, and is used for quantifying the feature of gray level variation in the region, and specifically, the texture feature image is generated by calculating the gray level co-occurrence matrix and the texture feature value of the sub-image formed by each small window. Therefore, the image texture value can reflect not only the visual complexity of the image, but also serve as an important index for quantifying the complexity of the image information. Thus, the image information complexity level of the first video frame may be represented by a first image texture value and the image information complexity level of the second video frame may be represented by a second image texture value. It should be noted that the image texture value may be a numerical value, or may be a matrix, for example, the image texture value is represented as an image matrix, which includes R, G, B two-dimensional matrices, and the numerical value is between 0 and 255, and the numerical values represent the color depth.
In some embodiments, the corresponding first image texture value may be determined from a gray value of each first pixel point in the first video frame. For example, "determining a first image texture value of a first video frame currently ready for encoding in a sequence of frames to be encoded" in step 103 may include:
(103.1) acquiring a gray value of each first pixel point in a first video frame currently prepared for encoding in a frame sequence to be encoded;
(103.2) determining a first image texture value for the first video frame based on the gray value for each first pixel point.
The gray value may be a color depth of a pixel point in a black-and-white image, that is, a color depth of each first pixel point in a first video frame, where the gray value generally ranges from 0 to 255, white is 255, black is 0, the gray value represents a change of brightness from deep to light, and the color in the corresponding video frame ranges from black to white, which may reflect statistical characteristics and contrast of the video frame.
Specifically, in the process of encoding video frames in a sequence of frames to be encoded, in order to determine whether the first video frame is more complex than the previous video frame (i.e., the second video frame), it is necessary to determine the image texture values of the first video frame and the second video frame, respectively. Taking the calculation of the first image texture value of the first video frame as an example, for the first video frame in which the current coding progress in the frame sequence to be coded is to be coded, the gray value corresponding to each first pixel point in the first video frame can be obtained, for example, the gray value of each first pixel point can be obtained by weighted average according to the color values of the color three channels (red, green and blue) of each first pixel point, so as to represent the contribution of the color value of the color three channels to the brightness, and for example, the color value of each first pixel point in the color three channels can be determined, and the maximum value/minimum value of the color value of the first pixel point in the color three channels can be used as the gray value of the first pixel point. Further, a first image texture value of the first video frame is determined according to the gray value of each first pixel point.
Similarly, a second image texture value for a second video frame may be determined in the above manner. In this way, it is then determined whether the first video frame is more complex than the previous video frame (i.e., the second video frame) based on the magnitude relationship between the first image texture value and the second image texture value, so as to perform rate adjustment of video encoding.
It should be noted that, when calculating and determining the first image texture value of the first video frame, not only the first image texture value may be calculated and generated for all the region information of the first video frame, but also the first image texture value may be determined according to the gray value of the target region information, the target object information, and/or the target object information in the first video frame, and similarly, the second image texture value may be determined according to the gray value of the target region information, the target object information, and/or the target object information in the second video frame.
For example, in determining the image texture values of the first video frame and the second video frame, the image texture values may be calculated for gray values of pixel points within a target image region in the video frame, where the target image region may be an image region corresponding to a specific object or an image region of a fixed region.
Specifically, taking the calculation of the first image texture value of the first video frame as an example, obtaining a target mask image, wherein the target mask image is an image for displaying pixels of a target image area in the first video frame, performing mask processing on the first video frame according to the target mask image to obtain a processed target first video frame, and further obtaining the gray value of each first pixel point in the target first video frame so as to calculate the first image texture value of the target first video frame according to the gray value of each first pixel point. The method comprises the steps of displaying corresponding area content information in an original first video frame aiming at a visual state of a target image area in a target first video frame, displaying the visual state of other image areas except the target image area in the first video frame as blank, calculating image texture values based on the visual state, describing key texture features of the target image area in the video frame in a targeted manner, improving the calculation efficiency of texture complexity, and reducing the calculation amount in the calculation process of the image texture values by taking the display state distinguishing information of the content difference areas as main basis of the image texture value calculation in the subsequent first video frame and second video frame, wherein the image frames with image redundant information removed in the forward prediction video frame or the bidirectional prediction video frame are distinguished only in the target image area between the first video frame and the second video frame.
In some embodiments, the local texture value of each video frame sub-region may be determined according to the gray value of the first pixel point in each video frame sub-region in the first video frame, and the first image texture value of the first video frame may be determined by combining the local texture values of the respective video frame sub-regions. For example, step (103.2) may comprise:
(103.2.1) obtaining a target local texture window;
(103.2.2) dividing a first pixel point in a first video frame according to a target local texture window to obtain a plurality of divided target local areas;
(103.2.3) determining a first local texture value corresponding to each target local region in the first video frame based on the gray value of the first pixel point contained in each target local region;
(103.2.4) determining a first image texture value for the first video frame based on the first local texture value for each target local region.
The target local texture window may be a neighborhood range region of a specific size range, for example, a range region of 3 pixels by 3 pixels, specifically, the target local texture window may be understood as a coverage area of an operator in a local binary pattern algorithm, in the target local texture window, a gray value of a first pixel point located at a central position of the window is taken as a gray judgment threshold value of the current target local texture window, gray values of adjacent pixel points are compared with the threshold value, if the gray values of surrounding pixel points are greater than the gray value of the first pixel point located at the central position, the position of the pixel point is marked, or is assigned as1, otherwise, the position of the pixel point is assigned as 0.
The target local area may be an image sub-area in the first video frame, that is, the video frame sub-area, it may be understood that the first pixel point of the first video frame is divided according to the size of the target local texture window, and the first video frame may be divided into a plurality of target local areas, where the size of each target local area is consistent with the size of the target local texture window. For example, taking the target local area of the first video frame as an example, the target local texture window may be a range window with a size of 3 pixels by 3 pixels, and then each target local area has a length and a width of an area composed of 3 first pixels, i.e. one target local area includes 9 first pixels, and for example, if the target local texture window may be a range window with a size of 4 pixels by 4 pixels, one target local area includes 16 first pixels, and 5 pixels by 5 pixels, and the target local area includes 25 first pixels.
Specifically, to determine the first image texture value of the first video frame, the first image texture value of the first video frame may be calculated from the first local texture values of the plurality of regions in the first video frame. Specifically, the target local texture window may be obtained first, and the first video frame may be divided according to the size of the target local texture window, so as to divide the first video frame into a plurality of target local areas with the same size. Further, after determining each target local area in the first video frame, for each target local area, a first local texture value corresponding to the target local area may be determined according to a gray value of a first pixel point included in the target local area, where the first local texture value may be understood as a texture feature value of the first pixel point at a central position of the target local area, so that a summary calculation is performed based on the first local texture value corresponding to each target local area to obtain a first image texture value of the first video frame, where a calculation process may be a weighted calculation or a stitching, and is not limited herein. Therefore, the local texture value of each video frame subarea can be determined according to the gray value of the first pixel point in each video frame subarea in the first video frame, and the first image texture value of the first video frame is determined by combining the local texture values of each video frame subarea so as to be used for the complexity comparison between the subsequent first video frame and the second video frame, thereby improving the accuracy and reliability of the video frame complexity comparison.
In some embodiments, for each target local area in the first video frame, a difference between a gray value of a first pixel point at a central position of the area and a gray value of surrounding first pixel points is determined, and binarization conversion is performed according to the difference to obtain a first local texture value corresponding to each target local area. For example, the step (103.2.4) may include determining a center first pixel point in each target local area and edge first pixel points around the center first pixel point, comparing, for each target local area, a gray value of each edge first pixel point with a gray value of the center first pixel point to obtain a plurality of comparison results of each target local area, and performing binarization processing on the plurality of comparison results of each target local area to obtain a first local texture value corresponding to each target local area.
The central first pixel point may be a pixel point at a central position in a corresponding target local area in the first video frame, and the edge pixel points are other pixel points located around the central first pixel point in the corresponding target local area, for example, assuming that the target local area is an image area corresponding to a target local texture window with a size of 3 pixels by 3 pixels, the target local area includes 9 first pixel points, 1 pixel point located at the central position is the central first pixel point, and all other 8 pixel points located around the central first pixel point are edge first pixel points. It should be noted that, the center first pixel point and the edge first pixel point are considered as one target local area, and each target local area in the first video frame has the center first pixel point and the edge first pixel point separately.
Specifically, after dividing a first video frame into a plurality of target local areas, in order to obtain a first local texture value corresponding to the target local areas, first, a first pixel point located at a central position in the target local areas can be determined as a central first pixel point, other first pixel points located at peripheral positions of the central first pixel point are determined as edge first pixel points, then, a gray level value of the central first pixel point is used as a gray level reference value or a gray level threshold value, the gray level value of each edge first pixel point is compared with the gray level reference value of the center to obtain a gray level comparison result of each edge first pixel point relative to the central first pixel point, and finally, based on the gray level comparison results, binarization conversion is performed to obtain the first local texture value of the target local areas.
For example, each target local area is an image area corresponding to a target local texture window with a size of 3 pixels by 3 pixels, and each target local area includes 9 first pixel points, namely 1 central first pixel point and 8 edge first pixel points. Taking one of the target local areas as an example, the gray value of the central first pixel point is 83, the surrounding 8 edge first pixel points are 44, 118, 192, 204, 250, 174, 61 and 32 respectively, the gray value of each edge first pixel point is compared with the gray value of the central first pixel point to obtain 8 gray comparison results, each gray comparison result is respectively subjected to binarization conversion, if the gray value of the edge first pixel point is greater than the gray value of the central first pixel point, the gray value of the edge first pixel point is assigned to be 1 through binarization conversion, and if the gray value of the edge first pixel point is less than or equal to the gray value of the central first pixel point, then the value is assigned to 0, so that for the above 8 gray scale comparison results, through binarization conversion, the values are respectively assigned to 0, 1, 0 and 0, which are indicated as 8-bit binary numbers as 01111100, and by 8-bit binary representation of the image texture values, 256 different results are available for the first image texture value of each target local area, the binary form of the first image texture value of the target local area is converted into decimal form and indicated as 124, and both binary numbers "01111100" and decimal "124" can indicate that the target local area corresponds to the first image texture value.
It should be noted that, for the target local area divided according to the target local texture window of other specifications, one or more pixels located at the middle position in the target local area may be determined as a central pixel point combination area, and the central pixel point combination area may be regarded as a whole, that is, as a "central pixel point". For example, taking a target local area of the first video frame as an example, the central first pixel point combination area includes a plurality of first pixel points located at the central position of the target local area, and other first pixel points located outside the central first pixel point combination area in the target local area are taken as edge first pixel points. When the image texture value of the target local area under the specification is calculated, the gray value of the central first pixel point combination area can be determined according to the gray values of a plurality of pixels in the central first pixel point combination area, or the gray value is defined as the target gray value of the central first pixel point combination area, the gray values of a plurality of first pixels in the central first pixel point combination area are weighted and averaged to obtain the target gray value corresponding to the central first pixel point combination area, and therefore the gray value of each edge pixel point is compared with the target gray value corresponding to the central pixel point combination area to obtain a plurality of gray comparison results, and further, the first local texture value corresponding to the target local area is obtained through binarization conversion of the gray comparison results.
For example, each target local area is an image area corresponding to a target local texture window with the size of 4 pixels by 4 pixels, the area contains 16 first pixel points, a central first pixel point combination area in the target local area is determined, 4 first pixel points in a 2-pixel by 2-pixel area at the central position of the target local area can be used as the central first pixel point combination area, further, based on the gray values of the 4 first pixel points in the central first pixel point combination area, the target gray value of the central first pixel point combination area is determined, for example, the gray values of the 4 first pixel points are weighted and averaged, or the maximum value or the minimum value in the gray values of the 4 first pixel points is taken as the target gray value of the central first pixel point combination area, 12 edge first pixel points in the periphery of the central first pixel point combination area in the target local area are respectively compared with the target gray value of the central first pixel point combination area, and the 12 local gray values are obtained through comparison, and the local conversion result is obtained through the local conversion result.
In addition, the target local area may be further divided according to target local texture windows with other sizes, and the implementation process may refer to the above examples, which are not described in detail herein.
In some embodiments, in order to reduce the amount of calculation in calculating the image texture value of the video frame and improve the calculation efficiency of the image texture value, the specification size parameter of the local texture window for dividing the local area of the video frame may be adjusted before dividing the video frame to calculate the image texture value, so as to follow the adjusted target local texture window. For example, step (103.2.1) may include obtaining a candidate local texture window and determining a candidate window range value for the candidate local texture window, obtaining a target available resource amount of the image computing resource, and querying a window range list according to the target available resource amount to obtain a target window range value, the window range list including an association between different available resource amounts and window range values, adjusting the candidate local texture window according to the target window range value when the candidate window range value is less than the target window range value to obtain the target local texture window, and determining the candidate local texture window as the target local texture window when the candidate window range value is greater than or equal to the target window range value. The candidate local texture window may be a neighborhood region of a preset specification range, which may be understood as a default operator coverage range of the local binarization mode algorithm. For example, the size of the candidate local texture window is a neighborhood region of 3 pixels by 3 pixels.
The image computing resource may be an operation resource used for image processing in the server, such as one or more of a Central Processing Unit (CPU), an image processing unit (GPU), a memory, a cache, etc., which may affect the computing efficiency of the image texture value. And the target amount of available resources may be any one or more of the above available computing resources, e.g., GPU computing resources.
The window scope list may be a list of available resource amounts of one or more image computing power resources and window scope values, and specifically includes an association relationship between available resource amounts of one or more types of image computing power resources and window scope values. It should be noted that, the server may detect the remaining available amount of the local image computing power resource in real time, so as to update the available resource scalar of the corresponding type of image computing power resource in the window range list synchronously according to the remaining available amount of the image computing power resource.
The method comprises the steps of obtaining a default candidate local texture window corresponding to a local binarization mode algorithm, determining a candidate window range value of the candidate local texture window, wherein the candidate window range value represents the size of the candidate local texture window, further obtaining a target available resource quantity of image computing power resources of a server, inquiring a window range list according to the target available resource quantity to determine a target window range value matched with the target available resource quantity of the current image computing power resources from the window range list, and finally comparing the target window range value with the candidate window range value in size, if the target window range value is larger than the candidate window range value, adjusting the size of the candidate local texture window according to the target window range value to obtain an adjusted target local texture window, otherwise, if the target window range value is smaller than or equal to the candidate window range value, directly taking the candidate local texture window as the target local texture window. With this, the texture window with larger coverage can be selected as the target local texture window as much as possible, so that the number of windows (i.e. "target local areas") is reduced as much as possible when the video frame is divided into a plurality of target local areas according to the target local texture window, thus reducing the calculation amount of the image texture value in the subsequent process, improving the calculation efficiency of the image texture value, effectively avoiding the delay of the calculation process of the image texture value in the subsequent process to adjust the coding rate of the video encoder, thereby affecting the video coding progress and efficiency, and having reliability.
In the embodiment of the application, in order to accelerate the calculation speed of the image texture value of the video frame, the resolution of the video frame can be reduced for any one of the video frames in the embodiment of the application besides the specification size of the local texture window by enlarging, so as to reduce the calculation amount of the image texture value, thereby improving the calculation efficiency of the image texture value of the video frame. It should be noted that, the calculation amount can be reduced by reducing the image resolution of the video frame, and on the basis, the calculation amount of the image texture value of the video frame can be further reduced by further combining the target local texture window with the enlarged specification size, so that the calculation efficiency of the image texture value is further improved, the delay of the subsequent video coding efficiency due to the calculation process of the image texture value is effectively avoided, and the reliability is provided.
In some embodiments, taking the example of reducing the calculation amount of the first video frame in calculating the image texture value, the resolution of the first video frame may be reduced by a downsampling method, so as to perform subsequent image texture value calculation according to the downsampled first video frame, thereby improving the calculation efficiency of the image texture value. For example, the step (103.1) may be preceded by obtaining image resolution information corresponding to a first video frame currently prepared for encoding in the frame sequence to be encoded, performing downsampling processing on the first video frame based on the image resolution information to obtain a downsampled first video frame, and the step (103.1) may be preceded by obtaining a gray value of each first pixel point in the downsampled first video frame.
The image resolution information may be an amount of information contained in the video frame, and the image resolution information is represented by the number of pixels, and specifically, the image resolution information may represent the number of pixels contained in the video frame in a unit length (for example, per inch), that is, the number of pixels contained in the video frame in the length and width dimensions. It will be appreciated that the image resolution information determines how fine the video frame is, the higher the resolution the more clear the image.
It should be noted that, the amount of computation of the image texture value for the first video frame may be reduced by the pixel point downsampling method. Specifically, for a first video frame to be encoded corresponding to a current encoding progress in a frame sequence to be encoded, image resolution information of the first video frame may be obtained, where the image resolution information may represent distribution and number of first pixels included in the first video frame, further, a pixel value corresponding to each first pixel in the first video frame is obtained, and downsampling processing is performed according to a magnitude relation of the pixel value of each first pixel based on each first pixel distributed in the first video frame reflected by the image resolution information, for example, a first pixel area of "2 pixels by 2 pixels" or "3 pixels by 3 pixels" is used as a unit, and a first pixel with a maximum pixel value in each first pixel area is adopted, so as to obtain a downsampled first video frame. Further, a gray value of each first pixel point in the first video frame after the downsampling process is obtained, so that a first image texture value of the first video frame is determined based on the gray value of each first pixel point in the first video frame after the downsampling process.
In the embodiment of the application, because the network bandwidth resources are limited, if the difference between the first video frame and the second video frame in the frame sequence to be encoded is too small, the video encoder continues to encode the currently encoded video frame by using the encoding code rate of the video encoder when encoding the previous video frame, and the image quality of the currently encoded video frame is similar to that of the previous video frame. Therefore, if the difference between the first video frame and the second video frame in the frame sequence to be encoded is too small, the encoding rate of the video encoder is not adjusted even if the texture complexity of the first video frame is greater than that of the second video frame, but the encoding rate of the video encoder is used when the second video frame is encoded, so that the first video frame is encoded. Therefore, the frequency of adjusting the coding rate in the video coding process can be reduced, so that the occupancy rate of the transmission of video frame stream data after video coding to bandwidth resources is reduced as much as possible.
In some embodiments, to reduce the occupancy of bandwidth resources by encoded video frame stream data, a similarity between a first video frame to be encoded and a second video frame that has been encoded in a previous frame may be determined first, and if the similarity is less than a certain threshold, a first image texture value of the first video frame is determined to determine the texture complexity of the first video frame. For example, before step 103, the method may further include:
(103. A) determining a similarity between the first video frame and the second video frame;
step 103 may comprise determining a first image texture value of a first video frame currently ready for encoding in the sequence of frames to be encoded when the similarity is smaller than a preset similarity threshold.
The preset similarity threshold may be a threshold for determining a degree of similarity between a first video frame to be encoded and a second video frame that has been encoded before, so as to determine a degree of difference between the first video frame and the second video frame, and if the degree of similarity is greater, the degree of difference between the first video frame and the second video frame is smaller, whereas if the degree of similarity is greater, the degree of difference between the first video frame and the second video frame is greater. And carrying out difference judgment between the first video frame and the second video frame through the preset similarity threshold value so as to decide whether the image texture values of the first video frame and the second video frame need to be determined or not, thereby further adjusting the coding rate of the video coder.
When determining the similarity between the first video frame and the second video frame, algorithms such as color similarity estimation (Hue Saturat i on Va l ue, HSV), histogram statistics, hash similarity statistics and the like can be sampled, and the similarity between the first video frame to be encoded corresponding to the current encoding progress in the frame sequence to be encoded and the second video frame already encoded in the previous frame can be determined through any algorithm. Further, after the similarity between the first video frame and the second video frame is determined, the similarity is compared with a preset similarity threshold.
In one aspect, when the similarity between the first video frame and the second video frame is smaller than a preset similarity threshold, the difference between the first video frame and the second video frame is excessively large, at this time, a first image texture value of a first video frame currently prepared for encoding in a frame sequence to be encoded is determined, and a second image texture value of a second video frame adjacent to the first video frame is determined, so that the encoding code rate of the video encoder is adjusted when the first image texture value is larger than the second image texture value.
On the other hand, when the similarity is greater than or equal to the preset similarity threshold, the first video frame currently prepared for encoding in the frame sequence to be encoded is encoded by the video encoder according to the current encoding code rate (i.e., the encoding code rate when encoding the second video frame).
In some implementations, the color similarity between the first video frame and the second video frame is calculated with a histogram statistical algorithm. For example, the similarity comprises a color similarity, and step (103. A) may comprise:
(103. A.1) obtaining a first pixel value of each first pixel point in a first video frame currently prepared for encoding in a sequence of frames to be encoded, and generating a first histogram for the first video frame according to the statistical number of different first pixel values;
(103. A.2) obtaining a second pixel value for each second pixel point in the second video frame and generating a second histogram for the second video frame based on the statistical number of different second pixel values;
(103. A.3) determining a color similarity between the first video frame and the second video frame based on a difference between the first histogram and the second histogram.
The first pixel value refers to a pixel value of a corresponding first pixel point in the first video frame, and the second pixel point refers to a pixel value of a corresponding second pixel point in the second video frame.
The first histogram is used for counting the number of first pixels corresponding to each pixel value dimension of the first video frame, the width of the first histogram may include 256 pixel value dimensions, that is, pixel values 0 to 255, each bin in the first histogram corresponds to a target pixel value, and the height of the bin represents the counted number of first pixels corresponding to the target pixel value in the first video frame, and the first histogram may reflect the color condition of the first video frame. Similarly, the second histogram is used to count the number of second pixels corresponding to each pixel value dimension of the second video frame, and the width of the second histogram may include 256 pixel value dimensions, that is, pixel values 0 to 255, and may reflect the color condition of the second video frame.
The color similarity may represent a degree of similarity in color between the first video frame and the second video frame, and also reflect a color difference between the first video frame and the second video frame.
Specifically, when determining the color similarity between the first video frame and the second video frame according to the histogram statistical algorithm, first, a first pixel value of each first pixel point in the first video frame may be obtained, the statistical number of the first pixel points corresponding to each first pixel value dimension in the first video frame may be counted, the width (i.e., the numerical value of the abscissa) of the histogram may be taken as each first pixel value dimension, the statistical number of the first pixel points corresponding to each first pixel value dimension may be taken as the height of the corresponding column in the histogram, and the first histogram for the first video frame may be generated to represent the color condition of the first video frame. Meanwhile, a second histogram corresponding to the second video frame is obtained, the generation process of the second histogram can refer to the first histogram, if the second histogram is already generated in the historical time, and the second histogram corresponding to the second video frame can be directly obtained. Further, a color similarity between the first video frame and the second video frame is calculated based on differences between the first histogram and the second histogram in each pixel value dimension. In this way, the degree of difference in color characteristics between the first video frame and the second video frame is evaluated through the color similarity, so that a subsequent decision is made whether to adjust the coding rate of the video encoder according to the texture difference between the first video frame and the second video frame.
In some embodiments, the color similarity between the first video frame and the second video frame is determined from a difference in the number of pixels of the first histogram and the second histogram in each pixel value dimension. For example, step (103. A.3) may include determining a first statistics of first video frames in each pixel value dimension from the first histogram and determining a second statistics of second video frames in each pixel value dimension from the second histogram, determining a difference in number of pixels in each pixel value dimension for the first video frames and the second video frames from the first statistics and the second statistics in each pixel value dimension, and determining a color similarity between the first video frames and the second video frames based on the difference in number of pixels in each pixel value dimension.
The first statistics may be the number of first pixels corresponding to the corresponding first pixel value dimension in the first histogram, and the second statistics may be the number of second pixels corresponding to the corresponding second pixel value dimension in the second histogram.
The pixel number difference is the difference between the first histogram and the second histogram in the same pixel value dimension. For example, if the number of first pixels with a pixel value of 100 in the first histogram is 60 and the number of second pixels with a pixel value of 100 in the second histogram is 70, the difference in the number of pixels in the dimension of 100 is 30.
When determining the color similarity between the first video frame and the second video frame according to the pixel number difference value in each pixel value dimension, specifically, firstly, obtaining the maximum value between the first statistics and the second statistics in the pixel value dimension as a target value, then, determining a target ratio between the pixel number difference value and the target value for each pixel value dimension, wherein the target ratio can represent the difference ratio, namely the difference degree, of the first video frame and the second video frame in the current pixel value dimension, and further, the similarity is 1 at the highest, and subtracting the target ratio from 1 to obtain the color sub-similarity of the first video frame and the second video frame in the current pixel value dimension. And finally, carrying out summation and average on the color sub-similarity in each pixel value dimension to obtain the color similarity.
It should be noted that, for the calculation process of the second image texture value of the second video frame, reference may be made to the description of the "first video frame" above, and details thereof are omitted herein.
By the method, the complexity of the video frame corresponding to the current coding progress and the complexity of the previous video frame can be determined in the coding process of the video frame, so that the coding code rate of the video coder can be adjusted according to the frame level decision when the texture complexity increment of the current video frame relative to the previous video frame is determined, the picture quality requirement of the current video frame is met, and the reliability is realized.
104. When the first image texture value is greater than the second image texture value, an image texture ratio is determined from the first image texture value and the second image texture value.
In the embodiment of the application, after the first image texture value of the first video frame and the second image texture value of the second video frame are obtained, the first image texture value is compared with the second image texture value, and whether the coding rate of the video encoder is adjusted later or not is determined through the comparison between the first video frame and the second video frame on the texture complexity, on the one hand, when the first image texture value is larger than the second image texture value, the coding rate of the video encoder is determined to be adjusted later, and on the other hand, when the first image texture value is smaller than or equal to the second image texture value, the coding rate of the video encoder is not adjusted, so that the coding rate of the video encoder can meet the picture quality of the first video frame which is currently ready to be coded after the video frame is coded, and the video encoder has reliability.
The image texture ratio may be a ratio representing a texture complexity of the first video frame relative to the second video frame, and it is understood that when the texture complexity of the first video frame is greater than the texture complexity of the second video frame, the image texture ratio is greater than 1, which represents that the first video frame is more complex than the second video frame.
It should be noted that, when the decision is made to subsequently adjust the coding rate of the video encoder, in order to reasonably adjust the coding rate of the video encoder, a ratio between the first image texture and the second image texture value may be determined as an image texture ratio between the first video frame and the second video frame, so that the coding rate of the video encoder may be subsequently adjusted according to the image texture ratio, and specifically, the coding rate used when the second video frame is encoded may be adjusted. Therefore, the coding rate of the video coder can meet the video frame image quality requirement of the first video frame which is currently ready to be coded after being regulated, the occupation amount of video frame stream data to transmission bandwidth resources is reduced, and the video coder has reliability.
In some embodiments, to avoid more frequent adjustment of the coding rate of the video encoder, when the first image texture value is greater than the second image texture value, the image texture ratio is calculated to be generated according to the greater degree of difference in texture complexity between the first video frame and the second video frame, so as to adjust the coding rate of the video encoder later. For example, determining an image texture ratio from the first image texture value and the second image texture value in step 104 may include obtaining an image texture difference value of the first image texture value and the second image texture value, and calculating the image texture ratio of the first image texture value and the second image texture value when the image texture difference value is greater than a preset texture difference threshold.
The image texture difference value may represent a degree of texture difference between the first video frame and the second video frame, and the texture difference value is determined according to the first image texture value and the second image texture value, if the image texture difference value is larger, the texture difference between the first video frame and the second video frame is larger, otherwise, if the image texture difference value is smaller, the texture difference between the first video frame and the second video frame is smaller.
Specifically, the first image texture value is compared with the second image texture value, when the first image texture value is larger than the second image texture value, the image texture of the first video frame which is currently ready to be encoded is more complex than the image texture of the second video frame which is already encoded, if the difference of the image texture complexity of the first video frame relative to the second video frame is smaller, the first video frame can be encoded by continuously using the encoding code rate of the video encoder when the second video frame is encoded, the image quality of the first video frame is less influenced, and the occupation amount of transmission bandwidth resources can be reduced.
Therefore, to avoid frequently increasing the coding rate of the video encoder, when the first image texture value is greater than the second image texture value, an image texture difference value between the first image texture value and the second image texture value may also be determined, and the image texture difference value may be compared with a preset texture difference threshold value, where the preset texture difference threshold value is used to determine a threshold value of a degree of difference between a first video frame to be coded and a second video frame that has been coded before. On the one hand, if the image texture difference value is greater than the preset texture difference threshold value, it indicates that the difference degree between the first video frame and the second video frame is greater, and the previous encoding rate of the video frame encoder cannot meet the image quality requirement of the first video frame, at this time, the image texture ratio between the first image texture value and the second image texture value needs to be determined, so that the previous encoding rate of the video encoder is adjusted to be higher according to the image texture ratio. On the other hand, if the image texture difference value is less than or equal to the preset texture difference threshold value, it is indicated that the difference degree between the first video frame and the second video frame is smaller, the previous coding rate of the video encoder may not be increased, and the first video frame may be continuously coded along the coding rate used for coding the second video frame. Therefore, the adjustment frequency of the coding rate of the video coder is reduced, and the occupation amount of transmission bandwidth resources can be reduced.
By the method, the comparison result of the texture complexity between the first video frame and the second video frame can be determined by comparing the first image texture value with the second image texture value, so that when the first video frame is more complex than the texture of the second video frame, the texture complexity of the current video frame relative to the previous video frame is increased, at the moment, a decision is made to adjust the coding rate of the video encoder according to the image texture ratio between the first video frame and the second video frame, the coding rate of the video encoder is adjusted according to the frame level decision, and finer granularity and finer adjustment of the coding rate of the video encoder are realized.
105. And adjusting the coding code rate of the video coder according to the image texture ratio to obtain a target coding code rate, and coding the first video frame according to the target coding code rate.
In the embodiment of the present application, after the image texture ratio between the first video frame and the second video frame is obtained, the previous encoding rate of the video encoder may be adjusted according to the image texture ratio, because the encoding rate of the video encoder may be adjusted in real time during the encoding process of the previous video frame in the sequence of frames to be encoded, at this time, the current encoding rate of the video encoder may be understood as the encoding rate used when encoding the second video frame, and therefore, the previous coding rate of the video encoder is adjusted according to the image texture ratio, so that the adjustment amplitude of the coding rate of the video encoder is matched with the texture complex difference ratio of the front video frame and the rear video frame, the coding rate of the video encoder is adjusted more finely and accurately according to the frame level, and the current first video frame is encoded according to the newly adjusted coding rate, thereby realizing the maximum bandwidth resource saving while ensuring the image quality after the first video frame is encoded, and having reliability.
The target coding rate can be a coding rate which is redetermined after adjustment of the video encoder, and the target coding rate can be obtained by adjusting the previous coding rate according to the image texture ratio and is used for coding the first video frame so as to cope with the coding of the first video frame with rich image content and complex information and ensure the image quality of the first video frame after the coding.
It should be noted that, because the network bandwidth resources are limited, the network bandwidth resources need to be reasonably utilized to transmit the encoded video frame resources subsequently, so as to avoid video playing and blocking while ensuring the picture quality of the video frame, and save the network bandwidth resources as much as possible. Based on the above, in the adjusting/code controlling process of the video encoder, an upper limit value of the coding rate can be set to limit the coding rate of the video encoder to exceed the upper limit value, so that network bandwidth resources which are reasonably used are realized, and video frame picture quality is ensured while video playing is prevented from being blocked.
In some embodiments, the coding rate of the video encoder is adjusted according to the image texture ratio, and the adjusted coding rate is compared with the limiting coding rate of the video encoder, so that the final target coding rate of the video encoder is determined. For example, the step 105 of adjusting the encoding rate of the video encoder according to the image texture ratio to obtain the target encoding rate may include obtaining an encoding rate upper limit value for the video encoder, weighting the encoding rate according to the image texture ratio to obtain a candidate encoding rate, adjusting the encoding rate of the video encoder according to the candidate encoding rate when the encoding rate upper limit value is greater than the candidate encoding rate to obtain the target encoding rate, and adjusting the encoding rate of the video encoder according to the encoding rate upper limit value when the encoding rate upper limit value is less than the candidate encoding rate to obtain the target encoding rate.
The upper limit value of the coding rate can be a coding rate limit value of a video encoder, and can be understood as the maximum coding rate of the video encoder, and the upper limit value of the coding rate can be set according to the available bandwidth resources or the expected target bandwidth resources, so that the maximum coding rate cannot exceed the upper limit value of the coding rate in the process of coding rate adjustment (namely code control), thereby maximally ensuring the image quality of video frames, simultaneously enabling the stream data transmission of the coded video frames to be smooth under the limited bandwidth resources, avoiding the phenomenon of video playing blocking, and having reliability.
Specifically, when the coding rate of the video encoder is determined to be adjusted, the current coding rate of the video encoder is obtained, for example, the coding rate used when the second video frame is encoded is obtained, the upper limit value of the coding rate aiming at the video encoder is obtained, further, the coding rate is weighted according to the image texture ratio, the coding rate of the video encoder is adjusted according to the texture complex difference ratio of the front video frame and the rear video frame as an adjustment amplitude, and the coding rate of the video encoder is adjusted to be higher, so that the coding rate of the video encoder is adjusted according to the frame level, and the candidate coding rate is obtained. Further, the upper limit value of the coding rate of the video encoder is used as a rate adjustment limiting threshold value, if the adjusted candidate coding rate is greater than or equal to the upper limit value of the coding rate of the video encoder, the upper limit value of the coding rate of the video encoder is used as a target coding rate, otherwise, if the adjusted candidate coding rate is less than the upper limit value of the coding rate of the video encoder, the coding rate of the video encoder is adjusted according to the candidate coding rate, namely, the candidate coding rate is used as the target coding rate of the video encoder.
In some embodiments, the video encoding process encodes the video frames in the sequence of frames to be encoded sequentially, and other video frames following the first video frame in the sequence of frames to be encoded are also encoded according to the encoding process of the first video frame. For example, after the first video frame is encoded according to the target encoding rate in step 105, the method may further include obtaining a third image texture value of a third video frame located after and adjacent to the first video frame in the sequence of frames to be encoded, determining a target image texture ratio between the third video frame and the first video frame when the third image texture value is detected to be greater than the first image texture value, adjusting a target encoding rate of the video encoder according to the target image texture ratio, obtaining an adjusted target encoding rate, and encoding the third video frame according to the target encoding rate when the third image texture value is detected to be less than or equal to the first image texture value.
The third video frame may be a video frame in the sequence of frames to be encoded, where the timing sequence is located after the first video frame, that is, a video frame subsequent to the first video frame. And the third image texture value may represent the image texture complexity of the third video frame.
Specifically, after the encoding of the first video frame in the frame sequence to be encoded is completed, the encoding of the next frame (i.e., the third video frame) of the first video frame is prepared next. A third image texture value for the third video frame may be determined and compared to the first image texture value prior to encoding the third video frame.
On the one hand, if the third image texture value is larger than the first image texture value, the third video frame is more complicated than the texture of the first video frame, the content of the video frame is richer, at this time, the coding rate of the video encoder is decided and adjusted, at this time, the ratio between the third image texture value and the first image texture value is calculated and used as the target image texture ratio between the third video frame and the first video frame, and the target coding rate of the video encoder is adjusted according to the target image texture ratio, the target coding rate is the code rate of the video encoder when the first video frame is coded, so as to obtain the target candidate coding rate, at this time, the target candidate coding rate can be compared with the upper limit value of the coding rate, if the target candidate coding rate is larger than the upper limit value of the coding rate, the upper limit value of the coding rate is determined as the current coding rate of the video encoder, the target candidate coding rate is refused to be adjusted by the video encoder, and if the target candidate coding rate is smaller than the upper limit value of the coding rate, the target candidate coding rate is used as the target coding rate adjusted by the video encoder, and further, the video encoder is coded according to the adjusted target coding rate.
On the other hand, if the third image texture value is smaller than or equal to the first image texture value, the third video frame is encoded along with the previous target encoding code rate of the video encoder, so that the video picture quality of the third video frame after encoding is ensured, and the reliability is realized.
By the method, the coding rate of the video encoder can be adjusted according to the image texture ratio between the front video frame and the rear video frame, so that the adjustment amplitude of the coding rate of the video encoder is matched with the texture complex difference ratio of the front video frame and the rear video frame, the coding rate of the video encoder is adjusted more finely and accurately according to the frame level, the current first video frame is encoded according to the newly adjusted coding rate, and the maximum bandwidth resource saving is realized while the picture quality after the first video frame is encoded is ensured, so that the video encoder has reliability.
As can be seen from the above overall description of the embodiments of the present application, the embodiments of the present application can obtain a current coding rate of a video encoder, obtain a frame sequence to be coded of a target video source, and start coding a first video frame in the frame sequence to be coded according to the coding rate by the video encoder, determine a first image texture value of a first video frame currently prepared to be coded in the frame sequence to be coded, and determine a second image texture value of a second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame, and when the first image texture value is greater than the second image texture value, determine an image texture ratio according to the first image texture value and the second image texture value, adjust the coding rate of the video encoder according to the image texture ratio, obtain the target coding rate, and code the first video frame according to the target coding rate.
Based on the method, the current coding rate of the video encoder can be obtained in real time, then, the video frames in the current frame sequence to be coded of the target video source are coded according to the coding rate, further, in the coding process, the texture complexity between the first video frame of the current preparation coding and the previous adjacent second video frame is compared in real time, the picture content richness of the video frames is represented through the texture complexity, the required coding rate is evaluated, if the current first video frame is more complex than the texture of the previous frame, the image texture ratio between the first video frame and the second video frame is determined according to the first image texture value and the second image texture value, finally, the coding rate of the video encoder is adjusted to the target coding rate according to the image texture ratio, so that the video encoder codes the first video frame according to the adjusted target coding rate, compared with the video frame sequence to be coded of the video source according to the fixed time statistics, the texture complexity of the video encoder can be adjusted in the video frame sequence to the texture of the target video source, the quality of the video frame sequence to be coded can be guaranteed when the texture of the previous frame is compared, the video frame quality of the video frame to be coded is adjusted according to the video complexity of the video frame encoder, and the video quality of the video frame to be better can be adjusted when the video frame quality is coded according to the video frame quality is adjusted.
The methods described in connection with the above embodiments are described in further detail below by way of example.
Fig. 4 is a flowchart illustrating another step of the video encoding method according to an embodiment of the present application. For ease of understanding, embodiments of the present application are described in conjunction with FIG. 4.
In the embodiments of the present application, description will be made from the viewpoint of a video encoding apparatus, which may be integrated in a computer device such as a terminal or a server in particular. For example, when the processor on the computer device executes a program corresponding to the video encoding method, the specific flow of the video encoding method is as follows:
201. and acquiring the current video source code rate of the target video source, and synchronously updating the coding code rate of the video encoder according to the video source code rate.
In the embodiment of the application, the coding rate of the video coder is adjustable, so that the coding rate of the video coder is reasonable, and the video coding rate can be adjusted according to the video source code rate of the video source, so that the picture quality of the video frame with higher complexity after coding is better. The video source code rate of the target video source can be acquired by reducing the acquisition granularity of the video source code rate, the coding code rate of the video coder can be synchronously adjusted, the finer adjustment of the coding code rate of the video coder can be realized, and the video coder is used as the basic code rate of video coding and has reliability.
Specifically, the video source code rate of the target video source can be determined according to the interval between two adjacent key video frames of the target video source, so as to adjust the coding code rate of the video encoder. It should be noted that the video Frame stream data of the target video source may be organized and transmitted to the server in the form of a video Frame sequence, and the video Frame sequence may include different types of video frames, such as a key video Frame (I ntra-coded Frame, I Frame), a forward prediction Frame (PRED ICTED FRAME, P Frame), and a Bi-prediction Frame (Bi-d I rect iona LPRED ICTED FRAME, B Frame). The video source rate between two key frames may be statistically determined based on the spacing (group of pictures, GOP) between the two key frames.
Specifically, the frame rate of the target video source is determined, the interval, such as the interval length or the interval duration, between two adjacent key video frames in the transmission sequence of the target video source is detected in real time, and then the video source code rate is calculated according to the interval between the two adjacent key video frames and the frame rate, such as the video source code rate is determined according to the ratio between the frame rate and the interval. Further, the coding rate in the video encoder is synchronously updated according to the video source code rate, so that the current coding rate can be adapted to the frame sequence to be coded between the current two key frames in the video frame sequence, and can be understood as a sub-sequence or a fragment sequence. Based on the method, the real-time video frame stream data of the target video source are organized in a sequence form, and the video source code rate in the real-time stage is determined by adopting the intervals among key frames and is used as a basic code rate when a video encoder encodes, so that the requirement of the encoding code rate of each video frame sequence is more met, and the flexibility and the reliability are realized.
202. And acquiring a frame sequence to be encoded of the target video source, and starting to encode the first video frame in the frame sequence to be encoded according to the encoding code rate by a video encoder.
In the embodiment of the application, after the real-time coding code rate of the video encoder is obtained, the frame sequence to be coded corresponding to the target video source can be obtained, so that the video frames in the frame sequence to be coded can be coded according to the current coding code rate by the video encoder.
When the server performs video encoding, the server may receive and acquire video frame stream data of the target video source and perform encoding processing on the video frame stream data acquired in real time, and if the video frame stream data is transmitted in a sequence form, the server may sequentially receive and acquire each frame sequence to be encoded of the target video source and sequentially perform encoding processing on video frames in each frame sequence to be encoded. Therefore, the encoding process of the video stream data of the target video source is performed in real time, if the server has completed encoding the video frames in the historical frame sequence to be encoded, which is acquired in the historical time, when the server acquires a frame sequence to be encoded, and when the video encoder in the server has updated and adjusted to the corresponding encoding code rate, the video encoder uses the current encoding code rate as the basic code rate, that is, begins encoding the first video frame in the frame sequence to be encoded according to the current real-time encoding code rate.
203. A similarity between a first video frame currently ready for encoding and a second video frame located adjacent to the first video frame in a sequence of frames to be encoded is determined.
In the embodiment of the application, because the network bandwidth resources are limited, if the difference between the first video frame and the second video frame in the frame sequence to be encoded is too small, the video encoder continues to encode the currently encoded video frame by using the encoding code rate of the video encoder when encoding the previous video frame, and the image quality of the currently encoded video frame is similar to that of the previous video frame. Therefore, if the difference between the first video frame and the second video frame in the frame sequence to be encoded is too small, the encoding rate of the video encoder is not adjusted even if the texture complexity of the first video frame is greater than that of the second video frame, but the encoding rate of the video encoder is used when the second video frame is encoded, so that the first video frame is encoded. Therefore, the frequency of adjusting the coding rate in the video coding process can be reduced, so that the occupancy rate of the transmission of video frame stream data after video coding to bandwidth resources is reduced as much as possible.
The similarity may refer to a color similarity between the first video frame and the second video frame.
Specifically, when determining the color similarity between the first video frame and the second video frame according to the histogram statistical algorithm, first, a first pixel value of each first pixel point in the first video frame may be obtained, the statistical number of the first pixel points corresponding to each first pixel value dimension in the first video frame may be counted, the width (i.e., the numerical value of the abscissa) of the histogram may be taken as each first pixel value dimension, the statistical number of the first pixel points corresponding to each first pixel value dimension may be taken as the height of the corresponding column in the histogram, and the first histogram for the first video frame may be generated to represent the color condition of the first video frame. Meanwhile, a second histogram corresponding to the second video frame is obtained, the generation process of the second histogram can refer to the first histogram, if the second histogram is already generated in the historical time, and the second histogram corresponding to the second video frame can be directly obtained. Further, determining a first statistical number of pixels in each pixel value dimension in the first histogram, determining a second statistical number of pixels in each pixel value dimension in the second histogram, determining a difference in the number of pixels in each pixel value dimension between the first video frame and the second video frame based on the first statistical number and the second statistical number in each pixel value dimension, and determining a color similarity between the first video frame and the second video frame based on the difference in the number of pixels in each pixel value dimension.
In this way, the degree of difference in color characteristics between the first video frame and the second video frame is evaluated through the color similarity, so that a subsequent decision is made whether to adjust the coding rate of the video encoder according to the texture difference between the first video frame and the second video frame.
204. When the similarity is less than a preset similarity threshold, determining a first image texture value of the first video frame and determining a second image texture value of the second video frame.
In the embodiment of the application, the similarity between the first video frame and the second video frame is compared with a preset similarity threshold, and the preset similarity threshold can be used for judging the similarity between the first video frame to be encoded and the second video frame which is encoded before, so as to judge the difference degree between the first video frame and the second video frame. The greater the similarity, the lesser the degree of difference of the first video frame relative to the second video frame, and vice versa. And carrying out difference judgment between the first video frame and the second video frame through the preset similarity threshold value so as to judge whether the image texture values of the first video frame and the second video frame need to be determined or not.
In one aspect, when the similarity between the first video frame and the second video frame is smaller than a preset similarity threshold, the difference between the first video frame and the second video frame is excessively large, at this time, a first image texture value of a first video frame currently prepared for encoding in a frame sequence to be encoded is determined, and a second image texture value of a second video frame adjacent to the first video frame is determined, so that the encoding code rate of the video encoder is adjusted when the first image texture value is larger than the second image texture value.
On the other hand, when the similarity is greater than or equal to the preset similarity threshold, the first video frame currently prepared for encoding in the frame sequence to be encoded is encoded by the video encoder according to the current encoding code rate (i.e., the encoding code rate when encoding the second video frame).
205. When the first image texture value is greater than the second image texture value, an image texture ratio is determined from the first image texture value and the second image texture value.
In the embodiment of the application, after the first image texture value of the first video frame and the second image texture value of the second video frame are obtained, the first image texture value is compared with the second image texture value to determine the texture complexity between the first video frame and the second video frame, if the image texture difference value is larger than the preset texture difference threshold value, the difference degree between the first video frame and the second video frame is larger, and the previous coding rate of the video frame encoder cannot meet the image quality requirement of the first video frame, at this time, the image texture ratio between the first image texture value and the second image texture value needs to be determined so as to adjust the previous coding rate of the video encoder according to the image texture ratio later. On the other hand, if the image texture difference value is less than or equal to the preset texture difference threshold value, it is indicated that the difference degree between the first video frame and the second video frame is smaller, the previous coding rate of the video encoder may not be increased, and the first video frame may be continuously coded along the coding rate used for coding the second video frame. Therefore, the adjustment frequency of the coding rate of the video coder is reduced, and the occupation amount of transmission bandwidth resources can be reduced.
206. And adjusting the coding code rate of the video coder according to the image texture ratio to obtain a target coding code rate, and coding the first video frame according to the target coding code rate.
In the embodiment of the application, the previous coding rate of the video encoder can be adjusted according to the image texture ratio, and the current coding rate of the video encoder can be understood as the coding rate used when the second video frame is coded because the coding rate of the video encoder can be adjusted in real time in the coding process of the previous video frame in the sequence of frames to be coded, so that the previous coding rate of the video encoder is adjusted according to the image texture ratio, the adjustment amplitude of the coding rate of the video encoder is matched with the texture complex difference ratio of the previous and next video frames, the coding rate of the video encoder is adjusted more finely and accurately according to the frame level, and the current first video frame is coded according to the newly adjusted coding rate, thereby realizing the maximized saving of bandwidth resources and reliability while ensuring the picture quality after the first video frame is coded.
Specifically, when the coding rate of the video encoder is determined to be adjusted, the current coding rate of the video encoder is obtained, for example, the coding rate used when the second video frame is encoded is obtained, the upper limit value of the coding rate aiming at the video encoder is obtained, further, the coding rate is weighted according to the image texture ratio, the coding rate of the video encoder is adjusted according to the texture complex difference ratio of the front video frame and the rear video frame as an adjustment amplitude, and the coding rate of the video encoder is adjusted to be higher, so that the coding rate of the video encoder is adjusted according to the frame level, and the candidate coding rate is obtained. Further, the upper limit value of the coding rate of the video encoder is used as a rate adjustment limiting threshold value, if the adjusted candidate coding rate is greater than or equal to the upper limit value of the coding rate of the video encoder, the upper limit value of the coding rate of the video encoder is used as a target coding rate, otherwise, if the adjusted candidate coding rate is less than the upper limit value of the coding rate of the video encoder, the coding rate of the video encoder is adjusted according to the candidate coding rate, namely, the candidate coding rate is used as the target coding rate of the video encoder.
The upper limit value of the coding rate can be a coding rate limit value of a video encoder, and can be understood as the maximum coding rate of the video encoder, and the upper limit value of the coding rate can be set according to the available bandwidth resources or the expected target bandwidth resources, so that the maximum coding rate cannot exceed the upper limit value of the coding rate in the process of coding rate adjustment (namely code control), thereby maximally ensuring the image quality of video frames, simultaneously enabling the stream data transmission of the coded video frames to be smooth under the limited bandwidth resources, avoiding the phenomenon of video playing blocking, and having reliability.
By the method, the coding rate of the video encoder can be adjusted according to the image texture ratio between the front video frame and the rear video frame, so that the adjustment amplitude of the coding rate of the video encoder is matched with the texture complex difference ratio of the front video frame and the rear video frame, the coding rate of the video encoder is adjusted more finely and accurately according to the frame level, the current first video frame is encoded according to the newly adjusted coding rate, and the maximum bandwidth resource saving is realized while the picture quality after the first video frame is encoded is ensured, so that the video encoder has reliability.
For the convenience of understanding the embodiments of the present application, the embodiments of the present application will be described with specific application scenario examples. Specifically, the application scenario example is described by executing steps 201-206 above.
It should be noted that the video encoding method is suitable for video live broadcast, real-time audio and video, and other scenes, such as live broadcast, game pictures, multi-user team collaborative game play, online watching of video, and the like. The video coding mainly compresses video frames in live broadcasting to improve the data flow of the video file in unit time, and can be understood as the data transmission quantity, so that the best balance between the video picture quality and the length of a user watching a pause is achieved on the premise of meeting a target bandwidth, and the video picture quality is improved. The following description is made by using the video coding method of the video coding scene example, which specifically comprises the following steps:
1. The video coding scene instance profile is as follows:
The video coding can be realized based on video coding code control (namely code rate control), wherein the video coding code control is used for adjusting quantization parameters of an image layer and a coding unit layer in the coding process by analyzing the complexity and the current residual bit number of the video, so that the encoder achieves the aim of optimally balancing the video quality and the time length of a user watching a pause on the premise of meeting a target bandwidth (for example, a range value with a similar video source code rate). Therefore, in live broadcast or real-time audio and video scenes, the coding rate of the control encoder can be adjusted according to the video source code rate of the video source so as to ensure the video picture quality.
The video coding scene example is based on code rate calculation of frame level, so that the coding code rate of a control encoder is adjusted, the encoder achieves the aim of optimally balancing video quality and the length of a user watching a pause on the premise of meeting a target bandwidth (for example, a range value with similar video source code rate), and video image quality is ensured.
Fig. 5 is an exemplary diagram of a video encoding scene according to an embodiment of the present application. In order to realize the optimal allocation problem of video data transmission in a limited bandwidth resource, as shown in fig. 5, the video coding scene is specifically shown as follows, a video source is input, the video source transmits in a video frame (image) sequence mode, the sequence comprises a plurality of video frames, in the process of coding each video frame, the video frames can be pre-analyzed to determine the space-time complexity of the video frames, so that the code rate of the encoder is controlled based on the space-time complexity of the video frames, specifically, the quantization parameters (Quant i zat ion Parameter, QP) of an image layer in the encoder are adjusted, and the Quantization Parameters (QP) of coding units in the encoder are adjusted, so that the actual code rate of the encoder is compressed and buffered to code the video frames according to the meeting target code rate, and the video image quality is as optimal as possible under the target bandwidth.
Fig. 6 is a diagram illustrating a code rate variation in a code control scenario of video coding according to an embodiment of the present application. Referring to fig. 6, in the video coding progress, no code control is performed between the progress a and the progress B, in the stage a to the stage B, the actual code rate is variable in magnitude, low in time, unstable, and the actual code rate exceeds the upper limit of the target bandwidth, so that in order to enable the coding code rate to meet the target bandwidth requirement, the code control can be performed, the output code rate is generally below the upper limit of the target bandwidth, the output code rate is biased to be stable, and the condition of large-amplitude up-down fluctuation does not exist, so that the quality of the coded video picture is effectively ensured, and the reliability is provided.
2. The specific implementation process of the video coding scene example is as follows:
The VIDEO encoding scene example is not limited to being applied to VIDEO live broadcast and VIDEO on demand scenes, and supports encoding of VIDEO media Stream data in various data formats, for example, supports formats such as real-time message protocol (Rea l-TIME MESSAGI NG Protoco l, RTMP), streaming media (FLASH VIDEO, FLV), code rate adaptation (HTTP LIVE STREAMING, HLS), dynamic adaptive streaming media (DYNAMIC ADAPT IVE STREAMING over HTTP, DASH), web VIDEO and voice real-time communication (Web real-time Time Communicat ions, webRTC), digital television broadcasting (Transport Stream, TS), moving image (Movi ng Picture Experts Group, mp 4), and the like. Taking a live video scene as an example, a client of a host can transmit video stream data to a video coding server in real time so as to code the live video stream through the video coding server, wherein the live video stream can be an image stream formed by organizing a plurality of continuous video frames by taking a frame sequence as a unit, and each video frame in the frame sequence is coded through the video coding server.
The video coding process may include two parts of coding rate control adjustment and video frame coding, and the present video coding scene example will be described again with reference to the coding rate control adjustment process.
(1) Coding rate control adjustment
Fig. 7 is a diagram illustrating an example of a video encoding process according to an embodiment of the present application. In conjunction with fig. 7, the adjustment of the coding rate control in the video coding process is described as follows:
the video frames are organized in units of a sequence, the first video frame of a sequence being called an immediate refresh image (I nstantaneous Decoder Refresh, IDR) video frame, the IDR frames being key frame (I-frame) images, a sequence of frames to be encoded may contain a number of I-frame images, the images following an I-frame image may refer to the images between the I-frame images as motion references.
And (1.1) counting the video source code rate according to the interval duration between key frames.
In a live video or real-time audio/video scene, in order to meet the code rate requirement of a video coding process, the coding code rate of a video coder can be preliminarily adjusted according to the code rate of a video source, and the coding code rate is used as the video coding basic coding code rate of a current frame sequence to be coded so as to carry out coding code rate adjustment control on the basis. The video source coding can be determined in real time according to the granularity of a frame interval (group of p I ctures, GOP) between two key frames (I frames), the frame interval can represent one video frame group (to-be-coded frame group), specifically, the frame length between two key frames (I frames) can be determined first, and the frame rate of a playing client can be determined, so that the number of video frame groups in unit time, namely the video source code rate, is determined according to the ratio between the frame rate and the frame length, and then the video source code rate is dynamically loaded and updated to a video coding kernel (namely a video encoder) according to the current video source code rate, so that the video encoder obtains the coding code rate of the current foundation.
(1.2) Detecting whether a mutation is found between any two adjacent video pictures (video frames) according to a frame level, wherein the detection method can be color similarity estimation (Hue Saturat i on Va l ue, HSV), histogram statistics, hash similarity statistics and the like. Taking as an example whether any two adjacent video pictures are suddenly changed according to the frame level histogram, the method specifically comprises the following steps:
For two adjacent frames of video pictures, respectively calculating and generating a histogram of each video picture, specifically, calculating a pixel value of each pixel point in the video frame, and counting the number of pixel points corresponding to each pixel value according to a scalar of 0 to 255 and 256 pixel value scalar, thereby generating a histogram of the video frame about colors based on the number of pixel points corresponding to each pixel value.
Further, based on the histograms of the two video frames, the color similarity between the video frames is calculated, and the specific expression is as follows:
Wherein S h represents the color similarity between two video frames, the closer the similarity is to 1, the closer the similarity is to 0, the greater the difference between two video frames is represented, n represents the width of the histogram, a i represents the value of one of the video frames at the pixel value i of the histogram (i.e., the statistic of the pixel point), and b i represents the value of one of the video frames at the pixel value i of the histogram.
In this way, by the above manner, the similarity between two video frames can be determined, and thus the difference value between the two video frames can be determined, specifically according to the expression of the difference value according to "1-S h".
(1.3) If the difference value exceeds 20% for two adjacent video pictures, calculating the texture complexity of the two frames of video pictures. For example, the analysis method of the picture texture complexity includes Euclidean distance, statistical histogram, local binary pattern (Loca l Bi NARY PATTERNS, LBP) detection algorithm, convolutional neural network (Convo l ut i ona l Neura l Networks, CNN) feature extraction classification algorithm, etc., taking LBP detection algorithm as an example, the local binary pattern is an operator for describing local features of an image, and the LBP features have significant advantages of gray scale invariance, rotation invariance, etc., and the specific procedures are as follows:
Fig. 8 is a diagram illustrating an exemplary calculation of an image texture value of a video frame according to an embodiment of the present application. Referring to fig. 8, a description will be given of a process of calculating an image texture value. The original LBP operator is defined as that in the window of 3*3, the gray value of the adjacent 8 pixels is compared with the gray value of the adjacent 8 pixels by taking the central pixel of the window as a threshold value, if the surrounding pixel value is larger than the central pixel value, the position of the pixel point is marked as 1, otherwise, the position of the pixel point is marked as 0. In this way, 8 points in the 3*3 adjacent areas can be compared to generate 8-bit binary numbers (usually converted into decimal numbers, namely LBP codes, 256 types in total), namely, the LBP value of the central pixel point of the window is obtained, as shown in fig. 8, 10=124 is obtained (01111100), the texture information of the area is reflected by the value, and the texture LEB value of the video frame is calculated by integrating the LBP values of the areas detected by each picture.
It should be noted that, when detecting the image texture value of a video frame, the smaller the operator window, the more windows that can be divided by a video frame, the larger the calculated amount, whereas if the larger the window is, the fewer divided windows are, the calculated amount is relatively reduced, so the size of the detection area of the video frame can be automatically adjusted according to the resolution and the computational effort of the video frame, so as to reduce the calculated amount when calculating the image texture value.
When detecting the image texture value of a video frame, the larger the resolution of the video frame, the more windows are divided for the same size, and the larger the calculation amount is. Based on this, the resolution of the video picture can be adjusted by downsampling, for example, a video picture with 4K resolution is downsampled to obtain a video picture with 2K resolution, the number of divided windows is reduced by 4 times, and the calculated amount is reduced by 4 times, so that a video picture with 2K resolution can be used for calculating the texture value, and a similar value can be taken instead of the window calculated value of the video picture with 4K resolution.
(1.4) The higher the texture complexity, the higher the code rate consumed for encoding, and if the image texture value of the current video frame is greater than that of the previous video frame, the code rate of the video encoder is adjusted. For example, if the image texture value of the current video frame is LEB1, the image texture value of the previous video frame is LEB0, if LEB1> LEB0, the video coding rate of the video coding kernel is updated and increased in real time, and if the coding rate value of the current video coding kernel is b itrate, the configurable maximum video rate of the video coding kernel is max-b itrate, the coding rate of the video coding kernel is adjusted to obtain a target coding rate, which is specifically expressed as b itrate =min (max-b itrate, b itrate ×leb1/LEB 0).
In the present video coding scenario, b itrate may be updated iteratively when multiple video frames in the same sequence are encoded, with adjustments based on b itrate coding rate at the previous frame.
(2) Video frame encoding. Specifically, for each video frame in the frame sequence to be encoded, each frame is encoded sequentially in a progressive manner, and in the encoding process, the encoding code rate of the video encoding core is adjusted according to the flows (1.1) to (1.4) in the above (1), so as to encode the video frame of the current frame according to the adjusted encoding code rate.
And (3) in the video coding scene of the whole video live broadcast or video on demand, the coding of the whole video live broadcast or video on demand process is finished according to the (1) coding code rate control adjustment and (2) video frame coding. Real-time dynamic detection of whether video pictures are suddenly cut from static or relatively low-bit-rate scenes to relatively high-bit-rate and relatively complex-picture scenes or not, and if so, the video coding bit rate is adjusted in real time and only up-regulated and not down-regulated, so that the problem of statistic delay of the existing video bit rate statistic method caused by switching of the scenes is solved, and the situation of relatively poor picture quality caused by insufficient bit rate allocation of video coding cores due to insufficient video bit rate statistics is caused
By executing the video coding scene example, the method has the advantages that the image quality code control is improved based on frame-level code rate calculation optimization, so that the coding code rate of a video coder is as close as possible to the video source code rate and does not exceed the set target bandwidth, and therefore, after the video picture scene is stable, the fixed duration code rate statistics can ensure that the counted video code rate does not exceed the source initial stream, the situation that the code rate is out of control due to real-time frame dynamic adjustment is prevented, the upward increase is relatively large, and finally, the video code rate is too high to influence the video watching quality experience of later audiences.
As can be seen from the foregoing, the embodiment of the present application can acquire the current coding rate of the video encoder in real time, then, the video frames in the current frame sequence to be coded of the target video source are coded according to the coding rate, further, in the coding process, the texture complexity between the first video frame currently prepared for coding and the second video frame adjacent to the first video frame is compared in real time, the picture content richness of the video frame is represented by the texture complexity, so as to evaluate the required coding rate, if the current first video frame is more complex than the texture of the previous frame, the image texture ratio between the first video frame and the second video frame is determined according to the first image texture value and the second image texture value, finally, the coding rate of the video encoder is adjusted to the target coding rate according to the image texture ratio, so that the video encoder encodes the first video frame according to the adjusted target coding rate, compared with the video frame coding scheme of the video source according to the fixed duration statistics in the prior art, the texture of the video frame sequence to be coded can be better than the texture of the video frame sequence to ensure that the texture of the video frame is better than the texture of the video frame sequence to be coded, and the quality of the video frame can be better adjusted after the video frame is coded according to the video frame quality is better than the video frame quality of the video frame after the video frame is coded.
The specific implementation of the above steps can be referred to the previous embodiments, and will not be repeated here.
In order to facilitate better implementation of the video coding method provided by the embodiment of the application, the embodiment of the application also provides a device based on the video coding method. Where the meaning of the terms is the same as in the video coding method described above, specific implementation details may be referred to in the description of the method embodiments.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application, which is integrated into a computer device according to the present application, wherein the video encoding apparatus may include a first obtaining unit 401, an encoding unit 402, a first determining unit 403, a second determining unit 404, and an adjusting unit 405.
An obtaining unit 401, configured to obtain a current coding rate of the video encoder;
the encoding unit 402 is configured to obtain a frame sequence to be encoded of the target video source, and encode, by using the video encoder, a first video frame in the frame sequence to be encoded according to an encoding rate;
a first determining unit 403, configured to determine a first image texture value of a first video frame currently prepared for encoding in a sequence of frames to be encoded, and determine a second image texture value of a second video frame located adjacent to the first video frame, where the first video frame is any video frame after the first video frame;
a second determining unit 404, configured to determine an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is larger than the second image texture value;
and the adjusting unit 405 is configured to adjust the coding rate of the video encoder according to the image texture ratio, obtain a target coding rate, and encode the first video frame according to the target coding rate.
In some embodiments, the video encoding apparatus further includes a third determining unit configured to:
Determining a similarity between the first video frame and the second video frame;
the first determining unit 403 is further configured to:
And when the similarity is smaller than a preset similarity threshold value, determining a first image texture value of a first video frame currently prepared for encoding in the frame sequence to be encoded.
In some embodiments, the similarity includes a color similarity, and the third determining unit is further configured to:
Acquiring a first pixel value of each first pixel point in a first video frame currently prepared for encoding in a frame sequence to be encoded, and generating a first histogram for the first video frame according to the statistical number of different first pixel values;
acquiring a second pixel value of each second pixel point in the second video frame, and generating a second histogram for the second video frame according to the statistical number of different second pixel values;
A color similarity between the first video frame and the second video frame is determined based on a difference between the first histogram and the second histogram.
In some embodiments, the third determining unit is further configured to:
determining a first statistics of the first video frame in each pixel value dimension from the first histogram and a second statistics of the second video frame in each pixel value dimension from the second histogram;
Determining a pixel number difference value of the first video frame and the second video frame in each pixel value dimension according to the first statistics and the second statistics in each pixel value dimension;
color similarity between the first video frame and the second video frame is determined based on the difference in the number of pixels in each pixel value dimension.
In some embodiments, the second determining unit 404 is further configured to:
acquiring an image texture difference value of the first image texture value and the second image texture value;
And when the image texture difference value is larger than a preset texture difference threshold value, calculating an image texture ratio of the first image texture value and the second image texture value.
In some embodiments, the first determining unit 403 is further configured to:
Acquiring a gray value of each first pixel point in a first video frame currently prepared for encoding in a frame sequence to be encoded;
and determining a first image texture value of the first video frame according to the gray value of each first pixel point.
In some embodiments, the first determining unit 403 is further configured to:
obtaining a target local texture window;
Dividing a first pixel point in a first video frame according to a target local texture window to obtain a plurality of divided target local areas;
determining a first local texture value corresponding to each target local area in the first video frame based on the gray value of the first pixel point contained in each target local area;
a first image texture value for a first video frame is determined based on the first local texture value for each target local region.
In some embodiments, the first determining unit 403 is further configured to:
Determining a central first pixel point in each target local area and edge first pixel points around the central first pixel point;
comparing the gray value of the first pixel point at each edge with the gray value of the first pixel point at the center according to each target local area to obtain a plurality of comparison results of each target local area;
And respectively carrying out binarization processing on a plurality of comparison results of each target local area to obtain a first local texture value corresponding to each target local area.
In some embodiments, the first determining unit 403 is further configured to:
acquiring a candidate local texture window, and determining a candidate window range value of the candidate local texture window;
acquiring target available resource quantity of the image computing resource, inquiring a window range list according to the target available resource quantity to obtain a target window range value, wherein the window range list comprises association relations between different available resource quantities and window range values;
When the candidate window range value is smaller than the target window range value, the candidate local texture window is adjusted according to the target window range value, and a target local texture window is obtained;
And when the candidate window range value is greater than or equal to the target window range value, determining the candidate local texture window as a target local texture window.
In some embodiments, the video encoding apparatus further comprises a processing unit for:
Acquiring image resolution information corresponding to a first video frame currently prepared for encoding in a frame sequence to be encoded;
Based on the image resolution information, performing downsampling processing on the first video frame to obtain a downsampled first video frame;
The first determining unit is further configured to obtain a gray value of each first pixel point in the first video frame after the downsampling process.
In some embodiments, the adjusting unit 405 is further configured to:
acquiring an upper limit value of a coding rate for a video encoder;
weighting the coding code rate according to the image texture ratio to obtain candidate coding code rate;
When the upper limit value of the coding rate is larger than the candidate coding rate, the coding rate of the video coder is adjusted according to the candidate coding rate, so that a target coding rate is obtained;
And when the upper limit value of the coding rate is smaller than the candidate coding rate, adjusting the coding rate of the video coder according to the upper limit value of the coding rate to obtain the target coding rate.
In some embodiments, the adjusting unit 405 is further configured to:
acquiring a third image texture value of a third video frame which is positioned behind the first video frame and is adjacent to the first video frame in the frame sequence to be encoded;
When the third image texture value is detected to be larger than the first image texture value, determining a target image texture ratio between the third video frame and the first video frame;
adjusting the target coding rate of the video coder according to the target image texture ratio to obtain an adjusted target coding rate, and coding a third video frame according to the adjusted target coding rate;
And when the third image texture value is detected to be smaller than or equal to the first image texture value, encoding the third video frame according to the target encoding code rate.
In some embodiments, the obtaining unit 401 is further configured to:
acquiring a current video source code rate of a target video source;
and synchronously updating the coding rate of the video coder according to the video source code rate.
As can be seen from the foregoing, the embodiment of the present application can acquire the current coding rate of the video encoder in real time, then, the video frames in the current frame sequence to be coded of the target video source are coded according to the coding rate, further, in the coding process, the texture complexity between the first video frame currently prepared for coding and the second video frame adjacent to the first video frame is compared in real time, the picture content richness of the video frame is represented by the texture complexity, so as to evaluate the required coding rate, if the current first video frame is more complex than the texture of the previous frame, the image texture ratio between the first video frame and the second video frame is determined according to the first image texture value and the second image texture value, finally, the coding rate of the video encoder is adjusted to the target coding rate according to the image texture ratio, so that the video encoder encodes the first video frame according to the adjusted target coding rate, compared with the video frame coding scheme of the video source according to the fixed duration statistics in the prior art, the texture of the video frame sequence to be coded can be better than the texture of the video frame sequence to ensure that the texture of the video frame is better than the texture of the video frame sequence to be coded, and the quality of the video frame can be better adjusted after the video frame is coded according to the video frame quality is better than the video frame quality of the video frame after the video frame is coded.
The specific implementation of each unit can be referred to the previous embodiments, and will not be repeated here.
Referring to fig. 10, fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application, which includes a structural block of a portion of the terminal 140 implementing an embodiment of the present application, where the terminal 140 includes a radio frequency (Rad i oFrequency, abbreviated as RF) circuit 510, a memory 515, an input unit 520, a display unit 540, a sensor 550, an audio circuit 560, a wireless fidelity (WI RE L ESS F I DE L I TY, abbreviated as WiFi) module 570, a processor 580, and a power supply 590. It will be appreciated by those skilled in the art that the terminal 140 structure shown in fig. 11 is not limiting of a cell phone or computer and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The RF circuit 510 may be used for receiving and transmitting signals during a message or a call, specifically, receiving downlink information from a base station, processing the downlink information by the processor 580, and transmitting uplink data to the base station.
The memory 515 may be used to store software programs and modules, and the processor 580 performs various functional applications of the terminal and video encoding by executing the software programs and modules stored in the memory 515.
The input unit 520 may be used to receive input numerical or character information and to generate key signal inputs related to the setting and function control of the terminal. Specifically, the input unit 520 may include a touch panel 531 and other input devices 532.
The display unit 540 may be used to display input information or provided information and various menus of the terminal. The display unit 540 may include a display panel 541.
Audio circuitry 560, speakers 561, and microphone 562 may provide an audio interface.
In this embodiment, the processor 580 included in the terminal 140 may perform the video encoding method of the previous embodiment.
The terminal 140 of the embodiment of the present application includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, an aircraft, etc. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.
Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application, which includes a structural frame of a portion of a server 110 implementing an embodiment of the present application. The server 110 may vary considerably in configuration or performance and may include one or more central processing units (Centra l Process i ng Un its, simply CPUs) 622 (e.g., one or more processors) and memory 632, one or more storage mediums 630 (e.g., one or more mass storage devices) that store applications 642 or data 644. Wherein memory 632 and storage medium 630 may be transitory or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the server 600. Still further, the central processor 622 may be configured to communicate with a storage medium 630 and execute a series of instruction operations in the storage medium 630 on the server 600.
The server 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input/output interfaces 658, and/or one or more operating systems 641, such as Wi ndows ServerTM, mac OS XTM, uni xTM, li nuxTM, freeBSDTM, and the like.
The central processor 622 in the server 600 may be used to perform the video encoding method according to the embodiment of the present application, specifically as follows:
The method comprises the steps of obtaining a current coding code rate of a video coder, obtaining a frame sequence to be coded of a target video source, starting coding a first video frame in the frame sequence to be coded according to the coding code rate through the video coder, determining a first image texture value of a first video frame currently prepared to be coded in the frame sequence to be coded, determining a second image texture value of a second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame, determining an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is larger than the second image texture value, adjusting the coding code rate of the video coder according to the image texture ratio, obtaining the target coding code rate, and coding the first video frame according to the target coding code rate.
The embodiment of the application also provides a computer readable storage medium, which is used for storing program codes, and the program codes are used for executing the video coding method of each embodiment, and the method is specifically as follows:
The method comprises the steps of obtaining a current coding code rate of a video coder, obtaining a frame sequence to be coded of a target video source, starting coding a first video frame in the frame sequence to be coded according to the coding code rate through the video coder, determining a first image texture value of a first video frame currently prepared to be coded in the frame sequence to be coded, determining a second image texture value of a second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame, determining an image texture ratio according to the first image texture value and the second image texture value when the first image texture value is larger than the second image texture value, adjusting the coding code rate of the video coder according to the image texture ratio, obtaining the target coding code rate, and coding the first video frame according to the target coding code rate.
Embodiments of the present application also provide a computer program product comprising a computer program. The processor of the computer device reads the computer program and executes it, causing the computer device to execute the video encoding method as described above.
Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It should be understood that in the description of the embodiments of the present application, plural (or multiple) means two or more, and that greater than, less than, exceeding, etc. are understood to not include the present number, and that greater than, less than, within, etc. are understood to include the present number.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a U disk, a removable hard disk, a Read-On-y Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk.
It should also be appreciated that the various embodiments provided by the embodiments of the present application may be arbitrarily combined to achieve different technical effects.
In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.
The embodiments of the present application have been described in detail, but the present application is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present application, and the equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.

Claims (17)

1.一种视频编码方法,其特征在于,包括:1. A video encoding method, characterized in that it includes: 获取视频编码器当前的编码码率;Get the current bitrate of the video encoder; 获取目标视频源的待编码帧序列,并通过所述视频编码器按照所述编码码率对所述待编码帧序列中的首个视频帧进行编码;The target video source is obtained as a sequence of frames to be encoded, and the first video frame in the sequence is encoded by the video encoder according to the encoding bitrate. 确定所述待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值,以及确定位于所述第一视频帧之前相邻的第二视频帧的第二图像纹理值,所述第一视频帧为所述首个视频帧之后的任意视频帧;Determine the first image texture value of the first video frame currently to be encoded in the frame sequence to be encoded, and determine the second image texture value of the second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame; 当所述第一图像纹理值大于所述第二图像纹理值时,根据所述第一图像纹理值和所述第二图像纹理值确定图像纹理比值;When the first image texture value is greater than the second image texture value, the image texture ratio is determined based on the first image texture value and the second image texture value. 根据所述图像纹理比值调整所述视频编码器的编码码率,得到目标编码码率,并按照所述目标编码码率对所述第一视频帧进行编码。The encoding bitrate of the video encoder is adjusted according to the image texture ratio to obtain the target encoding bitrate, and the first video frame is encoded according to the target encoding bitrate. 2.根据权利要求1所述的方法,其特征在于,所述确定所述待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值之前,还包括:2. The method according to claim 1, characterized in that, before determining the first image texture value of the first video frame currently to be encoded in the frame sequence to be encoded, it further includes: 确定所述第一视频帧与所述第二视频帧之间的相似度;Determine the similarity between the first video frame and the second video frame; 则所述确定所述待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值,包括:Determining the first image texture value of the first video frame currently to be encoded in the frame sequence includes: 当所述相似度小于预设相似度阈值时,确定所述待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值。When the similarity is less than a preset similarity threshold, the first image texture value of the first video frame currently to be encoded in the frame sequence to be encoded is determined. 3.根据权利要求2所述的方法,其特征在于,所述相似度包括色彩相似度,所述确定所述第一视频帧与所述第二视频帧之间的相似度,包括:3. The method according to claim 2, wherein the similarity includes color similarity, and determining the similarity between the first video frame and the second video frame includes: 获取所述待编码帧序列中当前预备编码的第一视频帧中每个第一像素点的第一像素值,并根据不同的所述第一像素值的统计数量,生成针对所述第一视频帧的第一直方图;Obtain the first pixel value of each first pixel in the first video frame currently to be encoded in the frame sequence to be encoded, and generate a first histogram for the first video frame based on the statistical number of different first pixel values. 获取所述第二视频帧中每个第二像素点的第二像素值,并根据不同的所述第二像素值的统计数量,生成针对所述第二视频帧的第二直方图;Obtain the second pixel value of each second pixel point in the second video frame, and generate a second histogram for the second video frame based on the statistical count of different second pixel values; 基于所述第一直方图和所述第二直方图之间的差异,确定所述第一视频帧与所述第二视频帧之间的色彩相似度。The color similarity between the first video frame and the second video frame is determined based on the difference between the first histogram and the second histogram. 4.根据权利要求3所述的方法,其特征在于,所述基于所述第一直方图和所述第二直方图之间的差异,确定所述第一视频帧与所述第二视频帧之间的所述色彩相似度,包括:4. The method according to claim 3, wherein determining the color similarity between the first video frame and the second video frame based on the difference between the first histogram and the second histogram includes: 根据所述第一直方图确定所述第一视频帧在每个像素值维度上的第一统计数,以及根据所述第二直方图确定所述第二视频帧在每个像素值维度上的第二统计数;The first statistic of the first video frame in each pixel value dimension is determined based on the first histogram, and the second statistic of the second video frame in each pixel value dimension is determined based on the second histogram; 根据每个像素值维度上的第一统计数和第二统计数,确定所述第一视频帧与所述第二视频帧在每个像素值维度上的像素数量差值;Based on the first and second statistics in each pixel value dimension, determine the difference in the number of pixels between the first video frame and the second video frame in each pixel value dimension; 基于每个像素值维度上的像素数量差值,确定所述第一视频帧与所述第二视频帧之间的所述色彩相似度。The color similarity between the first video frame and the second video frame is determined based on the difference in the number of pixels in each pixel value dimension. 5.根据权利要求1所述的方法,其特征在于,所述根据所述第一图像纹理值和所述第二图像纹理值确定图像纹理比值,包括:5. The method according to claim 1, wherein determining the image texture ratio based on the first image texture value and the second image texture value comprises: 获取第一图像纹理值与所述第二图像纹理值的图像纹理差值;Obtain the image texture difference between the first image texture value and the second image texture value; 当所述图像纹理差值大于预设纹理差阈值时,计算所述第一图像纹理值和所述第二图像纹理值的图像纹理比值。When the image texture difference is greater than a preset texture difference threshold, the image texture ratio of the first image texture value and the second image texture value is calculated. 6.根据权利要求1或5所述的方法,其特征在于,所述确定所述待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值,包括:6. The method according to claim 1 or 5, characterized in that determining the first image texture value of the first video frame currently to be encoded in the frame sequence to be encoded includes: 获取所述待编码帧序列中当前预备编码的第一视频帧中每个第一像素点的灰度值;Obtain the grayscale value of each first pixel in the first video frame currently to be encoded in the frame sequence to be encoded; 根据每个所述第一像素点的灰度值确定所述第一视频帧的第一图像纹理值。The first image texture value of the first video frame is determined based on the grayscale value of each of the first pixels. 7.根据权利要求6所述的方法,其特征在于,所述根据每个所述第一像素点的灰度值确定所述第一视频帧的第一图像纹理值,包括:7. The method according to claim 6, wherein determining the first image texture value of the first video frame based on the grayscale value of each first pixel includes: 获取目标局部纹理窗口;Get the target local texture window; 按照所述目标局部纹理窗口,对所述第一视频帧中的第一像素点进行划分,得到划分后的多个目标局部区域;According to the target local texture window, the first pixel in the first video frame is divided to obtain multiple target local regions after division. 基于每个目标局部区域内包含的第一像素点的灰度值,确定所述第一视频帧中每个目标局部区域对应的第一局部纹理值;Based on the grayscale value of the first pixel contained in each target local region, determine the first local texture value corresponding to each target local region in the first video frame; 基于每个目标局部区域对应的第一局部纹理值,确定所述第一视频帧的第一图像纹理值。The first image texture value of the first video frame is determined based on the first local texture value corresponding to each target local region. 8.根据权利要求7所述的方法,其特征在于,所述基于每个目标局部区域内包含的第一像素点的灰度值,确定所述第一视频帧中每个目标局部区域对应的第一局部纹理值,包括:8. The method according to claim 7, wherein determining the first local texture value corresponding to each target local region in the first video frame based on the grayscale value of the first pixel points contained in each target local region includes: 确定每个目标局部区域内的中心第一像素点以及所述中心第一像素点周围的边缘第一像素点;Determine the central first pixel point and the edge first pixels point surrounding the central first pixel point within each target local region; 针对每个目标局部区域,将每个所述边缘第一像素点的灰度值与所述中心第一像素点的灰度值进行大小对比,得到每个目标局部区域的多个对比结果;For each target local region, the gray value of each edge first pixel is compared with the gray value of the center first pixel to obtain multiple comparison results for each target local region; 将每个目标局部区域的多个对比结果分别进行二值化处理,得到每个目标局部区域对应的第一局部纹理值。The multiple comparison results of each target local region are binarized to obtain the first local texture value corresponding to each target local region. 9.根据权利要求8所述的方法,其特征在于,所述获取目标局部纹理窗口,包括:9. The method according to claim 8, wherein acquiring the target local texture window comprises: 获取候选局部纹理窗口,并确定所述候选局部纹理窗口的候选窗口范围值;Obtain candidate local texture windows and determine the candidate window range value of the candidate local texture windows; 获取图像算力资源的目标可用资源量,并根据所述目标可用资源量查询窗口范围列表,得到目标窗口范围值,所述窗口范围列表包括不同的可用资源量与窗口范围值之间的关联关系;Obtain the target available resource amount of image computing power resources, and query the window range list based on the target available resource amount to obtain the target window range value. The window range list includes the correlation between different available resource amounts and window range values. 当所述候选窗口范围值小于所述目标窗口范围值时,根据所述目标窗口范围值调整所述候选局部纹理窗口,得到目标局部纹理窗口;When the candidate window range value is smaller than the target window range value, the candidate local texture window is adjusted according to the target window range value to obtain the target local texture window; 当所述候选窗口范围值大于或等于所述目标窗口范围值时,将所述候选局部纹理窗口确定为目标局部纹理窗口。When the candidate window range value is greater than or equal to the target window range value, the candidate local texture window is determined as the target local texture window. 10.根据权利要求6至9任一项所述的方法,其特征在于,所述获取所述待编码帧序列中当前预备编码的第一视频帧中每个第一像素点的灰度值之前,还包括:10. The method according to any one of claims 6 to 9, characterized in that, before obtaining the grayscale value of each first pixel in the first video frame currently to be encoded in the frame sequence to be encoded, the method further includes: 获取所述待编码帧序列中当前预备编码的第一视频帧对应的图像分辨率信息;Obtain the image resolution information corresponding to the first video frame currently to be encoded in the frame sequence to be encoded; 基于所述图像分辨率信息,对所述第一视频帧进行下采样处理,得到下采样处理后的第一视频帧;Based on the image resolution information, the first video frame is downsampled to obtain the downsampled first video frame. 则所述获取所述待编码帧序列中当前预备编码的第一视频帧中每个第一像素点的灰度值,包括:The step of obtaining the grayscale value of each first pixel in the first video frame currently to be encoded in the frame sequence to be encoded includes: 获取所述下采样处理后的第一视频帧中每个第一像素点的灰度值。Obtain the grayscale value of each first pixel in the first video frame after the downsampling process. 11.根据权利要求1所述的方法,其特征在于,所述根据所述图像纹理比值调整所述视频编码器的编码码率,得到目标编码码率,包括:11. The method according to claim 1, wherein adjusting the encoding bitrate of the video encoder according to the image texture ratio to obtain the target encoding bitrate comprises: 获取针对所述视频编码器的编码码率上限值;Obtain the upper limit of the encoding bitrate for the video encoder; 根据所述图像纹理比值对所述编码码率进行加权处理,得到候选编码码率;The coding bitrate is weighted according to the image texture ratio to obtain the candidate coding bitrate; 当所述编码码率上限值大于所述候选编码码率时,按照所述候选编码码率对所述视频编码器的编码码率进行调整,得到目标编码码率;When the upper limit of the coding bitrate is greater than the candidate coding bitrate, the coding bitrate of the video encoder is adjusted according to the candidate coding bitrate to obtain the target coding bitrate; 当所述编码码率上限值小于所述候选编码码率时,按照所述编码码率上限值对所述视频编码器的编码码率进行调整,得到目标编码码率。When the upper limit of the encoding bitrate is less than the candidate encoding bitrate, the encoding bitrate of the video encoder is adjusted according to the upper limit of the encoding bitrate to obtain the target encoding bitrate. 12.根据权利要求1或11所述的方法,其特征在于,所述按照所述目标编码码率对所述第一视频帧进行编码之后,还包括:12. The method according to claim 1 or 11, characterized in that, after encoding the first video frame according to the target coding bitrate, it further includes: 获取所述待编码帧序列中位于所述第一视频帧之后且相邻的第三视频帧的第三图像纹理值;Obtain the third image texture value of the third video frame that is located after and adjacent to the first video frame in the frame sequence to be encoded; 当检测到所述第三图像纹理值大于所述第一图像纹理值时,确定所述第三视频帧与所述第一视频帧之间的目标图像纹理比值;When the third image texture value is detected to be greater than the first image texture value, the target image texture ratio between the third video frame and the first video frame is determined; 根据所述目标图像纹理比值调整所述视频编码器的所述目标编码码率,得到调整后的目标编码码率,并按照所述调整后的目标编码码率对所述第三视频帧进行编码;The target encoding bitrate of the video encoder is adjusted according to the target image texture ratio to obtain the adjusted target encoding bitrate, and the third video frame is encoded according to the adjusted target encoding bitrate. 当检测到所述第三图像纹理值小于或等于所述第一图像纹理值时,按照所述目标编码码率对所述第三视频帧进行编码。When the third image texture value is detected to be less than or equal to the first image texture value, the third video frame is encoded according to the target encoding bitrate. 13.根据权利要求1所述的方法,其特征在于,所述获取视频编码器当前的编码码率,包括:13. The method according to claim 1, wherein obtaining the current encoding bitrate of the video encoder includes: 获取目标视频源当前的视频源码率;Get the current video bitrate of the target video source; 根据所述视频源码率同步更新视频编码器的编码码率。The encoding bitrate of the video encoder is updated synchronously based on the video source bitrate. 14.一种视频编码装置,其特征在于,包括:14. A video encoding apparatus, characterized in that it comprises: 获取单元,用于获取视频编码器当前的编码码率;The acquisition unit is used to acquire the current encoding bitrate of the video encoder. 编码单元,用于获取目标视频源的待编码帧序列,并通过所述视频编码器按照所述编码码率对所述待编码帧序列中的首个视频帧进行编码;The encoding unit is used to acquire the sequence of frames to be encoded from the target video source, and to encode the first video frame in the sequence of frames to be encoded by the video encoder according to the encoding bitrate. 第一确定单元,用于确定所述待编码帧序列中当前预备编码的第一视频帧的第一图像纹理值,以及确定位于所述第一视频帧之前相邻的第二视频帧的第二图像纹理值,所述第一视频帧为所述首个视频帧之后的任意视频帧;The first determining unit is used to determine the first image texture value of the first video frame currently to be encoded in the frame sequence to be encoded, and to determine the second image texture value of the second video frame adjacent to the first video frame, wherein the first video frame is any video frame after the first video frame; 第二确定单元,用于当所述第一图像纹理值大于所述第二图像纹理值时,根据所述第一图像纹理值和所述第二图像纹理值确定图像纹理比值;The second determining unit is configured to determine an image texture ratio based on the first image texture value and the second image texture value when the first image texture value is greater than the second image texture value. 调整单元,用于根据所述图像纹理比值调整所述视频编码器的编码码率,得到目标编码码率,并按照所述目标编码码率对所述第一视频帧进行编码。An adjustment unit is used to adjust the encoding bitrate of the video encoder according to the image texture ratio to obtain a target encoding bitrate, and to encode the first video frame according to the target encoding bitrate. 15.一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可以在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至13任一项所述的视频编码方法。15. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the video encoding method according to any one of claims 1 to 13. 16.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至13任一项所述的视频编码方法。16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a plurality of instructions adapted for loading by a processor to perform the video encoding method according to any one of claims 1 to 13. 17.一种计算机程序产品,包括计算机程序或指令,其特征在于,所述计算机程序或指令被处理器执行时实现权利要求1至13任一项所述的视频编码方法。17. A computer program product comprising a computer program or instructions, characterized in that, when the computer program or instructions are executed by a processor, they implement the video encoding method according to any one of claims 1 to 13.
CN202410913458.XA 2024-07-09 2024-07-09 Video encoding methods, apparatus, devices, and readable storage media Pending CN121309819A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202410913458.XA CN121309819A (en) 2024-07-09 2024-07-09 Video encoding methods, apparatus, devices, and readable storage media
PCT/CN2025/095634 WO2026011968A1 (en) 2024-07-09 2025-05-19 Video encoding method and apparatus, device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410913458.XA CN121309819A (en) 2024-07-09 2024-07-09 Video encoding methods, apparatus, devices, and readable storage media

Publications (1)

Publication Number Publication Date
CN121309819A true CN121309819A (en) 2026-01-09

Family

ID=98288608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410913458.XA Pending CN121309819A (en) 2024-07-09 2024-07-09 Video encoding methods, apparatus, devices, and readable storage media

Country Status (2)

Country Link
CN (1) CN121309819A (en)
WO (1) WO2026011968A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9305603B2 (en) * 2010-07-07 2016-04-05 Adobe Systems Incorporated Method and apparatus for indexing a video stream
CN110166781B (en) * 2018-06-22 2022-09-13 腾讯科技(深圳)有限公司 Video coding method and device, readable medium and electronic equipment
CN111385576B (en) * 2018-12-28 2021-08-10 北京字节跳动网络技术有限公司 Video coding method and device, mobile terminal and storage medium
CN113347421B (en) * 2021-06-02 2023-07-14 黑芝麻智能科技(上海)有限公司 Video encoding and decoding method, device and computer equipment

Also Published As

Publication number Publication date
WO2026011968A1 (en) 2026-01-15

Similar Documents

Publication Publication Date Title
US11949891B2 (en) Systems and methods for region-of-interest tone remapping
CN113766226B (en) Image coding method, device, equipment and storage medium
US12125171B2 (en) Video denoising method and apparatus, and storage medium
US20180063549A1 (en) System and method for dynamically changing resolution based on content
US20110026591A1 (en) System and method of compressing video content
US20060188014A1 (en) Video coding and adaptation by semantics-driven resolution control for transport and storage
CN108810545B (en) Method, apparatus, computer readable medium and electronic device for video encoding
US20220191574A1 (en) Method for uploading video and client
CN108737825A (en) Method for coding video data, device, computer equipment and storage medium
US11431993B2 (en) Method and apparatus for processing encoded data, computer device, and storage medium
CN103974060A (en) Method and device for adjusting video quality
US10812832B2 (en) Efficient still image coding with video compression techniques
CN117176955A (en) Video encoding method, video decoding method, computer device, and storage medium
CN108141599A (en) Retain texture/noise consistency in Video Codec
CN110740316A (en) Data coding method and device
WO2023071469A1 (en) Video processing method, electronic device and storage medium
US20240291995A1 (en) Video processing method and related apparatus
CN117478881B (en) Video information processing method, system, device and storage medium
CN121309819A (en) Video encoding methods, apparatus, devices, and readable storage media
CN118158414A (en) Video encoding method, video encoding device, electronic equipment and computer storage medium
CN117255177A (en) Client self-adaptive video playing and requesting method, device, system and medium
US20250168351A1 (en) Method and computing device for adaptively encoding video for low latency streaming
EP4492780A1 (en) Video decoder guided region aware film grain synthesis
US20250024048A1 (en) Video decoder guided region aware film grain synthesis
US20240397066A1 (en) Rate-control using machine vision performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication