HK1189741A

HK1189741A - Video coding method and system

Info

Publication number: HK1189741A
Application number: HK14102813.5A
Authority: HK
Inventors: 张雷
Original assignee: 美国博通公司
Priority date: 2012-07-10
Filing date: 2014-03-20
Publication date: 2014-06-13

Abstract

The present invention is directed to a video coding method and system. In one embodiment, a method comprising receiving at a single encoding engine an input video stream according to a first version of a video characteristics, such as frame rate, profile and level, and coding standard, and generating by the single encoding engine, in parallel, a plurality of streams comprising a first encoded stream according to a first version of the video characteristic and a second encoded stream according to a second version of the video characteristic, the second encoded stream generated based on video coding information used to generate the first encoded stream.

Description

Video coding method and system

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. patent application No. 13/545,261, filed on 7/10/2012, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates generally to video encoding/transcoding.

Background

Advances in video technology have resulted in a variety of mechanisms by which consumers can receive and enjoy video (and audio) presentations. For example, the signal may be received over satellite or cable at an electronic device at a home or business and distributed as a high bit rate, High Definition (HD) stream for viewing in one room over a multimedia over coax alliance (MoCA) network, or distributed as a low bit rate stream for viewing over wireless on a portable device, or distributed as streaming content to another client device for off-site viewing over the internet. As technology improves, various methods of achieving these functions continue to evolve.

Disclosure of Invention

The present disclosure provides a method comprising: receiving, at a single encoding engine, an input video stream according to a first version of a video characteristic; and generating, by the single encoding engine, a plurality of streams in parallel, the plurality of streams including a first encoded stream according to the video characteristics of the first version and a second encoded stream according to the video characteristics of the second version, the second encoded stream being generated based on the video encoding information used to generate the first encoded stream.

Preferably, the video characteristics comprise a frame rate, and wherein the frame rate values of the first and second versions are different.

Preferably, the video coding information includes a motion vector search result for inter prediction.

Preferably, the motion vector search result includes a motion vector, a partition of one coding unit, a motion vector search range, or any combination thereof.

Preferably, the second version comprises a lower frame rate than the first version, wherein the generating comprises: generating a second encoded stream on the lower frame rate video by sharing motion vector search results applied when generating the first encoded stream, wherein the generation of the second encoded stream occurs without performing additional motion search operations.

Preferably, the second version comprises a lower frame rate than the first version, wherein the generating comprises: a second encoded stream is generated on the lower frame rate video by deriving motion vectors from the motion vectors corresponding to the first encoded stream.

Preferably, the derivation is based on a repeating or non-repeating pattern of pictures corresponding to the first encoded stream.

Preferably, the method further comprises selecting the intra periods of the image repeat pattern so as to align intra images of streams of different frame rates.

Preferably, the method further comprises replacing a reference picture corresponding to the first coded stream and not available for the second coded stream with another reference picture.

Preferably, the deriving further comprises scaling the motion vector corresponding to the first coded stream based on a temporal distance difference between the current picture, the replacing reference picture and the unavailable reference picture.

Preferably, the method further comprises selecting an intra period of the repeating pattern of images as a multiple of the time reduction factor.

Preferably, the first encoded stream comprises a combination of base layer and enhancement layer video streams, while the second encoded stream comprises only base layer or enhancement layer video streams, and the motion search results correspond to generation of the first encoded stream and are used to generate motion vectors for the base layer video stream.

Preferably, the video coding information includes mode decisions for inter and intra prediction.

Preferably, the partitions corresponding to inter-prediction of the second encoded stream are the same as the partitions corresponding to inter-prediction of the first encoded stream.

Preferably, when the same intra picture is shared between two streams, the intra prediction mode of each coding unit corresponding to the first encoded stream is decided for the second encoded stream.

Preferably, the video characteristics include a profile and a rating, and wherein the first version and the second version differ in profile, rating, or a combination of both.

Preferably, the video coding information includes a motion vector search result for inter prediction, the motion vector search result including a motion vector, a partition of one coding unit, a motion vector search range, or any combination thereof.

Preferably, the generation of the second coded stream is based on a first motion vector search range for providing the first coded stream, or a second motion vector search range being a subset of the first motion vector search range, the first coded stream and the second coded stream being provided based on a temporally common search operation.

Preferably, the video coding information comprises a selection between inter or intra prediction or intra mode decision of intra prediction for a given coding unit.

The present disclosure also provides a system comprising: a single encoding engine configured in hardware that receives an input video stream according to a first version of video characteristics and generates in parallel a plurality of encoded streams including different versions of video characteristics, the plurality of encoded streams being generated based on video encoding information used to generate one of the plurality of encoded streams.

Drawings

Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the figures, like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram of an example environment in which embodiments of a video encoding system may be employed.

Fig. 2A-2B are schematic diagrams illustrating the generation of video streams of different frame rates from an input video stream and the alternative selection of reference images that are not available in a reduced frame rate stream.

FIG. 3 is a block diagram illustrating one embodiment of an example motion search range for different video coding standards or different profiles of the same coding standard in one embodiment of an encoding engine.

Fig. 4A-4B are block diagrams illustrating particular embodiments of an example encoding engine.

Fig. 5 is a flow diagram illustrating one embodiment of an example video encoding method.

Detailed Description

Certain embodiments of video encoding systems and methods are disclosed herein that include a single encoding engine that shares video encoding information among multiple real-time parallel encoding operations, thereby providing multiple encoded streams. The video coding information includes motion vector search results (e.g., motion vectors, partitions of a coding unit or a macroblock, motion vector search ranges, etc.), and in some embodiments includes mode decisions such as inter or intra prediction modes for the coding unit or macroblock, and also includes an intra prediction direction if intra prediction is selected for the coding unit or macroblock. It should be noted that those of ordinary skill in the art will understand that a coding unit refers to the basic coding unit in the emerging HEVC video compression standard, while a macroblock refers to the basic coding unit in the MPEG-2, AVC, VC-1, and VP8 video compression standards. Macroblock and coding unit are two terms that may be used interchangeably herein. References herein to encoding include encoding (e.g., based on the receipt of a non-compressed stream) and transcoding (e.g., based on the receipt of a compressed stream and compression with or without decompression).

In one embodiment of a video coding system corresponding to different frame rates (e.g., frame rates also referred to herein as image rates, as one example video characteristic), when coding lower frame rate video from the same input, a single coding engine is used to generate one or more compressed streams of the original high frame rate input video, and one or more lower frame rate videos lower than the original input video, in real-time, by sharing motion vector search results for inter-prediction and/or sharing mode decisions for both inter-and intra-prediction.

In one embodiment of a video coding system corresponding to multiple profiles and levels (e.g., profiles and levels are another example video characteristic), when coding video of different profiles and levels from the same input, a single coding engine is used to generate compressed streams of different profiles and levels in real-time by sharing motion vector search results for inter-prediction and/or sharing intra mode decisions for intra-prediction and/or selecting between inter or intra predictions for coding units or macroblocks.

In conventional systems, multiple instances of the same encoding engine may be used to support real-time parallel encoding of raw input video and lower frame rate or different profile and level versions of input video, which may increase silicon cost, or to support encoding of raw input video and lower frame rate or different profile and level versions through the same encoding engine at a speed that is a multiple of the real-time video rate, which increases circuit clock rate and power consumption. Motion vector search is one of the functions that consumes more processing resources and DRAM bandwidth cost, whether it be implemented in hardware or software. This is also one of the functions that can significantly affect the coding quality if the search range is not sufficient. As performed by particular embodiments of the video encoding system, sharing of motion search results between the same video of different sizes, different frame rates, or different profiles may save silicon and DRAM costs. Furthermore, the generation of multiple different encoded streams at different frame rates and/or profiles and/or levels is achieved in certain embodiments of the video encoding system without increasing the speed beyond the real-time video frame rate of the input video.

Having summarized features of particular embodiments of a video encoding system, reference will now be made in detail to the description of the disclosure, as illustrated in the accompanying drawings. While the present disclosure will be described in conjunction with these drawings, it is not intended to limit it to the embodiment or embodiments disclosed herein. Moreover, while the description identifies or describes details of one or more embodiments, such details are not necessarily a part of each embodiment, nor are all of the advantages described in connection with a single embodiment or all embodiments. On the contrary, the intent is to cover all alternatives, modifications and equivalents as defined by the appended claims, including within the spirit and scope of the disclosure. Furthermore, it should be understood that in the specification of the present disclosure, the claims are not necessarily limited to the specific embodiments described in the specification.

Referring to fig. 1, a block diagram of an example environment is shown in which embodiments of a video encoding system may be employed. It will be appreciated by those of ordinary skill in the art that other systems that may be used for encoding are contemplated in the context of the present disclosure, and thus fig. 1 is for illustrative purposes only, where other variations are also contemplated within the scope of the present disclosure. The environment shown in fig. 1 includes a home entertainment system 100 that includes an electronic device 102, an encoding engine 104 embedded in the electronic device 102, and a plurality of multimedia devices including a smartphone 106, a notebook computer 108, and a television 110. In one embodiment, the electronic device 102 is configured as a home media gateway set top box, where the same input video from cable or satellite (or terrestrial) is simultaneously encoded in real-time by a single encoding engine 104 to different bit rates, such as a high bit rate High Definition (HD) stream for viewing by MoCA on a television 110 in a bedroom, and a low bit rate stream for wireless use with portable devices (e.g., smart phones 106, mobile phones, PDAs, etc.) and/or for streaming to another client (e.g., notebook computer 108) for viewing by an internet transition location. In some embodiments, the electronic apparatus 102 may be implemented as a server device, a router, a computer, a television among other electronic devices.

The low bit rate stream may be a low value video characteristic compared to the original input video that provided the input to the electronic device 102 (e.g., a lower frame rate, such as a stream at half the frame rate of the original input, or a different profile and level, such as level 2.1 and level 3.1, which are lower in the AVC/h.264 video coding standard). The following description starts with an embodiment involving video features for different frame rate applications, followed by a specific embodiment addressing other video features. Multiple frame rate streams of the same video content are particularly useful in heterogeneous video consumption environments. For example, using multiple screens, a live sports game of ultra-high quality video such as 1080p60 can be viewed in the living room in the large screen 110, while at the same time the same game of 1080p30 can be viewed in a portable device (e.g., smartphone 106, iPAD, etc.) in the kitchen or backyard through WiFi using a home wireless router, or when a user may have to drive away in the middle of the game while their home may still be viewing the game at home, they can be viewed at 1080p30 on the vehicle display screen over a 3G/4G wireless IP network. Seamless consumption of the same video content on multiple screens at different locations at the same time may require the real-time encoding engine 104 to generate multiple frame rate videos with the same input video at the same time.

The real-time multi-rate video coding engine 104 also has one or more applications in wireless video display, such as over WiFi videos, or over WiGig videos, where the available bandwidth of the video transmission rate can change very quickly because a moving object may block the transmission path between the transmitter and receiver.

Particular embodiments of the video encoding system may provide benefits for video quality of service if a transmitter, which typically includes, for example, the video encoding engine 104, generates high and low bit rate streams. A low bit rate stream may be the same video at a lower frame rate than the original input stream and therefore satisfies a lower transmission rate when the available bandwidth decreases.

Multiple frame rate video coding engines 104 in real-time (e.g., which generate multiple compressed video streams with the same input video in real-time) may be attractive because display devices with video frame rate converters appear to be popular features where low frame rate streams can be converted to high frame rates when displayed on a screen.

While the above emphasis has been placed on applications and/or features/benefits related to using different frame rates, the ability to present different profiles and hierarchical streams also has numerous beneficial applications that can be consumer-compliant and are contemplated herein. For example, a set-top box application that streams the same video content to multiple screens simultaneously forms an important feature. Different screens may require the set-top box to transmit the same video content not only at different bit rates, frame rates, and/or image sizes, but also at different profiles and levels. That is, different video display devices may have different decoding capabilities, so that each device may support a selected profile and level, even for the same encoding standard. For example, a portable device such as an iPad may support AVC profile/level 4.2, while a mobile phone may only support AVC profile/level 3.2. The real-time multi-rate video coding engine 104 may need to generate multiple compressed video streams in real-time with different profiles and levels of the same input video at the same time.

In the following description, various embodiments related to a video encoding method providing a plurality of encoded streams different in exemplary video characteristics (e.g., frame rate and profile and level) are disclosed in conjunction with the descriptions in fig. 2A through 2B (frame rate) and fig. 3 (motion search range for different profiles and levels or encoding standards). It should be understood that within the specification of the present disclosure, a particular embodiment of a video coding system may employ one or any combination of these methods.

Reference will now be made to fig. 2A-2B, which illustrate some example video encoding methods involving multiple encoded streams providing different frame rates. In one embodiment of the video encoding method, when a low frame rate video is encoded by sharing motion search results of a stream encoded by an original (e.g., as received in a given electronic device) input video frame rate, a motion search is performed in the video at the original input frame rate. For example, the frame rate of the original input video may be 1920 × 1080 at 60 frames per second (e.g., 1080p 60), while the lower frame rate video may be 1920 × 1080 at 30 frames per second (e.g., 1080p 30). In this example, motion search is performed when encoding 1080p60 video, while the motion vectors of encoding 1080p30 video are derived from those of 1080p60 video, without doing its own motion search. How the motion vectors for lower frame rate video are derived depends on the GOP (group of pictures) structure. In one embodiment, by selecting the intra period of a GOP to be an even number of pictures, the video coding method facilitates motion vector derivation, which allows the same pictures in a video sequence to be selected as intra pictures (intra coded or I pictures) in 1080p60 and 1080p30 streams, and which further allows the intra picture distance in 1080p30 to be half of 1080p 60.

As an example, attention is directed to diagram 200A in fig. 2A, which shows the generation of an uncompressed input video stream 201 provided to the input of the encoding engine 104, and two video streams 202 and 204 (each showing a sequence of GOPs) exhibiting image rate differences. It should be noted that for the sake of brevity, the terms "image" and "frame" are used interchangeably in the disclosure. The same idea applies to video in interlaced (interlaced) format, where "field" is the basic unit of a picture. For purposes of illustration, it is assumed that the generated video stream 202 corresponds to a GOP of 1080p60 video, while the other generated video stream 204 corresponds to a GOP of 1080p30 video. It should be noted that additional video streams may be generated in the encoding engine 104, as indicated by the horizontal dashed line along with a portion of the image shown below the video stream 204. As shown in this example, the selected GOP in the 1080P60 video 202 is "I0P 1P2P3P4P5P6I 1" (in display order). A GOP in 1080P30 video 204 is selected by the video coding system as "I0P 1P3P5I 1" (again in display order). If a selected P-image in a 1080P30 video 204 uses a reference that is not one of the images in a 1080P30 video sequence (e.g., not present in a 1080P30 video 204), for example, where P1 uses P0 as a reference in a 1080P60 sequence and P0 is not one of the images in a 1080P30 sequence (as represented by the dashed reference arrow from P1 to the non-existent P0), then P1 uses its previous reference image (e.g., an I0 image, as represented by the solid reference arrow from P1 to I0) as a reference in a 1080P30 video 204. The motion vectors of P1 to P0 in the 1080P60 video 202 are scaled to the P1 to I0 image references in the 1080P30 video 204 by the ratio of the temporal distance 208 between P1 and I0 to the temporal distance 206 between P1 and P0. In this example, if the temporal distance 206 between P1 and P0 is one unit, the temporal distance 208 between P1 and I0 is two units, and the motion vector is scaled by a factor of 2. If the selected P image in the 1080P60 video 202 uses a reference that is still one of the 1080P30 video 204, e.g., where P1 uses I0 as a reference and I0 is still one of the images in the 1080P30 video 204, the same motion vectors of P1 through I0 can be used in the 1080P30 video sequence without scaling.

In yet another example, in the context of 1080P60 and 1080P30 videos shown in diagram 200B of fig. 2B, assume that the selected GOP in the non-compressed input video 209 and 1080P60 video 210 is "I0B 1P0B2B3P1B4B5P2B6B7I 1" and the selected GOP in the 1080P30 video 212 is "I0B 1B2P1B5B6I 1". As illustrated by the horizontal dashed line with a portion of the image located below the video stream 212, in some implementations, additional streams may be generated. If a selected B-picture in a 1080P30 video 212 uses a reference that is not one of the pictures in the stream 212, e.g., where B1 uses I0 (a reference as shown by the solid arrow) and P0 as two references for bi-prediction, and P0 is no longer in the 1080P30 sequence 212 (as indicated by the dashed reference arrows), the selected B-picture uses its nearest neighboring I or P-picture as its reference for B1 inter-prediction (e.g., I0 and P1, each indicated by the solid reference arrows). Since P1 is not a reference for B1 in 1080P60 video 210, the motion vectors for B1 to P1 can be derived by scaling the motion vectors for B1 to P0 by the ratio of the temporal distance 216 between B1 and P1 to the temporal distance 214 between B1 and P0. In this example, if the temporal distance 214 between B1 and P0 is one unit, the temporal distance 216 between B1 and P1 is four units, and the motion vector between B1 and P1 is derived by scaling those between B1 and P0 by a factor of 4.

After motion vectors are found in the 1080p30 video 212, motion compensation and other processing functions such as transform, quantization, inverse transform, reconstruction, loop filtering, and entropy coding (entropy coding) may be performed for the 1080p30 video 212, which is independent of the 1080p60 video 210 to prevent any drift. It should be noted that, in the above example, the reconstructed image P1 used in 1080P30 inter prediction is different from the image used in 1080P60 inter prediction.

The above-described example video encoding methods may be applied in some embodiments to encode video at frame rates of different temporal reduction factors. When the temporal reduction factor is not even, the intra period of the selected GOP may be a multiple of the temporal reduction factor. In some embodiments, the intra period of the selected GOP may not be a multiple of the temporal reduction factor and the intra images of different frame rates are not aligned. In this latter case, the video at the lower frame rate may have its own intra mode decision block and select its own intra picture instead of the inter predicted picture in the original video GOP. The intra mode decision block typically does not access the DRAM and consumes negligible silicon area or power. In some implementations, the scaled motion vectors in the lower frame rate video may also be refined by performing a refinement search within a small search range.

With respect to partitions for inter-prediction purposes, in one embodiment, inter-prediction partitions of lower frame rates may maintain those used in higher frame rates. In some embodiments, when the low frame rate and high frame rate videos share the same intra image, the intra prediction mode decision for each coding unit or macroblock may also be shared between the low frame rate and high frame rate videos. For example, in AVC mode, P or B pictures may also have macroblocks coded by intra mode. Inter or intra decisions for each macroblock can also be shared with low and high frame rate video.

In some embodiments, the motion search of the sharing scheme may be extended to a real-time scalable video encoder, where different temporal layers may be encoded in real-time by the same encoder (e.g., encoding engine 104). The motion search results of the enhancement layer may be used to generate motion vectors for the base layer, which is a lower frame rate.

In some embodiments, the motion search of the sharing scheme may be applied to a real-time 3D video encoder, where multiple views may be encoded in real-time by the same encoder. The motion search result of one view may be shared by neighboring views in a multi-view coding method of a 3D video.

Having described a particular implementation of a video encoding method with respect to image rate video characteristics, attention is now directed to fig. 3, which is associated with a discussion of a particular video encoding method with respect to different profiles and levels and/or encoding standards. While different profiles and levels may have different maximum allowable frame rates, maximum allowable image sizes, and/or maximum allowable motion search ranges, limiting the execution cost of different applications while achieving operational compatibility between encoders and decoders of different manufacturers, there are some common toolsets that apply to all profiles and levels. For example, the intra prediction mode is the same for all AVC profiles and levels. Hardware and/or software implementations of intra prediction mode decision may be shared when generating AVC streams of different profiles and levels.

In another example, the vertical motion vector range is different for different AVC profiles and levels, such as [ -256, +255.75] for AVC levels 2.1 through 3, and [ -512, +511.75] for AVC levels 3.1 and above, as shown by the corresponding motion vector ranges 304 and 302 in diagram 300 of FIG. 3. In one embodiment, the video encoding method may use a common set of vertical motion vector ranges for all target AVC profiles and levels generated by the encoder, such as motion vector range 304 (e.g., [ -256, +255.75 ]) when generating AVC level 3 and 4.2 streams. In some implementations, when the motion search covers a larger range of higher profile streams, the video encoding method may also apply a subset of the motion search results to lower profile streams. In this example, the motion search finds the best motion vector for use in a motion vector range 302 of AVC level 3.1 and above (e.g., [ -512, +511.75 ]), and finds the best mode vector for a smaller range 304 of AVC levels 2.1 to 3 during the same search operation. The motion search may be performed in the high or low profile/level encoding path of a first method (e.g., a common set), while it may be performed in the high profile/level encoding path of a second method (e.g., a subset).

In yet another AVC example, the minimum luma bi-prediction size is limited to 8 × 8 at level 3.1 and above, without limitation on the levels below. When generating a stream at level 3.1 or above and at level 3 and below (e.g., a common set), the motion search may limit the minimum luma bi-prediction size to 8 x 8.

In the example of an AVC encoder, P or B pictures may also have macroblocks that can be encoded by intra mode. In some embodiments of the video coding method, inter or intra decisions for each macroblock may also be shared by low and high profiles/levels. For intra prediction, all AVC profiles and classes may share the same intra mode decision with no profile or class specific restrictions.

Because the target maximum bit rate may be different for different profiles and levels, other encoding parameters or tools may not be shared through the encoding paths of the different profiles and levels. For example, the quantization parameters may be different and the resulting reconstructed images may be different. In general, other functions may not be shared.

In some embodiments, the above-described video coding methods involving profiles and levels and motion search and mode decision sharing schemes may be applied to the coding of multiple streams of different profiles and levels of any video coding standard (including the emerging HEVC or h.265, etc.), where multiple profiles and levels may be defined for different applications. In some embodiments, the above described video encoding methods regarding profiles and levels, and motion search and mode decision sharing schemes may also be extended to the encoding of multiple streams of different video standards where there is a common set of motion search parameters, such as motion vector ranges, that may be shared by both video encoding standards. In this case, the same implementation block may be used to search for a common coarse motion vector before refinement for different coding standards according to different partition constraints of the standards.

Attention is now directed to fig. 4A-4B, which illustrate an example video encoding system implemented as a single encoding engine 104. In one embodiment, the single encoding engine 104 may be implemented in hardware, although some embodiments may include software (including firmware) or a combination of software and hardware. For example, some embodiments may include a processor (e.g., a CPU) that provides instructions and/or data to one or more of the logic units shown in fig. 4A and 4B. The example single encoding engine 104 includes a first processing unit 402 and a second processing unit 404. It should be understood that, in the context of this disclosure, while two processing units 402 and 404 are shown, this number is merely illustrative and particular embodiments may include additional processing units. The multiple processing units 402 and 404 generate respective bitstreams (e.g., "bitstream 1 and bitstream 2") corresponding to different frame rates, image sizes, and/or profiles and levels. For purposes of illustration, a single encoding engine 104 is shown to generate two bitstreams. However, some embodiments of a single encoding engine 104 may be extended to generate any number of bitstreams. The number of bitstreams may depend, for example, on the applications executed on the electronics housing of a single encoding engine 104.

Video is received at a video input 406 (e.g., an interface). For example, the video received at the interface 406 input may include the input video 201 shown in fig. 2A (or the input video 209 shown in fig. 2B). The interface 406 performs a copy function according to well-known methods, wherein the input video is split into multiple video streams (two (2) in this example), which mirror the frame rate (and/or profile/level) of the input video 201. Multiple streams (e.g., input frame rate video 202 as shown in FIG. 2A or input frame rate video 210 as shown in FIG. 2B) are output from interface 406 and provided to respective processing units 402 and 404. At the first processing unit 402, the input frame rate video 202 (210) is provided to encoding decision logic, which includes encoding decisions such as intra mode decision logic 408, where a prediction direction determination for a macroblock in intra prediction mode is made, and inter/intra decision logic 410, which processes the given macroblock. It also shows motion estimation logic 412 (e.g., a motion search function) that includes partitions of macroblocks and their motion vectors. The single encoding engine 104 further includes additional processing logic 438, which (referring to fig. 4B) may include motion compensation logic 414, 424 for inter prediction, where partitions for retrieval and their associated motion vectors may be identified by motion estimation (search) logic 412.

As shown in fig. 4A, another of the plurality of input frame rate videos output through the interface 406 is provided to the second processing unit 404, and specifically a time scalar (scaling) logic 436. Time scalar logic 436 performs frame rate conversion and outputs reduced frame rate video, such as reduced frame rate video 204 (fig. 2A) or 212 (fig. 2B). In some embodiments, time scalar logic 436 may be omitted, as in some embodiments involving multiple configuration files and/or level generation. In some embodiments, spatial scaling may also be employed. The reduced frame rate video is provided to additional processing logic 440, which is described below in conjunction with fig. 4B. The video coding information includes motion vectors, motion vector search areas, mode decisions, etc., and as explained above, is shared between the first and second processing units 402 and 404. In one embodiment, the motion vector, motion vector search area, and/or mode decision determined for the first processing unit 402 is provided to derivation logic 434.

In embodiments where multiple image rates are provided by the encoding engine 104, the derivation logic derives motion vectors based on those used in the first processing unit 402 and based on receiving (e.g., from the time scalar logic 436, or in some embodiments from other logic such as the interface 406, or from the CPU) GOPs and appropriate reference images and their temporal distance from the current image (e.g., where reference images present in the stream processed in the first processing unit 402 are not present, or not in the second processing unit 404, and the temporal distance in the original video processed by 402 and the temporal distance in the lower frame rate video processed by 404). In some embodiments provided corresponding to different picture rate streams, the intra prediction mode decision provided by the intra mode decision logic 408 is also shared between the high picture rate and low picture rate video streams (e.g., between the first and second processing units 402 and 404, respectively) when sharing the same intra picture. Derivation logic 434 and time scalar logic 436 share information directly or indirectly (e.g., through CPU intervention), as represented by the dashed line between 436 and 434. For example, time scalar logic 436 may pass information corresponding to frame rate and/or image type to derivation logic 434 (with or without processor intervention). As described above, time scalar logic 436 performs time scaling to provide reduced frame rate video (e.g., 204 or 212).

In an embodiment with sharing of motion vector search and mode decision, such information is provided to the derivation logic 434 for encoding of the reduced frame rate video stream 204 or 212.

In embodiments involving multiple profiles and levels, when encoding video from the same input at different profiles and levels, the first and second processing units 402 and 404 generate one or more compressed streams at different profile configurations and levels (with or without reduced frame rates) in real time by sharing motion vector search results for inter prediction, and/or sharing intra mode decisions for intra prediction, and/or selecting between inter or intra predictions for macroblocks or coding units. Derivation logic 434 can determine whether to apply a common set of motion vector ranges to all target AVC profiles and profiles or to apply a subset of the motion search results to the lower profile streams when the motion search covers a larger range of higher profile streams. Such video coding information is used to encode the video stream 204 or 212.

While various algorithms and/or methods have been described as being performed, at least in part, in derivation logic 434 in conjunction with time scalar logic 436, it should be understood that in some implementations one or more of the functions described above may be performed by other logic, or distributed among a plurality of different logic.

In the encoding process, a current frame or picture in a group of pictures (GOP) is provided for encoding. A current picture may be processed as a macroblock or coding unit in the emerging video coding standard HEVC, where a macroblock or coding unit corresponds to, for example, a 16 × 16 or 32 × 32 block of pixels in the original picture. Each macroblock may be encoded in an intra-coding mode or an inter-coding mode of a P picture or a B picture. In inter-coding mode, motion compensated prediction may be performed by additional processing logic 438 and 440, such as respective motion compensation logic 414 and 424 (fig. 4B) in each processing unit 402 and 404, respectively, and may be based on at least one previously encoded reconstructed image.

Referring to fig. 4B, and further illustrating additional processing logic within 438 and 440 for each processing unit 402, 404, the predicted macroblock P may be subtracted from the current macroblock, thereby generating a difference macroblock by the logic 416, 426 of each bitstream, and the difference macroblock may be transformed and quantized by the respective transformer/quantizer logic 418, 428 of each bitstream. The output of each transformer/quantizer logic 418, 428 may be entropy encoded by a respective entropy encoder logic 420, 430 and output as a compressed bitstream corresponding to a different bit rate.

The encoded video bitstreams (e.g., "bitstream 1 and bitstream 2") include entropy encoded video content and any side information needed to decode the macroblocks. During the reconstruction operation for each bitstream, the results of the respective transformer/quantizer logic 418, 428 may be dequantized, inverse transformed, added to the prediction and loop filtering by the respective inverse quantizer/transformer/reconstruction logic 418, 428 to generate reconstructed difference macroblocks for each bitstream.

In this regard, each bitstream is associated with a respective processing unit 402, 404 comprising residual calculation logic 416, 426, each of which is configured to generate a residual, and subsequently a quantized transform factor. It should be noted, however, that different quantization parameters may be applied. Each processing unit 402, 404 further includes reconstruction logic 422, 432 coupled to the inverse quantizer/transformer logic 418, 428, wherein each reconstruction logic 422, 432 is configured to generate a respective reconstructed pixel. As illustrated, the reconstruction logic 422, 432 performs reconstruction of decoded pixels at different frame rates and profiles and levels depending on the respective quantization parameters applied. It should be noted that one or more functions related to the various logic described in association with fig. 4A and 4B may be combined into a single logic unit, or further distributed between additional logic units, or omitted in some embodiments.

It should be noted that the various embodiments disclosed are applicable to various video standards including, but not limited to, MPEG-2, VC-1, VP8, and HEVC, which provide more coding tools that can be shared. For example, using HEVC, inter prediction unit sizes may range anywhere from block sizes 4 × 4 to 32 × 32, which requires a large amount of data to perform motion search and mode decision.

It should be appreciated that in the context of the present disclosure, one embodiment of a video encoding method 500, shown in fig. 5 and implemented in one embodiment by a single encoding engine (e.g., 104), includes receiving an input video stream according to a first version of video characteristics at the single encoding engine (502). For example, the input video stream may be an uncompressed stream (e.g., input video stream 201 or 209 of fig. 2A and 2B), while the video characteristics may be a frame rate (image rate), profile/level, and/or encoding standard. The method 500 further includes generating, by a single encoding engine, a plurality of streams in parallel based on the input video stream (504). For example, the generated first and second streams include 202 of video stream fig. 2A (or 210 of fig. 2B) and 204 of fig. 2A (or 212 of fig. 2B). The method 500 further includes selecting a GOP and/or picture type (506). For example, the encoding engine 104 (e.g., the interface 406, the time scalar logic 436, or the CPU in some embodiments in fig. 4A-4B) may determine the GOP and/or picture type.

The method 500 further includes motion vector searching in a first generated stream, such as the video stream 202, 210 (508). For example, a first processing unit (e.g., motion estimation logic 412 of fig. 4A-4B) may perform such a search in image blocks of a first generated stream. The method 500 further includes determining reference pictures for a second generated stream, such as the video stream 204 or 212 (510). In one embodiment, this latter function may be performed by time scalar logic 436. Based on these determinations (506, 508, and 510), method 500 scales (512) the motion vectors from the first stream to a second generated stream according to the frame rate difference. The method 500 further includes generating (in parallel) a plurality of encoded streams (including the reduced frame rate stream) based on the video encoding information (512). For example, the plurality of streams may include a first encoded stream including video characteristics according to a first version and a second encoded stream including video characteristics according to a second version (e.g., a lower frame rate, a different profile and level, and/or standard). The second encoded stream is generated based on video encoding information used to generate the first encoded stream, wherein the video encoding information includes motion vector search results and/or mode information. In some embodiments, repeating or non-repeating picture patterns (e.g., open GOPs) may be used. Thus, the second version may comprise a lower frame rate than the first version, and the generating comprises generating the second encoded stream at a lower frame rate by deriving motion vectors from motion vectors used in generating the first encoded stream.

With respect to profile and level differences, method 500 bounds (bound) motion vectors from a first stream according to a profile/criteria of a second generated stream (516), and method 500 further includes generating the second encoded stream based on a first motion vector search range for the first encoded stream or a second motion vector search range that is a subset of the first motion vector search range, the first and second encoded streams provided based on a temporally common search operation (518). It should be understood in the context of this disclosure that in some embodiments, one or more of the above-described logic functions may be omitted, or additional logic functions may be included. For example, sharing of mode information is also contemplated within the scope of certain embodiments of method 500.

The video encoding system may be implemented in hardware, software (e.g., including firmware), or a combination thereof. In one embodiment, the video encoding system is performed using any one or a combination of the following techniques, all of which are well known in the art: discrete logic circuitry with logic gates implementing logical functions on the data signals, an Application Specific Integrated Circuit (ASIC) with appropriate combinational logic gates, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), and so forth. In an embodiment, where all or part of the video encoding system is implemented in software, the software is stored in a memory and it can be executed by a suitable instruction execution system (e.g., a computer system including one or more processors, memory and operating system encoded with encoding software/firmware, etc.).

It should be appreciated by those of skill in the art that any flow illustrations or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

1. A method, comprising:

receiving, at a single encoding engine, an input video stream according to a first version of a video characteristic; and

generating, by the single encoding engine, a plurality of streams in parallel, the plurality of streams including a first encoded stream according to the video characteristics of the first version and a second encoded stream according to the video characteristics of the second version, the second encoded stream being generated based on video encoding information used to generate the first encoded stream.

2. The method of claim 1, wherein the video characteristic comprises a frame rate, and wherein the first version and the second version differ in frame rate value.

3. The method of claim 2, wherein the video coding information comprises motion vector search results for inter prediction.

4. The method of claim 3, wherein the motion vector search result comprises a motion vector, a partition of one coding unit, a motion vector search range, or any combination thereof.

5. The method of claim 3, wherein the second version comprises a lower frame rate than the first version, wherein generating comprises: generating the second encoded stream on a lower frame rate video by sharing the motion vector search results applied when generating the first encoded stream, wherein the generation of the second encoded stream occurs without performing an additional motion search operation.

6. The method of claim 3, wherein the second version comprises a lower frame rate than the first version, wherein generating comprises: generating the second encoded stream on lower frame rate video by deriving motion vectors from motion vectors corresponding to the first encoded stream.

7. The method of claim 3, wherein the first encoded stream comprises a combination of base layer and enhancement layer video streams and the second encoded stream comprises only a base layer or enhancement layer video stream, the motion search result corresponding to generation of the first encoded stream and used to generate motion vectors for the base layer video stream.

8. The method of claim 2, wherein the video coding information comprises mode decisions for inter and intra prediction.

9. The method of claim 1, wherein the video characteristics include a profile and a rating, and wherein the first version and the second version differ in profile, rating, or a combination of both.

10. A system, comprising:

a single encoding engine configured in hardware that receives an input video stream according to a first version of a video characteristic and generates in parallel a plurality of encoded streams including different versions of the video characteristic, the plurality of encoded streams generated based on video encoding information used to generate one of the plurality of encoded streams.