HK1189740A

HK1189740A - Video coding method and system

Info

Publication number: HK1189740A
Application number: HK14102812.6A
Authority: HK
Inventors: 张雷
Original assignee: 美国博通公司
Priority date: 2012-07-10
Filing date: 2014-03-20
Publication date: 2014-06-13

Abstract

The disclosure is directed to a video coding method and system. In one embodiment, a method comprising receiving at a single encoding engine an input video stream having one or more pictures of a first size; and generating by the single encoding engine, in parallel, plural encoded streams, a first of the encoded streams comprising one or more pictures of the first size and a second of the encoded streams comprising one or more pictures of a second size that is smaller than the first size, the encoding of the second stream based on sharing video coding information used in encoding the first encoded stream.

Description

Video coding method and system

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. patent application No. 13/545,242, filed on 7/10/2012, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates generally to video encoding/transcoding.

Background

Advances in video technology have resulted in a variety of mechanisms by which consumers can receive and enjoy video (and audio) presentations. For example, the signal may be received over satellite or cable at an electronic device at a home or business and distributed as a high bit rate, High Definition (HD) stream for viewing in one room over a multimedia over coax alliance (MoCA) network, or distributed as a low bit rate stream for viewing over wireless on a portable device, or distributed as streaming content to another client device for off-site viewing over the internet. As technology improves, various methods of achieving these functions continue to evolve.

Disclosure of Invention

The present disclosure provides a method comprising: receiving, at a single encoding engine, an input video stream having one or more images of a first size; and generating, by a single encoding engine, a plurality of encoded streams in parallel, a first encoded stream comprising one or more images of a first size, a second encoded stream comprising one or more images of a second size, the second size being smaller than the first size, the encoding of the second stream being based on sharing video encoding information for use in encoding the first encoded stream.

Preferably, the video coding information includes a motion vector search result for inter prediction.

Preferably, the motion vector search result includes a motion vector, a partition of one coding unit, a motion vector search range, or any combination thereof.

Preferably, a motion vector search is performed for the first encoded stream and the resulting motion vectors and associated coding unit partitions are used to generate the second encoded stream without performing the motion vector search function of the second encoded stream.

Preferably, the vertical and horizontal dimensions of each of the plurality of blocks of the one or more images used in generating the first encoded stream are reduced by a defined factor to derive a plurality of reduced blocks of the second encoded stream.

Preferably, the method further comprises mapping motion vectors of each of the plurality of blocks of the input video stream to a plurality of reduced blocks corresponding to the second encoded stream, the mapped motion vectors corresponding to the second encoded stream being based on the motion vectors used in generating the first encoded stream adjusted by the defined scaling factor.

Preferably, the method further comprises merging the mapped motion vectors in response to the non-compatible block size to form a compatible block size.

Preferably, the method further comprises determining a partition of each macroblock or coding unit containing compatible mapped blocks based on neighboring mapped motion vectors.

Preferably, the method further comprises averaging or applying a median operation to the mapped motion vectors when the boundaries corresponding to each macroblock of the reduced coded stream are not aligned with the boundaries of the macroblocks of the input video stream.

Preferably, the video coding information includes an intra prediction mode for intra prediction.

Preferably, the video coding information comprises a selection between inter-prediction and intra-prediction for the coding unit.

Preferably, the video coding information comprises motion vector search results, intra prediction modes for intra prediction, selection between inter prediction and intra prediction for the coding unit, or any combination thereof.

Preferably, the method further comprises generating one or more additional encoded streams in parallel with generating the first encoded stream and the second encoded stream, generating the shared-based video coding information.

Preferably, the generation occurs in real time.

Preferably, the method further comprises processing some of the encoding functions of the first encoded stream and the second encoded stream independently of each other.

Preferably, the first encoded stream is generated according to a different video coding standard than the second encoded stream.

Preferably, the first encoded stream comprises an enhancement layer stream and the second encoded stream comprises a base layer stream.

Preferably, the first encoded stream corresponds to one three-dimensional view of the scene and the second encoded stream corresponds to a different three-dimensional view of the scene.

The present disclosure provides a system comprising a single encoding engine configured in hardware for receiving an input video stream having one or more first sized images and generating in parallel a plurality of encoded streams, an encoded stream comprising one or more first sized images and a second encoded stream comprising one or more second sized images, the second size being smaller than the first size, the encoding of the second stream being based on sharing video encoding information for use in encoding the first encoded stream.

Drawings

Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the figures, like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram of an example environment in which embodiments of a video encoding system may be employed.

Fig. 2 is a schematic diagram illustrating generation of video streams of different image sizes from an input video stream.

Fig. 3A to 3D are block diagrams illustrating merging of motion vectors.

Fig. 4A-4B are block diagrams illustrating particular embodiments of an example encoding engine.

Fig. 5 is a flow diagram illustrating one embodiment of an example video encoding method.

Detailed Description

Disclosed herein are particular embodiments of video encoding systems and methods, including a single encoding engine that shares video encoding information among multiple real-time parallel encoding operations, thereby providing multiple encoded streams of spatially scaled video of the same source. The video coding information includes motion vector search results (e.g., motion vectors, partitions of one coding unit or one macroblock, motion vector search resolution, etc.), and in particular embodiments, mode decisions such as inter or intra prediction modes for coding units (basic coding units in the emerging HEVC video compression standard) or macroblocks (basic coding units in the MPEG2, AVC, VC-1, VP8 video compression standard), and intra prediction directions if intra prediction modes are selected for the coding units or macroblocks (the two terms coding unit and macroblock are used interchangeably in this application). References to encoding herein include encoding (e.g., based on the receipt of a non-compressed stream) and transcoding (e.g., based on the receipt of a compressed stream and a compression operation with or without decompression).

In one embodiment of a video coding system corresponding to multiple spatially scaled streams of the same source, a single coding engine is used to generate in real-time one or more compressed streams of original input video and one or more reduced versions of the original input video, and when different sizes of video are coded from the same input, coded by sharing motion vector search results for inter-prediction, and/or intra-prediction modes for intra-prediction, and/or selection between inter-or intra-prediction for coding units or macroblocks.

In conventional systems, multiple instances of the same encoding engine may be used to support parallel encoding of the original input video and the scaled version of the input video in real-time, which may increase silicon cost, or encoding of the original input video and the scaled version by the same encoding engine at a speed that is a multiple of the real-time video rate, which increases circuit clock rate and power consumption. Motion vector search is one of the functions that consumes more processing resources and DRAM bandwidth cost, whether it be implemented in hardware or software. This is also one of the functions that can significantly affect the coding quality if the search range is not sufficient. As performed by particular embodiments of the video coding system, sharing of motion search results between the same videos of different sizes may save silicon and DRAM costs.

Having summarized features of particular embodiments of a video encoding system, reference will now be made in detail to the description of the disclosure, as illustrated in the accompanying drawings. While the present disclosure will be described in conjunction with these drawings, it is not intended to limit it to the embodiment or embodiments disclosed herein. Moreover, while the description identifies or describes details of one or more embodiments, such details are not necessarily a part of each embodiment, nor are all of the advantages described in connection with a single embodiment or all embodiments. On the contrary, the intent is to cover all alternatives, modifications and equivalents as defined by the appended claims, including within the spirit and scope of the disclosure. Furthermore, it should be understood that in the specification of the present disclosure, the claims are not necessarily limited to the specific embodiments described in the specification.

Referring to fig. 1, a block diagram of an example environment is shown in which embodiments of a video encoding system may be employed. It will be appreciated by those of ordinary skill in the art that other systems that may be used for encoding are contemplated in the context of the present disclosure, and thus fig. 1 is for illustrative purposes only, where other variations are also contemplated within the scope of the present disclosure. The environment shown in fig. 1 includes a home entertainment system 100 that includes an electronic device 102, an encoding engine 104 embedded in the electronic device 102, and a plurality of multimedia devices including a smartphone 106, a notebook computer 108, and a television 110. In one embodiment, the electronic device 102 is configured as a home media gateway set top box, where the same input video from cable or satellite (or terrestrial) is simultaneously encoded in real-time by a single encoding engine 104 to different bit rates, such as a high bit rate High Definition (HD) stream for viewing by MoCA on a television 110 in a bedroom, and a low bit rate stream for wireless use with portable devices (e.g., smart phones 106, mobile phones, PDAs, etc.) and/or for streaming to another client (e.g., notebook computer 108) for viewing by an internet transition location. In some embodiments, the electronic apparatus 102 may be implemented as a server device, a router, a computer, a television among other electronic devices.

The low bit rate stream may be a lower value video characteristic (e.g., a smaller image size or lower resolution video, such as a stream of half the image width and height of the original input) than the original input video provided to the input of the electronic device 102. Multiple streams of the same video content with different image sizes are particularly useful in heterogeneous video consumption environments. For example, with multiple pictures, a larger image size or higher resolution corresponding to a live sports game, i.e., 1920 x 1080, may be viewed on the large screen 110 in the living room, while the same game of lower image size or resolution, e.g., 960 x 540, may be viewed on a portable device (e.g., smartphone 106, iPAD, etc.) in the kitchen or in the backyard through WiFi using a home wireless router, or a 960 x 540 resolution image may be viewed on a display screen in a vehicle through a 3G/4G wireless IP network while the user needs to leave the car in the game while his or her family may still view the game at home. Seamless consumption of the same video content on multiple display screens at different locations at the same time may require the real-time encoding engine 104 to generate multiple image size video streams with the same input video at the same time.

The real-time multi-rate video coding engine 104 also has one or more applications in wireless video display, such as over WiFi videos, or over Wi Gig videos, where the available bandwidth of the video transmission rate can change very quickly because a moving object may block the transmission path between the transmitter and receiver.

Particular embodiments of the video coding system may provide a benefit to the quality of video services if a transmitter, which typically includes, for example, the video coding engine 104, generates both a high bit rate stream and a low bit rate stream. The low bit rate stream may be the same video that is smaller than the image size of the original input stream, thus satisfying a lower transmission rate when the available bandwidth drops. Further, when spatial scaling is used, the low bit rate stream may be a downscaled (down-scaled) video from the original input stream, thereby achieving a possible high compression quality at a low bit rate when the receiver can scale it up to the original input size.

When spatial scaling is used, the real-time multi-rate video coding engine 104 can take advantage of emerging ultra-high definition video formats, such as the 4K × 2K format, for more applications where a greater variety of video sizes can coexist in a video home network environment.

Reference is now made to fig. 2, which shows an embodiment of a video coding method that shares motion vectors between two streams of different picture sizes. In a particular embodiment of a video coding system involving image size differences based on input video coding, reduced video is coded by sharing motion search results with the original input video by applying operations such as scaling and median or weighted averaging to motion vectors of macroblocks that are collocated in the original input video. For example, assume that the size of the original input video is 1920 × 1080 and the reduced video size is 960 × 540. In this example, motion search is performed in 1920 x 1080 video coding, and the resulting motion vectors and their associated macroblock partitions are used to generate motion vectors and associated macroblock partitions of reduced video without performing a motion search function in the reduced video. Assume in this example that both the horizontal and vertical dimensions are reduced by a factor of two (2). In this example, a 16 × 16 macroblock in 1920 × 1080 video corresponds to an 8 × 8 block in reduced 960 × 540 video, and an 8 × 8 macroblock in 1920 × 1080 video corresponds to a 4 × 4 block in reduced 960 × 540 video.

In fig. 2, an input video 201 and respective GOPs (group of pictures) of a plurality of streams generated (e.g., by the encoding engine 104) are shown, such as 1920 × 1080 video 202 and 960 × 540 video 204. The input video 201 may comprise an uncompressed (non-compressed) video stream. Each picture of the 1920 x 1080 video 202 includes a plurality of macroblocks (e.g., of size 16 x 16), as partially shown by an example 8 x 8 block (one-quarter of a macroblock) 206 with a motion vector mv (x, y) for picture P0 with reference to the corresponding block in picture I0. Note that the image rate of the input video 201 is the same as the image rate of the video 202. Further, although two generated streams are shown, additional streams may be generated in parallel with streams 202 and 204 in some embodiments. Also, each picture of the 960 × 540 video 204 includes a plurality of 4 × 4 sized blocks corresponding to each 8 × 8 block, such as the 4 × 4 blocks 208 with reference to the motion vectors (mvx/scale _ x, mvy/scale _ y) of the corresponding block in the picture I0, where scale _ x and scale _ y are the scaling factors in the horizontal and vertical directions, respectively, used to derive the motion vectors (e.g., two (2) in this example). Each block, such as block 206 in 1920 x 1080 video 202, may find a corresponding reduced block, such as block 208 in 960 x 540 video 204, which may also be referred to herein as a paired block in the original video and the reduced video, where the position (x, y) of the top left corner of the block in 1920 x 1080 video maps to the position (x/2, y/2) of the top left corner of the paired block in 960 x 540 video. Thus, the motion vectors (e.g., mvx, mvy) associated with each partition block in 1920 x 1080 video 202 may be mapped (by mapping reference line 210) to a counterpart block (e.g., block 208) in 960 x 540 video 204 by downscaling the motion vectors by the same factors (e.g., mvx/scale _ x, mvy/scale _ y, where mvx/scale _ x and mvy/scale _ y are both equal to two (2)) used in the above example to downscale the video image size in the horizontal and vertical directions, respectively.

In some embodiments, referring to fig. 3A through 3D (each figure showing an example of one 16 x 16 macroblock in reduced video), where each square block element represents one luma sample as a simple illustration (with the understanding that different macroblock partitions are contemplated within the scope of the present disclosure), when the size of the counterpart block in the reduced video is not a legal or compatible partition size of the encoded video standard, such as the illegal size of a 2 x 2 block (only four 2 x 2 blocks of a macroblock are highlighted) in the reduced video of the AVC encoding standard as shown in diagram 300A in fig. 3A (to avoid obscuring the drawing), a motion vector merge operation may be performed to merge the four 2 x 2 mapped motion vectors (e.g., motion vectors 302A, 302B, 302C, 302D) to form a 4 x 4 block 304A (only one 4 x 4 block is highlighted, so as not to obscure the figure) which is the legal or compatible partition size of the AVC coding standard, as shown in diagram 300B. The merge operation may be a median or weighted average of the four motion vectors of the four 2 x 2 blocks, as reflected by the resulting motion vector 302E of block 302A.

In some implementations, after all motion vectors are mapped to legally sized blocks in the reduced 960 x 540 video 204 (fig. 2), partitions of a 16 x 16 macroblock in the 960 x 540 video 204 may be determined by examining the motion vectors of neighboring blocks to see if larger partition sizes can be formed. For example, and referring to diagram 300C of fig. 3C, if all four 4 x 4 blocks 304A, 304B, 304C, and 304D in a 960 x 540 video 204 have the same motion vector, then an 8 x 8 sub-partition may be applied to the 8 x 8 blocks in the AVC coding standard, as shown by 8 x 8 block 306 in diagram 300D of fig. 3D. Also, if both the upper left and upper right 8 x 8 blocks use 8 x 8 sub-partitions and they have the same motion vector, and if the same applies to the lower left and lower right 8 x 8 blocks, then a 16 x 8 partition may be selected for the macroblock. If the lower left and lower right 8 x 8 blocks have different motion vectors, then a 16 x 8 partition may be selected for the macroblock. In other words, particular embodiments of the video encoding method determine the maximum possible legal partition size as the selected partition size for the reduced video after merging the mapped motion vectors from the original video.

When sharing the motion vector search function, the reduced video and the original video received at the electronic device typically share the same GOP structure. The downscaled video still uses its own reconstructed image, which is at the same temporal position as the reconstructed image of the original video, as its reference for motion compensation. This prevents any drift. In addition to motion vector search, downscaled video may share intra mode decisions, and/or inter and intra mode decisions for each coding unit or macroblock. The downscaled video processes its own coding functions, such as transform, quantization, inverse transform, reconstruction, loop filtering and entropy coding, are independent of the original video coding process.

In some embodiments of the video coding method, the above scheme may be applied to coding reduced video of different scale factors. When the reduction factor is not even, the macroblock boundaries of the reduced video may not be aligned with the macroblock boundaries of the original video. In this case, a 4 x 4 block in the reduced video may have, for example, more than an 8 x 8 block in the original video, and the 8 x 8 block may be paired with the 4 x 4 block when the original video is reduced. One way to find the motion vector of a 4 x 4 block in the reduced video is to use the mean or median of all mapped motion vectors of all paired blocks in the original video.

The motion search sharing scheme can be extended to the case where the reduced video is encoded in a different video encoding standard than that for the original video. Some example limitations of this operation may include many references, the temporal locations of which may be different, which require that the temporal locations of the reference images selected by the motion search function of the original video conform to the coding standard of the reduced video. This requirement can be met because all video coding standards allow pictures that precede the current picture to be references for inter prediction. Reducing the minimum partition size, motion vector resolution, and motion vector range of a video also requires compliance with the encoding standard by: the motion vector range is defined by taking the mean or median of the motion vectors of the paired blocks that make up the smallest partition size in the reduced video, rounding the motion vector resolution when the mapped motion vector from the original video has a higher resolution than the legal motion vector resolution of the reduced video, or if the mapped motion vector from the original video exceeds the legal motion vector range of the reduced video. For example, the original video may be AVC encoded, while the downscaled video is MPEG-2 encoded. MPEG-2 has a minimum partition size of 16 x 8 and a motion vector resolution of half a pixel, whereas AVC has a minimum partition size of 4 x 4 and a motion vector resolution of one quarter of a pixel.

In some embodiments of the video encoding method, the motion search sharing scheme may be applied to a real-time scalable video encoder (e.g., the encoding engine 104), where different spatial layers may be encoded by the same encoder in real-time. The motion search results of the enhancement layer may be used to generate motion vectors of the base layer as a lower resolution image.

In some implementations, the motion search sharing scheme can be applied simultaneously between a spatially scaled video where the encoder encodes, for example, a 1920 × 1080 two-dimensional video, and a reduced version of a three-dimensional video, such as 960 × 540, with the same encoder. In this case, the motion search of the lower resolution three-dimensional video may use the reduced motion vectors from the higher resolution two-dimensional video.

Attention is now directed to fig. 4A-4B, which illustrate an example video encoding system implemented as a single encoding engine 104. In one embodiment, the single encoding engine 104 may be implemented in hardware, although some embodiments may include software (including firmware) or a combination of software and hardware. For example, some embodiments may include a processor (e.g., a CPU) that provides instructions and/or data to one or more of the logic units shown in fig. 4A and 4B. The example single encoding engine 104 includes a first processing unit 402 and a second processing unit 404. It should be understood that, in the context of this disclosure, while two processing units 402 and 404 are shown, this number is merely illustrative and particular embodiments may include additional processing units. The multiple processing units 402 and 404 generate respective bitstreams (e.g., "bitstream 1 and bitstream 2") corresponding to different image sizes of the same input video. For purposes of illustration, a single encoding engine 104 is shown to generate two bitstreams. However, some embodiments of a single encoding engine 104 may be extended to generate any number of bitstreams. The number of bitstreams may depend, for example, on the applications executed on the electronics housing of a single encoding engine 104.

Video is received at a video input 406 (e.g., an interface). For example, the video received at the interface 406 input may include the input video 201 shown in fig. 2. The interface 406 performs a copy function according to well-known methods, wherein the input video is split into multiple video streams (two (2) in this example), which mirror the image size of the input video 201. Multiple streams (e.g., input image size video 202 as in fig. 2) are output from the interface 406 and provided to the respective processing units 402 and 404. At the first processing unit 402, the input image size video 202 is provided to encoding decision logic, which includes encoding decisions such as intra mode decision logic 408, where a prediction direction determination for a macroblock in intra prediction mode is made, and inter/intra decision logic 410, which processes the given macroblock. It also shows motion estimation (motion search) logic 412, including partitions of macroblocks and their motion vectors. The single encoding engine 104 further includes additional processing logic 438, which (referring to fig. 4B) may include motion compensation logic 414, 424 for inter prediction, where partitions for retrieval and their associated motion vectors may be identified by motion estimation (search) logic 412.

As shown in fig. 4A, another of the plurality of input image size videos output by the interface 406 is provided to the second processing unit 404, in particular, to a spatial scalar (scaler) logic 436. The spatial scalar logic 436 performs spatial scaling and reduces the reduced image size. The amount of spatial scaling may be determined, for example, by a processor (e.g., a CPU) or based on user input. In other words, the spatial scalar logic 436 reduces the original input video to a desired size or resolution. The reduced image size is provided to additional processing logic 440, which is described below in conjunction with FIG. 4B. In some embodiments, a combination of temporal scaling and spatial scaling may be performed. The output of the spatial scalar logic 436 may include the reduced image size video 204 (fig. 2). The video coding information includes motion vectors, motion vector search areas, mode decisions, etc., and is shared between the first processing unit 402 and the second processing unit 404 for encoding the video streams 202 and 204, as described below. In one embodiment, the motion vector, motion vector search area, and/or mode decision determined for the first processing unit 402 is provided to the derivation module 434.

In embodiments implementing spatial scaling to derive a reduced size stream for encoding, spatial scalar logic 436 performs reduction as described above, and derivation logic 434 performs motion vector mapping, motion vector scaling, and so on. For example, in one embodiment, derivation logic 434 performs mapping of blocks and motion vectors between pictures of different sizes, scales the motion vectors, and performs additional processing to merge the blocks to maintain conformance with a given encoding standard, and after merging the mapped motion vectors of the video encoding process performed by the first processing unit 402, finds the maximum legal partition size as the selected partition size of the reduced video. Derivation logic 434 and space scalar logic 436 share information, either directly or indirectly (e.g., through CPU intervention), represented by the dashed line between 436 and 434. For example, in one embodiment utilizing direct transfer (without CPU intervention), the scale factors are passed directly from the space scalar logic 436 to the derivation logic 434. Spatial scalar logic 436 performs spatial scaling to provide reduced image size video (e.g., 204), and encoding of the reduced image size video (204) then occurs based on the derived video encoding information (e.g., motion vectors) from first processing unit 402 and based on information (e.g., image size, bit rate, scaling factor) passed from spatial scalar logic 436 to derivation logic 434. Although various algorithms and/or methods are described as being performed, at least in part, in derivation logic 434 in conjunction with spatial scalar logic 436, it should be understood that in some implementations one or more of the foregoing functions may be performed by other logic or distributed between a number of different logics.

In the encoding process, a current frame or picture in a group of pictures (GOP) is provided for encoding. A current picture may be processed as a macroblock or coding unit in the emerging video coding standard HEVC, where a macroblock or coding unit corresponds to, for example, a 16 × 16 or 32 × 32 block of pixels in the original picture. Each macroblock may be encoded in an intra-coding mode or an inter-coding mode of a P picture or a B picture. In inter-coding mode, motion compensated prediction may be performed by additional processing logic 438 and 440, such as respective motion compensation logic 414 and 424 (fig. 4B) in each processing unit 402 and 404, respectively, and may be based on at least one previously encoded reconstructed image.

Referring to fig. 4B, and further illustrating additional processing logic 438 and 440 for each processing unit 402, 404, the predicted macroblock P may be subtracted from the current macroblock, thereby generating a difference macroblock by the logic 416, 426 of each bitstream, and the difference macroblock may be transformed and quantized by the respective transformer/quantizer logic 418, 428 of each bitstream. The output of each transformer/quantizer logic 418, 428 may be entropy encoded by a respective entropy encoder logic 420, 430 and output as a compressed bitstream corresponding to a different bit rate.

The encoded video bitstreams (e.g., "bitstream 1 and bitstream 2") include entropy encoded video content and any side information needed to decode the macroblocks. During the reconstruction operation for each bitstream, the results of the respective transformer/quantizer logic 418, 428 may be dequantized, inverse transformed, added to the prediction and loop filtering by the respective inverse quantizer/transformer/reconstruction logic 418, 428 to generate reconstructed difference macroblocks for each bitstream.

In this regard, each bitstream is associated with a respective processing unit 402, 404 comprising residual calculation logic 416, 426, each of which is configured to generate a residual, and subsequently a quantized transform factor. It should be noted, however, that different quantization parameters may be applied. Each processing unit 402, 404 further includes reconstruction logic 422, 432 coupled to the inverse quantizer/transformer logic 418, 428, wherein each reconstruction logic 422, 432 is configured to generate a respective reconstructed pixel. As illustrated, the reconstruction logic 422, 432 performs reconstruction of the decoded pixels at different image sizes depending on the respective quantization parameters applied. It should be noted that one or more functions related to the various logic described in association with fig. 4A-4B may be combined into a single logic unit or further distributed between additional logic units.

It should be noted that the various embodiments disclosed are applicable to various video standards including, but not limited to, MPEG-2, VC-1, VP8, and HEVC, which provide more coding tools that can be shared. For example, using HEVC, inter prediction unit sizes may range anywhere from block sizes 4 × 4 to 32 × 32, which requires a large amount of data to perform motion search and mode decision.

It should be appreciated that in the context of the present disclosure, one embodiment of a video encoding method 500 described in fig. 5 and implemented in one embodiment by a single encoding engine (e.g., encoding engine 104) includes receiving an input video stream of images having one or more first sizes at the single encoding engine (502). For example, the input video stream may be an uncompressed stream (e.g., input video stream 201 in fig. 2). The method 500 further includes generating, by a single encoding engine, a plurality of streams of different image sizes in parallel (504). In other words, the interface 406 (fig. 4A and 4B) replicates the input video stream and generates multiple (e.g., two in this example) streams corresponding to the video stream 202. The spatial scalar logic 436 generates a reduced image size stream (e.g., video stream 204) and provides the stream to additional processing logic 440. Furthermore, the first processing unit 402 provides motion vectors and/or mode information to the derivation logic 434, which is used to encode the reduced picture size stream. In one embodiment, method 500 maps macroblocks of a reduced image from one generated stream (e.g., a reduced image size stream) to blocks in another generated stream (e.g., an image of an input stream having a first image size) (506), and maps motion vectors of each of a plurality of blocks of the input video stream to a plurality of reduced blocks of a second encoded stream whose mapped motion vectors are based on motion vectors used in generating the first encoded stream adjusted by a defined scaling factor (508). The method 500 further includes generating a plurality of encoded streams, a first encoded stream including one or more images of a first size, a second encoded stream including one or more images of a second size, wherein the second size is smaller than the first size, the encoding of the second stream based on sharing video encoding information for use in encoding the first encoded stream (510). In some implementations, the method 500 includes merging mapping motion vectors to form compatible block sizes in response to non-compatible block sizes. The method 500 may further include, in some implementations, determining a partition of each macroblock or coding unit containing compatible mapped blocks based on neighboring mapped motion vectors, and averaging or applying a median operation to the mapped motion vectors when boundaries of macroblocks of the downscaled coded stream are not aligned with boundaries of macroblocks of the input video stream. It should be understood that in the context of the present disclosure, one or more of the described logic functions may be omitted or additional logic functions may be included in some embodiments. For example, sharing mode information is also contemplated within the scope of certain embodiments of method 500.

The video encoding system may be implemented in hardware, software (e.g., including firmware), or a combination thereof. In one embodiment, the video encoding system is performed using any one or a combination of the following techniques, all of which are well known in the art: discrete logic circuitry with logic gates implementing logical functions on the data signals, an Application Specific Integrated Circuit (ASIC) with appropriate combinational logic gates, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), and so forth. In an embodiment, where all or part of the video encoding system is implemented in software, the software is stored in a memory and it can be executed by a suitable instruction execution system (e.g., a computer system including one or more processors, memory and operating system encoded with encoding software/firmware, etc.).

It should be appreciated by those of skill in the art that any flow illustrations or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

1. A method, comprising:

receiving, at a single encoding engine, an input video stream having one or more images of a first size; and

generating, by the single encoding engine, in parallel, a plurality of encoded streams, a first encoded stream comprising one or more images of the first size, and a second encoded stream comprising one or more images of a second size, the second size being smaller than the first size, the encoding of the second stream being based on sharing video encoding information for use in encoding the first encoded stream.

2. The method of claim 1, wherein the video coding information comprises motion vector search results for inter prediction.

3. The method of claim 2, wherein the motion vector search result comprises a motion vector, a partition of one coding unit, a motion vector search range, or any combination thereof.

4. The method of claim 2, wherein a motion vector search is performed for the first encoded stream, and the resulting motion vectors and associated coding unit partitions are used to generate the second encoded stream without performing a motion vector search function for the second encoded stream.

5. The method of claim 2, wherein the vertical and horizontal dimensions of each of the plurality of blocks of the one or more images used in generating the first encoded stream are reduced by a defined factor to derive the plurality of reduced blocks of the second encoded stream.

6. The method of claim 1, wherein the video coding information comprises intra prediction modes for intra prediction.

7. The method of claim 1, wherein the video coding information comprises a selection between inter-prediction and intra-prediction for a coding unit.

8. The method of claim 1, wherein the video coding information comprises motion vector search results, an intra prediction mode for intra prediction, a selection between inter prediction and intra prediction for a coding unit, or any combination thereof.

9. A system comprising a single encoding engine configured in hardware for receiving an input video stream having one or more images of a first size and generating in parallel a plurality of encoded streams comprising one or more images of the first size, and a second encoded stream comprising one or more images of a second size, the second size being smaller than the first size, the encoding of the second stream being based on sharing video encoding information for use in encoding the first encoded stream.

10. The method of claim 9, wherein the video coding information comprises motion vector search results, an intra prediction mode for intra prediction, a selection between inter prediction and intra prediction for a coding unit, or any combination thereof.