HK1036864A

HK1036864A - Decoding an encoded image having a first resolution directly into a decoded image having a second resolution

Info

Publication number: HK1036864A
Application number: HK01106901.4A
Authority: HK
Inventors: 拉马昌德兰‧纳塔拉詹; T‧乔治‧勘姆贝尔
Original assignee: 赤道技术公司
Priority date: 1998-06-19
Filing date: 1999-06-18
Publication date: 2002-01-18

Description

Direct decoding of an encoded image having a first resolution into a decoded image having a second resolution

Technical Field

The present invention relates generally to image processing circuits and techniques, and more particularly to circuits and methods for directly decoding an encoded version of an image having one resolution into a decoded version having another resolution. For example, such a circuit may directly down-convert an encoded image having a high resolution (hereinafter "hi-res") form into a decoded image having a low resolution (hereinafter "lo-res") form without the intermediate step of decoding the image having the high resolution form.

Background

It is sometimes desirable to change the resolution of electronic images, such as electronic display devices like television sets and computer monitors, which have a maximum display resolution, and therefore if the resolution of an image is higher than the maximum display resolution of the device, it is desirable to down-convert the image to a resolution less than or equal to the maximum display resolution of the device. For clarity, this is described hereinafter as down-converting a high resolution version of an image to a low resolution version of the image.

Fig. 1 shows a pixel map of a high resolution version 10 and a pixel map of a low resolution version 12 of an image. The high resolution version 10 is n pixels wide by t pixels high, so it has n x t pixels P_0,0-P_t,nBut if the maximum display resolution of the display device (not shown) is n × g]One pixel wide [ t × h]Where g and t are smaller than 1, one typically converts the high resolution form 10 to a low resolution form 12 having a resolution less than or equal to the maximum display resolution of the display device for display purposes. Thus, in order to display an image on a display device with the highest possible resolution, the low resolution version 12 has (n × g) × (t × h) pixels P_0,0-P_{(t×h)×(n×g)}. For example, assume that the high resolution version 10 is n =1920 pixels wide by t =1088 pixels high. Assuming still further that the maximum resolution of the display device is n × g =720 pixels wide by t × h =544 pixels high, then the maximum horizontal resolution of low resolution version 12 is g =3/8 for the horizontal resolution of high resolution version 10, and the maximum vertical resolution of low resolution version 12 is h =1/2 for the vertical resolution of high resolution version 10.

Referring to fig. 2, many forms of images, such as form 10 of fig. 1, are encoded using conventional block-based compression schemes before being transmitted or stored, and so for these image forms, the above is summarizedThe resolution reduction discussed in connection with fig. 1 is typically done block by block. In particular, fig. 2 depicts an example of the block level g =3/8 and h =1/2 down-conversion discussed above in connection with fig. 1. Image block 14 of high-resolution version 10 (fig. 1) is 8 pixels wide and 8 pixels high, and image block 16 of low-resolution version 12 (fig. 1) is 8 × 3/8=3 pixels wide and 8 × 1/2=4 pixels high. The pixels in the block 16, commonly referred to as subsampled pixels, are evenly distributed within the block 16 and span the boundaries of adjacent blocks (not shown) of the low resolution version 12. For example, referring to block 16, sub-sampling pixel P_0.2To P_0.1From its distance to P in a block (not shown) immediately to the right of block 16_0.0Are the same, the same sub-sampled pixel P_3.0To P_2.0From its distance to P in a block (not shown) immediately at the bottom of block 16_0.0Are the same.

Unfortunately, because algorithms that decode an encoded high resolution version of an image into a decoded low resolution version of the image are inefficient, image processing circuits that execute these algorithms typically require relatively high performance processors and large amounts of memory, and are therefore also relatively expensive.

For example, U.S. patent No.5,262,845 describes an algorithm that can decode an encoded high resolution version of an image at full resolution, and then down-convert the decoded high resolution version to a decoded low resolution version. Since only the decoded low resolution version is shown, it is an unnecessary and uneconomical step to produce a decoded high resolution version of the image.

Moreover, motion compensation algorithms are generally inefficient in decoding and down-converting encoded video images as described above, which inefficiency further increases the processing power and memory requirements, and thus the cost of the image processing circuitry. For example, U.S. patent No.5,262,845 describes one technique as follows. First, a low resolution version of a reference frame is generated from its high resolution version in a conventional manner and stored in a reference frame buffer. Next, the encoded high resolution version of the motion compensated frame with motion vectors pointing to macroblocks of the reference frame is decoded on its full resolution scale. But the motion vectors generated in relation to the high resolution version of the reference frame are not compatible with the low resolution version of the reference frame. The processing circuitry therefore needs to up-convert the pointed macroblock in the low resolution form of the reference frame to a high resolution macroblock compatible with the motion vector. The processing circuitry performs this up-conversion using interpolation. Next, the processing circuit combines the remainder with the high resolution reference macroblock to produce a decoded macroblock of the motion compensated frame. The processing circuit then down-converts the decoded high resolution version to a decoded low resolution version after decoding the entire motion compensated frame into a decoded high resolution version of the motion compensated frame. Therefore, this technique is inefficient because the reference macroblock is down-converted for storage and display and then up-converted for motion compensation.

Unfortunately, the image processing circuitry that performs the down-conversion and motion compensation techniques described above is too expensive for many users. For example, with the advent of High Definition Television (HDTV), it is estimated that many users are not burdened with replacing their ordinary television sets with HDTV receivers/displays. Accordingly, HDTV decoders that down-convert HDTV video frames to standard resolution video frames that can be displayed on a common television set are expected to have a large consumer market. However, many users who cannot afford HDTV receivers also cannot afford HDTV decoders if these decoders include the relatively expensive image processing circuitry described above.

Overview of conventional image compression techniques

To assist the reader in understanding the concepts discussed above and in the following description of the invention more readily, a basic conventional image compression technique is first reviewed below.

Electronic transmission of higher resolution images over lower bandwidth channels, or electronic storage of such images in a smaller memory space, typically requires compression of the digital data representing the images. Such image compression generally involves reducing the number of data bits necessary to represent the image. For example, video images for High Definition Television (HDTV) are compressed so that they can be transmitted over existing television channels. If not compressed, HDTV video images require a transmission channel with a much wider bandwidth than existing television channels. Moreover, to reduce the time for data communication and transmission to an acceptable level, the image also needs to be compressed before it is sent over the internet. Also, in order to increase the image storage capacity of the CD-ROM or the server, the images may also be compressed before being stored thereon.

Referring to fig. 3A through 9, elements of a popular block-based Moving Picture Experts Group (MPEG) compression standard, including MPEG-1 and MPEG-2, are discussed. For purposes of description, compression in Y, C is based here on the use of the MPEG 4: 2: 0 format_B,C_RA color space represents a video image for discussion. However, the concepts discussed may be applied to other MPEG formats, to images represented by other color spaces, and to other block-based compression standards, such as the Joint Photographic Experts Group (JPEG) standard, which is commonly used for compressing still images. In addition, although much of the details of the MPEG standard and Y, C have been omitted for brevity_B,C_RColor space, but these details are well known and a large number of reference materials at hand are disclosed.

Still referring to fig. 3A through 9, the MPEG standard is typically used to compress temporal image sequences, for the purposes described herein, video frames such as those found in television broadcasts. Each video frame is divided into partitions, called macroblocks, each of which comprises one or more pixels. Fig. 3A is a 16 x 16 pixel macroblock 30 having 256 pixels 32 (not drawn to scale). In the MPEG standard, the macroblocks are all 16 x 16 pixels, although other compression standards may use macroblocks with other dimensions. In the original video frame, i.e. the frame before compression, each pixel 32 has a respective luminance value Y and a respective pair of color difference values, i.e. chrominance difference values, C_BAnd C_R。

Referring to fig. 3A-3D, before compressing a frame, from the original values Y, C of the original frame_BAnd C_RTo generate a digital luminance (Y) and chrominance difference (C) to be used for compression_BAnd C_R) I.e. the value before compression. In the MPEG 4: 2: 0 format, the value Y before compression is the same as the original Y value. Therefore, each pixel 32 can only maintain its original luminance value Y. However, to reduce the amount of data to be compressed, the MPEG 4: 2: 0 format allows only one pre-compression C for each group 34 of 4 pixels 32_BValue and C_RThe value is obtained. Each one of C before compression_BAnd C_RThe values are respectively from the original C of the four pixels 32 in the respective groups 34_BAnd C_RThe value is deduced. E.g. C before compression_BThe value may be equal to the original C of the four pixels 32 in each group 34_BThe average of the values. Thus, referring to fig. 3B-3D, Y, C before compression is generated for macroblock 30_BAnd C_RThe values are arranged as follows: a 16 x 16 matrix 36 of Y values before compression (for each pixel 32, the Y value before compression is equal to its original Y value), a C before compression_BAn 8 x 8 matrix 38 of values (C before compression for each group 34 of four pixels 32)_BValue equal to derived C_BValue), one C before compression_RAn 8 x 8 matrix 40 of values (C before compression for each group 34 of four pixels 32)_RValue equal to derived C_RValue). The matrices 36,38 and 40 are commonly referred to as "blocks" of values. In addition, since it is convenient to perform compression conversion of a block of pixel values 8 × 8 instead of compression conversion of a block of 16 × 16, the block 36 of Y values before compression is subdivided into four 8 × 8 blocks 42a to 42D, which correspond to the 8 × 8 pixel blocks a to D in the macroblock 30, respectively. Thus, referring to fig. 3A through 3D, six 8 × 8 blocks of pre-compression pixel data are generated for each macroblock 30: four 8 x 8 blocks 42a to 42d of Y values before compression, one C before compression_B8 x 8 block 38 of values, C before compression_R8 x 8 blocks of values 40.

Fig. 4 is a block diagram of an MPEG compressor 50, the compressor 50 more generally being referred to as an encoder. In general, encoder 50 converts pre-compressed data for a frame or sequence of frames to encoded data representing the same frame or sequence of frames, which has substantially fewer data bits than the pre-compressed data. To perform such conversion, the encoder 50 reduces or removes redundancy in the data before compression and reformats the remaining data using efficient conversion and encoding techniques.

More specifically, encoder 50 includes a frame reordering buffer 52 that receives pre-compression data for one or more frames of a sequence and reorders the frames in a sequence suitable for encoding. Thus, the reordered sequence is typically different from the sequence in which the frames are generated and will be displayed. The encoder 50 assigns each stored frame to a respective group called a group of pictures (GOP) and marks each frame as an intra frame (i) or a non-intra frame (non-i). For example, each GOP may include three i frames and 12 non-i frames for a total of 15 frames. Encoder 50 always encodes an i-frame without reference to another frame, but can and typically does encode a non-i-frame with reference to frames in one or more GOPs. However, the encoder 50 does not encode non-i frames with reference to frames in different GOPs.

Referring to fig. 4 and 5, in encoding an i-frame, Y, C before compression representing the i-frame_BAnd C_RThe 8 x 8 blocks of values (fig. 3B to 3D) pass through an adder 54 to a Discrete Cosine Transformer (DCT)56 which is responsible for transforming these blocks of values into 8 x 8 blocks each comprising 1 DC (zero frequency) transform value and 63 AC (non-zero frequency) transform values. FIG. 5 is a Y-DCT of luminance transform values_(0,0)a-Y-DCT_(7,7)aCorresponding to the luminance pixel value Y before compression in block 36 of fig. 3B, block 57_(0,0)a-Y_(7,7)a. Thus, the number of luminance transform values Y-DCT for block 57 is the same as the number of luminance pixel values Y for block 36. Similarly, the chroma transform value C_B-DCT and C_RBlocks of-DCT (not shown) correspond to the chrominance pixel values in blocks 38 and 40. In addition, Y, C before compression_BAnd C_RThe value does not need to be compared with any other value when passing through the adder 54The values are added because the adder 54 is not needed when the encoder 50 encodes an i-frame. However, as will be discussed below, an adder 54 is typically required when the encoder 50 encodes non-I frames.

Referring to fig. 4 and 6, the quantizer and zigzag scanner 58 limits each of the transformed values from the DCT56 to its maximum value and provides quantized AC and DC transformed values on respective paths 60 and 62. Fig. 6 is an example of a zigzag scanning pattern 63, such a pattern quantizer and zigzag scanner 58 may be implemented. More specifically, the quantizer and scanner 58 reads transform values of the transform block (e.g., transform block 57 in fig. 5) in the indicated order. Thus the quantizer and scanner 58 first reads the transformed value for the "0" position, then the transformed value for the "1" position, and the third the transformed value for the "2" position, so far as finally it reads the transformed value for the "63" position. The quantizer and zigzag scanner 58 reads the transform values in this zigzag manner to improve the well-known coding efficiency. Of course, the quantizer and zigzag scanner 58 may also perform other scanning modes depending on the encoding technique and the type of image being encoded.

Referring again to fig. 4, the prediction encoder 64 predictively encodes the DC transform value, and the variable length encoder 66 converts the quantized AC transform value and the quantized and predictively encoded DC transform value into a variable length code, such as a huffman code. These codes form encoded data representing the pixel values of the i frames being encoded. The transmit buffer 68 then temporarily stores the codes to synchronize the transmission of the encoded data to the decoder (discussed below in connection with fig. 8). Alternatively, if the encoded data is to be stored rather than transmitted, encoder 66 may provide the variable length code directly to a storage medium such as a CD-ROM.

If an i-frame is to be used as a reference for one or more non-i-frames in a GOP (which is often used as a reference), encoder 50 decodes the encoded i-frame to produce a corresponding reference frame using a decoding technique similar to or the same as that used by the decoder, for the following reasons (fig. 8). The decoder has no other choice when decoding non-i frames with reference to i frames, and can only use the decoded i frames as reference frames. Since MPEG encoding and decoding is lossy, some information is lost due to quantization of the AC and DC transform values, and therefore the pixel values of the decoded i-frame will typically differ from the pre-compression pixel values of the original i-frame. Therefore, using the pre-compressed I frame as a reference frame during encoding may result in additional artifacts in the decoded non-I frame, since the reference frame used for decoding (decoded I frame) is different from the reference frame used for encoding (pre-compressed I frame).

Thus, to generate a reference frame for an encoder that is similar or identical to a reference frame for a decoder, the encoder 50 includes a dequantizer and inverse zigzag scanner 70 and an inverse DCT72 designed to mimic the dequantizer and scanner and inverse DCT72 (fig. 8) of the decoder. The dequantizer and inverse scanner 70 first performs the inverse of the zigzag scan path performed by the quantizer 58 so that the DCT values are correctly positioned in the respective decoded transform blocks. Next, the dequantizer and inverse scanner 70 dequantizes the quantized DCT values, and the inverse DCT72 transforms the dequantized DCT values into corresponding decoded Y, C_BAnd C_RAn 8 x 8 block of pixel values that together make up the reference frame. However, due to losses that occur during quantization, some or all of these decoded pixel values may differ from their corresponding pre-compressed pixel values, and thus the reference frame may also differ from its corresponding pre-compressed frame as described above. The decoded pixel values are then passed through adder 74 (used when generating reference frames from non-i frames as described below) to reference frame buffer 76, which is used to store the reference frames.

In encoding a non-i frame, the encoder 50 initially encodes the macroblocks of each non-i frame in at least two ways: in the manner discussed above for i-frames, and using motion prediction as discussed below. The encoder 50 then saves and transmits the resulting code with the least bits. This technique ensures that the macroblocks of the non-i frame are encoded using a minimum number of bits.

As for motion prediction, motion is exhibited if the relative position of an object in a previous frame or a subsequent frame is changed. For example, a horse may exhibit relative motion if it flies across the screen. Or if the camera follows the horse, the background will show relative motion with respect to the horse. Typically each subsequent frame in which an object appears contains at least some macroblocks of the same pixels as the previous frame. But such matching macroblocks in subsequent frames typically occupy different positions than they occupied in previous frames, respectively. Alternatively, a macroblock that includes a stationary object (e.g., a tree) or a portion of a background frame (e.g., the sky) may occupy the same frame position in each subsequent frame, thus exhibiting "zero motion". In either case, rather than encoding each frame independently, fewer data bits are typically used to tell the decoder that "macroblocks R and Z of frame 1 (non-I frame) are the same as macroblocks at S and T positions of frame 0 (reference frame), respectively". This "statement" is encoded as a motion vector. For a moving object that is relatively fast, the position value of the motion vector is relatively large. In contrast, for stationary or relatively slow moving objects or background frames, the position value of the motion vector is relatively small or equal to zero.

Fig. 7 depicts the concept of motion vectors associated with non-i frame 1 and reference frame 0 discussed above. Motion vector MV_RThe macroblock representing position R of frame 1 can be found at position S of reference frame 0. MV (Medium Voltage) data base_RThe method comprises three parts: the first part, here 0, indicates that a matching macroblock can be found in the frame (here frame 0). The other two moieties X_RAnd Y_RTogether, form a two-dimensional position value that indicates where the matching macroblock is located in frame 0. Thus, in this example, since position S of frame 0 has the same X-Y coordinates as position R of frame 1, X_R=Y_RAnd = 0. In contrast, the macroblock at position T matches the macroblock at position Z, which has different X-Y coordinates than the X-Y coordinates at position T. Thus, X_ZAnd Y_ZRepresenting position T in relation to position Z. For example, assume that position T is 10 images to the left relative to position ZPixel (negative X direction) 7 pixels down (negative Y direction), hence MV_Z=0, -10, -7. They are all based on the same general concept, although many other motion vector schemes may also be used. For example, the position R may be bi-directionally encoded. That is, the position R may have two motion vectors pointing to respective matching positions in different frames, one before frame 1 and one after frame 1. During decoding, the pixel values of these matching locations are averaged or combined to calculate the pixel value at that location.

Referring again to fig. 4, motion prediction will now be discussed in detail. In encoding a non-I frame, the motion predictor 78 compares the pre-compressed Y values of the macroblocks in the non-I frame with the decoded Y values of the respective macroblocks in the reference I frame (C is not used in motion prediction)_BAnd C_RValue) and identifies a matching macroblock. For each macroblock in the non-i frame whose matching value can be found in the i reference frame, the motion predictor 78 generates a motion vector that identifies the location of the matching macroblock in the reference frame as well as the reference frame. Thus, as discussed below in connection with fig. 8, in decoding the motion-coded macroblocks of these non-i frames, the decoder uses the motion vectors to obtain pixel values of the motion-coded macroblocks from matching macroblocks in the reference frame. Predictive coder 64 predictively codes the motion vectors, and coder 66 generates respective codes for the coded motion vectors and provides them to transmit buffer 68.

In addition, since the macroblocks in the non-I frame and the matching macroblocks in the reference I frame are typically similar but not identical, the encoder 50 encodes these differences along with the motion vectors to enable the decoder to account for them. More specifically, the motion predictor 78 provides the decoded Y values of the matching macroblocks in the reference frame to the adder 54, and the adder 54 effectively subtracts these Y values, pixel by pixel, from the pre-compressed Y values of the matching macroblocks in the non-i frame. These differences, referred to as residuals, are arranged in 8 x 8 blocks and processed by the DCT56, quantizer and scanner 58, encoder 56, and buffer 68 in a manner similar to that discussed above, except that the quantized DC transform values of the residual blocks are connected directly to the encoder 66 via line 60 and thus cannot be predictively encoded by the predictive encoder 64.

In addition, it is possible to use non-I frames as reference frames. When a non-i frame is used as a reference frame, the quantized residuals produced by the quantizer and zig-zag scanner 58 are dequantized, reordered and inverse transformed by the dequantizer and inverse scanner 70 and inverse DCT, respectively, so that the non-i reference frame will be identical to the reference frame used by the decoder for the same reasons discussed above. The motion predictor 78 provides the adder 74 with the decoded Y value of the reference frame used to generate the remainder. Adder 74 adds the residuals from inverse DCT72 to the decoded Y values of the reference frame to generate the Y values of the non-i reference frame. Reference frame buffer 76 then stores the reference non-i frame along with the reference i frame for use in motion coding subsequent non-i frames.

Although circuits 58 and 70 are described as performing zig-zag and anti-zig scanning, respectively, in other embodiments another circuit may perform zig-zag scanning and anti-zig scanning may be omitted. For example, encoder 66 may perform zigzag scanning and circuitry 58 may only perform quantization. Because the zigzag scan is outside the reference frame loop, the inverse zigzag scan can be omitted by the dequantizer 70. This saves processing power and processing time.

Still referring to fig. 4, the encoder 50 also includes a rate controller 80 to ensure that the transmit buffer 68 transmits the encoded frame data at a fixed rate without overflow or idleness, i.e., underflow. If either of these two conditions occurs, an error is introduced into the encoded data stream. For example, if the buffer 68 overflows, data from the encoder 66 is lost. Thus, the rate controller 80 uses the feedback to adjust the quantization scale factor used by the quantizer/scanner 58 based on the degree of saturation of the transmit buffer 68. More specifically, the more full the buffer 68, the greater the scaling factor made by the controller 80, and the fewer data bits produced by the encoder 66. Conversely, the more empty the buffer 68, the smaller the scaling factor the controller 80 causes the more data bits the encoder 66 produces. This adjustment of continuity ensures that buffer 68 neither overflows nor underflows.

Fig. 8 is a block diagram of a conventional MPEG decompressor 82, the decompressor 82 being generally referred to as a decoder, which is capable of decoding frames encoded by the encoder 60 of fig. 4.

Referring to fig. 8 and 9, the variable length decoder 84 decodes the variable length code received from the encoder 50 for i frames and non-i frames for which motion prediction is not performed. The predictive decoder 86 decodes the predictively decoded DC transform values, and a dequantizer and inverse zig-zag scanner 87, which are similar or identical to the dequantizer and inverse zig-zag scanner 70 of fig. 4, dequantizes and reorders the decoded AC and DC transform values. Alternatively, another circuit such as decoder 84 may perform the inverse zigzag scan. An inverse DCT88, similar or identical to the inverse DCT72 in fig. 4, is used to transform the quantized transform values into pixel values. For example, FIG. 9 shows the luminance inverse transform values Y-IDCT, i.e., the block 89 of decoded luminance pixel values, which correspond to the luminance transform values Y-DCT in block 57 of FIG. 5 and the luminance pixel values Y before compression in block 42a of FIG. 3B, respectively_B. But the pixel values decoded in block 89 are typically different from the individual pixel values in block 42a because of the loss caused by the quantization and dequantization performed by encoder 50 (fig. 4) and decoder 82, respectively.

Still referring to fig. 8, the decoded pixel values from the inverse DCT88 are passed through an adder 90 (which is used in decoding the motion predicted macroblocks of the non-i frames as described below) into a frame reordering buffer 92, which reordering buffer 92 is used to store the decoded frames and arrange them in the correct order for display on a video display device 94. If a decoded frame is used as a reference frame, it is also stored in the reference frame buffer 96.

For macroblocks of non-I frames that are motion predicted, the decoder 84, dequantizer and inverse scanner 87, inverse DCT88 process the remainder transform values for I frame transform values as discussed above. The prediction decoder 86 decodes the motion vector and the motion interpolator 98 provides the pixel values from the reference frame macroblock to which the motion vector points to the adder 90. Adder 90 adds these reference pixel values to the remainder pixel values to produce decoded macroblock pixel values and provides these decoded pixel values to frame reordering buffer 92. If the decoder 50 (fig. 4) uses a decoded non-i frame as a reference frame, then such decoded non-i frame is stored in the reference frame buffer 96.

Referring to fig. 4 and 8, although the encoder 50 and decoder 82 are described as including multi-function circuit blocks, they may be implemented in hardware, software, or a combination of both. For example, encoder 50 and decoder 82 are typically implemented by respective one or more processors that perform the respective functions of the circuit blocks.

A more detailed discussion of the MPEG encoder 50 and MPEG decoder 82 of fig. 4 and 8, and the general MPEG standards, is found in many publications, such as the "video compression" of Peter d.symes, McGraw-Hill,1998, incorporated herein by reference. In addition, there are other well-known block-based compression techniques for encoding and decoding video and still images.

Summary of the invention

In one aspect of the invention, an image processing circuit includes a processor that receives an encoded portion of a first version of an image. The processor directly decodes the encoded portion into a decoded portion of a second version of the image, the second version having a resolution different from the resolution of the first version.

Thus, such an image processing circuit may directly decode an encoded high resolution version of an image into a decoded low resolution version of the image. That is, such a circuit eliminates the inefficient step of decoding the encoded high resolution version at its full resolution level prior to down-converting to the low resolution version. Such image processing circuits are therefore generally faster, simpler and cheaper than the circuits used in the prior art for decoding and down-converting images.

In another aspect of the invention, the image processing circuit includes a processor that modifies a motion vector associated with a portion of the first version of the first image. The processor then identifies a portion of the second image to which the modified motion vector points, the second image having a different resolution than the first version of the first image. Next, the processor generates a portion of a second version of the first image from the identified portion of the second image, the second version of the first image having the same resolution as the second image.

Thus, such image processing circuitry may decode a macroblock that is motion predicted using a form of reference frame that has a different resolution than the form of reference frame used to encode the macroblock. Such image processing circuits are therefore generally faster, simpler and cheaper than the prior art circuits for converting images for which motion is predicted.

Brief description of the drawings

Fig. 1 is a pixel diagram of a high resolution version and a low resolution version of an image.

Fig. 2 is a pixel diagram of the high resolution version and the low resolution version of the macroblock of fig. 1, respectively.

Fig. 3A is a diagram of a conventional pixel macroblock in an image.

Fig. 3B is a block diagram of conventional luminance values before compression, wherein the luminance values before compression correspond to pixels in the macroblock of fig. 3A, respectively.

Fig. 3C and 3D are block diagrams of conventional pre-compressed chrominance values, wherein the chrominance values before compression correspond to the pixel groups in the macroblock of fig. 3A, respectively.

Fig. 4 is a block diagram of a conventional MPEG encoder.

Fig. 5 is a block of luminance conversion values generated by the encoder of fig. 4, which correspond to the luminance pixel values of fig. 3B before compression, respectively.

Fig. 6 is a conventional zigzag sampling pattern, which may be performed by the quantizer and zigzag scanner of fig. 4.

Fig. 7 describes the concept of a conventional motion vector.

Fig. 8 is a block diagram of a conventional MPEG decoder.

Fig. 9 is a block of inverse transform values generated by the decoder of fig. 8, which correspond to the luminance transform values of fig. 5 and the luminance pixel values before compression of fig. 3B, respectively.

Fig. 10 is a block diagram of an MPEG decoder according to an embodiment of the present invention.

Figure 11 illustrates a technique for converting a high resolution non-interleaved block of pixel values to a low resolution non-interleaved block of pixel values according to one embodiment of the invention.

Figure 12 illustrates a technique for converting a high resolution interleaved block of pixel values to a low resolution interleaved block of pixel values according to one embodiment of the invention.

FIG. 13A shows the low resolution block of FIG. 11 overlaid with the high resolution block of FIG. 11 according to one embodiment of the invention.

Fig. 13B shows the low resolution block of fig. 11 overlaid with the high resolution block of fig. 11 according to another embodiment of the invention.

Fig. 14 shows the low resolution block of fig. 12 overlaid with the high resolution block of fig. 12 according to one embodiment of the invention.

Fig. 15A shows a sub-set of transform values for directly down-converting the high resolution block of fig. 11 to the low resolution block of fig. 11, in accordance with one embodiment of the present invention.

Fig. 15B shows a sub-set of transform values for directly down-converting the high resolution block of fig. 12 to the low resolution block of fig. 12, in accordance with one embodiment of the present invention.

Fig. 16 shows a series of one-dimensional IDCT calculations instead of one two-dimensional IDCT calculation associated with the subset of transformed values in fig. 15A.

FIG. 17 illustrates a motion decoding technique according to one embodiment of the invention.

Fig. 10 is a block diagram of an image decoder and processing circuit 110 according to one embodiment of the invention. The circuit 110 includes: a connection buffer 112 responsible for receiving and storing individual high resolution versions of the encoded image; a variable length decoder 114 for receiving the encoded image data from the connection buffer 112 and separating a data block representing an image from control data accompanying the image data; a state controller 116 for receiving control data and providing the following signals on lines 118, 120 and 122, respectively: a signal indicating whether an image to be encoded is interlaced or non-interlaced, a signal indicating whether a block currently being decoded is motion predicted, and a motion vector to be decoded; a transform value selection and inverse zigzag circuit 124 selects the required transform values from each image block and scans them according to the desired inverse zigzag pattern, or another circuit such as the decoder 114 may perform an inverse zigzag scan; an inverse quantizer 126 for dequantizing the selected transform values; the inverse DCT and subsampler circuit 128 directly converts the transformed values of the high resolution version of the dequantized image to pixel values of the low resolution version of the same image.

For an i block to be encoded, pixel values subsampled by circuit 128 are passed through adder 130 to image buffer 132, and image buffer 132 is used to store the low resolution version of the decoded image.

For blocks that are motion predicted, the motion vector scaling circuit 134 scales the motion vectors from the state controller 116 to the same resolution as the low resolution version of the image stored in the buffer 132. The motion compensation circuit 136 determines the values of the pixels in the matching macroblock that are stored in the buffer 132 and pointed to by the scaled motion vector. In response to the signal on line 120, switch 137 couples the pixel values from circuit 136 to adder 130, which adder 130 is responsible for adding them to the decoded and subsampled remainder from circuit 128, respectively. The sum thus obtained is the pixel values of the decoded macroblock, which are stored in the frame buffer 132. Frame buffer 132 stores the low resolution version of the decoded image in display order and provides this low resolution version to HDTV receiver/display 138.

Fig. 11 illustrates the resolution reduction performed on non-interlaced images by the IDCT and subsampler circuit 128 of fig. 10, in accordance with one embodiment of the present invention. Although the circuit 128 directly converts the encoded high resolution version of the non-interlaced image to the decoded low resolution version of the image, for clarity, fig. 11 depicts this reduction in resolution in the pixel domain. More specifically, an 8 x 8 block 140 of pixels P from the high resolution version of the image is down-converted to a 4 x 3 block 142 of sub-sampled pixels S. Thus, in this example, the horizontal resolution of block 142 is 3/8 of the horizontal resolution of block 140, and the vertical resolution of block 142 is 1/2 of the vertical resolution of block 140. Sub-sampled pixel S in block 142₀₀Is determined by a weighted combination of the values of the pixels P in the sub-block 144 of the block 140. That is, S₀₀Is w₀₀P₀₀,w₀₁P₀₁,w₀₂P₀₃,w₀₃P₀₃,w₁₀P₁₀,w₁₁P₁₁,w₁₂P₁₂And w₁₃P₁₃In which w₀₀-w₁₃Are respectively P₀₀-P₁₃The respective weight value. The calculation of the weight w will be discussed below in connection with fig. 13a and 13 b. Similarly, the sub-sampled pixel S₀₁Is determined by a weighted combination of the values of the pixels P in the sub-block 146, the sub-sampled pixel S₀₂Is determined by the value of pixel P in sub-block 148And so on. In addition, although blocks 140, 142 and sub-blocks 144,146, and 148 are shown having particular dimensions, they may have other dimensions in other embodiments of the invention.

Fig. 12 depicts the resolution reduction performed on the interleaved image by the IDCT and subsampler circuit 128 of fig. 10, according to one embodiment of the present invention. Although the circuit 128 directly converts the high resolution version of the interlaced image encoding into the low resolution version of the image decoding, for clarity, fig. 12 depicts this reduction in resolution in the pixel domain. More specifically, an 8 x 8 block 150 of pixels P from the high resolution version of the image is down-converted to a 4 x 3 block 152 of sub-sampled pixels S. Thus, in this example, the horizontal resolution of block 152 is 3/8 at the horizontal resolution of block 150, and the vertical resolution of block 152 is 1/2 at the vertical resolution of block 150. Sub-sampled pixel S in block 152₀₀Is determined by a weighted combination of the values of the pixels P in sub-block 154 of block 150. That is, S₀₀Is w₀₀P₀₀,w₀₁P₀₁,w₀₂P₀₂,w₀₃P₀₃,w₂₀P₂₀,w₂₁P₂₁,w₂₂P₂₂And w₂₃P₂₃In which w₀₀-w₂₃Are respectively P₀₀-P₂₃The respective weight value. Similarly, the sub-sampled pixel S₀₁Is determined by a weighted combination of the values of the pixels P in the sub-block 156, the sub-sampled pixel S₀₂Is determined by a weighted combination of the values of pixels P in sub-block 158, and so on. In addition, although blocks 150,152 and sub-blocks 154,156, and 158 are shown having particular dimensions, they may have other dimensions in other embodiments of the invention.

Fig. 13A shows the low resolution block 142 of fig. 11 overlapping the high resolution block 140 of fig. 11, according to one embodiment of the invention. Block boundary 160 is the boundary common to overlapping blocks 140 and 142, with subsampled pixel S labeled as Xs and pixel P labeled as a dot. Image that is sub-sampledThe pixels S are spaced apart from each other by a horizontal distance Dsh and a vertical distance Dsv, which are both within the block boundary 160 and across the block boundary 160. Similarly, the pixels P are spaced apart from each other by a horizontal distance Dph and a vertical distance Dpv. In the example described, Dsh =8/3(Dph), Dsv =2 (dpp). Because of S₀₀And a pixel P₀₁And P₁₁Horizontally aligned and thus horizontally displaced from the pixel P₀₁And P₁₁Recently, the values of these pixels are therefore being determined S₀₀Pixel P farther than the horizontal distance in value₀₀，P₁₀,P₀₂,P₁₂,P₀₃And P₁₃Are weighted more heavily. In addition, because of S₀₀In row 0 of pixel P (i.e., P)₀₀,P₀₁,P₀₂And P₀₃) And line 1 (i.e., P)₁₀,P₁₁,P₁₂And P₁₃) So that all the pixels P of row 0 and row 1 are equally weighted in the vertical direction. For example, in one embodiment, pixel P₀₀,P₀₂,P₀₃,P₁₀,P₁₂And P₁₃Are w =0, so that they are paired with S₀₀Does not contribute to P₀₁And P₁₁Averaging the values of (A) to obtain S₀₀The value of (c). In a similar manner, S may also be calculated using the weight values of pixels P in sub-blocks 146 and 148 (FIG. 11), respectively₀₁And S₀₂The value of (c). But because of the sub-sampled pixel S₀₀,S₀₁And S₀₂Are located at different horizontal positions in their respective sub-blocks 144,146 and 148, respectively, and are therefore used to calculate S₀₀,S₀₁And S₀₂The set of weights w for the values of (a) also differs. The values of the remaining sub-sampled pixels S are calculated in a similar manner.

Fig. 13B shows the low resolution block 142 of fig. 11 overlapping the high resolution block 140 of fig. 11, according to another embodiment of the invention. The main difference between the overlap of fig. 13A and the overlap of fig. 13B is that: in the overlap of fig. 13B, the sub-sampled pixel S is shifted to the left in the horizontal direction from its position in fig. 13A. Due to this shift, the pixel weight w is different from the weight used in fig. 13A. But with the exception of the difference in weight, the value of the sub-sampled pixel S is calculated in a similar manner as described above in connection with fig. 13A.

Fig. 14 shows the low resolution block 152 of fig. 12 overlapping the high resolution block 150 of fig. 12, according to one embodiment of the invention. The sub-sampled pixel S is located at the same position as the pixel in fig. 13a, and therefore its horizontal weight is exactly the same as that of the pixel in fig. 13 a. However, because pixel P is interleaved with sub-sampled pixel S, pixel S is not in row 0 of sub-block 154 (i.e., P)₀₀,P₀₁,P₀₂And P₀₃) And line 1 (i.e., P)₂₀,P₂₁,P₂₂And P₂₃) In the middle of (1). The pixels P of row 0 are therefore weighted more heavily than the individual pixels P of row 1. For example, in one embodiment, pixel P₀₀,P₀₂,P₀₃,P₂₀,P₂₂And P₂₃Are w =0, so that they are paired with S₀₀The value of (A) does not contribute, and P₀₁Is weighted by the value of (A) P₂₁The values of (a) are weighted more heavily. For example, S₀₀May be at a value of P₀₁And P₀₂Are calculated by means of straight-line interpolation, i.e. bilinear filtering.

The techniques described above in connection with fig. 13A,13B and 14 may be used to calculate the luminance and chrominance values of the sub-sampled pixel S.

Referring to fig. 10 and 15A, the variable length decoder 114 provides a block 160 of transform values (shown as dots) representing the encoded non-interlaced image to the select and inverse zigzag circuit 124. The circuit 124 selects and uses only the sub-block 162 of transformed values to generate the value of the non-interleaved subsampled pixel S of figures 11,13A and 13B. Because the circuit 110 decodes and down-converts the received image to a lower resolution form, the inventors have discovered that much of the encoded information, i.e., many of the converted values, can be removed before the inverse DCT and sub-sampling circuit 128 decodes and down-converts the encoded macroblock. Deleting such information can greatly reduce the processing power and processing time required for the decoder 110 to decode and convert the encoded image. More specifically, the low resolution version of the image lacks the details of the high resolution version, and the details of the image block are represented by the higher frequency transformed values in the corresponding transformed block. These higher frequency transform values are located in the lower right quadrant of the transform block. Conversely, the lower frequency transform values are located in the upper left quadrant, which corresponds to sub-block 162. Thus, by discarding the remaining 48 higher frequency transform values in block 160 with the 16 lower frequency transform values in sub-block 162, circuit 128 does not waste processing power and processing time in incorporating the higher frequency transform values into the decoding and down-conversion algorithms. Since these discarded higher frequency transform values do not contribute much or at all to the quality of the low resolution version of the decoded image, the discarding of these transform values has little or no effect on the quality of the low resolution version.

Fig. 15B is a block 164 of transform values and a sub-block 166 of transform values representing the encoded interleaved image, which sub-block 166 is used by the circuit 124 to generate the values of the interleaved subsampled pixels S of fig. 12 and 14. The inventors have found that the transformed values in sub-block 166 may give good decoding and down-conversion results. Because the sub-block 166 is not in the matrix form, the inverse zigzag scan pattern of circuit 124 may be modified so that circuit 124 may scan the transformed values of sub-block 166 into a matrix form such as a 4 x 4 matrix.

Referring to fig. 10-15B, the mathematical details of the decoding and sub-sampling algorithms performed by the decoder 110 are discussed below. For purposes of example, these algorithms are discussed in the context of operating on sub-blocks of the non-interleaved block 57 of luminance values Y (fig. 5), where the sub-blocks are identical to the sub-block 162 of fig. 15B.

For an 8 x 8 block of transform values f (u, v), the inverse dct (idct) transform is:where F (x, y) is the IDCT value, i.e., the pixel value at the x, y position of the IDCT matrix of 8 x 8. Constant C_uAnd C_vTheir particular values are known to be unimportant to the discussion herein. Equation 1 can be written in matrix form:where P (x, Y) is the pixel value being calculated, matrix Y_DCTIs the transformed value Y_DCT(u,v)For the corresponding block of decoded pixel values to which P (x, y) belongs, and matrix D (x, y) is a matrix of constant coefficients representing the values to the left of equation (1) rather than the transformed values f (u, v). Therefore, each pixel value P (x, y) can be solved by equation (2). Y is_DCTRemains unchanged and D (x, y) is a function of x and y, and P is different for each pixel value calculated.

The one-dimensional IDCT algorithm is represented by the following equation:where F (x) is a single-line reverse transformed value, and f (u) is a single-line transformed value. In matrix form, equation (3 can be written as:

wherein each decoded pixel value P is equal to the transformed value Y_DCT0-Y_DCT7The inner product of this row with each row of matrix D. I.e. for example P₀=[Y_DCT0,…,Y_DCT7]·[D₀₀,…,D₀₇]And so on. Thus, in general, in a one-dimensional case, the pixel value Px can be derived according to the following formula:

Pi=Y_DCT·D_i5) wherein D is_iIs the ith row of the matrix D of equation (4). Now as described above in connection with fig. 11, the values of a large number of pixels in the first and second rows of sub-block 144 are combined to produce sub-sampled pixel S₀₀. However, let us now assume that only 0 rows of pixels P exist and that only one row of subsampled pixels S exists₀,S₁And S₂Will be calculated. Applying the one-dimensional IDCT of equations (4) and (5) to a single row, e.g., row 0, we can obtain the following equations:wherein S is_ZIs a sub-sampled pixel value, W_iIs a pixel P_iThe weight factor of the value, i =0-n represents the pair S_ZHas a value of (d) has a contribution to the location of the particular pixel P within the row. For example, still assuming that only 0 rows of pixels P are present in sub-block 144, then for S₀We can obtain the following equation:wherein for i = 0-3P_IAre respectively equal to P₀,P₁,P₂And P₃. Now substituting formula (5) for P, we can obtain the following formula:wherein for i =0-n, R_ZIs equal to w_i·D_i. Therefore, we derive a one-dimensional formula that can be used to sub-sample the pixel value S_ZWith a matrix Y of corresponding one-dimensional transform values_DCTAnd each row coefficient D_iAre directly linked. That is, the formula eliminates the need to first calculate P_iThe value can be calculated as S_ZThe value of (c).

Now, referring to two-dimensional equations (1) and (2), equation (5) can be generalized to two dimensions as follows:

P_x,y=D_x,y*Y_DCT=D_x,y(0,0)·Y_DCT(0,0)…+D_x,y(7,7)·Y_DCT(7,7)9) where the asterisks indicate the inner product between the matrices. The inner product means the matrix D_X,YEach element of (2) and matrix Y_DCTAre multiplied, the sum of these products being equal to P_X,YThe value of (c). Equation (8) can also be converted to two dimensions as follows:

thus, the matrix R_YZIs a matrix D of weighting factors from i =0-n_iThe sum of (1). For example, referring again to FIG. 11, the sub-sampled pixel S₀₀The value of (d) can be given by:wherein i =0-7 corresponds to the value P, respectively₀₀,P₀₁,P₀₂,P₀₃,P₁₀,P₁₁,P₁₂And P₁₃. Thus, the circuit 124 of FIG. 10 can calculate the subsampled pixel S directly from the transformed values and the associated transform coefficient matrix₀₀The value of (c). Therefore, the circuit 124 does not need to perform this intermediate step of conversion to the pixel value P.

Equation (11) can be further simplified because only 16 transform values in sub-block 162 are used in equation (11) as described above in connection with fig. 15A. Since we do the inner product, the matrix R_YZOnly 16 elements corresponding to the 16 transform values in sub-block 162 are required. This can reduce the amount of computation and processing time by about 1/4.

Since in the above example the matrix R_YZAnd Y_DCTThere are 16 elements in each, so the processor can perform inner product calculations with each matrix as a single-dimensional matrix having 16 elements. Alternatively, the matrix R is more efficient if the processing circuitry is more efficient at processing one-dimensional vectors each having four elements_YZAnd Y_DCTCan be transformed into four one-dimensional four-element vectors, such that the sub-sampled pixel S_YZThe value of (c) can be calculated using the four inner products. As described above with reference to FIG. 15B, the inverse zig-zag scanning algorithm of block 124 in FIG. 10 may be modified to place selected transform values into a valid matrix formula for an interlaced image or any sub-block of transform values that does not initially produce a valid matrix.

Referring to FIG. 16, in another embodiment of the present invention, the subsampled pixel S is calculated using a series of one-dimensional IDCT calculations rather than a single two-dimensional calculation_YZThe value of (c). More specifically, fig. 16 depicts the case where the series of one-dimensional IDCT calculations are performed on sub-blocks 162 of transform values. This technique may be used for other sub-blocks of transform values, such as sub-block 166 of fig. 15B. Since the general principles of such one-dimensional techniques are well known, they are not discussed in depth.

Next, according to one embodiment of the present invention, the weight value W for the subsampled samples discussed above in connection with FIGS. 11 and 13A_iThe calculation of (a) is discussed. As described above in connection with fig. 13A, because the sub-sampled pixel S₀₀-S₀₂In the very middle of the first and second rows of pixels P, the weight values W of the pixel values in the first row are the same as the weight values W of the respective pixel values in the second row. Thus, it is possible to provideFor 8 pixel values in the sub-block 144, we only need to calculate 4 weight values W. To compute such weights, a 4-tap (one tap per four pixel values) Lagrangian interpolator having fractional delays of 1,1-2/3 and 1-1/2, respectively, corresponding to subsampled pixel values S, is introduced in one embodiment₀₀-S₀₂. In one embodiment, the weight value w is assigned according to the following formula: w₀=-1/6(d-1)(d-2)(d-3) 12)W₁=1/2(d)(d-2)(d-3) 13)W₂=-1/2(d)(d-1)(d-3) 14)W₃=1/6(d)(d-1)(d-2) 15)

Referring to FIG. 13A, the first set of two delays 1 and 1-2/3 corresponds to the subsampled pixel value S₀₀And S₀₁. More specifically, these delays indicate the sub-sampled pixel S₀₀And S₀₁The position associated with the leftmost pixel P in the first group, i.e., each of sub-blocks 144 and 146 (fig. 11) of pixels P. For example, due to S₀₀And P₀₁And P₀₂Aligned so that it is aligned with the first group of pixels P₀₁And P₀₂An interval D of 1 pixel in the horizontal direction_ph. Thus, when the delay value 1 is substituted into equations 12-15, the only weight w that is not zero is w₁Which corresponds to the pixel value P₀₁And P₁₁. This is significant because the pixel S₀₀Directly with the pixel value P₀₁And P₁₁Aligned, the weight values of the other pixels P can be set to zero. Similarly, referring to FIGS. 11 and 13A, the sub-sampled pixel S₀₁And the first group of pixels P in sub-block 146₀₂And P₁₂An interval D of 1-2/3 pixels in the horizontal direction_ph. Because of the pixel S₀₁None of the pixels P is in a row, so the weight value w is not equal to 0. Thus, the pixel S which is sub-sampled is₀₁In particular, W₀Is a pixel P₀₂And P₁₂Weight value of value, W₁Is a pixel P₀₃And P₁₃Weight value of value, W₂Is a pixel P₀₄And P₁₄Weight value of value, W₃Is a pixel P₀₅And P₁₅Weight value of a value.

In one embodiment, the sub-sampled pixel S₀₂Is calculated differently from the subsampled pixel S₀₀And S₀₁Is calculated. To make the design of Lagrangian filters more ideal, S₀₂The delay of (c) is preferably 1-1/3. On the contrary, if the delay is equal to S₀₀And S₀₁Is calculated in the same way, then due to S₀₂And the first group of pixels P in the sub-block 148₀₄Interval D of 2-1/3 pixels_phAnd therefore the delay should also be 2-1/3. But to be able to use the most ideal delay 1-1/3, we calculate this delay as if it were pixel P₀₅And P₁₅For the first group of pixels in the sub-block 148, then add two dummy pixels P₀₈And P₁₈Are respectively given with P₀₇And P₁₇The same value. Thus, the weight function w₀-w₃Respectively correspond to the pixels P₀₅,P₁₅,P₀₆；P₁₆,P₀₇；P₁₇(ii) a And a virtual pixel P₀₈And P₁₈. Although for calculating S₀₂The technique of delaying may not be as accurate as we use delays 2-1/3, but the increase in lagrangian filter efficiency caused by using delays 1-1/3 compensates for this potential inaccuracy.

In addition, as described above, because all the sub-sampled pixels S₀₀-S₀₂Are located exactly in the middle of the 0 and 1 lines of pixels P, so a factor of 1/2 may be included in each weight value to effectively average the weight values of the pixels P in the 0 and 1 lines. Of course, if the pixel S is sub-sampled₀₀-S₀₂Not in the very middle of the 0 and 1 rows of pixels P, a second lagrangian filter may be supplemented in the vertical direction in a similar manner as described above in the horizontal direction. Alternatively, the horizontal and vertical Lagrangian filters may be combined into one two-dimensional Lagrangian filter.

Referring to fig. 12 and 14, for the interleaved block 150, the sub-sampled pixel S₀₀-S₀₂Located 1/4 down between 0 and 2 rows of pixels in the vertical direction. Thus, in addition to multiplying by the respective weighting functions, the values of the pixels P in the respective sub-blocks may be bilinearly weighted, i.e., the pixel values in row 0 are weighted 3/4 in the vertical direction and the pixel values in row 2 are weighted 1/4 in the vertical direction, thereby resolving uneven vertical alignment. Alternatively, if the subsampled pixels S and P between blocks are not in a fixed vertical alignment, then a Lagrangian filter is used in the vertical direction.

The techniques described above for calculating the subsampled pixel S values may be used to calculate the luminance and chrominance values of the pixel S.

Referring to fig. 17, motion compensation performed by the decoder 110 of fig. 10 will be discussed in accordance with one embodiment of the present invention. For purposes of example, assume that the encoded form of the image is non-interlaced and includes an 8 x 8 block of transform values, and further assume that the circuit 124 of fig. 10 decodes and down-converts these encoded blocks into a 4 x 3 block of sub-sampled pixels S, such as the block 142 of fig. 11. In addition, assume that the encoded motion vector has a resolution of 1/2 pixels in the horizontal direction and 1/2 pixels in the vertical direction. Because the low resolution version of the image has 3/8 for the high resolution version of the image, 1/2 for the vertical resolution, the scaled motion vectors from circuit 134 (fig. 10) have 3/8 × 1/2= (3/16) D for the horizontal resolution_shThe vertical resolution is 1/2 × 1/2= (1/4) D_sv. Thus, the delay of the horizontal component is a multiple of 1/16 and the delay of the vertical component is a multiple of 1/4. It is also assumed that the motion vector being encoded has a value of 2.5 in the horizontal direction and a value of 1.5 in the vertical direction. Thus, the scaled motion vector in the example is equal to 2-1/2 × 3/8=15/16 in the horizontal direction and 1-1/2 × 1/2=3/4 in the vertical direction. Thus, such scaled motion vectors point to the matching macroblock 170, whose pixel S is represented by an "x".

But the pixels of block 170 are not aligned with pixels S (represented by dots) of reference macroblock 172. Reference block 172 phase-specific matchingThe block 170 is large so that it includes an area where the block 170 can fall. For example, a pixel S₀₀Can fall on the reference pixel S_I,S_J,S_MAnd S_NOr anywhere in between. Thus, in a similar manner to the processing of the pixels S of the block 142 (fig. 11) described above, each pixel S of the matching block 170 is calculated from the weight values of the respective pixels S in the filter block 174, the block 174 including the blocks 170 and 172. In the illustrated embodiment, each pixel S of the block 170 is calculated from a sub-block of 4 × 4=16 pixels in the filter block 174. For example, S₀₀Is calculated from the 16 pixel weight values in the sub-block 176 of the filter block 174.

In one embodiment, one per (1/16) D is used in the horizontal direction_shFour-tap polyphase finite impulse response FIR filter with one delay (e.g., one Lagrangian filter), one per (1/16) D used in the vertical direction_svA four tap FIR filter with one delay. Thus, one would think that the two filters could be combined into a set of 16 x 4=64 two-dimensional filters for each individual phase in the horizontal and vertical directions. In this example, the pixel S₀₀Located in the horizontal direction from the first column of pixels (i.e., S) in the sub-block 176_a,S_h,S_lAnd S_q)(1-15/16)D_shThe level contributions for each weight value w may be calculated in a similar manner as described above in connection with fig. 13A. Likewise, the pixel S₀₀The first row of pixels (i.e., S) in the distance sub-block 176 in the vertical direction_a-S_d)(1-3/4)D_svThe vertical contributions for each weight value w are calculated in a similar manner to that used for calculating the horizontal contributions. The horizontal and vertical contributions are then combined to obtain a weighting function for each pixel in sub-block 176 associated with S00, from which S can be calculated₀₀The value of (c). The values of the other pixels S in the matching block 170 are also calculated in a similar manner. For example, a pixel S₀₁Is calculated using the weight values of the pixels in sub-block 178, pixel S₁₀Is calculated using the weight values of the pixels in sub-block 180.

Thus, all the motion compensated pixels S₀₀-S₇₅The pixel values of the matched block 170 may be calculated using a total of 552 MACS for 4 products (MACS) × 6 pixels per row by 11 rows (in the filter block 174) =264 MACS for horizontal filtering and 4 products (MACS) × 8 pixels per column by 11 columns =288 MACS for vertical filtering. With vector image processing circuitry operating on 1 × 4 vector elements, we can decompose horizontal filtering into 264/4=66 1 × 4 inner products, and can decompose vertical filtering into 288/4=72 1 × 4 inner products.

Referring to fig. 10 and 15, once the motion compensation circuit 136 has calculated the values of the pixels in the matched block 170, the adder 130 adds these pixel values to the respective residuals from the inverse DCT and subsampling circuit 128 to generate the decoded low resolution version of the image. The decoded macroblock is then provided to frame buffer 132 for display on HDTV receiver/display 138. If the decoded macroblock is part of a reference frame, it may be provided to a motion compensator 136 for use in decoding another motion predicted macroblock.

Motion decoding of chrominance values of pixels may be performed in the same manner as described above. Alternatively, because the human eye is less sensitive to color variations than to luminance variations, one can still achieve good results using bilinear filtering rather than the more complex Lagrangian technique described above.

In addition, as described above in connection with fig. 7, some motion-predicted macroblocks have motion vectors that point to matching blocks in different frames, respectively. In this case, the values of the pixels in each matching block are calculated as described above in connection with fig. 16, and then averaged before adding the remainder to generate the decoded macroblock. Alternatively, one can reduce processing time and bandwidth by decoding a macroblock using only one matching block. It was found that this approach can produce frames of acceptable quality and also greatly reduce decoding time.

From the foregoing it will be appreciated that, although a specific embodiment of the invention has been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. For example, although the down-conversion of an image is discussed for display on a lower resolution display screen, the above techniques have other applications, for example, the techniques may be used to down-convert one image for display within another image, commonly referred to as a picture-in-picture (PIP) display. In addition, although the decoder 110 of fig. 10 is depicted as including a number of circuits, the functions of these circuits may be performed by one or more conventional or special purpose processors, or may be performed by hardware.

Claims

1. An image processing circuit comprising:

a processor for receiving an encoded portion of a first version of an image, the first version having a resolution;

the encoded portion is directly converted to a decoded portion of a second form of the image, the second form having a different resolution than the first form.

2. The image processing circuit of claim 1 wherein the resolution of the second version of the image is lower than the resolution of the first version of the image.

3. The image processing circuit of claim 1, wherein:

the encoded portion of the first version of the image is represented by transform values;

the decoded part of the second version of the image is represented by pixel values.

4. An image processing circuit comprising:

a processor for receiving a first set of transform values representing a portion of a first form of an image;

selecting a second set of pixel values from the first set of transform values, the second set of pixel values being fewer in number than the first set;

the second set of transform values is directly converted to a first set of pixel values representing a portion of a second version of the image having fewer pixels than the first version of the image.

5. The image processing circuit of claim 4 wherein each transform value in the first set comprises a respective discrete cosine transform value.

6. The image processing circuit of claim 4, wherein:

the first set of transform values comprises an 8 x 8 block of transform values, the block having 4 quadrants; and

the second set of transform values consists of transform values from one quadrant of the block.

7. The image processing circuit of claim 4, wherein:

the first set of transform values comprises an 8 x 8 block of transform values having an upper left quadrant; and

the second set of transform values consists of transform values from the upper left quadrant of the block.

8. The image processing circuit of claim 4, wherein:

the first set of transform values comprises a block of 8 rows by 8 columns of transform values; and

the second set of transform values consists of the first three transform values from each of the first four rows of the block and the first transform value from each of the last four rows of the block.

9. The image processing circuit of claim 4, wherein:

the image comprises one video frame:

the portions of the first form of the video frame are non-interlaced; and

the portions of the second form of the video frame are non-interlaced.

10. The image processing circuit of claim 4, wherein:

the image comprises one video frame:

portions of the first form of the video frame are interlaced; and

the portions of the second form of the video frame are interlaced.

11. The image processing circuit of claim 4, wherein:

the first form of the image is 1920 pixels wide by 1088 pixels high; and

the second form of the image is 720 pixels wide by 544 pixels high.

12. The image processing circuit of claim 4, wherein:

the first set of transformed values represents a second set of pixel values, which represents a first version of the portion of the image; and

the processor may be operative to convert the second set of transform values directly to the first set of pixel values by mathematically combining transform coefficients associated with the second set of pixel values.

13. The image processing circuit of claim 4, wherein:

the processor may be operable to convert the second set of transform values directly into the first set of pixel values by:

weighting the transform coefficients associated with the second set of pixel values; and

the weighted transform coefficients are mathematically combined.

14. The image processing circuit of claim 4, wherein:

the corresponding weighted transform coefficients are accumulated.

15. The image processing circuit of claim 4, wherein:

each transform value in the first set of transform values comprises a discrete cosine transform value;

weighting the inverse discrete cosine transform coefficients associated with the second set of pixel values;

accumulating the corresponding weighted coefficients;

the second set of transform values and the weighted coefficients are mathematically combined according to an inverse discrete cosine transform algorithm.

16. An image processing circuit comprising:

a processor operable to modify a motion vector associated with a first modality portion of a first image;

identifying a second image portion to which the modified motion vector points, the second image having a different resolution than the first version of the first image;

a second form portion of the first image is generated from the identified second image portion, the second form portion of the first image having the same resolution as the second image.

17. The image processing circuit of claim 16 wherein the second image has a lower resolution than the first version of the first image.

18. The image processing circuit of claim 16, wherein:

the motion vector is compatible with a first form of the first image; and

the processor may be operable to modify the motion vector to be compatible with the second image.

19. An image processing circuit comprising:

a processor operable to modify a motion vector associated with a portion of the first version of the first image to be compatible with a second image having a different resolution than the first version of the first image;

identifying a second image portion to which the modified motion vector points;

converting a first set of residuals representing a first form portion of the first image to a second set of residuals representing a second form portion of the first image, the second form having a same resolution as the second image;

the second set of residuals and pixel values representing the identified portion of the second image are mathematically combined to generate pixel values representing the portion of the first image in the second form.

20. The image processing circuit of claim 19 wherein the second image and the second version of the first image have a lower resolution than the first version of the first image.

21. The image processing circuit of claim 19, wherein:

the second image and the second form of the first image have a lower resolution than the first form of the first image;

the second set of residuals is less than the first set of residuals.

22. The image processing circuit of claim 19 wherein the processor is operable to modify the motion vector by multiplying the motion vector by a scaling factor between the first version of the first image and the second image.

23. The image processing circuit of claim 19 wherein the modified motion vector has a resolution in at least one dimension of less than 1/2 pixels.

24. The image processing circuit of claim 19 wherein the processor is operable to calculate pixel values representing the identified portion of the second image.

25. The image processing circuit of claim 19 wherein the processor is operable to mathematically combine by adding each remainder in the second set to a respective pixel value representing the identified portion of the second image.

26. The image processing circuit of claim 19 wherein the pixel values representing the identified portion of the second image correspond to interpolated pixels that are offset from actual pixels of the second image.

27. The image processing circuit of claim 19 wherein the processor is operable to convert the first set of residuals by:

selecting a first set of transform values from a second set of transform values representing the first set of residuals, the first set of transform values being smaller than the second set of transform values:

the first set of transform values is directly converted to a second set of residuals.

28. A method, comprising:

receiving an encoded portion of a first version of an image, the first version having a resolution;

a decoding section for directly converting the encoded portion into an image of a second form, the second form having a resolution different from the resolution of the first form.

29. The method of claim 28, wherein the resolution of the first form of image is higher than the resolution of the second form of image.

30. The method of claim 28, wherein:

receiving comprises receiving transformed values representing a first form of the encoded portion of the image; and

the converting includes converting the transformed values to pixel values representing a decoded portion of the second form image.

31. A method, comprising:

receiving a first set of transform values representing a first form of an image portion;

selecting a second set of transform values from the first set, the first set being smaller than the second set;

the second set of transform values is directly converted to a first set of pixel values representing the image portion in a second form, the second form having fewer pixels than the first form.

32. The method of claim 31, wherein:

the image is composed of video frames;

the video frame portions of the first form are non-interlaced;

the video frame portions of the second form are non-interlaced;

the first set of transform values comprises an 8 x 8 block of transform values, the block having an upper left quadrant; and

the second set of transform values consists of transform values in the upper left quadrant of the block.

33. The method of claim 31, wherein:

the image is composed of video frames;

the video frame portions of the first form are interlaced;

the video frame portions of the second form are interlaced;

the first set of transform values comprises a block of 8 rows by 8 columns of transform values;

34. The method of claim 31, wherein:

the first set of transformed values representing a second set of pixel values representing the image portion in the first form;

the conversion comprises the following steps: the transform coefficients associated with respective sub-groups of pixel values of the second set of pixel values are mathematically combined to produce each pixel value of the first set.

35. The method of claim 31, wherein:

the conversion comprises the following steps:

weighting the sets of transform coefficients associated with respective sub-sets of pixel values of the second set of pixel values;

the weighted transform coefficients in each set of transform coefficients are mathematically combined.

36. The method of claim 31, wherein:

the conversion comprises the following steps:

the sum of the corresponding weighted transform coefficients in each set of transform coefficients is calculated.

37. The method of claim 31, wherein:

the conversion comprises the following steps:

weighting each set of inverse discrete cosine transform coefficients associated with each sub-set of pixel values of the second set of pixel values;

calculating a sum of corresponding weighted coefficients in each set of coefficients to produce sets of summed coefficients; and

the second set of transformed values and the sets of summed coefficients are mathematically combined according to an inverse discrete cosine transform algorithm.

38. A method, comprising:

modifying a motion vector associated with a first modality portion of a first image;

a second version portion of the first image is generated from the identified second image portion, the second version of the first image having the same resolution as the second image.

39. The method of claim 38, wherein the second image has a lower resolution than the first version of the first image.

40. The method of claim 38, wherein modifying comprises modifying a motion vector from being compatible with a first form of a first image to being compatible with a second image.

41. A method, comprising:

modifying the motion vector associated with the first version of the portion of the first image to be compatible with a second image having a different resolution than the first version of the first image;

identifying a second image portion to which the modified motion vector points;

the second set of residuals and pixel values representing the identified portion of the second image are mathematically combined to generate a second form portion pixel value representing the first image.

42. The method of claim 41, wherein the second image and the second version of the first image have a lower resolution than the first version of the first image.

43. The method of claim 41, wherein:

the second image and the second version of the first image have a lower resolution than the first version of the first image.

The second set of residuals is less than the first set of residuals.

44. The method of claim 41, wherein modifying comprises: the motion vector is multiplied by a scaling factor between the first version of the first image and the second image.

45. The method of claim 41, wherein modifying comprises modifying the motion vector to have a resolution of less than 1/2 pixels in at least one dimension.

46. The method of claim 41, further comprising calculating pixel values representative of the identified portion of the second image.

47. A method as claimed in claim 41, in which mathematically combining comprises summing each remainder in the second set with a respective pixel value representing the identified portion of the second image.

48. The method of claim 41, further comprising computing pixel values representing the identified portion of the second image by interpolating between actual pixels of the second image.

49. The method of claim 41, wherein converting comprises:

selecting a first set of transform values from a second set of transform values representing the first set of residuals, the first set of transform values being smaller than the second set of transform values;