HK1067266A

HK1067266A - Interframe encoding method and apparatus

Info

Publication number: HK1067266A
Application number: HK04110243.0A
Authority: HK
Inventors: A．C．厄维尼; V．R．拉维恩德兰
Original assignee: 高通股份有限公司
Priority date: 2001-06-07
Filing date: 2002-06-06
Publication date: 2005-04-01

Description

Method and device for interframe coding

Technical Field

The present invention relates to digital signal processing, and more particularly to a lossless method of encoding digital image information.

Background

Digital image processing has a very prominent position in the main disciplines of digital signal processing. The importance of human vision has generated tremendous interest and development in digital image processing technology and science. In the field of transmission and reception of video signals, for example, some suitable for projection film or movies, various improvements have been made to image compression techniques. Many currently used and planned video systems employ digital encoding techniques. Various aspects of the field relate to encoding of images, restoration of images, and feature selection of images. Image coding refers to attempting to transmit pictures of a digital communication channel in an efficient manner, using as few bits as possible to reduce the required frequency bandwidth, while keeping distortion within certain limits. Image restoration refers to an effort to restore the true image of the target. Encoded images transmitted over a communication channel may be distorted by various factors. The original source of degradation occurs in the generation of an image from a target. The selection of a feature refers to the selection of certain attributes in the picture. These attributes are necessary in recognition, classification, and decision making in a wider context.

Digital encoding of video, such as in a digital camera, is an area that benefits from improved image compression techniques. Digital image compression can generally be divided into two categories: lossless methods and lossy methods. A lossless image is an image that is restored without any loss of information. Lossy methods include the loss of some information that is unrecoverable, depending on the compression ratio, the quality of the compression algorithm, and the implementation of the algorithm. In general, lossy compression methods are believed to achieve the compression ratios required for cost-effective digital cinema methods. To achieve the quality level of digital cinema, the compression method should have a visually lossless performance level. As such, although there is a mathematical loss of information as a compression process, under normal viewing conditions, the image distortion caused by such loss should not be observable by the viewer.

Existing digital image compression techniques have been developed for other applications, typically television systems. This technique has made design compromises to suit the intended application, but these approaches do not meet the quality requirements required for cinema presentations.

Digital cinema compression techniques should have the visual quality that is originally experienced by people who regularly watch movies. Ideally, the visual quality of digital cinema should be tried to exceed that of high quality release print film. At the same time, the compression technique should have a practical high coding efficiency. As defined herein, coding efficiency refers to the bit rate required for a compressed image quality suitable for satisfying a certain quality level.

Typical video compression techniques are based on differential pulse code modulation (PDCM), Discrete Cosine Transform (DCT), Motion Compensation (MC), entropy coding, separate compression, and wavelet transform. A compression technique that provides a significant level of compression while maintaining a desired level of quality for video signals employs adaptively sized blocks and sub-blocks of encoded DCT coefficient data. This technique is hereinafter referred to as the Adaptive Block Size Differential Cosine Transform (ABSDCT) method.

One key aspect of video compression is the similarity between adjacent frames in a sequence. One prominent prior art in this area is motion compensation, as in MPEG. The motion compensation is performed to encode the picture using incomplete predictions from adjacent frames in the sequence. Such prediction and/or compensation schemes introduce errors between the original source and the decoded video sequence. Often this is the case. These errors can increase to unacceptable levels and cause undesirable problems in high quality applications. For example, in the compressed material of the Moving Picture Experts Group (MPEG), motion artifacts are often observable. Motion artifacts refer to the effect of previous or following frames that can be seen in the current frame, or ghosts. The artifacts of such motion also make video editing on a frame-by-frame basis a difficult task. Accordingly, there is a need for an inter-coding scheme that overcomes the shortcomings of current inter-coding techniques and reduces visual defects such as motion artifacts.

Disclosure of Invention

Embodiments of the present invention disclose a method of inter-frame coding that effectively increases the compression gain provided by using any transform-based compression technique without introducing any additional distortion. Such methods, referred to herein as delta encoders or delta encoding processes, reveal spatial and temporal redundancy of video sequences in the frequency domain. That is, delta encoders reveal that a sequence has a high degree of time-domain correlation if there is only a small change in the sequence from one frame to the next. As such, the transform domain property maintains a very significant coherence between adjacent frames in the video sequence.

In a system adapted to encode digital video, a method of inter-frame coding is discussed. The digital video comprises a fixed frame and at least one subsequent frame. Each fixed frame and each subsequent frame comprises a plurality of pixel elements. The plurality of pixels of the fixed frame and each subsequent frame may be converted from elements of the pixel domain to elements of the frequency domain. The elements of the frequency domain are quantized to emphasize those elements that are more sensitive to the human visual system and de-emphasize those elements that are not. A difference is determined between each quantized frequency domain element of the fixed frame and a corresponding quantized frequency domain element of each subsequent frame. In one embodiment, a fixed frame is associated with a predetermined number of subsequent frames. In another embodiment, the anchor frame is associated with the subsequent frame until the correlation characteristic between the subsequent frame and the anchor frame reaches an unacceptable level. In yet another embodiment, a rolling fixed frame is employed.

It is, therefore, a feature and advantage of the present invention to enable efficient encoding of image data.

Another feature and advantage of the present invention is to reduce the effects of motion artifacts.

Drawings

The features, objects, and advantages of the present invention will be more clearly understood from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings. Like reference numerals designate corresponding parts throughout the drawings, in which:

FIG. 1 is a block diagram of an image processing system incorporating the variance based block size assignment system of the present invention and method thereof;

FIG. 2 is a flow chart illustrating the processing steps involved in variance-based partition size assignment;

FIG. 3 is a flow chart illustrating the processing steps involved in inter-frame encoding;

fig. 4 illustrates a flow chart of the processing steps involved in delta encoder operation.

Description of the preferred embodiments

In order to facilitate digital transmission of digital signals and enjoy the benefits associated therewith, some form of signal compression is required. It is also important to maintain a high quality of the image in order to obtain a high definition in the final image. Furthermore, computational efficiency is required to meet small hardware implementations, which is important in many applications.

In one embodiment, the image compression of the present invention is based on Discrete Cosine Transform (DCT) techniques. Generally, the images to be processed in the digital domain are composed of pixel data, and these images may be divided into a series of non-overlapping blocks, N × N in size. A two-dimensional DCT may be performed for each block. The two-dimensional DCT can be defined by the following relationship:

in the formula:and

x (M, N) is the pixel position (M, N) in an nxm block, and,

x (k, l) is the corresponding DCT coefficient.

Since the pixel values are non-negative, the DCT component X (0, 0) is always positive and has the largest energy. In fact, for a typical image, most of the transform energy is concentrated around the X (0, 0) component. This energy compaction property makes DCT an attractive compression method.

It will be appreciated that most natural images are composed of flat, relatively slowly varying regions, and heavily varying regions such as object boundaries and high contrast textures. The contrast adaptive coding scheme may take advantage of this by allocating more bits to busy regions and less bits to less busy regions. This technique is disclosed in U.S. patent 5,021,891 entitled "adaptive block size image compression method and system," which is assigned to the assignee of the present invention and is incorporated herein by reference. DCT techniques are also disclosed in U.S. patent 5,170,345 entitled "adaptive block size image compression method and system," which is assigned to the assignee of the present invention and is hereby incorporated by reference. Furthermore, the use of the ABSDCT technique in combination with a differential quadtree transform technique is also disclosed in U.S. patent 5,452,104 entitled "adaptive block size image compression method and system," assigned to the present invention and incorporated herein by reference. The systems disclosed in these patents employ what is referred to as "intra" coding in which the coding of image data for each frame is independent of the content of any other frame. Using the ABSDCT technique, to a large extent, the data rate obtained is independent of the extent of discernable degradation in image quality.

Using ABSDCT, the video signal will typically be divided into blocks of pixels suitable for processing. For each block, the luminance and chrominance components are input to the interleaver of the block. For example, 16 x 16 (pixel) blocks may be provided to a block interleaver, which orders or organizes the image samples within each 16 x 16 block to produce data blocks and composite sub-blocks suitable for Discrete Cosine Transform (DCT) analysis. A DCT operator is a method of converting a time-sampled signal into a frequency representation of the same signal. By converting to a frequency representation, DCT techniques have been shown to have a very high degree of compression when the quantizer can be designed to take advantage of the frequency distribution characteristics of an image. In a preferred embodiment, one 16 × 16 DCT is used for the first ordering, four 8 × 8 DCTs are used for the second ordering, sixteen 4 × 4 DCTs are used for the third ordering, and sixty-four 2 × 2 DCTs are used for the fourth ordering.

For image processing purposes, the DCT operation is performed on pixel data that is divided into an array of non-overlapping blocks. It should be noted that although the block sizes discussed herein are in N × N size, it should be apparent that various other block sizes may be used. For example, an N × M block size may be used where N and M are both integers and M is either greater than or less than N. Another important aspect is that each block can be divided into at least one layer of sub-blocks, e.g., N/i × N/i, N/i × N/j, N/i × M/j, and so on, where i and j are integers. Further, the block exemplified herein is a 16 × 16 pixel block corresponding to a block of DCT coefficients and a sub-block. It should also be understood that various other integers, such as two integers that are both odd or even, may be used, for example, 9 x 9.

Generally, an image may be divided into blocks of pixels suitable for processing. The color signal can be converted from RGB space to YC₁C₂Space, where Y may be a luminance or brightness component, and C₁And C₂Is a chrominance or color component. Since the eye has only a low spatial sensitivity to color, many systems subsample C by a factor of 4 in the horizontal and vertical directions₁And C₂And (4) components. However, such sub-sampling is not required. Full resolution images, referred to as a 4: 4 format, are useful or necessary in some applications referred to as cover "digital cinema". Two possible YCs₁C₂The representation method comprises the following steps: YIQ representation and YUV representation, both of which are well known in the artProvided is a technique. It is also possible to use a variant of the YUV representation, called YCbCr.

Referring now to FIG. 1, FIG. 1 shows an image processing system 100 incorporating the present invention. The image processing system 100 includes an encoder 102 for encoding the received video signal. The compressed signal is transmitted or transmitted over a physical medium through a transmission channel 104 and received by a decoder 106. The decoder 106 decodes the received signal into image samples and then displays the samples.

In the preferred embodiment, Y, Cb and the Cr components are not sub-sampled. Thus, an input of a 16 x 16 pixel block is provided to the encoder 102. The encoder 102 may include a block size assignment element 108 that is used to perform block size assignments in preparation for video compression. The block size assignment element 108 determines the block decomposition of a 16 x 16 block based on the perceptual characteristics of the image in the block. Depending on the motion in the 16 x 16 block, the block size may sub-divide each 16 x 16 block into smaller blocks in a quadtree structure. The block size assignment element 108 generates quad-tree data, which may be referred to as PQR data, which may be between 1 and 12 bits in length. Thus, if the block size allocation determines that a 16 x 16 block needs to be subdivided, the R bit in the PQR data is set and followed by four additional bits of Q data corresponding to four subdivided 8 x 8 blocks. If the block size allocation determines that any of the 8 x 8 blocks needs to be subdivided, four additional bits of P data are added for each subdivided 8 x 8 block.

Referring now to FIG. 2, a flow diagram showing details of the operation of the block size assignment component 108 is provided. The algorithm uses the variance of one block as a metric to decide to subdivide one block. Beginning at step 202, a 16 x 16 block of pixels is read. At 204, the variance, v16, of the 16 x 16 block is calculated. The variance can be calculated using the following method:

in the formula: n ═ 16, and x_i，jIs the pixel in the ith row and jth column of the nxn block. If the average value of the block is between two predetermined values, the first variance threshold T16 is changed to provide a new threshold T '16, and the block variance is then compared to the new threshold T' 16, step 206.

If the variance v16 is not greater than the threshold T16, then at step 218, the starting address of the 16 x 16 block is written and the R bit in the PQR data is set to 0 to indicate that the 16 x 16 block is not subdivided. The algorithm then reads the next 16 x 16 pixel block. If the variance v16 is greater than the threshold T16, then R in the PQR data is set to 1 to indicate that the 16 x 16 block will be subdivided into four 8 x 8 blocks at step 210.

As shown in step 212, four 8 x 8 blocks are then considered, i ═ 1: 4, as a further subdivision. For each 8 x 8 block, at step 214, the variance v8 is calculated_i. If the average value of the block is between two predetermined values, the first variance threshold T8 is changed to provide a new threshold T' 8, and the block variance is then compared to the new threshold at step 216And (6) comparing.

If the variance v8_iNot greater than the threshold T8, then at step 218, the start address of the 8 x 8 block is written and the corresponding qbit, Q, bit is written_iIs set to 0. The next 8 x 8 block is then processed. If the variance v8_iGreater than the threshold T8, then at step 220, the corresponding qbit, Q, is asserted_iSet to 1 to indicate that the 8 x 8 block will need to be subdivided into four 4 x 4 blocks.

As shown in step 222, the four 4 x 4 blocks, J, are then considered_i1: 4 for further subdivision. For each 4 x 4 block, at step 224, the variance v4 is calculated_ij. If the average value of the block is between two predetermined values, the first threshold T4 is changed to provide a new threshold T' 4, and the block variance is compared to the new threshold at step 226.

If the variance v4_ijNot greater than the threshold T4, then at step 228, the address of the 4 x 4 block is written and the corresponding P bit, P, is written_ijIs set to 0. Subsequently, the next 4 × 4 block is processed. If the variance v4_ijGreater than the threshold T4, the corresponding P bits, P, are transmitted in step 230_ijSet to 1 to indicate that the 4 x 4 block is to be subdivided into four 2 x 2 blocks. Further, addresses of four 2 × 2 blocks are written.

The thresholds T16, T8, and T4 may be predetermined constants. This is called a hard decision. Alternatively, an adaptation and soft decision may be made. The soft decision may vary the threshold for variance based on the average pixel value of a 2N x 2N block, where N may be 8, 4, and 2. Thus, a function of the average pixel value may be used as a threshold.

For illustrative purposes, consider the following example. Let the predetermined variance threshold for the Y component be 50, 1100 and 880 for 16 x 16, 8 x 8 and 4 x 4 blocks, respectively. In other words, T16 is 50, T8 is 1100, and T4 is 880. Let the mean value set in the range of 80 and 100. Assume that the calculated variance for a 16 x 16 block is 60. Since both 60 and its mean value 90 are greater than T16, the 16 x 16 block is subdivided into four 8 x 8 sub-blocks. Assume that the computational variances for an 8 x 8 block are 1180, 935, 980, and 1210. Since two 8 x 8 blocks have a variance exceeding T8, the two blocks are further subdivided to produce a total of eight 4 x 4 sub-blocks. Finally, assume that the variances of the eight 4 x 4 blocks are 620, 630, 670, 610, 590, 525, 930, and 690, and the mean corresponding to the first four is 90, 120, 110, 115. Since the average values of the first 4 x 4 block are all within the range (80, 100), its threshold will be lowered to T' 200, which is less than 880. Therefore, the 4 × 4 block will be subdivided as the seventh 4 × 4 block.

It is noted that a similar procedure may be employed to assign C to color components₁And C₂The block size of (2). The color components may be decimated in the horizontal, vertical and both directions. Also, it is noted that although the allocation of block sizes has been discussed in a top-down fashion, where the largest block is estimated first (16 × 16 in the present invention), a bottom-up fashion may be used. The bottom-up approach will first estimate the smallest block (2 x 2 in the present invention).

Referring again to FIG. 1, other portions of image processing system 100 will be discussed. The PQR data, along with the selected block address, is provided to the DCT element 110. The DCT element 110 performs an appropriately sized discrete cosine transform on the selected block using the PQR data. Only the selected block needs to be DCT processed.

The image processing system 100 optionally includes DQT elements 112 for reducing redundancy between DC coefficients of the DCT. The DC coefficient can be found in the upper left corner of each DCT block. Generally, the DC coefficient is larger than the AC coefficient. This size discrepancy makes it difficult to design an efficient variable length encoder. Therefore, it is advantageous to reduce the redundancy between DC coefficients.

The DQT element 112 performs a 2-dimensional DCT on the DC coefficients, and takes 2 × 2 each time. Starting with a 2 × 2 block in a 4 × 4 block, a 2-dimensional DCT is performed once on four DC coefficients. This 2 x 2DCT is referred to as a differential quadtree transform or DQT of 4 DC coefficients. The coefficients of the DQT are then used, together with three adjacent DC coefficients in an 8 x 8 block, to calculate the DQT of the next stage. Finally, the DC coefficients of four 8 × 8 blocks in a 16 × 16 block may be used to calculate DQT. Thus, in a 16 x 16 block, there is a true DC coefficient, while the others are AC coefficients corresponding to DCT and DQT.

The transform coefficients (both DCT and DQT) are provided to quantizer 114 for quantization. In a preferred embodiment, the DCT coefficients are quantized using a Frequency Weighted Mask (FWM) and a quantization scale factor. FWM is a table that is frequency weighted in the same dimension as the block of input DCT coefficients. The frequency weighting uses different weights for different DCT coefficients. The weights are designed to emphasize input samples having frequency components that are more sensitive to the human visual system and de-emphasize samples having frequency components that are not as sensitive to the visual system. The weighting can also be designed based on the distance observed, etc.

Huffman codes (Huffman) can be designed based on measured and theoretical statistics of an image. It is observed that most natural images are composed of blank or relatively slowly varying regions, and busy regions such as object boundaries and high contrast textures. Huffman codes with frequency domain transforms (e.g., DCT) can take advantage of this performance by allocating more bits to busy areas and fewer bits to blank areas. Generally, a huffman code may use a look-up table to encode the run length and the non-zero value.

The weights may be selected based on empirical data. ISO/IECJTC1 CD 10918 issued by the international organization for standardization 1994, "part of digital compression and encoding of continuous tone still images 1: basic requirements and guidelines the design of the weighting flags for 8 x 8 DCT coefficients is discussed in the section "basic requirements and guidelines", the content of which is incorporated herein by reference. In general, two FWMs may be designed, one for the luminance component and the other for the chrominance component. The FWM table of block size bits 2 × 2, 4 × 4 can be obtained by decimation, and the FWM table of 16 × 16 can be obtained by interpolation of the FWM table of 8 × 8 blocks. The scale factors control the quantization coefficients and quality and bit rate.

Thus, the individual DCT coefficients may be quantized according to the following relationship:

in the formula: DCT (i, j) is the input DCT coefficients, fwm (i, j) is the frequency weighting mask, q is the scale factor, and DCTq (i, j) is the quantized coefficients. It is noted that the first term in parenthesis is rotated up and down according to the sign of the DCT coefficient. The DQT coefficients are also quantized using a suitable weighting mask. However, a plurality of tables and masks may be used and applied to each of Y, Cb and the Cr component.

The quantized coefficients may be provided to a delta encoder 115. The Delta encoder 115 may effectively increase the compression gain provided by any transform based compression techniques, such as DCT or ABSDCT, in a manner that does not add any other distortion or quantization noise. The Delta encoder 115 may be configured to determine non-zero coefficients in the form of coefficient differences between adjacent frames and losslessly encode the difference information. In another embodiment, the differential information may be encoded with a slight loss. Such embodiments are necessary in balancing the quality considerations related to space and/or speed requirements.

The delta-coded coefficients for the anchor frame and the corresponding subsequent frame may be provided to the zig-zag scan serializer 116. The serializer 116 scans the block of quantized coefficients in a zig-zag format to produce a serialized code-stream of quantized coefficients. It is also possible to select a plurality of different zigzag scanning patterns and other patterns than the zigzag. One embodiment uses an 8 x 8 block size as the zigzag scan, but other sizes such as 32 x 32, 16 x 16, 4 x 4, 2 x 2 or combinations of the above may be used.

Notably, the zig-zag scan serializer 116 may be disposed before or after the quantizer 114. The end result is the same.

In any case, the code stream of quantized coefficients is provided to a variable length encoder 118. The variable length encoder 118 may use run length encoding of zeros prior to encoding. This technology is discussed in detail in the previously mentioned U.S. Pat. Nos. 5,021,891, 5,107,345 and 5,452,104, which are incorporated herein. The run-length encoder takes the quantized coefficients and notes the run of consecutive coefficients from among the non-consecutive coefficients. The consecutive values may be referred to as run-length values and encoded. The non-consecutive values are encoded separately from each other. In one embodiment, the continuous coefficients are zero values and the non-continuous coefficients are non-zero values. Typically, the arbitrary length ranges from 0 to 63 and the size is an AC number from 1-10. An additional code is added at the end of the file code, so that there are a total of 641 possible codes.

The compressed image signal is typically generated by an encoder 102 and sent to a decoder 106 via a transmission channel 104. The PQR data, which may contain block size allocation information, is also provided to the encoder 106. The decoder 106 includes a variable length decoder 120 that can decode the values of the run length and the non-zero values.

Frequency domain methods, such as DCT, can transform a block of pixels into a new block of lower correlation and fewer transform coefficients. Such frequency domain compression schemes also employ knowledge of the distortion perceived in the image to improve the targeted performance of the coding scheme. Fig. 3 illustrates such a process of an interframe encoder 300. The data for the encoded frame is read originally into the system in the pixel domain 304. The encoded data for each frame is then divided into blocks of pixels 308. In one embodiment, the block size is variable and may be assigned using an adaptive block size discrete cosine transform (ABSDC) technique. The block size may vary depending on the amount of detail in a given region. Any block size may be used, for example, 2 × 2, 4 × 4, 8 × 8, 16 × 16, or 32 × 32.

The encoded data is then processed to convert the data from the pixel domain to elements 312 in the frequency domain. This involves the processing of the DCT and DQT as discussed in fig. 2. The processing of DCT/DQT is also discussed in pending U.S. patent application "apparatus and method for computing discrete cosine transforms using a butterfly processor" (filed 6.6.2001, serial No.: unknown, attorney docket No.990437), the contents of which are hereby incorporated by specific reference.

The encoded frequency domain elements 316 are then quantized. Quantization may involve the final block of encoded data in the frequency domain having few non-zero coefficients for encoding, depending on frequency weighting by contrast sensitivity before coefficient quantization. Corresponding blocks of encoded data in adjacent frames of the frequency domain typically have similar characteristics in both the location and pattern of zeros and the value of the coefficients. The quantized frequency elements are then delta encoded 320. The Delta encoder calculates a coefficient difference value applicable to non-zero coefficients between adjacent frames and losslessly encodes the information. Lossless encoding of the information is accomplished by serialization 324 and run-length amplitude encoding 328. In one embodiment, run-length amplitude encoding is followed by entropy encoding, such as huffman encoding. Serialization process 324 can be extended between frames of interest to achieve longer run lengths, further increasing the efficiency of the delta encoder. In one embodiment, a zigzag ordering is also employed.

Fig. 4 illustrates the operation of delta encoder 400. A plurality of adjacent frames may be considered as one first frame, or anchor frame, and a corresponding adjacent frame, or subsequent frame. First, a block 404 of elements in the frequency domain of a fixed frame is input. At 408, blocks corresponding to elements of the next and subsequent frames are also read. In one embodiment, the 16 x 16 block size used is irrelevant to the BSA's breakthrough of block sizes. However, this is only one expectation that any block size may be used.

In one embodiment, variable block sizes defined by BSA may be used. The difference between the corresponding elements of the anchor frame and the subsequent frame may be determined 412. In one embodiment, only the corresponding AC values in the blocks of the anchor frame and each subsequent frame are compared. In another embodiment, both the DC value and the AC value are compared. The subsequent frame may then be represented 416 with the difference result between the anchor frame and the subsequent frame, as long as the difference is associated with the appropriate anchor frame. One block after the other, compares all corresponding elements of the anchor frame and the subsequent frame and calculates the difference. Then, it is queried whether there is another subsequent frame 420. If so, the anchor frame is compared in the same manner with the next subsequent frame. The above process is repeated until the computation of the anchor frame and all related subsequent frames is completed.

In one embodiment, one fixed frame is associated with four subsequent frames, although it is contemplated that any number of frames may be used. In another embodiment, one fixed frame may be associated with N subsequent frames, where N depends on the correlation characteristics of the image sequence. In other words, a new anchor frame will be established once the calculated difference between an anchor frame and a given subsequent frame exceeds a specified threshold. In one embodiment, the threshold is predetermined. It has been found that: it is necessary to take into account the correlation between frames of approximately 95% of the balanced quality while maintaining an acceptable bit rate. However, this may vary depending on the material being processed. In another embodiment, the threshold may be configured to any degree of correlation.

In yet another embodiment, a rotating fixed frame is employed. Once the computation of the first subsequent frame is complete, the subsequent frame becomes a bit-new fixed frame 424 and a comparison of the frame with its neighboring frames is made. Thus, once the difference between a fixed frame and a subsequent frame is determined, the subsequent frame becomes the new fixed frame and is compared again. For example, if frame 1 is a fixed frame and frame 2 is a subsequent frame, the difference between frame 1 and frame 2 is determined in the manner discussed above. Frame 2 is compared again with frame 3 as a new fixed frame and the difference between the corresponding elements is calculated again. This process is repeated until all frames of material have passed.

The compression coding algorithms and methods employed in aspects of the embodiments are included in many compression and digital video processing schemes. Embodiments of the present invention may reside in a computer or application specific integrated circuit to perform compression and encoding of digital video. The algorithm itself may be executed in software or in a programmable manner or in dedicated hardware.

Referring again to fig. 1, the output of the variable length decoder 120 is provided to an inverse zigzag scan serializer 122 which orders the coefficients according to the scanning scheme employed. The inverse zig-zag scan serializer 122 may accept the PQR data to assist in properly ordering the coefficients into a composite block of coefficients.

The composite block is provided to an inverse quantizer 124 for removing additional processing due to the use of the frequency weighted mask. The final block of coefficients is then provided to an IDQT element 126, followed by an IDCT element 128 if a differential quadtree transform has been applied. Otherwise, the block of coefficients is provided directly to the IDCT element 128. The TDQT element 126 and the IDCT element 128 inverse transform the coefficients to produce a block of pixel data. The block of pixel data must then be interpolated, converted to RGB format, and then stored for further display.

As an example, the various illustrative logical blocks, flowcharts, and steps discussed in connection with the embodiments disclosed herein may be implemented or performed in hardware and software in Application Specific Integrated Circuits (ASICs), programmable logic devices, discrete gate or transistor logic, discrete hardware components (e.g., registers and FIFOs), processors capable of executing a set of middleware instructions, any conventional programmable software and processors, or any combination thereof. The processor may be a microprocessor, which may be any conventional processor, controller, microcontroller, or state machine. The software may reside in RAM memory, flash memory, ROM memory, registers, hard disk, a removable disk, a CD-ROM, a DVD-ROM, or any other form of storage medium known in the art.

The above discussion of the preferred embodiments is intended to enable any person skilled in the art to make and use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features as explained.

Claims

1. In a system adapted to encode digital video, the digital video comprising a fixed frame and at least one subsequent frame, the fixed frame and each subsequent frame comprising a plurality of pixel elements, a method of inter-frame coding, the method comprising:

converting a plurality of pixels in the fixed frame and each subsequent frame from pixel domain elements to frequency domain elements, the frequency domain elements being representable as DC elements and AC elements;

quantizing the frequency domain elements to emphasize those elements that are more sensitive to the human visual system and de-emphasize those elements that are less sensitive to the human visual system; and the number of the first and second groups,

a difference is determined between each quantized frequency domain element of the fixed frame and an associated quantized frequency domain element of each subsequent frame.

2. The method of claim 1, wherein the operation of converting is using a Discrete Cosine Transform (DCT).

3. The method of claim 2, wherein the operation of converting further comprises employing Discrete Quadtree Transformation (DQT).

4. The method of claim 1, wherein the act of quantizing further comprises weighting elements using a frequency weighting mask.

5. The method of claim 4, wherein the act of quantizing further comprises employing a quantizer step function.

6. The method of claim 1, wherein there are four subsequent frames compared to the anchor frame.

7. The method of claim 1, wherein only differences between AC quantized frequency domain elements are determined.

8. The method of claim 1, further comprising grouping the plurality of pixel elements into a 16 x 16 block size.

9. The method of claim 1, wherein the act of quantizing produces lossless frequency domain elements.

10. The method of claim 9, wherein the act of quantizing produces lossy frequency domain elements.

11. The method of claim 1, further comprising representing the subsequent frame as a difference between the quantized frequency domain elements of the fixed frame and corresponding frequency domain elements of the subsequent frame.

12. The method of claim 1, further comprising serializing the quantized frequency domain elements.

13. The method of claim 12, further comprising variable length coding the serialized quantized frequency domain elements.

14. In a system for encoding digital video, the digital video comprising a plurality of frames 1, 2, 3.

Converting the plurality of pixels in each frame from pixel domain elements to frequency domain elements, the frequency domain elements being representable in rows and columns;

determining a difference between a quantized frequency domain element of the first frame and a corresponding quantized frequency domain element of the second frame; and the number of the first and second groups,

the process of determining the difference between the quantized frequency domain elements of a subsequent frame is repeated such that the quantized frequency domain elements of each frame are compared with the quantized frequency domain elements of the frame immediately preceding it.

15. The method of claim 14, further comprising representing each of frames 2 through N as a difference between the quantized frequency domain elements of frames 2 through N and the corresponding frequency domain elements of frames 1 through N-1.

16. The method of claim 14, wherein the converting further employs a Discrete Cosine Transform (DCT).

17. The method of claim 16, wherein the converting further employs Discrete Quadtree Transformation (DQT).

18. The method of claim 14, wherein the quantization operation further comprises weighting elements using a frequency weighting mask.

19. The method of claim 18, wherein the quantization operation further comprises employing a quantizer step function.

20. The method of claim 14, wherein only differences between AC quantized frequency domain elements are determined.

21. The method of claim 14, further comprising grouping the plurality of pixel elements into a 16 x 16 block size.

22. The method of claim 14, wherein the act of determining produces lossless frequency domain elements.

23. The method of claim 14, wherein the act of determining produces lossy frequency domain elements.

24. The method of claim 14, further comprising representing the subsequent frame as a difference between the quantized frequency domain elements of the fixed frame and corresponding frequency domain elements of the subsequent frame.

25. The method of claim 14, further comprising serializing the quantized frequency domain elements.

26. The method of claim 25, further comprising variable length coding the serialized quantized frequency domain elements.

27. The method of claim 26, wherein the variable length coded serialized quantized frequency domain elements are huffman coded.

28. In a system for encoding digital video, the digital video including a fixed frame and at least one subsequent frame, the fixed frame and each subsequent frame including a plurality of pixel elements, an apparatus for forming inter-frame coding, the apparatus comprising:

means for converting the plurality of pixels in the fixed frame and each subsequent frame from pixel domain elements to frequency domain elements, and the frequency domain elements can be represented by DC elements and AC elements;

means for quantizing the frequency domain elements to emphasize those elements that are more sensitive to the human visual system and de-emphasize those elements that are not sensitive to the human visual system; and the number of the first and second groups,

means for determining a difference between each quantized frequency domain element of the fixed frame and a corresponding quantized frequency domain element of each subsequent frame.

29. The apparatus of claim 28, wherein the means for converting employs a Discrete Cosine Transform (DCT).

30. The apparatus of claim 29, wherein the means for converting further comprises employing Discrete Quadtree Transform (DQT).

31. The apparatus of claim 28, wherein the means for quantizing further comprises weighting elements using frequency weighting flags.

32. The apparatus of claim 31, wherein the means for quantizing further comprises employing a quantizer step function.

33. The apparatus of claim 28, wherein there are four subsequent frames to compare with a fixed frame.

34. The apparatus of claim 28, wherein the means for determining determines only differences between AC quantized frequency domain elements.

35. The apparatus of claim 28, further comprising means for grouping the plurality of pixel elements into a 16 x 16 block size.

36. The apparatus of claim 28, wherein the means for quantizing produces lossless frequency domain elements.

37. The apparatus of claim 36, wherein the means for quantizing produces lossy frequency domain elements.

38. The apparatus of claim 28, further comprising means for representing a subsequent frame as a difference between the quantized frequency domain elements of the fixed frame and corresponding frequency domain elements of the subsequent frame.

39. The apparatus of claim 28, further comprising means for serializing the quantized frequency domain elements.

40. The apparatus of claim 39, further comprising means for variable length encoding the serialized quantized frequency domain elements.

41. In a system for encoding digital video, the digital video comprising a plurality of frames 1, 2, 3.

Means for converting the plurality of pixels in each frame from pixel domain elements to frequency domain elements, the frequency domain elements being representable in rows and columns;

means for determining a difference between a quantized frequency domain element of the first frame and a corresponding quantized frequency domain element of the second frame; and the number of the first and second groups,

means for repeating the process of determining the difference between quantized frequency domain elements of a subsequent frame such that the quantized frequency domain elements of each frame are compared with the quantized frequency domain elements of the frame immediately preceding it.

42. The apparatus of claim 41, further comprising means for representing each of frames 2 through N as a difference between a quantized frequency domain element of frames 2 through N and a corresponding frequency domain element of frames 1 through N-1.

43. The apparatus of claim 41, further comprising means for representing a subsequent frame as a difference between a quantized frequency domain element of the fixed frame and a corresponding frequency domain element of the subsequent frame.

44. In a system for encoding digital video, the digital video comprising a plurality of frames 1, 2, 3.

A DCT/DQT transformer configured to transform the plurality of pixels in each frame from pixel domain elements to frequency domain elements, the frequency domain elements being representable in rows and columns;

a quantizer, coupled to the transformer, configured to quantize the frequency domain elements to emphasize elements that are more sensitive to the human visual system and de-emphasize elements that are less sensitive to the human visual system; and the number of the first and second groups,

a delta (delta) encoder, coupled to the quantizer, configured to determine differences between quantized frequency-domain elements of a first frame and associated quantized frequency-domain elements of a second frame, and to repeat the process of determining differences between quantized frequency-domain elements of successive frames such that the quantized frequency-domain elements of each frame are compared with the quantized frequency-domain elements of the frame immediately preceding it.

45. The apparatus of claim 44, wherein only differences between AC quantized frequency domain elements are determined.

46. The apparatus of claim 44, further comprising a block size assignment configured to group a plurality of pixel elements into variable block sizes.

47. The apparatus of claim 44 wherein the delta encoder produces lossless frequency domain elements.

48. The apparatus of claim 44 wherein the delta encoder produces lossy frequency domain elements.

49. The apparatus of claim 44 further comprising a serializer, coupled to the quantizer, configured to accept the quantized frequency domain elements and reorder the quantized frequency domain elements.

50. The apparatus of claim 49 further comprising a variable length coder coupled to the serializer and configured to variable length code the quantized frequency domain elements.