HK1053213B

HK1053213B - Quality based image compression

Info

Publication number: HK1053213B
Application number: HK03105415.3A
Authority: HK
Inventors: K‧S‧蒂阿格拉坚; S‧A‧莫雷
Original assignee: 高通股份有限公司
Priority date: 2000-01-28
Filing date: 2001-01-26
Publication date: 2007-08-10

Description

Quality-based image compression

Background

Field of the invention

The present invention relates to image compression. More particularly, the present invention relates to a quality-based image signal compression scheme utilizing adaptively sized blocks and sub-blocks of encoded discrete cosine transform coefficient data.

Description of related Art

In the field of transmission and reception of video signals such as those used for showing "movies" or "movies", various improvements to image compression techniques are being made. Many current and proposed video systems utilize digital encoding techniques. Digital coding provides robustness to the communication link against impairments such as multipath fading and interference or signal interference, each of which will severely degrade image quality. Also, digital technology facilitates the use of digital encryption technology that is useful or even necessary for governmental and many new and evolving commercial broadcast applications.

High definition television is an area that benefits from improved image compression techniques. When originally proposed, over-the-air transmission (or even wire or fiber transmission) of high definition television seems impractical due to excessive bandwidth requirements. Wireless or other transmission systems have been designed to generally not provide sufficient bandwidth conveniently. However, it has been realized that compression of digital video signals can be brought to a level that enables transmission using a reasonable bandwidth. Such a level of signal compression, in conjunction with digital transmission of the signal, enables the video system to transmit with less power and greater resistance to channel impairments, while occupying a more desirable useful bandwidth.

Many available compression techniques provide a significant level of compression but result in a degradation of the quality of the video signal. In general, techniques for transmitting compressed information require that the compressed information be transmitted at a constant bit rate.

One compression technique capable of providing significant levels of compression while retaining the desired video signal quality utilizes adaptively sized blocks and sub-blocks of encoded Discrete Cosine Transform (DCT) coefficient data. This technique will be referred to herein as the block size adaptive discrete cosine transform (ABSDCT) method. This technique is disclosed in U.S. Pat. No. 5,021,891 entitled "Adaptive Block Size Image Compression Method and System" and assigned to the assignee of the present invention and incorporated herein by reference. The DCT technique is also disclosed in U.S. Pat. No. 5,107,345 entitled Adaptive Block Size Image compression method And System, assigned to the assignee of the present invention And incorporated herein by reference. Furthermore, the use of the ABSDCT technique in combination with the differential quadtree transformation technique is disclosed in U.S. Pat. No. 5,452,104 entitled "Adaptive Block Size Image Compression Method And System", which is also assigned to the assignee of the present invention And is incorporated herein by reference. The systems disclosed in these patents use a method known as "intra" coding, in which each frame of image data is coded without regard to the content of any other frame. Using ABSDCT techniques, the achievable data rate can be reduced from around 15 megabits per second to about 5 million bits per second without a discernible reduction in image quality.

ABSDCT techniques can be used to compress black and white or color images or signals representing images. The color input signal may be in YIQ format, where for each 4x4 pixel block, Y is luminance or luminance samples, and I and Q are chrominance or color samples. Other known methods such as YUV, YC may also be used_bC_yOr RGB format. Most studies have shown that the sub-sampling of 1/4 color components in the horizontal and vertical directions is reasonable due to the low spatial sensitivity of the eye to color. Thus, a video signal can be represented by 4 luminance components and two chrominance components.

Using ABSDCT, the video signal will typically be divided into pixel blocks for processing. For each block, the luminance and chrominance components are passed to a block interleaver. For example, 16x16 (pixel) blocks would be fed to a block interleaver that arranges or organizes the image samples in each 16x16 block to produce blocks of data for Discrete Cosine Transform (DCT) analysis and synthesized sub-blocks. The DCT operator is a method of transforming a time and space sampled signal to a frequency representation of the same signal. By transforming to a frequency representation, DCT techniques have proven to allow for very high levels of compression, since the quantizer can be designed to take advantage of the frequency distribution characteristics of the image. In the preferred embodiment, one 16x16DCT is applied to order 1, 48 x8 DCTs are applied to order 2, 164 x4 DCTs are applied to order 3, and 64 2x2 DCTs are applied to order 4.

The DCT operation reduces the spatial redundancy inherent in the video resource. After DCT, most of the video signal energy tends to concentrate in some of the DCT coefficients. An auxiliary transform, Differential Quadtree Transform (DQT), may be used to reduce redundancy in DCT coefficients.

For a 16x16 block and each sub-block, the DCT coefficient values and, if DQT is used, the DQT values are analyzed to determine the number of bits required to encode the block or sub-block. The block or combination of sub-blocks requiring the least number of bits is then selected to represent the image segment. For example, two 8x8 sub-blocks, 64 x4 sub-blocks, and 82 x2 sub-blocks may be selected to represent an image segment.

The selected blocks or combinations of sub-blocks are then arranged in order in 16x16 blocks. The DCT/DQT coefficient values will then undergo frequency weighting, quantization and coding (such as variable length coding) in preparation for transmission. Although the ABSDCT technique described above performs very well, it is computationally intensive. So that it may be difficult to implement the technique in compact hardware.

Summary of The Invention

Other techniques for making hardware implementations more efficient provide certain advantages. Some systems utilize adaptively sized blocks and sub-blocks of Discrete Cosine Transform (DCT) coefficient data. While some DCT-based systems utilize quality as a compression parameter, other portions of the data are based on coding rate, as opposed to using a quality-based scale. An example of such a coding rate based parameter is the quantization step selection of a contrast based block size adaptive image compression algorithm.

The present invention is a quality based image compression system and method utilizing adaptively sized blocks and sub-blocks of discrete cosine transform coefficient data and utilizing a quality based quantization scale factor. A block of pixel data is input to an encoder. The encoder consists of a Block Size Assignment (BSA) element that partitions the input block for processing. The block size assignment is based on the variance of the input block and the further subdivided blocks. In general, if the average values of the block and sub-block fall into different predetermined ranges, the area with larger variance is subdivided into smaller blocks and the area with smaller variance is not subdivided. Thus, the variance threshold of a block is first modified from its nominal value based on its mean, then the variance of the block is compared to a threshold, and if the variance is greater than this threshold, then the block is subdivided.

The block size assignments are fed into a transform element that transforms the pixel data into frequency domain data. Only the blocks and sub-blocks selected by the block size allocation are transformed. The transformed data is then scaled by quantization and serialization. The quantization of the transform data is quantized based on an image quality metric, such as a scale factor that is adjusted in terms of contrast, coefficient count, rate distortion, density of block size assignments, and/or past scale factors. Data may also be serialized using polyline scanning to produce a data stream. The data stream may be encoded in preparation for transmission with a variable length encoder. The encoded data is transmitted over a transmission channel to a decoder where the pixel data is reconstructed in preparation for display.

It is a feature and advantage of the present invention to provide a quality based image compression system.

It is another feature and advantage of the present invention to allow flexible picture quality control by managing the bit rate in a frame-by-frame manner.

It is another feature and advantage of the present invention to allow flexible picture quality control by managing the bit rate in a block-by-block manner.

Maintaining quality image compression and bit rate control of data with activities such as sudden motion is another feature and advantage of the present invention.

It is another feature and advantage of the present invention to use the signal to noise ratio parameter to quantify the image quality.

It is another feature and advantage of the present invention to utilize quantization scale factors that are adjusted in the contrast of the image.

It is another feature and advantage of the present invention to utilize quantization scale factors that are adjusted in the counting of AC coefficients of DCT blocks comprising an image.

It is another feature and advantage of the present invention to utilize quantization scale factors that are adjusted in terms of distortion and inter-frame bit rate.

It is another feature and advantage of the present invention to utilize quantization scale factors that are adjusted in terms of past quantization scale factors.

Brief Description of Drawings

The features and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:

FIG. 1 is a block diagram of an encoder portion of a quality-based image processing system, including the variance-based block size assignment system and method of the present invention;

FIG. 2 is a block diagram of a decoder portion of a quality-based image processing system including the variance-based block size assignment system and method of the present invention;

FIG. 3 is a flow chart illustrating the processing steps involved in variance-based block size assignment;

fig. 4a, 4b and 4c illustrate exemplary block size assignments, corresponding quadtree decompositions and corresponding PQR data.

Fig. 5 is a graph illustrating the relationship of quantization scale factor versus average bits per pixel rate.

Fig. 6 is a flow diagram of a quantization scale factor based on the AC coefficient count of a DCT block.

Fig. 7a is a flow chart of rate-distortion based quantization scale factors.

Fig. 7b is a continuation of the flow chart of the rate-distortion based quantization scale factor of fig. 7 a.

Detailed description of the preferred embodiments

In order to facilitate digital transmission of digital signals and to enjoy the corresponding benefits, it is often necessary to use some form of signal compression. It is also important to maintain high quality of the image in order to achieve high compression in the resulting image. Furthermore, computational efficiency is desirable for achieving hardware compactness, which is important in many applications.

The present invention provides a system or apparatus and method for quality-based image compression that takes into account image quality and computational efficiency in performing image compression. Controlling the bit rate of a truly variable rate system according to the purpose of maintaining the target bit rate suppresses the purpose of maintaining the image quality. Instead, the present invention is directed to a quality-based rate control strategy. The quality-based image compression system may be at the block level or at the frame level. Since block-level systems use more additional bits for identification of a particular block, block-level systems typically use a greater number of coded bits per frame than frame-level control.

Before one embodiment of the invention is explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the following drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.

The Image Compression of the present invention is Based on Discrete Cosine Transform (DCT) techniques such as those disclosed in co-pending U.S. patent application Ser. No. 09/436,085, filed on 8.11.1999, "contrast sensitive variable basic Adaptive Block Size DCT Image Compression", assigned to the assignee of the present invention and incorporated herein by reference. The image to be processed in the digital domain is typically composed of pixel data divided into an array of non-overlapping blocks of NxN size. A two-dimensional DCT may be performed on each block. The two-dimensional DCT is defined by the following relation:

wherein

x (m, n) is the pixel at position (m, n) in the NxM block,

x (k, l) is the corresponding DCT coefficient.

Since the pixel values are non-negative, the DCT component X (0, 0) is always positive and usually has most of the energy. In fact, for a typical image, most of the transform energy is concentrated around the component X (0, 0). This energy compression property makes DCT techniques an attractive compression method.

The image compression technique utilizes contrast adaptive coding to achieve further bit rate reduction. It has been observed that most natural images consist of relatively slowly varying flat areas and busy areas such as object boundaries and high contrast textures. Contrast adaptive coding schemes take advantage of this by allocating more bits to busy regions and fewer bits to less busy regions.

Contrast adaptive coding may also be used to reduce occlusion effects, as described in the above-mentioned co-pending patent application serial No. 09/436,085. The occlusion effect is more noticeable in the busy areas of the image. However, it has been recognized that the occlusion effect is reduced when smaller scale DCTs are used. When using a 2x2DCT, although the bit-per-frame performance will suffer, the occlusion effect becomes practically invisible.

Furthermore, contrast adaptive methods utilize intra-frame coding (spatial processing) rather than inter-frame coding (spatio-temporal coding). Inter-frame coding essentially requires multiple frame buffers in addition to more complex processing circuitry. In many applications, practical implementations require reduced complexity. Intra-coding may also be used in situations where the spatio-temporal coding scheme can be corrupted and perform poorly. For example, a movie of 24 frames per second can belong to this category because of the short integration time caused by the mechanical shutter. This short integration time allows a higher degree of temporal aliasing. As fast motion becomes jerky, the assumption of inter-frame correlation is broken. Intra-coding is also easier to standardize when 50Hz and 60Hz power supply frequencies are involved. Current televisions send signals at 50Hz or 60 Hz. The use of intra-frame schemes, as a digital method, can accommodate 50Hz and 60Hz operation, or even 24 frames per second film, trading off frame rate versus spatial resolution.

For image processing purposes, the DCT operation is performed on pixel data that is divided into an array of non-overlapping blocks. Note that while block sizes are discussed herein in terms of NxN sizes, it is envisioned that a variety of block sizes may be used. For example, a block size of NxM may be utilized, where N and M are integers and M is greater or less than N. Another important aspect is that the block can be divided into at least 1 level sub-blocks, such as N/ixN/i, N/ixN/j, N/ixM/j, etc., where i and j are integers. Also, the exemplary block size described herein is a 16x16 pixel block with corresponding blocks and sub-blocks of DCT coefficients. It is further envisioned that various other integers, such as both even and both odd, may be used, such as 9x 9.

Fig. 1 and 2 illustrate an image processing system 100 incorporating the quality-based compression system of the present invention. The image processing system 100 consists of an encoder 104 that compresses a received video signal. The compressed signal is transmitted using the transmit channel 108 and received by the decoder 112. The decoder 112 decodes the received signal into image samples that are then displayed.

Generally, the image is divided into pixel blocks for processing. RGB to YC may be used₁C₂Converter 116 converts color signals from RGB space to YC₁C₂Space, where Y is luminance or a luminance component, C₁And C₂Is a chrominance or color component. Many systems are in the horizontal and vertical directions 1/4 vs. C due to the low spatial sensitivity of the eye to color₁And C₂The components are sub-sampled. However, sub-sampling is not required. High resolution images, known as the 4:4:4 format, are very useful or necessary in some applications such as those known as overlay "digital cinema". Two possible YCs₁C₂The representation is a YIQ representation and a YUV representation, both of which are well known in the art. A variant using the YUV representation known as YCbCr is also possible.

In the preferred embodiment, Y, Cb and each of the Cr components are processed without sub-sampling. Thus, the input for each component of a 16x16 block of pixels is fed into the encoder 104. For illustration purposes, the encoder 104 for the Y component is illustrated. Similar encoders are used for the Cb and Cr components. Encoder 104 consists of a block size assignment element 120 that performs block size assignment in preparation for video compression. The block size assignment element 120 determines the block decomposition of the 16x16 block based on the perceptual characteristics of the image in the block. The block size assignment subdivides each 16x16 block into smaller blocks in a quadtree fashion according to activity in the 16x16 block. The block size allocation element 120 generates quad-tree data, referred to as PQR data, which may be between 1 and 12 bits in length. Thus, if the block size allocation determines that a 16x16 block is to be partitioned, the R bits of PQR data are set and followed by 4 additional bits of Q data equivalent to the 4 partitioned 8x8 blocks. If the block size allocation determines that one of any 8x8 blocks is to be split, then 4 additional bits of P data are added to each of the 8x8 blocks that are being re-split.

Referring now to FIG. 3, a flow diagram illustrating the detailed operation of the block size assignment component 120 is presented. The variance of the block is used as a metric in the re-segmentation decision. Beginning at step 202, a 16x16 block of pixels is read. At step 204, the variance v16 of the 16x16 block is calculated. The variance is calculated as follows:

wherein N is 16, and x_i，jIs the pixel in the ith row and jth column of the NxN block. If the mean value of the block is between two predetermined values, the variance threshold T16 is first modified to give a new threshold T '16, and the block variance is then compared to the new threshold T' 16, step 206.

If the variance v16 is not greater than the threshold T16, then at step 208 the starting address of the 16x16 block is written to temporary memory and the R bit of the PQR data is set to 0 to indicate that the 16x16 block has not been re-partitioned. The algorithm then reads the next 16x16 pixel block. If the variance v16 is greater than the threshold T16, then at step 210, the R bit of the PQR data is set to 1 to indicate that the 16x16 block is to be subdivided into 48 x8 blocks.

As shown in step 212, 48 x8 blocks (i ═ 1: 4) are considered in sequence for further resegmentation. For each 8x8 block, the variance v8 is calculated at step 214_i. If the mean of the block is between two predetermined values, the variance threshold T8 is first modified to give a new threshold T' 8, and the block variance is then compared to the new threshold at step 216.

If the variance v8_iNot greater than the threshold T8, then at step 218 the start address of the 8x8 block is written to temporary storage and the corresponding Q bit Q is written to_iSet to 0. The next 8x8 block is then processed. If the variance v8_iGreater than the threshold T8, then at step 220, the corresponding qbit Q is set_iSet to 1 to indicate that the 8x8 block is to be subdivided into 4x4 blocks.

As shown in step 222, 4x4 blocks (j) are considered sequentially_i1: 4) for further repartitioning. For each 4x4 block, the variance v4 is calculated at step 224_ij. If the mean value of the block is between two predetermined values, the variance threshold T4 is first modified to give a new threshold T' 4, and the block variance is then compared to the new threshold at step 226.

If the variance v4_ijNot greater than the threshold T4, then in step 228 the start address of the 4x4 block is written to temporary storage and the corresponding P bits P are written_ijSet to 0. The next 4x4 block is then processed. If the variance v4_ijGreater than the threshold T4, then at step 230, the corresponding P bits P are applied_ijSet to 1 to indicate that the 4x4 block is to be subdivided into 42 x2 blocks. Further, addresses of 42 × 2 blocks are written to the temporary memory.

The thresholds T16, T8, and T4 may be predetermined constants. This is called a hard decision. Alternatively, adaptive or soft decision may be implemented. For example, the soft decision changes the variance threshold based on the average pixel value of a 2Nx2N block, where N may be 8, 4, or 2. Thus, a function of the average pixel value can be used as a threshold.

For purposes of illustration, consider the following example. Assume that the predetermined variance thresholds for the 16x16, 8x8, and 4x4 block Y components are 50, 1100, and 880, respectively. In other words, T16 ═ 50, T8 ═ 1100, and T4 ═ 880. Assume that the range of the average values is 80 and 100. Assume that the calculated variance for the 16x16 block is 60. Since 60 is greater than T16 and the average value 90 is between 80 and 100, the 16x16 block is subdivided into 48 x8 blocks. Assume that the calculated variance of the 8x8 block is 1180, 935, 980, and 1210. Since the variance of two of the 8x8 blocks exceeds T8, the two blocks are further subdivided to produce a total of 84 x4 sub-blocks. Finally, assume that the variances of the 84 × 4 blocks are 620, 630, 670, 610, 590, 525, 930, and 690, and the corresponding averages are 90, 120, 110, 115. Since the average value of the 1 st 4x4 block falls within the range (80, 100), its threshold will be reduced to 200 for T' 4 less than 880. Thus, not only the 7 th 4x4 block but also the 4x4 block are subdivided. Fig. 4a shows the resulting block size allocation. Fig. 4b shows the corresponding quadtree decomposition. In addition, figure 4c shows the PQR data resulting from this block size allocation.

Note that a similar process is used for color component C₁And C₂The block size is allocated. The color components may be extracted horizontally, vertically, or both.

In addition, note that while the block size assignment is described in a top-down approach, where the largest block (the 16x16 block in this example) is calculated first, a bottom-up approach may be used instead. The bottom-up approach will first calculate the smallest block (2 x2 blocks in this example).

Referring back to fig. 1, the PQR data is fed into the DCT element 124 along with the address of the selected block. The DCT element 124 performs an appropriately sized discrete cosine transform on the selected block using the PQR data. Only the selected blocks need to be subjected to DCT processing.

The image processing system 100 may optionally include a DCT element 128 to reduce redundancy between DC coefficients of the DCT. A DC coefficient is encountered in the upper left corner of each DCT block. In general, the DC coefficient is larger than the AC coefficient. The difference in size makes it difficult to design an efficient variable length encoder. Therefore, it is advantageous to reduce the redundancy between DC coefficients.

The DQT element 128 performs a 2-D DCT on each DC coefficient taken at 2x 2. Starting with 2x2 blocks of 4x4 blocks, a 2-D DCT is performed on the 4DC coefficients. The 2x2DCT is referred to as differential quadtree transform or DQT of 4DC coefficients. Next, the DC coefficients of the DQT are used together with the 3 adjacent DC coefficients in the 8x8 block to calculate the next stage DQT. Finally, the DQT is calculated using the DC coefficients of 4 of the 8x8 blocks of the 16x16 blocks. Thus, in a 16 × 16 block, corresponding to DCT and DQT, there is a true DC coefficient and the rest are AC coefficients.

The transform coefficients (DCT and DQT) are fed to a quantizer for quantization. In a preferred embodiment, the DCT coefficients are quantized using a frequency domain weighting mask (FWM) and quantization scale factors. FWM is a frequency weighting table of the same dimension as the block of input DCT coefficients. The frequency weighting applies different weights to different DCT coefficients. The weighting is designed to emphasize input samples having frequency components to which human vision or optical systems are more sensitive and not to emphasize input samples having frequency components to which vision or optical systems are less sensitive. The weighting may also be designed based on factors such as line of sight.

The weighting is selected based on empirical data. International organization for standardization 1994 ISO/IEC JTC1 CD 10918, "Digital compression and encoding of continuous-bone images-part 1: one method of designing a weighted mask of 8x8DCT coefficients is disclosed in requierments and guidelities. In general, two FWMs are designed, one for the luminance component and the other for the chrominance component. A FWM table of 2x2, 4x4 block size is obtained by decimation, and a FWM table of 16x16 blocks is obtained by interpolating the table of 8x8 blocks. The scale factor controls the quality and bit rate of the quantized coefficients.

Thus, each DCT coefficient is quantized according to the following relation:

where DCT (i, j) is the input DCT coefficient, fwm (i, j) is the frequency weighting mask, q is the scale factor and DCT_q(i, j) are the quantized coefficients. Note that the first term in parentheses is rounded up or down depending on the sign of the DCT coefficients. The DCT coefficients are also quantized using a suitable weighting mask. However, a variety of tables or masks may be used and applied to Y, C_bAnd C_rEach of the components.

The block of pixel data and the frequency weighted mask are then scaled using the quantization scale factor element. In the preferred embodiment, there are 32 scale factors corresponding to the average bit rate. Unlike other compression methods such as MPEG2, the average bit rate is controlled according to the quality of the processed image rather than the target bit rate and buffer status. In the preferred embodiment, a particular quantization scale element is preselected by selector 130. Although it should be noted that the invention does not require the quantization scale element to be pre-selected.

In one block-based embodiment, the quantization scale factor is based on contrast (132). It has been found that the human visual system is more sensitive to high contrast regions than to low contrast regions. The contrast-based quantization scale factor uses a predetermined relationship between the bit rate and the quantization scale factor. In an embodiment, the block size allocation function partitions the MxN block into sub-blocks according to contrast, such that there are more sub-parts in high contrast blocks and fewer sub-parts in low contrast blocks. The present invention quantifies contrast as the ratio of the square root of the variance to the average luminance. The quantization scale factor is specifically selected according to the block size assignment given by the following relation:

wherein R is_iIs the number of bits, σ, of the i-th block_i ²Is the variance, p, of the ith block_iIs the percentage of the i-th block obtained from the block size assignment function. The quantization scale factor is determined using a predetermined relationship between the bit rate and the quantization scale factor.

Fig. 5 illustrates the relationship between bit rate and quantization scale factor. Predetermining by estimating the average number of bits necessary to determine a description pixel for a set of imagesThe relationship between the bit rate and the quantization scale factor. As shown in the figure, the quantization scale factor increases as the average number of bits per pixel decreases. Thus, may be the corresponding R_iThe value selects a specific scaling factor Q.

In another block-based embodiment, the quantization scale factor is selected based on a coefficient count (136) of the DCT. In general, coefficient counting systems and methods require a 1-bit overhead per block. Fig. 6 illustrates a quantizer scale factor method 300. After block size allocation, a portion of data comprising an NxN block of data is selected (304). In the preferred embodiment, the portion of data comprises a block of 16 rows and 16 columns of pixel data. As depicted in fig. 3, a DCT is performed on the data block (308), and sub-blocks are further specified. A threshold is determined for each data block (312). The threshold may be determined at any stage of processing. In a preferred embodiment, the threshold is predetermined empirically. The number of AC coefficients that exceed a predetermined threshold in magnitude is then determined (316) by the relationship:

where T is the threshold, X (k, l) denotes the DCT coefficient at position (k, l) in the NxN block, and M is the number of AC coefficients in the NxN block of the DCT that exceed the threshold. A scaling factor is then assigned according to the number of AC coefficients that exceed the threshold T (320).

After estimating the 1 st data block in a portion of data and assigning a scaling factor, the next NxN data block in the portion is selected (324). The scale factor is incremented (328) and another DCT is performed (332). The mean square error between the old and new DCTs is determined (336). The scaling factor is decremented (340) and another DCT is performed (344). The mean square error between the old and new DCT is again determined (348). The two mean square errors are compared and a quantizer factor associated with the lower mean square error is assigned to the data block (352). The loop is then repeated for each data block in the portion of data (356). After each data block in the portion of data has been assigned a quantizer, the next portion of data block is selected and the loop continues (360).

Another block-based embodiment for determining quantization scale factors is a rate-distortion based quantizer scale (134) and is illustrated in fig. 7 (400). In general, a rate-distortion quantizer operates in a similar manner to a coefficient-counting quantizer, but it requires more overhead bits depending on how many scale factors are scrutinized. After block size allocation, a portion of data comprising an NxN data block is selected (404). In the preferred embodiment, the portion of data comprises a 16 row 16 column block of pixel data. As depicted in fig. 3, a DCT is performed on the data block (408), and sub-blocks are further specified. The threshold for each data block is again determined at any stage of processing (412). An average distortion D and an average bit rate per pixel R are also determined (416). The number of AC coefficients that exceed a predetermined threshold (M) in magnitude is then determined (420) using the relationship described for the coefficient counting method. A scaling factor is then assigned based on the number of AC coefficients that exceed the threshold T (424).

After estimating the 1 st data block in a portion of data and assigning a scaling factor, the next NxN data block in the portion is selected (428). The scale factor is incremented by the scale factor of the previous block (432) and another DCT is performed (436). The mean square error between the old and new DCTs is determined (440). The average distortion D and average bit rate per pixel R corresponding to the quantization scale factor associated with the mean square error rate in step 440 are then determined (444).

The scale factor is then decremented from that of the previous block (448) and another DCT is performed (452). The Mean Square Error (MSE) between the old and new DCTs is again determined (456). The average distortion D and average bit rate per pixel R corresponding to the quantization scale factor associated with the mean square error rate in step 456 are then determined (460). The difference between the two average distortion rates D and between the two per-pixel bit rates R is determined (464). Then, two scaling factors are determined (468) by taking the following ratio of the amount of change in pixel rate to the amount of change in average distortion rate:

the scaling factor obtained from the largest absolute ratio is selected for the current block (472). The loop is then repeated for each data block in the portion of data (476). After each data block in the portion of data has been assigned a quantizer, the next portion of data block is selected and the loop continues (480).

The quantization scale factors may also be selected on a frame-by-frame basis. However, frame-based rate control generally achieves a lower average bit rate than the block-based rate control described above. Furthermore, frame-based rate control requires a greater use of buffers, as opposed to in an inter-block fashion, since the bit-transform occurs in an inter-frame fashion.

In a frame-by-frame quantizer embodiment, the quantization scale factor may depend on the density of various block size assignments (FIG. 1, block 144). A quantization scale factor for an entire frame of the image is determined from a metric related to image quality of the image. Image quality is inferred by determining the number of 16x16 blocks and the number of 8x8 blocks for a given frame. That is, if a given frame has more 8x8 blocks than 16x16 blocks, the image is considered more complex. Therefore, more complex images generally require better quantization to maintain quality. Conversely, if a given frame has more 16x16 blocks than 8x8 blocks, the image is considered less complex. Thus, a lower quantization scale factor can be used and still maintain image quality. Thus, the block size assignment algorithm is performed for each frame, and the scaling factor is based on the following equation:

wherein n is₁₆Representing the number of 16x16 blocks in the frame, n₈Representing the number of 8x8 blocks in the frame and L representing the ratio over the threshold. In a preferred embodiment, the threshold is predetermined empirically. The scaling factor is then selected based on the value of L and the number of 8x8 blocks for a given frame.

Alternatively, the frame-based quantization scale factor may be based on the quantization scale factor of the previous frame (148). In this embodiment, the current frame is compressed using the scale factor of the previous frame, the scale factor corresponding to the previous frame scale factor coefficient +1, and the scale factor corresponding to the previous frame scale factor coefficient-1. During compression of the current frame, the actual scale factor is determined from the block size assignment information as discussed above in the discussion of the frame-by-frame visual quality method. The compressed output corresponding to the scale factor closest to the actual scale factor of the current frame is determined and used along with its corresponding scale factor. That is to say

Otherwise

Wherein q(s)_n-1) Coefficient representing scale factor of previous frame, q(s)_n-1+1) represents a candidate scale factor over the previous frame, q(s)_n-1-1) represents a candidate scale factor below the previous frame, and q_nRepresenting the actual scale factor of the current frame.

Referring back to fig. 1 and 2, the quantized coefficients may be fed through a variable length encoder 152 into a polyline scan serializer 156. Serializer 156 scans the quantized coefficient blocks in a zigzag manner to generate a serialized stream of quantized coefficients. Several different polyline scan modes, as well as modes other than the polyline mode, may also be selected. Although other sizes may be used, the preferred technique is to use an 8x8 block size for the polyline scan.

Note that the zig-zag scan serializer 156 may be placed before or after the quantizers 132, 136, 140, 144, or 148. The end result is generally the same.

In any case, the quantized coefficient stream is fed into a variable length encoder 152. The variable length encoder 152 may utilize run length encoding of 0 followed by huffman encoding. This technique is detailed in the above-mentioned U.S. Pat. Nos. 5,021,891, 5,107,345, and 5,452,104, and is summarized here. Run-length coding takes quantized coefficients and separates 0 from non-0 coefficients. The 0 value is called run length value and is huffman coded. The non-0 values are respectively Huffman encoded.

Huffman coding of the modified quantized coefficients is also possible and used in the preferred embodiment. Here, after the zigzag scan, the run length encoder will determine the run length/size pairs in each 8x8 block. These run length/size pairs are then huffman encoded.

Huffman coding is designed from statistical information of measured or theoretical images. It has been observed that most natural images consist of flat or less varying areas and busy areas such as object boundaries and high contrast textures. Huffman coding and frequency domain transforms such as DCT take advantage of these characteristics by allocating more bits to busy regions and fewer bits to flat regions. Generally, Huffman encoding uses a look-up table to encode the run length and non-0 values. Typically, multiple tables are used, as desired, although 1 or 2 tables may be used, with 3 tables being preferred in the present invention.

The compressed image signal generated by the encoder is temporarily stored using the buffer 160 and then transferred to the decoder 112 using the transmission channel 108. PQR data containing block size allocation information is also fed to the decoder 112. The decoder 112 includes a buffer 164 and a variable length decoder 168 that decodes run length values and non-0 values.

The output of the variable length decoder 168 is fed into an inverse polyline scan serializer 172, which orders the coefficients according to the scanning scheme used. The inverse polyline scan serializer 172 receives the PQR data to assist in properly ordering the coefficients into a complex coefficient block.

The process resulting from the use of quantization scale factors and frequency weighting masks is undone using selector 174 to feed the composite block to an inverse quantizer 176, 178, 180, 182, or 184 corresponding to the quantizer used in encoder 104.

If a differential quadtree transform has been applied, the coefficient block is fed into an IDQT element 186 followed by an IDCT element 186. Otherwise, the coefficient block is fed directly into IDCT element 190. The IDQT element 186 and IDCT element 190 inverse transform the coefficients to produce blocks of pixel data. The block of pixel data must then be interpolated and converted to RGB form for storage for future display.

Accordingly, a system and method for block size assignment based on pixel variance for image compression is presented. Variance-based block size assignment provides several advantages. Since the discrete cosine transform is performed after the block size is determined, efficient calculation is achieved. Only computationally intensive transformations need to be performed on the selected blocks. Furthermore, the block selection process is efficient since the variance of the pixel values is mathematically easy to calculate. Yet another advantage of variance-based block size assignment is that it is perceptually-based. The pixel variance is a measure of activity in the block and gives an indication of the presence of edges, texture, etc. It helps to obtain more detail than metrics such as average pixel values. Thus, the variance-based scheme of the present invention assigns smaller blocks to regions with more edges and larger blocks to flatter regions. As a result, excellent quality can be obtained in the reconstructed image.

Yet another important advantage is that the block size allocation is made with respect to several quantization scale factors regardless of the bit rate, allowing for greater accuracy and clarity in controlling image quality. By considering the quality-based quantization scale factor, the image compression is at a truly variable bit rate. This allows the image compression system to retain details of all regions just above the visible threshold. Also, unlike techniques such as Moving Picture Experts Group (MPEG) compression, variance-based image compression provides less degradation in image quality. This is crucial for applications such as in the field of digital cinema.

Due to the high demand for digital video, piracy is a significant threat. Digital watermarking is an important requirement to prevent copyright infringement and tax loss. Variance-based block size assignment is a natural candidate for watermarking due to watermarking in perceptually important image regions.

The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Additional features and advantages of the invention are set forth in the following claims.

Claims

1. A method of compressing a digital image comprised of pixel data, the method comprising the steps of:

reading a block of pixel data;

performing a block size assignment, wherein the block size assignment includes an act of dividing a block of pixels into sub-blocks of pixel data, the act comprising:

determining a variance of pixel values of the block of pixel data;

comparing the variance to a threshold;

performing decision based on the result of the act of comparing and subdividing the block;

if the decision is to subdivide the block, then repeating the acts of determining, comparing, and making a decision for each sub-block until a predetermined criterion is met; and

designating each block or sub-block that is no longer partitioned as a block size allocation;

transforming the sub-blocks of pixel data into corresponding frequency domain representations using a discrete cosine transform;

the frequency domain representation is scaled to the data stream on a frame-by-frame basis according to the number and size of block size assignments per frame, wherein the scaling act is based on a quality metric associated with image quality.

2. The method as recited in claim 1, wherein the act of scaling further comprises an act of providing a frequency weighted mask to the sub-block of pixel data such that the frequency weighted mask emphasizes portions of the image to which the human visual system is more sensitive and does not emphasize portions of the image to which the human visual system is less sensitive.

3. The method of any preceding claim, wherein the act of scaling further comprises an act of quantizing the sub-blocks of pixel data based on image quality.

4. The method of claim 1, wherein the quality metric is a signal-to-noise ratio.

5. The method of claim 1, wherein the transforming act performs a discrete cosine transform followed by a differential quadtree transform.

6. The method of claim 1, wherein the scaling act is based on a quantization scale factor selected on a block-by-block basis, a contrast of sub-blocks of pixel data, a number of AC coefficients in sub-blocks that exceed a predetermined threshold in size, a mean square error of the sub-blocks, a frame-by-frame basis, a number and size of block size assignments per frame, and/or a scale factor of a previous frame.

7. The method of claim 6, wherein the act of deciding requires that blocks be re-partitioned if the variance is greater than a threshold.

8. The method of claim 7, wherein the threshold is predetermined, is a function of a pixel average finger of the block or sub-block being estimated and/or a change in each level of resegmentation.

9. The method of any of claims 6-8, wherein the determining, comparing and deciding steps are not repeated when a predetermined criterion is met, the predetermined criterion being based on a size of a preselected minimum block of pixel data.

10. The method of claim 1, wherein the act of scaling further comprises encoding the data stream into a serialized data stream in preparation for transmission.

11. The method of claim 1, wherein the performing act is based on perceptual characteristics of the image.

12. The method of claim 1, wherein the scaling is given by the relationship:

wherein R is_iIs the number of bits, σ, of the i-th block_i ²Is the variance of the ith block, and p_iIs the percentage of the i-th block.

13. An image compression system for compressing pixel data based on quality, the system comprising:

block size distribution means for dividing the pixel block into sub-blocks of pixel data;

transform means for transforming the sub-blocks of pixel data into respective frequency domain representations using a discrete cosine transform; and

scaling means for scaling the frequency domain representation into a data stream on a frame-by-frame basis according to the number and size of block size assignments per frame, wherein the scaling means is based on a quality metric associated with image quality.

14. The system of claim 13, wherein the scaling means further comprises frequency weighted masking means for the sub-blocks of pixel data such that the frequency weighted mask emphasizes portions of the image to which the human visual system is more sensitive and does not emphasize portions of the image to which the human visual system is less sensitive.

15. The system of claim 13 or 14, wherein the quality metric is a signal to noise ratio.

16. The system of claim 13, wherein the transform means is configured to perform a discrete cosine transform.

17. The system of claim 13 wherein the transforming means is configured to perform a discrete cosine transform followed by a differential quadtree transform.

18. The system of claim 13, wherein the scaling means operates on a quantization scale factor based scaling data on a block-by-block basis, the selection being based on contrast of sub-blocks of pixel data, on a number of AC coefficients in sub-blocks that exceed a predetermined threshold in size, on a sub-block mean square error, on a frame-by-frame basis, on a number and size of block size assignments per frame, and/or on a scale factor of a previous frame.

19. The system of claim 13, wherein the block size allocation means is configured to: determining a variance of pixel values of the block of pixel data; comparing the variance to a threshold; deciding to re-partition the block based on the result of the act of comparing, wherein if the decision is to re-partition the block, then repeating determining the variance, comparing to a threshold, and deciding to re-partition for each sub-block until a predetermined criterion is met; each block or sub-block that is no longer partitioned is designated as a block size assignment.

20. The system of claim 19, wherein the decision re-segmentation calls for re-segmenting blocks if the variance is greater than a threshold.

21. The system of claim 20, wherein the threshold is predetermined and is a function of a pixel mean of the block or sub-block being estimated and/or a threshold variation per subdivision level.

22. The system of claim 19 wherein the predetermined criterion is based on a preselected minimum pixel data block size.

23. The system of claim 13, wherein the scaling means further comprises an encoder that encodes the data stream into a serialized data stream in preparation for transmission.

24. The system of claim 13, wherein the block size assignment means is based on perceptual characteristics of the image.