HK1073040A - Configurable pattern optimizer - Google Patents
Configurable pattern optimizer Download PDFInfo
- Publication number
- HK1073040A HK1073040A HK05105544.5A HK05105544A HK1073040A HK 1073040 A HK1073040 A HK 1073040A HK 05105544 A HK05105544 A HK 05105544A HK 1073040 A HK1073040 A HK 1073040A
- Authority
- HK
- Hong Kong
- Prior art keywords
- data
- blocks
- block
- color components
- image
- Prior art date
Links
Description
Background
I. Field of the invention
The present invention relates to image processing and compression, and more particularly to a configurable graphics optimizer for compressing images.
Description of the related Art
Digital image processing has gained prominence in the field of digital signal processing. The importance of human vision has prompted great interest and progress in the field and discipline of digital image processing. Various improvements have been made to image compression in the field of transmission and reception of video signals, such as those used for projection film or motion pictures. Many current and proposed video systems utilize digital encoding techniques. Aspects of this field include image coding, image restoration, and image feature selection. Image coding represents an attempt to transmit pictures of a digital communication channel in an efficient manner, using as few bits as possible to minimize the required bandwidth while maintaining distortion within certain limits. Image restoration represents efforts to restore a true image of an object. The encoded image transmitted over the communication channel has been distorted by various factors. The source of degradation may have been caused when the image was originally created from the object. Feature selection refers to selecting certain attributes of a picture. Identification, classification, and decision in a wide range of environments may require such attributes.
Digital encoding of video, such as in digital cinema, is an area that benefits from improved image compression techniques. Digital image compression may be generally divided into two categories: lossless and lossy methods. Lossless images are restored without any loss of information. Lossy methods include the inability to recover some information, depending on the compression rate, the quality of the compression algorithm, and the implementation of the algorithm. In general, lossy compression methods are considered to be the compression ratios desired to obtain cost-effective digital cinema methods. To achieve a digital cinema quality level, the compression method should provide a visually lossless performance level. Thus, despite the mathematical loss of information due to the compression process, the image distortion caused by this loss should be imperceptible to an observer under normal viewing conditions.
Existing digital image compression techniques have been developed for other applications, namely for television systems. This technique has made design compromises to suit the intended application, but does not meet the quality requirements required for cinema presentations.
Digital cinema compression techniques should provide the visual quality previously experienced by people who watch movies. Ideally, the visual quality of digital cinema should attempt to exceed the higher quality version of the copy film. At the same time, compression techniques should have practical high coding efficiency. As defined herein, coding efficiency refers to the bit rate required to achieve a certain quality level for the compressed image quality. Furthermore, the system and encoding techniques should have internal flexibility to accommodate different formats and should be cost effective; i.e. a small and efficient decoder or encoder process.
Many available compression techniques provide significant levels of compression but result in degradation of video signal quality. In general, techniques for transferring compressed information require that the compressed information be transferred at a constant bit rate.
One compression technique that can provide significant levels of compression while maintaining desirable video signal quality uses adaptively sized blocks and sub-blocks of encoded Discrete Cosine Transform (DCT) coefficient data. This technique is referred to hereinafter as the Adaptive Block Size Discrete Cosine Transform (ABSDCT) method. This technique is disclosed in U.S. patent No. 5021891, entitled "Adaptive Block Size image compression Method and System," which is assigned to the assignee of the present invention and is incorporated herein by reference. DCT techniques are also disclosed in U.S. patent No. 5107345 entitled "Adaptive Block Size Image Compression Method and system," which is assigned to the assignee of the present invention and is incorporated herein by reference. Also, a combination of the ABSDCT technique with a differential quadtree transform technique is discussed in U.S. patent No. 5452104 entitled "Adaptive Block Size Image Compression method and System," which is also assigned to the assignee of the present invention and incorporated herein by reference. The systems disclosed in these patents use what is known as "intra" coding, in which each frame of image data is coded regardless of the content of the other frames. By using ABSDCT techniques, the achievable data rates may be reduced from about 1.5 gigabits per second to about 50 megabits per second without a discernible degradation in signal quality.
The ABSDCT technique may be used to compress black and white or color images, or signals representing such images. The color input signal may be in YIQ format, Y is a luminance or luminance sample, and I and Q are chrominance (i.e., color) samples for each 4 x 4 pixel block. Other known formats, such as YUV, YC, may also be usedbCyOr RGB format. Due to the low spatial sensitivity of the eye to color, most studies indicate that sub-sampling of the color components by a factor of 4 in the horizontal and vertical directions is reasonable. Thus, a video signal can be represented by 4 luminance components and 2 chrominance components.
Using ABSDCT, the video signal is typically segmented into pixel blocks for processing. For each block, the luminance and chrominance components are passed to a block interleaver. For example, 16 x 16 (pixel) blocks may be provided to a block interleaver that orders or organizes the samples within each 16 x 16 block to produce blocks and composite sub-blocks of data for Discrete Cosine Transform (DCT) analysis. The DCT operator is a method of converting a time-sampled signal into a frequency representation of the same signal. By conversion into a frequency representation, it has been shown that the DCT technique allows for a very high degree of compression, since it can be assumed that the quantizer utilizes the frequency distribution characteristics of the image. In a preferred embodiment, one 16 × 16 DCT is applied to the first ordering, four 8 × 8 DCTs are applied to the second ordering, 164 × 4 DCTs are applied to the third ordering, and 64 2 × 2 DCTs are applied to the fourth ordering.
The DCT operation reduces the spatial redundancy inherent in the video source. After performing the DCT, most of the video signal energy is concentrated in a few DCT coefficients. An additional transform, Differential Quadtree Transform (DQT), may be used to reduce redundancy between DCT coefficients.
For a 16 x 16 block and each sub-block, the DCT coefficient values and DQT values (if DQT is used) are analyzed to determine the number of bits required to encode the block or sub-block. The block or combination of sub-blocks requiring the least number of bits is then selected for encoding to represent the image segment. For example, 28 × 8 sub-blocks, 64 × 4 sub-blocks, and 8 2 × 2 sub-blocks may be selected to represent an image segment.
The selected blocks or combinations of sub-blocks are then arranged in proper order to form a 16 x 16 block. The DCT/DQT coefficient values may then be subjected to frequency weighting, quantization and coding (e.g., variable length coding) in preparation for transmission. Although the ABSDCT technique described above performs very well, it is computationally intensive. Thus, a small hardware implementation of this technique may be difficult.
Variable length coding is done in the form of run lengths and sizes. Other compression methods, such as Joint Photographic Experts Group (JPEG) or moving Picture experts group (MPEG-2), use standard zig-zag scanning methods across the entire block size being processed, however, different block sizes are generated from the variance within the data blocks by using ABSDCT.
Summary of The Invention
Embodiments of the present invention provide apparatus and methods for an optimal pattern determiner. In one embodiment, the optimal pattern may be configured on a frame-by-frame basis. In another embodiment, a default pattern of predetermined block sizes is used, regardless of the actual block size as determined by the Adaptive Block Size Discrete Cosine Transform (ABSDCT) technique.
The present invention is a quality-based system and method of image compression that uses adaptively sized blocks and sub-blocks of discrete cosine transform coefficient data and a quality-based quantization scale factor. A block of pixel data is input to the encoder. The encoder includes a Block Size Assignment (BSA) element that segments an input block of pixels for processing. The block size assignment is based on the variance of the input block and the further subdivided blocks. Typically, regions with larger variances are subdivided into smaller blocks, and regions with smaller variances are not subdivided, as long as the mean of the blocks and sub-blocks fall within different predetermined ranges. Thus, first, the variance threshold of a block is modified from its nominal value based on its mean, then the variance of the block is compared to the threshold, and if the variance is greater than the threshold, the block is subdivided.
The block size assignment is provided to a transform element that transforms the pixel data into frequency domain data. The transform is performed only on the blocks and sub-blocks selected by the block size allocation. The transformed data is then subjected to scaling by quantization and serialization. The quantization of the transform data is based on image quality metrics such as scale factors to adjust contrast, coefficient counts, rate distortion, density of block size assignments, and/or past scale factors. Serialization is based on creating the longest possible run length of the same value. In one embodiment, a fixed block size zig-zag scan is used to serialize data to produce a data stream regardless of block size allocation. In another embodiment, the block size is 8 x 8. In preparation for transmission, the data stream may be encoded with a variable length encoder. The encoded data is sent over a transmission channel to a decoder where the pixel data is reconstructed in preparation for display.
In another embodiment, a method of serializing frequency-based image data in a digital cinema system is described. At least one set of data is compiled, which may be represented as a 16 x 16 block of data. Alternatively, one data frame is edited. The set of data is divided into four groups, each group being represented as an 8 x 8 block. Each of the 4 8 x 8 data blocks is serialized with zigzag scanning, vertical scanning, and/or horizontal scanning.
Thus, one aspect of the embodiments is to process data blocks within an 8 x 8 block using a fixed scan pattern, regardless of the actual block size allocation.
Another aspect of the embodiments is to determine and implement the optimal scanning technique on a frame-by-frame basis.
It is another aspect of the embodiments to provide a user with configurable scan patterns.
Brief Description of Drawings
The features, nature, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like elements have like numerals wherein:
FIG. 1 is a block diagram of an encoder portion of a quality-based image processing system incorporating a variance-based block size assignment system and method of the present invention;
FIG. 2 is a block diagram of a decoder portion of a quality-based image processing system incorporating a variance-based block size assignment system and method of the present invention;
FIG. 3 is a flow chart illustrating the processing steps involved in variance-based block size assignment;
FIG. 4a illustrates an exemplary block size allocation;
FIG. 4b illustrates a zigzag scan pattern through a 16 × 16 block size;
FIG. 4c illustrates a zigzag scan pattern within each variable block size;
FIG. 5a illustrates a zigzag scan pattern of 8 × 8 blocks independent of the actual block size;
FIG. 5b illustrates different scan patterns implemented within an 8 × 8 block independent of the actual block size;
FIG. 6a illustrates an embodiment of a serialization process; and
FIG. 6b illustrates another embodiment of a serialization process.
Description of The Preferred Embodiment
To facilitate digital transmission of digital signals and to enjoy corresponding benefits, it is generally desirable to employ some form of signal compression. It is also important to maintain a high quality of the image in order to achieve a high compression when generating the image. Moreover, computational efficiency is desirable for the implementation of micro-hardware, which is important in many applications.
Before one embodiment of the invention is explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.
One aspect of an embodiment employs Image Compression Based on Discrete Cosine Transform (DCT) techniques, such as that disclosed in co-pending U.S. patent application serial No. 09/436085, filed 11/8 1999, assigned to the assignee of the present invention and incorporated herein by reference, having a serial number of 09/436085. In general, an image to be processed in the digital domain will consist of pixel data divided into an array of non-overlapping blocks of size N × N. A two-dimensional DCT is performed on each block. The two-dimensional DCT is defined by the following relation:
whereinAnd
x (M, N) is the pixel position (M, N) within the NxM block, an
X (k, l) is the corresponding DCT coefficient.
Since the pixel values are non-negative, the DCT component X (0, 0) is always positive and typically has the most energy. In fact, for a general image, most of the transform energy is concentrated near the component X (0, 0). This energy compaction property makes DCT techniques an attractive compression method.
Image compression techniques use contrast adaptive coding to achieve further bit rate reduction. It has been observed that most natural images are composed of: flat, relatively slowly varying areas, and busy areas such as object boundaries and high contrast textures. Contrast adaptive coding schemes take advantage of this factor by allocating more bits to busy regions and fewer bits to less busy regions.
The contrast adaptive method uses intra-coding (spatial processing) instead of inter-coding (spatial-temporal processing). In addition to complex processing circuitry, inter-frame coding itself requires multiple frame buffers. In many applications, practical implementations require reduced complexity. Intra-coding is also useful in cases where the space-time coding scheme is subject to collapse and performance is extremely poor. For example, a movie of 24 frames per second may fall into this category because of the relatively short integration time produced by a mechanical shutter. Short integration times allow a higher degree of temporal aliasing. The frame-dependent assumption is broken for fast motion because it becomes jerky. Intra-coding is also easily standardized when both 50Hz and 60Hz mains line frequencies are involved. Televisions currently transmit signals at 50Hz or 60 Hz. The use of intra-frame schemes as a digital method can be applied to both 50Hz and 60Hz operation, or even to movies of 24 frames per second by compromising frame rate versus spatial resolution.
For image processing purposes, a DCT operation is performed on pixel data divided into an array of non-overlapping blocks. Note that although the block size dimensions discussed herein are N × N, it is contemplated that various block sizes may be used. For example, an N × M block size may be used, where N and M are both integers, M being either greater or less than N. Another important aspect is that the block may be divided into at least one layer of sub-blocks, such as N/i, N/i N/j, N/i M/j, etc., where i and j are integers. Also, the exemplary block size discussed herein is a 16 × 16 block of pixels, with corresponding blocks and sub-blocks of DCT coefficients. It is also contemplated that various other integer values may be used, such as even and odd numbers, e.g., 9 x 9.
Fig. 1 and 2 illustrate an image processing system 100 that incorporates the concept of a configurable serializer. The image processing system 100 includes an encoder 104 that compresses the received video signal. The compressed signal is transmitted using a transmission channel or physical medium 108 and received by a decoder 112. Decoder 112 decodes the received encoded data into image samples, which can be displayed.
Typically, an image is divided into a plurality of blocks of pixels for processing. The colour signals being possible from RGB to YC1C2Converted from RGB space to YC by converter 1161C2Space, where Y is a luminance component, C1And C2Is the chrominance (i.e., color) component. Due to the low spatial sensitivity of the human eye to color, many systems pair C4 times in the horizontal and vertical directions1And C2The components are sub-sampled. However, sub-sampling is not necessary. A full resolution image called 4: 4 may be very useful or necessary in certain applications, such as in applications covering "digital cinema". Two possible YCs1C2The representation is a YIQ representation and a YUV representation, both of which are well known in the art. A variant of YUV representation, called YCbCr, is also possible. This may be further split into odd and even components. Thus, in one embodiment, the representations Y-even, Y-odd, Y, C are usedb-even, Cb-odd, Cr-even, Cr-odd.
In a preferred embodiment, each of the odd and even Y, Cb, Cr components are processed without sub-sampling. Thus, each input of 6 components of a 16 × 16 pixel block is provided to the encoder 104. For illustrative purposes, the encoder 104 for the Y-even component is illustrated. Similar encoders are used for the Y-odd component, and the odd and even Cb and Cr components. The encoder 104 includes a block size assignment element 120 that performs block size assignment in preparation for video compression. The block size assignment element 120 determines the block decomposition of a 16 x 16 block based on observable characteristics of the image within the block. Block size assignment subdivides each 16 x 16 block into smaller blocks, such as 8 x 8, 4 x 4, and 2 x 20, in a quadtree fashion based on activity within the 16 x 16 block. The block size assignment element 120 generates a quadtree of data, called PQR data, which is between 1 and 21 bits in length. Thus, if the block size allocation element determines to decompose a 16 x 16 block, the R bits of the PQR data are set, followed by 4 additional bits of Q data, which correspond to the 4 decomposed 8 x 8 blocks. If the block size allocation determines that any 8 x 8 block is to be subdivided, 4 additional bits of P data for each subdivided 8 x 8 block are added.
Referring now to FIG. 3, a flow diagram is provided that illustrates details of the operation of the chunk size assignment component 120. The variance of a block is used as a measure of the decision to subdivide the block. Starting at step 202, a 16 × 16 block of pixels is read. In step 204, the variance v16 of the 16 × 16 block is calculated. The variance is calculated as follows:
wherein N is 16, xi,jI-th row and j-th column in the nxn block. If the mean of the block is between two predetermined values, the variance threshold T16 is first modified to provide a new threshold T '16, and the block variance is then compared to the new threshold T' 16, step 206.
If the variance v16 is not greater than the threshold T16, then in step 208 the starting address of the 16 x 16 block is written to temporary memory and then the R bit of the PQR data is set to 0 to indicate that the 16 x 16 block is not subdivided. The algorithm then reads the next 16 x 16 pixel block. If the variance v16 is greater than the threshold T16, then in step 210, the R bit of the PQR data is set to 1 to indicate that the 16 × 16 block is to be subdivided into 4 8 × 8 blocks.
As shown in step 212, 4 8 x 8 blocks (i ═ 1: 4 are considered sequentially for further subdivision, for each 8 x 8 block, the variance v8 is calculated in step 214i. If the mean of the block is between two predetermined values, the variance threshold T8 is first modified to provide a new threshold T' 8 and the block variance is then compared to the new threshold at step 216.
If the variance v8iNot greater than the threshold T8, then in step 218 the start address of the 8 x 8 block is written to temporary memory and the corresponding Q bit Q is writteniIs set to 0. The next 8 x 8 block is then processed. If the variance v8iGreater than the threshold T8, the corresponding qbit Q is set in step 220iSet to 1 to indicate that the 8 x 8 block is to be subdivided into 4 x 4 blocks.
As shown in step 222 of the method,4 x 4 blocks (j)iAre considered sequentially 1: 4 for further subdivision. For each 4 x 4 block, the variance v4 is calculated in step 224ij. If the mean of the block is between two predetermined values, the variance threshold T4 is first modified to provide a new threshold T' 4 and the block variance is then compared to the new threshold in step 226.
If the variance v4ijNot greater than the threshold T4, then in step 228 the start address of the 4 x 4 block is written and the corresponding P bit P is writtenijIs set to 0. The next 4 x 4 block is then processed. If the variance v4ijGreater than the threshold T4, the corresponding P bit P is set in step 230ijSet to 1 to indicate that the 4 x 4 block is to be subdivided into 42 x 2 blocks. Further, addresses of 42 × 2 blocks are written to the temporary memory.
The thresholds T16, T8, and T4 may be predetermined constants. This is called a hard decision. Alternatively, adaptive decision, i.e. soft decision, may be implemented. The soft decision changes the threshold of variance based on the average pixel value of a 2N x 2N block, where N may be 8, 4, or 2. Thus, a function of the average pixel value may be used as the threshold.
For illustration purposes, consider the following example. Let the predetermined variance threshold for the Y component be 50, 1100, and 880 for 16 x 16, 8 x 8, and 4 x 4 blocks, respectively. In other words, T16 ═ 50, T8 ═ 1100, and T16 ═ 880. Let the average range be 80 to 100. Assume that the variance calculated for a 16 x 16 block is 60. Since 60 is greater than T16, and the mean 90 is between 80 and 100, the 16 × 16 block is subdivided into 4 8 × 8 sub-blocks. Assume that the variances calculated for the 8 x 8 sub-blocks are 1180, 935, 980, and 1210. Since two 8 x 8 blocks have a variance exceeding T8, the two blocks are subdivided to produce a total of 8 x 4 sub-blocks. Finally, assume that the 8 4 x 4 blocks have variances of 620, 630, 670, 610, 590, 525, 930, and 690, with corresponding means of 90, 120, 110, 115. Since the mean of the first 4 x 4 block falls within the range (80, 100), its threshold is lowered to T' 4-200, which is below 880. Therefore, the 4 × 4 block is subdivided as well as the seventh 4 × 4 block. The resulting block size allocation is shown in fig. 4 a. The corresponding quadtree decomposition is shown in fig. 4 b. Also shown in fig. 4c is PQR data resulting from this block size allocation.
Note that similar steps are used to Y-odd, C for the color componentsb-even, Cb-odd, Cr-even and CrOdd allocation block size. The color components may be decimated horizontally, vertically, or in both directions.
Also, note that although the block size allocation has been described as a top-down approach, where the largest block is evaluated first (16 × 16 in this example, a bottom-up approach could be used instead.
Referring back to FIG. 1, the PQR data is provided to the DCT element 124, along with the address of the selected block. DCT element 124 performs an appropriately sized discrete cosine transform on the selected block using the PQR data. Only the selected blocks need to undergo DCT processing.
The image processing system 100 further comprises DQT elements 128 for reducing redundancy among DC coefficients of the DCT. The DC coefficient is encountered in the upper left corner of each DCT block. Typically, the DC coefficient is large compared to the AC coefficient. The size difference makes it difficult to design an efficient variable length encoder. Thus, it is advantageous to reduce the redundancy between DC coefficients.
The DQT element 128 performs a two-dimensional DCT on the DC coefficients, taking 2 × 2 each time. Starting from a 2 × 2 block within a 4 × 4 block, two-dimensional DCT is performed on 4 DC coefficients. This 2 × 2DCT is referred to as a differential quadtree transform of 4 DC coefficients, i.e., DQT. Next, the DC coefficients of the DQT within the 8 × 8 block and 3 adjacent DC coefficients are used to calculate the next level DQT. Finally, the DC coefficients of 4 8 × 8 blocks within a 16 × 16 block are used to calculate DQT. Thus, within a 16 × 16 block, there is one true DC coefficient, the remainder being AC coefficients corresponding to the DCT and DQT.
The transform coefficients (both DCT and DQT) are provided to a quantizer for quantization. In a preferred embodiment, the DCT coefficients are quantized using a Frequency Weighted Mask (FWM) and quantization scale factors. FWM is a table of frequency weights that are the same as the block dimension of the input DCT coefficients. The frequency weights apply different weights to different DCT coefficients. The weights are designed to emphasize input samples having frequency content that is more sensitive to the human visual system, while de-emphasizing samples having frequency content that is less sensitive to the visual system. The weights may also be designed based on factors such as viewing distance.
The weights are selected based on empirical data. A method for designing weight masks for 8 × 8DCT coefficients is disclosed in ISO/IEC JTC1 CD 10918, entitled "Digital compression and encoding of continuous-tone images-part 1: requirements and guidelines ", published by the International organization for standardization in 1994, incorporated herein by reference. Typically, two FWMs are designed, one for the luminance component and one for the chrominance component. The FWM table having a block size of 2 × 2 or 4 × 4 is obtained by decimation. A 16 × 16 FWM table is obtained by interpolation of the FWM table of the 8 × 8 block. The scale factors control the quality and bit rate of the quantized coefficients.
Thus, each DCT coefficient is quantized according to the following relationship:
where DCT (i, j) is the input DCT coefficients, fwm (i, j) is the frequency weighting mask, q is the scale factor, and DCTq (i, j) is the quantized coefficients. Note that the first term in parentheses is rounded up or down depending on the sign of the DCT coefficients. The DQT coefficients are also quantized using a suitable weighting mask. However, multiple tables or masks may be used and applied to each of Y, Cb and the Cr components.
The block of pixel data and the frequency weighted mask are then scaled by a quantizer 130, or scale factor element. In a preferred embodiment, there are 32 scale factors corresponding to the average bit rate. Unlike other compression methods such as MPEG2, the average bit rate is controlled based on the quality of the processed image, rather than the target bit rate or buffer status.
The quantized coefficients are provided to a scan serializer 152. The serializer 152 scans the block of quantized coefficients to produce a serialized stream of quantized coefficients. Zigzag scanning, column scanning, or row scanning may be employed. Many different zigzag scanning patterns, as well as patterns other than a zigzag, may also be selected. A preferred technique uses an 8 x 8 block size for zigzag scanning, although other sizes may be used.
Referring to fig. 4 and 5, various scanning techniques are described herein. Fig. 4b illustrates a zigzag scan of the entire 16 x 16 block 400. In frequency-based blocks, such as DCT, the values are encoded and represented such that the DC value is in the upper left corner and the AC value decreases as the lower right corner is approached. Thus, regardless of the block size allocation within a 16 x 16 block, the scanning technique of zigzag scanning across the 16 x 16 block results in coding inefficiencies. In other words, such zigzag scanning results in shorter run lengths of the same value.
Figure 4c illustrates a preferred scanning technique, using the order in which the coefficients are arranged within a given block. Each block 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 424, 426, and 428 employs a respective zigzag scan. In one embodiment, each block uses a different scanning pattern, such as vertical or horizontal, or inverted zigzag. Although this embodiment is very good at preserving the maximum run length, computing the respective zigzag scan for each block is computationally more intensive and difficult to implement in hardware.
Thus, it has been determined that implementations like the scan implementation described in fig. 5a and 5b may best balance between maximizing run length and ease of hardware implementation. Fig. 5a illustrates a 16 x 16 block 500 which is subdivided by block size allocation into blocks 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526 and 528. In one embodiment, regardless of BSA abort, a zigzag scan of each 8 x 8 quadrant of a 16 x 16 block is employed. Thus, blocks 504, 506, 508, and 510 are serialized by a zig-zag scan, block 512 is serialized by a zig-zag scan, block 514 is serialized by a zig-zag scan, and blocks 516, 518, 520, 522, 524, 526, and 528 are serialized by a zig-zag scan.
Fig. 5b illustrates a 16 x 16 block 550 subdivided by block size assignments into blocks 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576 and 578. In this embodiment, a different scan type is taken for each 8 × 8 quadrant of the 16 × 16 block. The type of scan employed is determined by evaluating the values within the 8 x 8 block and determining which scan method is most efficient. For example, in FIG. 5b, horizontal scanning is assumed for blocks 554, 556, 558, 560, block 562 is serialized by a zig-zag scan, block 564 is serialized by a vertical scan, and blocks 566, 568, 570, 572, 574, 576 and 578 are serialized by a zig-zag scan. In another embodiment, the optimal scanning method is determined on a frame-by-frame basis, as opposed to on a block-by-block basis. Determining the optimal scan method on a frame-by-frame basis is less computationally intensive than the block-by-block approach.
FIG. 6a illustrates a process 600 in which serialization occurs. A set of data 604 is read. Since the data being read is based on variable block sizes, the data being read does not have a uniform size or length. The data is compiled 608 or otherwise structured into a form that can be represented as 16 x 16 blocks. The data is then divided into 4 8 x 8 block sizes 612. Zigzag scanning 616 is then performed for each 8 x 8 block. The data is then routed to buffer 620).
FIG. 6b illustrates another embodiment of serialization 650. A data frame 654 is read. The evaluation of the data frame has determined the best serialization technique 658. Depending on the evaluation, a zig-zag scan 662, a vertical scan 664, or a horizontal scan 668 is taken. After serialization according to one of the scan methods, the data is routed to the buffer 672.
Referring back to fig. 1, the stream of serialized, quantized coefficients is provided to a variable length encoder 156. Variable length encoder 156 may use run length encoding of zero values followed by huffman encoding. This technology is discussed in detail in the above-mentioned U.S. patent nos. 5021891, 5107345, and 5452104, which are all incorporated and summarized herein by reference. The run-length encoder takes the quantized coefficients and separates zero values from the non-zero-value coefficients. The zero values are called run-length values and are subject to huffman coding. Non-zero values are separately huffman coded.
Modified huffman coding of the quantized coefficients is also possible and is used in the preferred embodiment. Here, after zigzag scanning, the run length encoder will determine the run length/size pairs within each 8 x 8 block. These run length/size pairs are then huffman coded.
Huffman coding is designed from measured or theoretical statistics of an image. It has been observed that most natural images are composed of: flat, relatively slowly varying areas, and busy areas such as object boundaries and high contrast textures. A huffman encoder with a frequency domain transform like DCT takes advantage of these features by allocating more bits to busy regions and less bits to flat regions. Typically, a Huffman encoder uses a look-up table to encode the run lengths and non-zero values. Typically, multiple tables are used, with 3 tables being preferred in the present invention, and then 1 or 2 tables can be employed as desired.
The compressed image signal generated by the encoder 104 may be temporarily stored by a buffer 160 and then transmitted to the decoder 112 by the transmission channel 108. PQR data containing block size allocation information is also provided to decoder 112. The decoder 112 includes a buffer 164 and a variable length decoder 168 that decodes run length values and non-zero values.
The output of the variable length encoder 168 is provided to an inverse serializer 172 which orders the coefficients according to the scanning scheme employed. For example, if a mix of zig-zag, vertical, and horizontal scanning is used, the inverse serializer 172 may reorder the coefficients appropriately depending on the type of scan employed. The reverse serializer 172 receives PQR data to help properly order the coefficients into a complex coefficient block.
The composite block is provided to an inverse quantizer 176 with a selector 174 for undoing the processing resulting from the use of the quantizer scale factors and the frequency weighting mask.
Then, if a differential quadtree transform is applied, the block of coefficients is provided to an IDQT element 186, followed by an IDCT element 186. Otherwise, the coefficient block is provided directly to the IDCT element 190. IDQT element 186 and IDCT element 190 inverse transform the coefficients to produce a block of pixel data. The pixel data may then have to be interpolated, converted to RGB form, and then stored for future display.
Thus, a system and method for image compression is presented that performs block size assignment based on pixel variance. Variance-based block size assignment provides several benefits. Since the discrete cosine transform is performed after the block size is determined, efficient calculation is achieved. Only the computationally intensive transform needs to be performed on the selected block. Furthermore, the block selection process is efficient because it is mathematically simple to calculate the variance values of the pixels. Another benefit of variance-based block size assignment is that it is perceptually based. The pixel variance is a measure of activity within the block and indicates the presence of edges, texture, etc. It captures the details of the block better than a measure such as the pixel mean. Thus, the variance-based scheme of the present invention assigns smaller blocks to regions with more edges and larger blocks to flatter regions. Thus, superior quality can be achieved in the reconstructed image.
By way of example, the various illustrative logical blocks, flow diagrams, and steps described in connection with the embodiments disclosed herein may be implemented as follows: hardware or software with Application Specific Integrated Circuits (ASICs), programmable logic devices, microprocessors, discrete gate or transistor logic, discrete hardware components such as registers and FIFOs, processors executing a set of firmware instructions, and conventional programmable software and processors, or a combination thereof. The processor is preferably a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The software may reside in RAM memory, flash memory, ROM memory, registers, hard disk, a removable disk, a CD-ROM, a DVD-ROM, or any other form of storage medium known in the art.
The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (35)
1. In a digital cinema system, a method of serializing frequency-based image data, the method comprising:
editing at least one set of data, which can be represented as a 16 x 16 block;
dividing the set of data into groups that can be represented as 4 8 x 8 blocks;
each of the 4 8 x 8 blocks of data is serialized.
2. The method of claim 1, wherein the serializing comprises zigzag scanning each of the 4 8 x 8 data blocks.
3. The method of claim 1, wherein the serializing comprises scanning each of the 4 8 x 8 blocks of data vertically.
4. The method of claim 1, wherein the serializing comprises scanning each of 4 8 x 8 blocks of data horizontally.
5. The method of claim 1, wherein said editing at least one group comprises: a frame of data is compiled that can be represented as a plurality of 16 x 16 blocks.
6. The method of claim 1, wherein the frequency-based image data is separated into Y, Cb and Cr color components.
7. The method of claim 6 wherein said Y, Cb and Cr color components are further separated into even and odd color components.
8. In a digital cinema system, a method of compressing a digital image, the image comprising pixel data, the pixel data being divided into color components, the method comprising the acts of:
reading a set of color components of the pixel data;
generating a block size allocation to divide the set of color components of the pixel into sub-blocks of pixel data;
transforming the sub-blocks of pixel data into corresponding frequency domain representations; and
scaling the frequency domain representation into a data stream, wherein the scaling action is based on a quality metric related to image quality;
editing at least one set of data from the data stream, which may be represented as a 16 x 16 block;
dividing the 16 x 16 data group into groups that can be represented as 4 8 x 8 blocks; and
each of the 4 8 x 8 blocks of data is serialized.
9. The method of claim 8, wherein the act of scaling further comprises: an act of providing a frequency weighted mask to the sub-block of pixel data such that the frequency weighted mask emphasizes image portions that are more sensitive to the human visual system and less emphasizes image portions that are less sensitive to the human visual system.
10. The method of claim 8, wherein the act of scaling further comprises: an act of quantizing the sub-blocks of pixel data according to image quality.
11. The method of claim 8, wherein the quality metric is a signal-to-noise ratio.
12. The method of claim 8, wherein the act of transforming performs a discrete cosine transform.
13. The method of claim 8, wherein the act of transforming performs a discrete cosine transform followed by a differential quadtree transform.
14. The method of claim 8, wherein the color components are Y, Cb and Cr color components.
15. The method of claim 14 wherein said Y, Cb and Cr color components are separated into even and odd color components.
16. In a digital cinema system, an apparatus for serializing frequency-based image data, the apparatus comprising:
means for editing at least one set of data, the set of data being representable as a 16 x 16 block;
means for dividing the set of data into groups that can be represented as 4 8 x 8 blocks;
means for serializing each of the 4 8 x 8 blocks of data.
17. The apparatus of claim 16, wherein the means for serializing comprises means for zigzag scanning each of the 4 8 x 8 data blocks.
18. The apparatus of claim 16, wherein the means for serializing comprises means for vertically scanning each of the 4 8 x 8 blocks of data.
19. The apparatus of claim 16, wherein the means for serializing comprises means for scanning each of 4 8 x 8 blocks of data horizontally.
20. The apparatus of claim 16, wherein said means for editing at least one group comprises: apparatus for editing a frame of data represented as a plurality of 16 x 16 blocks.
21. The apparatus of claim 16, wherein the frequency-based image data is separated into Y, Cb and Cr color components.
22. The apparatus of claim 21 wherein said Y, Cb and Cr color components are further separated into even and odd color components.
23. In a digital cinema system, an apparatus for compressing a digital image, the image comprising pixel data, the pixel data being divided into color components, the apparatus comprising:
means for reading a set of pixel data;
means for generating a block size allocation to divide the set of pixels into sub-blocks of pixel data;
means for transforming the sub-blocks of pixel data into respective frequency domain representations; and
means for scaling the frequency domain representation into a data stream, wherein the scaling action is based on a quality metric related to image quality;
means for editing at least one set of data from the data stream, the set of data possibly represented as a 16 x 16 block;
means for dividing the 16 x 16 data group into groups that can be represented as 4 8 x 8 blocks; and
means for serializing each of the 4 8 x 8 blocks of data.
24. The apparatus of claim 23, wherein the act of transforming performs a discrete cosine transform.
25. The apparatus of claim 23, wherein the act of transforming performs a discrete cosine transform followed by a differential quadtree transform.
26. The apparatus of claim 23, wherein the color components are Y, Cb and Cr color components.
27. The apparatus of claim 26 wherein said Y, Cb and Cr color components are separated into even and odd color components.
28. In a digital cinema system, an apparatus for serializing frequency-based image data, the apparatus comprising:
an editor for editing at least one set of data representable as a 16 x 16 block;
a divider for dividing the set of data into groups that can be represented as 4 8 x 8 blocks;
a serializer for serializing each of the 4 8 x 8 blocks of data.
29. The apparatus of claim 28, wherein the serializer further comprises a zig-zag scanner for zig-zag scanning each of the 4 8 x 8 blocks of data.
30. The apparatus of claim 28, wherein the serializer further comprises a vertical scanner for vertically scanning each of the 4 8 x 8 blocks of data.
31. The apparatus of claim 28, wherein the serializer further comprises a horizontal scanner for horizontally scanning each of the 4 8 x 8 blocks of data.
32. The apparatus of claim 28, wherein the editor is operative to edit a frame of data represented as a plurality of 16 x 16 blocks.
33. The apparatus of claim 28, wherein the frequency-based image data is separated into Y, Cb and Cr color components.
34. The apparatus of claim 33 wherein said Y, Cb and Cr color components are further separated into even and odd color components.
35. In a digital cinema system, an apparatus for compressing a digital image, the image comprising pixel data, the apparatus comprising:
a reader for reading a set of pixel data;
a generator for generating a block size allocation to divide the pixel group into sub-blocks of pixel data;
a transformer for transforming the sub-blocks of pixel data into corresponding frequency domain representations;
a scaler for scaling the frequency domain representation into a data stream, wherein the scaling action is based on a quality metric related to image quality;
an editor for editing at least one set of data from the data stream, the set of data being representable as a 16 x 16 block;
a divider for dividing the 16 × 16 data group into groups representable as 4 8 × 8 blocks; and
a serializer for serializing each of the 4 8 x 8 blocks of data.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/882,753 | 2001-06-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1073040A true HK1073040A (en) | 2005-09-16 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1186942C (en) | Variance based adaptive block size DCT image compression | |
| EP1405524B1 (en) | Configurable pattern optimizer | |
| JP5107495B2 (en) | Quality-based image compression | |
| JP4870743B2 (en) | Selective chrominance decimation for digital images | |
| CN1293509C (en) | Apparatus and method for encoding digital image data in a lossless manner | |
| AU2002315160A1 (en) | Configurable pattern optimizer | |
| CN1539239A (en) | Method and device for interframe coding | |
| HK1073040A (en) | Configurable pattern optimizer | |
| CN1992896A (en) | An apparatus and method for encoding digital image data in a lossless manner | |
| HK1053213B (en) | Quality based image compression | |
| HK1053565B (en) | Variance based adaptive block size dct image compression | |
| HK1166429B (en) | Configurable pattern optimizer | |
| HK1067266A (en) | Interframe encoding method and apparatus | |
| HK1070519A (en) | Selective chrominance decimation for digital images |