HK1114227A

HK1114227A - Tiled prefetched and cached depth buffer

Info

Publication number: HK1114227A
Application number: HK08109256.2A
Authority: HK
Inventors: 迈克尔‧休‧安德森; 丹‧明伦‧庄; 杰弗里‧希普佩; 拉雅‧拉金德尔库马尔‧达万
Original assignee: 高通股份有限公司
Priority date: 2005-03-21
Filing date: 2006-03-21
Publication date: 2008-10-24

Abstract

A 3D graphics pipeline includes a prefetch mechanism that feeds a cache of depth tiles. The prefetch mechanism may be predictive, using triangle geometry information from previous pipeline stages to pre-charge the cache, thereby allowing for an increase in memory bandwidth efficiency. A z-value compression technique may be optionally utilized to allow for a further reduction in power consumption and memory bandwidth.

Description

Tiled prefetch and cache depth buffer

Technical Field

The present disclosure relates generally to graphics processors, and more particularly to 3D graphics pipelines included in graphics processors.

Background

Three-dimensional (3D) images have been displayed on stationary display devices, such as computer and television screens, using graphics engines. These engines are typically included in desktop systems powered by conventional ac power outlets and therefore are not significantly constrained by power consumption limitations. However, the current trend is to incorporate 3D graphics engines into battery-powered handheld devices. Examples of such devices include mobile phones and Personal Digital Assistants (PDAs). Unfortunately, however, conventional graphics engines consume a large amount of power and are therefore not well suited for these low power operating environments.

FIG. 1 is a schematic block diagram of a basic Open GL rasterization pipeline included in a conventional 3D graphics engine. As shown, the rasterization pipeline of this example includes a triangle setup stage 101, a pixel shading stage 102, a texture mapping stage 103, a texture blending stage 104, a shear test stage 105, an alpha test stage 106, a stencil test stage 107, a Hidden Surface Removal (HSR) stage 108, an alpha blending stage 109, and a logical operations stage 110.

In a 3D graphics system, each object to be displayed is typically divided into surface triangles defined by vertex information, although other primitive shapes may also be utilized. Also typically, graphics pipelines are designed to process successive batches of triangles of an object or image. The triangles of any given lot may visually overlap the triangles of another lot, and the triangles within a given lot may also overlap each other.

Referring to FIG. 1, the triangle setup stage 101 "sets up" each triangle by computing setup coefficients to be used in computations performed by later pipeline stages.

The pixel shading stage 102 uses the setup coefficients to calculate which pixels each triangle covers. Since the triangles may overlap each other, multiple pixels with different depths may be located at the same point on the screen display. In particular, the pixel shading stage 101 uses vertex information to insert color, blur, depth values, texture coordinates, alpha values, etc. for each pixel. Any of a variety of shading techniques may be employed to achieve this, and shading operations may occur on a per triangle or per pixel basis.

The texture mapping stage 103 and the texture blending stage 104 are used to add and blend texture into each pixel in the triangle processing batch. Very generally, this is done by mapping a predefined texture onto pixels according to the texture coordinates contained within the vertex information. As with shading, texturing may be achieved using a variety of techniques. Also, a technique known as ambiguity handling may also be implemented.

The crop test stage 105 is used to discard pixels contained in a portion (slice) of a triangle that falls outside the field of view of the displayed scene. Typically, this is done by determining whether a pixel lies within a so-called clipping rectangle.

Alpha test unit 106 conditionally discards a fragment of a triangle (more precisely, the pixels contained in the fragment) based on a comparison between the Alpha value (transparency value) associated with the fragment and a reference Alpha value. Similarly, the template test conditionally discards fragments based on a comparison between each fragment and a stored template value.

The HSR stage 108 (also referred to as a depth test stage) discards pixels contained in the triangle fragment based on the depth values of other pixels having the same display location. Typically, this is done by comparing the z-axis value (depth value) of the pixel undergoing depth test with the z-axis value stored in the corresponding location of a so-called z-buffer (or depth buffer). The tested pixel is discarded when its z-axis value indicates that the pixel will be blocked from view by another pixel whose z-axis value is stored in the z-buffer. On the other hand, the z-buffer value is overwritten with the z-axis value of the tested pixel if the tested pixel is not to be blocked from view. In this approach, the underlying pixels that are blocked from view are discarded, while the overlying pixels remain.

The alpha blending stage 109 combines the rendered pixels with previously stored pixels in the color buffer based on the alpha values to achieve transparency of the object.

The logical operation unit 110 generally represents the remaining miscellaneous pipeline processing for ultimately obtaining pixel display data.

In any graphics system, it is desirable to save processor and memory bandwidth as much as possible while maintaining satisfactory performance. This is particularly the case in portable or handheld devices where bandwidth may be limited. Also, as previously explained, there is a particular need in the industry to minimize power consumption and improve bandwidth efficiency when processing 3D graphics for display on portable or handheld devices.

Disclosure of Invention

According to one aspect of embodiments of the present invention, there is provided a graphics processor, including a rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data. The processor further includes a memory that stores data utilized by at least one of the processing stages of the rasterization pipeline, and also includes a pre-fetch mechanism that retrieves data utilized by the at least one processing stage regarding a processed pixel prior to the processed pixel reaching the at least one processing stage.

According to yet another aspect of embodiments of the present invention, there is provided a graphics processor, including a rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage. The processor further includes: a depth buffer storing depth values of previously rendered pixels; a memory controller that retrieves depth values for the previously rendered pixels; and a cache coupled to the HSR stage of the pipeline and storing depth values retrieved by the memory controller.

According to yet another aspect of embodiments of the present invention, there is provided a graphics processor, including a rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage. The processor further includes: a depth buffer that stores depth values of two-dimensional pixel blocks; a block address generator that generates a block address for a two-dimensional block of pixels that includes processed pixels; a cache coupled to an HSR stage of the rasterizing processor; and a memory controller that retrieves depth values for the two-dimensional pixel block from a depth buffer in response to the block address and stores the depth values in the cache memory.

According to yet another aspect of the embodiments of the present invention, there is provided a graphics processor, including: a rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data; and means for pre-fetching data from a main memory and supplying the data to at least one of the processing stages before pixel data reaches the at least one processing stage through the rasterization pipeline.

According to yet another aspect of embodiments of the present invention, there is provided a graphics processor, including a rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage. The processor further includes: a hierarchical depth buffer storing depth values of two-dimensional pixel blocks; a random access memory coupled to the HSR stage and storing a maximum depth value and a minimum depth value of depth values for the two-dimensional block of pixels; a block address generator that generates a block address for a two-dimensional block of pixels that includes processed pixels; a cache coupled to an HSR stage of the rasterizing processor; and a memory controller that retrieves depth values for the two-dimensional pixel block from a depth buffer in response to the block address and stores the depth values in the cache memory.

According to yet another aspect of embodiments of the present invention, there is provided a graphics processor, including a rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage. The processor further includes a depth buffer including two-dimensional blocks of depth value data associated with pixel data rendered by the rasterization pipeline, wherein the primitive object data is indicative of a primitive shape, and wherein depth value data of a two-dimensional block is compressed if the two-dimensional block is completely contained within a primitive shape containing a processed pixel.

According to another aspect of the embodiments of the present invention, there is provided a graphics processing method, including: supplying primitive object data to a rasterization pipeline, the rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data; storing data utilized by at least one of the processing stages of the rasterization pipeline in a memory; and pre-fetching data about the processed pixel utilized by the at least one processing stage from the memory before the processed pixel reaches the at least one processing stage.

According to still another aspect of the embodiments of the present invention, there is provided a graphics processing method, including: supplying primitive object data to a rasterization pipeline, the rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage; and selectively compressing the two-dimensional blocks of depth value data in the depth buffer. The primitive object data indicates a primitive shape, and depth value data for a two-dimensional block is compressed when the two-dimensional block is completely contained within the primitive shape containing the processed pixels.

Drawings

The above and other aspects of the disclosed embodiments will be readily apparent from the following detailed description, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic block diagram of an example of a basic Open GL rasterization pipeline included in a 3D graphics engine;

FIG. 2 illustrates a simplified example of a circuit block configuration of a graphics pipeline, according to an embodiment of the present invention;

FIG. 3 is a view for explaining predictive prefetching of a pixel block according to another embodiment of the present invention;

FIG. 4 illustrates a simplified example of a circuit block configuration of a graphics pipeline, according to another embodiment of the invention;

FIG. 5 illustrates a block diagram of another embodiment of the present invention in which z-values for tiles are predictively prefetched and stored in a cache;

FIG. 6 illustrates a block diagram for explaining the operation of the depth cache illustrated in FIG. 5; and

fig. 7 is a view for explaining pixel blocks that are candidates for z-compression according to an embodiment of the present invention.

Detailed Description

Some embodiments herein feature, at least in part, a 3D graphics pipeline that includes a pre-fetch mechanism that feeds a cache of depth pixel tiles (tiles). The prefetch mechanism may be predictive, using triangle geometry information from previous pipeline stages to pre-charge the cache, thus enabling an increase in memory bandwidth efficiency.

Other embodiments feature, at least in part, a z-value compression technique that achieves a reduction in power consumption and memory bandwidth.

Several preferred but non-limiting embodiments will now be described.

The triangle setup block of the 3D graphics pipeline may be preceded by the command block referred to herein. The command block contains all relevant data for each triangle, including pixel screen position information. According to an embodiment of the invention, pixel screen position data is fed forward into the pipeline and used by later pipeline stages to calculate the address of the data needed for pixel processing. By the time a pixel reaches a given level, the value associated with that level will already be in the cache, thus achieving an improvement in bandwidth efficiency.

Fig. 2 is a simplified block diagram illustrating an embodiment of the present invention. A 3D graphics pipeline is depicted with a command block 200 and first through nth pipeline blocks 201a.. 201 n. At least one of the pipeline blocks is operably equipped with a cache memory 202a.. 202 d. By forwarding the address information to pipeline stages 1, 2, n-1, and/or n in advance, it is possible to retrieve relevant data from main memory before the processed pixels reach the pipeline stages. In this way, memory throughput is increased.

Also, in an alternative embodiment, the prefetch mechanism is accompanied by a prediction mechanism to further improve memory efficiency. This is described later with reference to fig. 3, which relates to an example of predictively prefetching z-values (depth values) from a depth buffer.

A three-dimensional (3D) rasterization pipeline utilizes a "depth test" to determine whether a newly processed pixel is occluded by a previously rendered pixel. The mechanism involves accessing a "depth buffer" (also referred to as a "z-buffer") in which depth values (i.e., z-values) are stored and checked during rasterization. Essentially the distance of any visible pixel from the viewer is stored as a depth value in the depth buffer. Subsequently, another processed pixel may attempt to occupy the same position on the screen. The depth value of the previously rendered pixel (i.e., the depth value stored in the depth buffer at the pixel location) is read and compared to the value of the newly processed pixel. If the comparison result indicates that the new pixel is closer to the viewer, it is considered visible and the previous depth value of the depth buffer may be overwritten with the depth value of the new pixel. The new pixels are further processed by the pipeline and eventually rendered in a frame buffer. On the other hand, if the comparison result indicates that the new pixel is farther from the viewer, it is deemed invisible and the new pixel may be discarded and the previous depth value of the depth buffer maintained. This process is referred to herein as Hidden Surface Removal (HSR).

FIG. 3 illustrates an example of how a triangle strip may be mapped onto a z-value tile. Triangles are labeled a through E and appear on the pipeline in that order. The blocks of pixels are labeled 1-13. To process triangle A, pixel blocks 1, 2, 3, 4, 5, and 8 are needed. Thus, the z-values for tiles 1, 2, 3, 4, 5, and 8 are prefetched from the depth buffer and stored in the cache. Next, to process triangle B, pixel blocks 4, 5, 8, and 9 are needed. However, since the pixel blocks 4, 5 and 8 are already stored in the cache, the pixel block 9 only has to be prefetched from the depth buffer. Triangle C is also similar, only the pixel block 6 has to be prefetched. Memory bandwidth efficiency is improved by predictively caching tiles in this manner.

FIG. 4 is a block diagram of an example of a 3D graphics pipeline configured for prefetching z-values to be utilized in a Hidden Surface Removal (HSR) block of the pipeline. In the figure, the pipeline includes a command block 400, a triangle setup block 401, a pixel shading block 402, an HSR block 403, a texture mapping block 404, and a texture blending block 405. In addition, the HSR block 403 is provided with a depth cache 406 and can access a depth buffer 407.

In operation, the address information for the depth pixel block is forwarded directly from the command block 400 to the HSR block 403. The HSR block 403 is configured to prefetch depth values from the depth buffer 407 according to the address information and then store the depth values in the depth cache 406. As such, when the processed pixel reaches the HSR block 403 through the pipeline, the depth value of the previously rendered pixel may be quickly retrieved from the cache 406 for HSR processing.

The predictive pre-fetching technique of depth buffer management of embodiments of the present invention lends itself extremely well to the use of a so-called hierarchical z-buffer, an example of which is described next.

Fig. 5 and 6 are functional block diagrams illustrating another embodiment of the present invention, wherein fig. 6 is a functional block diagram for explaining the operation of the depth test block 504 illustrated in fig. 5.

FIG. 5 illustrates a command engine 501, a triangle setup block 502, a pixel shading block 503, a depth test block 504 (containing a hierarchical z-buffer, not shown), a memory system 505 (containing a depth buffer), and remaining pipeline blocks 506.

In operation, triangle data from the command engine 501 is applied to the triangle setup block 502. The triangle setup block outputs the corresponding depth coefficients, geometry data and attribute coefficients, all of which are applied to the pixel shading block 503. The pixel attributes and pixel addresses are then supplied by the pixel shading block 503 to the depth test block 504, along with the triangle bounding box data from the command engine 501 and the depth coefficients from the triangle setup block 502. Depth test block 504 then performs a depth test with respect to the processed pixels and depth values stored in a cache (not shown). Preferably, the depth values are predictively retrieved from the memory system 505 and stored in a cache prior to the actual execution of the depth test. The processed pixels are then discarded as a result of the depth test or transmitted to the remaining pipeline block 506 in the form of pixel addresses and pixel attributes.

As already mentioned, fig. 6 is a functional block diagram for explaining the operation of the depth test block 504 illustrated in fig. 5. As shown in FIG. 6, the depth test block of this example generally includes a tile index predictor 601, a tile index generator 602, a depth interpolator 603, a tile test block 604, a pixel test block 607, an attribute buffer 608, and a depth cache 609.

Attribute buffer 608 is used to store pixel attributes for incoming pixels as they are passed down the pipeline. The depth block is a pipeline and the attribute buffer 608 matches the pipeline. As will be explained below, the discard _ pixel signal is actually an erase or clear signal for pixels flowing through pipeline 621.

The tile index predictor 601 utilizes bounding box information bounding box to predictively generate a series of tile indices that indicate the tiles occupied by the processed triangles. As previously discussed in connection with FIG. 3, memory bandwidth efficiency is improved by predictively caching blocks of pixels with respect to processed triangles. The prefetch logic 610 utilizes the tile index from the tile index predictor 601 to control the cache read block 612 of the depth cache 609. The operation of the cache read block 612 will be explained later. However, the prefetch logic block 610 makes early tile requests to the cache read block 612 so that pixels later requested by a pixel test block (explained below) are more likely to be present in the cache RAM.

The tile index generator 602 generates a tile index signal tile _ index _ in from the incoming pixel address pixel _ address _ in. It should be noted that since the same tile index will have been predicted earlier by the tile index predictor 601, logic may be shared between the tile index predictor 601 and the tile index generator 602.

The depth interpolator 603 actually rasterizes the depth value z _ in for the incoming pixel address in using the depth coefficient z _ coefficients and the bounding box information bounding box. It is also possible to include a depth interpolator 603 as part of the shading block (see fig. 5). However, in this example, the depth interpolator 603 has been implemented in the depth test block, as the same interpolator can be used to decompress z if only the coefficients are stored for any given tile. In this regard, it should be noted that the depth interpolator also occurs within the depth cache block 609.

The tile test block 604 is essentially a hierarchical z test block and is configured with a limit table 605 and a visibility check block 606. The limit table 605 contains the maximum far depth value (z-value) z _ max _ far and the minimum near depth value (z-value) z _ min _ near for each screen tile. The tile _ index from the tile index generator 602 is used as an address into the limit table 605, and thus, the limit table 605 produces a minimum depth value z _ min _ near and a maximum depth value z _ max _ far for the tile containing the processed pixel. The minimum depth value z _ min _ near and the maximum depth value z _ max _ far for the tile are then applied with z _ in to the visibility check block 606. The visibility check block 606 compares z _ in to z _ min _ near and z _ max _ far, where the comparison has three possible conclusions, namely, z _ in is farther than z _ max _ far for the tile, z _ in is closer than z _ min _ near for the tile, or z _ in is closer than z _ max _ far for the tile but farther than z _ min _ near for the tile.

In the case where z _ in is further than z _ max _ far of the tile, the pixel is discarded by operation of the discard _ pixel signal on the attribute buffer 608.

Where z _ in is closer than z _ min _ near for a tile, the pixel is visible and must be updated by enabling the update _ pixel signal and transmitting the signals represented in FIG. 6 as update _ pixel _ tile _ index, update _ pixel _ address, update _ pixel _ z, and update _ pixel _ z _ coefficients to the cache write block 617. The cache write block 617 includes cache tag management. When a pixel is updated, the cache write block 617 functions to update the cache RAM619 and maintain data coherency with the external memory system 620. Also, when a tile is stored back into the cache 619 or external memory system 620, the cache write block 617 streams the depth information pixel _ z for the tile to the limit generator 618.

Limit generator 618 calculates z _ max _ far and z _ min _ near for the tile as it is stored in memory system 620. Then, the update _ tile signal is enabled and the signals update _ tile _ index, z _ max _ far, and z _ min _ near are transmitted to the tile test block 604 in order to update the limit table 605.

As previously mentioned, the cache write block 617 receives the signals update _ pixel _ tile _ index, update _ pixel _ address, update _ pixel _ z, and update _ pixel _ z _ coeffients. The update _ pixel _ tile _ index signal is essentially a cache block index (or cache line index). update _ pixel _ address is a cache address used to address an individual pixel. update _ pixel _ z is an individual depth value (z value) of an individual pixel. The update _ pixel _ z _ coefficients signal contains coefficients that are used as part of the z-compression technique. That is, the compression table 611 of the depth cache 609 keeps track of which tiles store only their coefficients. When such a tile is encountered by the cache read block 612, the coefficients are read from the cache RAM619 and then passed through the depth interpolator 616 to recover the individual depth values.

Where z _ in is closer than z _ max _ far of the tile but further than z _ min _ near of the tile, the pixel is between the minimum and maximum values of the tile. As such, individual pixel tests are performed by enabling the pixel _ test _ enable signal. In response, the request _ pixel, request _ pixel _ tile _ index, and request _ pixel _ address signals are sent by the pixel test block 607 to the depth cache 609 to request the depth value of the previously processed pixel. The request _ pixel signal is essentially a cache read command, and the request _ pixel _ tile _ index and request _ pixel _ address are the tile and pixel addresses, respectively. In response to these signals, cache read block 612 retrieves the requested z-values for the previously processed pixels from cache RAM619 via memory interface 613. The cache read block 612 includes cache tag checking and management. The requested z value is supplied as a request _ pixel _ z signal to the pixel test block 607, which pixel test block 607 then determines whether the processed pixel is visible. If the pixel is determined to be invisible, then the discard _ pixel signal is enabled as previously described with respect to the tile test block 604. If the pixel is determined to be visible, the update _ pixel signal is enabled and the update _ pixel _ tile _ index, update _ pixel _ address, update _ pixel _ z, and update _ pixel _ z _ coeffients signals are utilized in the same manner as previously described in connection with the tile test block 604.

It should be noted that another level of hierarchical z-buffer may be implemented in which if a triangle is completely within a tile, the entire triangle is discarded based on the maximum and minimum values of the tile.

The embodiments of fig. 5 and 6 utilize a tile mode of operation in which depth values for tiles are stored and retrieved from a depth buffer. To further increase bandwidth efficiency, it may be desirable to compress the data representing the pixel blocks. One such z-compression technique according to an embodiment of the present invention is described below.

In the description of this embodiment, it is assumed that the depth buffer is divided into a tile mode (e.g., 4 x 4 pixels) and the triangles are rendered in the tile mode.

Early in the pipeline process, the depth value for the pixels of each triangle is calculated from the vertex information associated with the triangle. Typically, linear interpolation is used for this purpose.

As such, if a tile corresponds to a location in the z-buffer that is updated by rendering a triangle, the depth values in the tile may be represented as a linear function:

Z(x，y)＝A_zx+B_zy+C_z

here, x and y denote horizontal and vertical coordinates of each pixel within a 4 × 4 pixel block. By giving the depth value of the upper left pixel of the block of pixels, said value is (Z)₀₀)、A_zAnd B_zThe remaining pixels of the pixel block may be obtained by interpolating the following equations:

Z_ij＝A_z*i+B_z*j+Z₀₀，(i＝0～3，J＝0～3)

thus, if a tile is compressible, it is not necessary to update the depth buffer with depth values for all its 16 pixels, but only Z₀₀、A_zAnd B_z. This is merely 3/16 of the information of a conventional pixel block, assume a_zAnd B_zAnd Z₀₀With the same data accuracy. When reading back the same compressed tile from the Z-buffer, only the Z-buffer has to be read₀₀、A_zAnd B_zAnd a decompression function based on the above formula is performed to obtain the depth value of the entire pixel block.

A block of pixels may only be compressed when it is fully contained in a triangle, as illustrated in fig. 7. As illustrated, tile A may be compressed, while tiles B and C may not be compressed because they intersect the triangle boundaries. For determining whether a pixel block falls completely within a triangle, it is usually sufficient to check whether all four corner pixels of the pixel block are within the triangle.

Since not every tile is compressible, on-chip memory may be utilized to store an array of flags (1 bit per tile) that may indicate whether a particular tile is compressed in the depth buffer. When a tile is read from the depth buffer, its respective compression flag is checked to determine if decompression of the data is required. When a tile is updated to the depth buffer, if it is compressible, the compressed data is written to the depth buffer and the corresponding compression flag is set.

In the drawings and specification, there have been disclosed typical preferred embodiments and, although specific examples are set forth, they are used in a generic and descriptive sense only and not for purposes of limitation. It is therefore to be understood that the scope of the invention is to be construed by the appended claims, and not by the exemplary embodiments.

Claims

1. A graphics processor, comprising:

a rasterization pipeline comprising a plurality of serially arranged processing stages that render display pixel data from input primitive object data;

a memory storing data utilized by at least one of the processing stages of the rasterization pipeline; and

a pre-fetch mechanism that retrieves data about a processed pixel utilized by the at least one processing stage before the processed pixel reaches the at least one processing stage.

2. The graphics processor of claim 1, wherein the retrieved data is stored in a cache of the at least one of the processing stages of the rasterization pipeline.

3. A graphics processor, comprising:

a rasterization pipeline comprising a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage;

a depth buffer storing data utilized by an HSR stage of the rasterizing pipeline; and

a pre-fetch mechanism that retrieves data about processed pixels utilized by the HSR stage from the depth buffer before the processed pixels reach the HSR stage through the rasterization pipeline.

4. The graphics processor of claim 3, wherein the retrieved data is stored in a cache of the HSR stage of the rasterization pipeline.

5. A graphics processor, comprising:

a rasterization pipeline comprising a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein processing stages include a Hidden Surface Removal (HSR) stage;

a depth buffer that stores depth values of two-dimensional pixel blocks;

a pixel block address generator that generates a pixel block address for the two-dimensional pixel block that includes a processed pixel;

a cache coupled to the HSR stage of the rasterizing processor;

a memory controller that retrieves depth values for the two-dimensional block of pixels from the depth buffer in response to the pixel block addresses and stores the depth values in the cache.

6. The graphics processor of claim 5, wherein the depth buffer is a hierarchical depth buffer.

7. A graphics processor, comprising:

a rasterization pipeline comprising a plurality of serially arranged processing stages that render display pixel data from input primitive object data; and

means for pre-fetching data from a main memory and supplying pixel data to at least one of the processing stages before the data passes through the rasterization pipeline to the at least one processing stage.

8. The graphics processor of claim 7, wherein the at least one processing stage is a Hidden Surface Removal (HSR) stage.

9. The graphics processor of claim 8, wherein the device comprises a cache memory storing data from the main memory and coupled to the HSR level.

10. A graphics processor, comprising:

a hierarchical depth buffer storing depth values of two-dimensional pixel blocks;

a random access cache coupled to the HSR stage and storing a maximum depth value and a minimum depth value of depth values for the two-dimensional block of pixels;

11. The graphics processor of claim 10, further comprising a pixel block test block that compares a depth value of a processed pixel to minimum and maximum depth values of a pixel block containing the processed pixel.

12. The graphics processor of claim 11, wherein the pixel block test block is operable to discard the processed pixel if a depth value of the processed pixel is less than a minimum depth value for the pixel block containing the processed pixel.

13. The graphics processor of claim 11, wherein the tile test block is operable to update the cache if a depth value of the processed pixel is greater than a maximum depth value for the tile containing the processed pixel.

14. The graphics processor of claim 13, further comprising a pixel test block that compares a depth value of the processed pixel to a previously stored depth value stored in the cache memory.

15. The graphics processor of claim 14, wherein the pixel block test block is operable to enable the pixel test block if a depth value of the processed pixel is between a minimum and a maximum depth value of the pixel block containing the processed pixel.

16. The graphics processor of claim 10, further comprising: a tile index predictor block that generates tile information based on primitive object data associated with the processed pixel; and prefetch logic that retrieves depth values for the pixel block based on pixel block information generated by the pixel block index predictor block.

17. A graphics processor, comprising:

a depth buffer comprising a two-dimensional block of pixels of depth value data associated with pixel data rendered by the rasterization pipeline, wherein the primitive object data indicates a primitive shape, and wherein the depth value data of a two-dimensional block of pixels is compressed if the two-dimensional block of pixels is completely contained within the primitive shape containing the processed pixel.

18. The graphics processor of claim 17, wherein the primitive shapes are triangles.

19. The graphics processor of claim 18 wherein the two-dimensional block of pixels is a 4 x 4 block of pixels.

20. The graphics processor of claim 17 wherein the depth value data is compressed by storing coefficients of an equation describing relative values of depth values for the two-dimensional block of pixels.

21. The graphics processor of claim 20, wherein the equation is a linear equation.

22. A graphics processing method, comprising:

supplying primitive object data to a rasterization pipeline comprising a plurality of serially arranged processing stages that render display pixel data from input primitive object data;

storing data utilized by at least one of the processing stages of the rasterization pipeline in a memory; and

pre-fetching data about a processed pixel utilized by the at least one processing stage from the memory before the processed pixel reaches the at least one processing stage.

23. The method of claim 22, further comprising storing the retrieved data in a cache of the at least one of the processing stages of the rasterizing pipeline.

24. The method of claim 23, wherein the at least one processing stage is a Hidden Surface Removal (HSR) stage.

25. The graphics processor of claim 24, further comprising performing a pixel block test that compares a depth value of a processed pixel to minimum and maximum depth values of a two-dimensional pixel block containing the processed pixel.

26. The graphics processor of claim 25, wherein the tile testing includes updating the cache if a depth value of the processed pixel is greater than a maximum depth value of the tile containing the processed pixel.

27. The graphics processor of claim 26, further comprising selectively performing a pixel test that compares a depth value of the processed pixel to a previously stored depth value stored in the cache.

28. The graphics processor of claim 27, wherein the pixel block test includes enabling the pixel test if a depth value of the processed pixel is between a minimum and a maximum depth value for the pixel block containing the processed pixel.

29. The graphics processor of claim 22, further comprising generating pixel block information based on primitive object data associated with the processed pixel, and prefetching depth values for a pixel block based on the pixel block information.

30. A graphics processing method, comprising:

supplying primitive object data to a rasterization pipeline, the rasterization pipeline including a plurality of serially arranged processing stages that render display pixel data from input primitive object data, wherein the processing stages include a Hidden Surface Removal (HSR) stage;

selectively compressing two-dimensional blocks of pixels of depth value data in a depth buffer, wherein the primitive object data indicates a primitive shape, and wherein the depth value data for a two-dimensional block of pixels is compressed when the two-dimensional block of pixels is completely contained within the primitive shape containing the processed pixels.