US20200202594A1 - Integration of variable rate shading and super-sample shading - Google Patents
Integration of variable rate shading and super-sample shading Download PDFInfo
- Publication number
- US20200202594A1 US20200202594A1 US16/228,692 US201816228692A US2020202594A1 US 20200202594 A1 US20200202594 A1 US 20200202594A1 US 201816228692 A US201816228692 A US 201816228692A US 2020202594 A1 US2020202594 A1 US 2020202594A1
- Authority
- US
- United States
- Prior art keywords
- quads
- shading rate
- shading
- rate
- modified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/40—Filling a planar surface by adding surface attributes, e.g. colour or texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/80—Shading
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Definitions
- Three-dimensional (“3D”) graphics processing pipelines perform a series of steps to convert input geometry into a two-dimensional (“2D”) image for display on a screen. Some of the steps include rasterization and pixel shading. Rasterization involves identifying which pixels (or sub-pixel samples) are covered by triangles provided by stages of the pipeline prior to the rasterizer. The output of rasterization includes quads—a block of 2 ⁇ 2 pixels—and coverage data that indicates which samples are covered by the pixels of the quads. The pixel shader shades the pixels of the quads, and the pixels of the quads are then written to a frame buffer. Because pixel shading is very resource-intensive, techniques are constantly being developed to improve efficiency of pixel shading.
- FIG. 1 is a block diagram of an example device in which one or more features of the disclosure can be implemented
- FIG. 2 illustrates details of the device of FIG. 1 , according to an example
- FIG. 3 is a block diagram showing additional details of the graphics processing pipeline illustrated in FIG. 2 ;
- FIGS. 4A-4D illustrates a technique for performing rasterization at a different resolution than pixel shading, according to an example.
- a technique for performing rasterization and pixel shading with decoupled resolution involves performing rasterization as normal to generate quads.
- the quads are accumulated into a tile buffer.
- a shading rate is determined for the contents of the tile buffer. If the shading rate is a sub-sampling shading rate, then the quads in the tile buffer are down-sampled, which reduces the amount of work to be performed by a pixel shader. The shaded down-sampled quads are then restored to the resolution of the render target. If the shading rate is a super-sampling shading rate, then the quads in the tile buffer are up-sampled. The results of the shaded down-sampled or up-sampled quads are written to the render target.
- FIG. 1 is a block diagram of an example device 100 in which one or more features of the disclosure can be implemented.
- the device 100 could be one of, but is not limited to, for example, a computer, a gaming device, a handheld device, a set-top box, a television, a mobile phone, a tablet computer, or other computing device.
- the device 100 includes a processor 102 , a memory 104 , a storage 106 , one or more input devices 108 , and one or more output devices 110 .
- the device 100 also includes one or more input drivers 112 and one or more output drivers 114 .
- any of the input drivers 112 are embodied as hardware, a combination of hardware and software, or software, and serve the purpose of controlling input devices 112 (e.g., controlling operation, receiving inputs from, and providing data to input drivers 112 ).
- any of the output drivers 114 are embodied as hardware, a combination of hardware and software, or software, and serve the purpose of controlling output devices 114 (e.g., controlling operation, receiving inputs from, and providing data to output drivers 114 ). It is understood that the device 100 can include additional components not shown in FIG. 1 .
- the processor 102 includes a central processing unit (CPU), a graphics processing unit (GPU), a CPU and GPU located on the same die, or one or more processor cores, wherein each processor core can be a CPU or a GPU.
- the memory 104 is located on the same die as the processor 102 , or is located separately from the processor 102 .
- the memory 104 includes a volatile or non-volatile memory, for example, random access memory (RAM), dynamic RAM, or a cache.
- the storage 106 includes a fixed or removable storage, for example, without limitation, a hard disk drive, a solid state drive, an optical disk, or a flash drive.
- the input devices 108 include, without limitation, a keyboard, a keypad, a touch screen, a touch pad, a detector, a microphone, an accelerometer, a gyroscope, a biometric scanner, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals).
- the output devices 110 include, without limitation, a display, a speaker, a printer, a haptic feedback device, one or more lights, an antenna, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals).
- a network connection e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals.
- the input driver 112 and output driver 114 include one or more hardware, software, and/or firmware components that are configured to interface with and drive input devices 108 and output devices 110 , respectively.
- the input driver 112 communicates with the processor 102 and the input devices 108 , and permits the processor 102 to receive input from the input devices 108 .
- the output driver 114 communicates with the processor 102 and the output devices 110 , and permits the processor 102 to send output to the output devices 110 .
- the output driver 114 includes an accelerated processing device (“APD”) 116 which is coupled to a display device 118 , which, in some examples, is a physical display device or a simulated device that uses a remote display protocol to show output.
- APD accelerated processing device
- the APD 116 is configured to accept compute commands and graphics rendering commands from processor 102 , to process those compute and graphics rendering commands, and to provide pixel output to display device 118 for display.
- the APD 116 includes one or more parallel processing units configured to perform computations in accordance with a single-instruction-multiple-data (“SIMD”) paradigm.
- SIMD single-instruction-multiple-data
- any processing system that performs processing tasks in accordance with a SIMD paradigm may be configured to perform the functionality described herein.
- computing systems that do not perform processing tasks in accordance with a SIMD paradigm performs the functionality described herein.
- FIG. 2 illustrates details of the device 100 and the APD 116 , according to an example.
- the processor 102 ( FIG. 1 ) executes an operating system 120 , a driver 122 , and applications 126 , and may also execute other software alternatively or additionally.
- the operating system 120 controls various aspects of the device 100 , such as managing hardware resources, processing service requests, scheduling and controlling process execution, and performing other operations.
- the APD driver 122 controls operation of the APD 116 , sending tasks such as graphics rendering tasks or other work to the APD 116 for processing.
- the APD driver 122 also includes a just-in-time compiler that compiles programs for execution by processing components (such as the SIMD units 138 discussed in further detail below) of the APD 116 .
- the APD 116 executes commands and programs for selected functions, such as graphics operations and non-graphics operations that may be suited for parallel processing.
- the APD 116 can be used for executing graphics pipeline operations such as pixel operations, geometric computations, and rendering an image to display device 118 based on commands received from the processor 102 .
- the APD 116 also executes compute processing operations that are not directly related to graphics operations, such as operations related to video, physics simulations, computational fluid dynamics, or other tasks, based on commands received from the processor 102 .
- the APD 116 includes compute units 132 that include one or more SIMD units 138 that are configured to perform operations at the request of the processor 102 (or another unit) in a parallel manner according to a SIMD paradigm.
- the SIMD paradigm is one in which multiple processing elements share a single program control flow unit and program counter and thus execute the same program but are able to execute that program with different data.
- each SIMD unit 138 includes sixteen lanes, where each lane executes the same instruction at the same time as the other lanes in the SIMD unit 138 but can execute that instruction with different data. Lanes can be switched off with predication if not all lanes need to execute a given instruction. Predication can also be used to execute programs with divergent control flow. More specifically, for programs with conditional branches or other instructions where control flow is based on calculations performed by an individual lane, predication of lanes corresponding to control flow paths not currently being executed, and serial execution of different control flow paths allows for arbitrary control flow.
- the basic unit of execution in compute units 132 is a work-item.
- Each work-item represents a single instantiation of a program that is to be executed in parallel in a particular lane.
- Work-items can be executed simultaneously (or partially simultaneously and partially sequentially) as a “wavefront” on a single SIMD processing unit 138 .
- One or more wavefronts are included in a “work group,” which includes a collection of work-items designated to execute the same program.
- a work group can be executed by executing each of the wavefronts that make up the work group. In alternatives, the wavefronts are executed on a single SIMD unit 138 or on different SIMD units 138 .
- Wavefronts can be thought of as the largest collection of work-items that can be executed simultaneously (or pseudo-simultaneously) on a single SIMD unit 138 . “Pseudo-simultaneous” execution occurs in the case of a wavefront that is larger than the number of lanes in a SIMD unit 138 . In such a situation, wavefronts are executed over multiple cycles, with different collections of the work-items being executed in different cycles.
- An APD scheduler 136 is configured to perform operations related to scheduling various workgroups and wavefronts on compute units 132 and SIMD units 138 .
- the parallelism afforded by the compute units 132 is suitable for graphics related operations such as pixel value calculations, vertex transformations, and other graphics operations.
- a graphics pipeline 134 which accepts graphics processing commands from the processor 102 , provides computation tasks to the compute units 132 for execution in parallel.
- the compute units 132 are also used to perform computation tasks not related to graphics or not performed as part of the “normal” operation of a graphics pipeline 134 (e.g., custom operations performed to supplement processing performed for operation of the graphics pipeline 134 ).
- An application 126 or other software executing on the processor 102 transmits programs that define such computation tasks to the APD 116 for execution.
- FIG. 3 is a block diagram showing additional details of the graphics processing pipeline 134 illustrated in FIG. 2 .
- the graphics processing pipeline 134 includes stages that each performs specific functionality of the graphics processing pipeline 134 . Each stage is implemented partially or fully as shader programs executing in the programmable compute units 132 , or partially or fully as fixed-function, non-programmable hardware external to the compute units 132 .
- the input assembler stage 302 reads primitive data from user-filled buffers (e.g., buffers filled at the request of software executed by the processor 102 , such as an application 126 ) and assembles the data into primitives for use by the remainder of the pipeline.
- the input assembler stage 302 can generate different types of primitives based on the primitive data included in the user-filled buffers.
- the input assembler stage 302 formats the assembled primitives for use by the rest of the pipeline.
- the vertex shader stage 304 processes vertices of the primitives assembled by the input assembler stage 302 .
- the vertex shader stage 304 performs various per-vertex operations such as transformations, skinning, morphing, and per-vertex lighting. Transformation operations include various operations to transform the coordinates of the vertices. These operations include one or more of modeling transformations, viewing transformations, projection transformations, perspective division, and viewport transformations, which modify vertex coordinates, and other operations that modify non-coordinate attributes.
- the vertex shader stage 304 is implemented partially or fully as vertex shader programs to be executed on one or more compute units 132 .
- the vertex shader programs are provided by the processor 102 and are based on programs that are pre-written by a computer programmer.
- the driver 122 compiles such computer programs to generate the vertex shader programs having a format suitable for execution within the compute units 132 .
- the hull shader stage 306 , tessellator stage 308 , and domain shader stage 310 work together to implement tessellation, which converts simple primitives into more complex primitives by subdividing the primitives.
- the hull shader stage 306 generates a patch for the tessellation based on an input primitive.
- the tessellator stage 308 generates a set of samples for the patch.
- the domain shader stage 310 calculates vertex positions for the vertices corresponding to the samples for the patch.
- the hull shader stage 306 and domain shader stage 310 can be implemented as shader programs to be executed on the compute units 132 , that are compiled by the driver 122 as with the vertex shader stage 304 .
- the geometry shader stage 312 performs vertex operations on a primitive-by-primitive basis.
- operations can be performed by the geometry shader stage 312 , including operations such as point sprite expansion, dynamic particle system operations, fur-fin generation, shadow volume generation, single pass render-to-cubemap, per-primitive material swapping, and per-primitive material setup.
- a geometry shader program that is compiled by the driver 122 and that executes on the compute units 132 performs operations for the geometry shader stage 312 .
- the rasterizer stage 314 accepts and rasterizes simple primitives (triangles) generated upstream from the rasterizer stage 314 .
- Rasterization consists of determining which screen pixels (or sub-pixel samples) are covered by a particular primitive. Rasterization is performed by fixed function hardware.
- the pixel shader stage 316 calculates output values for screen pixels based on the primitives generated upstream and the results of rasterization.
- the pixel shader stage 316 may apply textures from texture memory. Operations for the pixel shader stage 316 are performed by a pixel shader program that is compiled by the driver 122 and that executes on the compute units 132 .
- the output merger stage 318 accepts output from the pixel shader stage 316 and merges those outputs into a frame buffer, performing operations such as z-testing and alpha blending to determine the final color for the screen pixels.
- the rasterization performed by the rasterizer stage 314 is done at the same resolution as pixel shading performed by the pixel shader stage 316 .
- the rasterizer stage 314 accepts triangles from earlier stages and performs scan conversion on the triangles to generate fragments.
- the fragments are data for individual pixels of a render target and include information such as location, depth, and coverage data, and later, after the pixel shader stage, shading data such as colors.
- the render target is the destination image to which rendering is occurring (i.e., colors or other values are being written).
- the fragments are grouped into quads, each quad including fragments corresponding to four neighboring pixel locations (that is, 2 ⁇ 2 fragments).
- Scan conversion of a triangle involves generating a fragment for each pixel location covered by the triangle. If the render target is a multi-sample image, then each pixel has multiple sample locations, each of which is tested for coverage. The fragment records coverage data for the samples of that fragment.
- the fragments that are generated by the rasterizer stage 314 are transmitted to the pixel shader stage 316 , which shades the fragments (determines color values for those fragments), and may determine other values as well.
- Performing rasterization and pixel shading at the same resolution means that for each fragment generated by the rasterizer, the pixel shader 316 performs a calculation to determine a color for that fragment.
- the area of screen-space occupied by a pixel is the same area as the precision with which colors are determined.
- each fragment generated by the rasterizer stage 314 is shaded by a different work-item.
- the rasterizer stage 314 typically performs depth testing, culling fragments occluded by previously-rendered fragments.
- helper fragments are fragments that are not covered by a triangle but that are generated as part of a quad anyway to assist with calculating derivatives for texture sampling.
- Another way to understand the mode of operation in which rasterization is performed at the same resolution as shading is that the resolution at which the edges of a triangle can be defined is equivalent to the resolution at which colors of that triangle can be defined.
- VRS variable rate shading
- SSAA super sample anti-aliasing
- FIG. 4 illustrates a technique for rasterizing, shading, and outputting a rendered image using one of SSAA, VRS, or neither, according to an example.
- the technique begins with step 402 , where the rasterizer stage 314 rasterizes a triangle received from an earlier stage of the graphics processing pipeline 134 to determine covered samples and to generate fragments including indications of those covered samples.
- the rasterization generates one fragment for each pixel in the render target for which there is coverage by a triangle.
- a fragment is a grouping of data that corresponds to a single pixel and has information such as sample coverage, color data for each sample (after the pixel shader stage), depth data for each sample, and possibly other types of data.
- Fragments are used to color the pixels of the frame buffer in the output merger stage 318 .
- a sample is a point within a screen pixel for which information such as coverage information, depth information, and color information can be determined individually.
- the purpose of having multiple samples for each render target pixel is to perform anti-aliasing, which improves the visual appearance of hard edges within images.
- the rasterizer stage 314 determines which samples are covered by received primitives and which samples are not covered.
- the rasterizer stage 314 receives triangles from earlier stages of the graphics processing pipeline 134 and rasterizes those triangles to generate the fragments. Rasterizing a triangle includes determining which pixels of the render target are covered by the triangle, and which samples within those covered pixels are covered by the triangle, if there are multiple samples per pixel. Any technically feasible technique for rasterizing triangles may be used. A fragment is generated for each pixel for which one sample is covered.
- the rasterizer stage 314 also performs depth testing at step 402 .
- depth testing involves examining the depth value for each sample covered by the triangle and comparing those depth values to a depth buffer that stores depth values for already-processed triangles. The depth value for a particular sample is compared to the depth value stored at the depth buffer for the same position as the particular sample. If the depth buffer indicates that the sample is occluded, then that sample is marked as not covered and if the depth buffer indicates that the sample is not occluded, then that sample survives. The data indicating which sample locations are covered and not occluded is passed on to other parts of the graphics processing pipeline 134 for later processing as described elsewhere in this description.
- the term “covered” when applied a sample means that the sample is covered by a triangle and passes the depth test and the term “not covered” or “uncovered” means that a sample is either not covered by a triangle or is covered by a triangle but does not pass the depth test.
- Rasterization outputs fragments in 2 ⁇ 2 groups known as quads. More specifically, for each pixel of the render target that has at least one sample covered by the triangle, the rasterizer stage 314 generates a fragment. The rasterizer 314 creates quads from these fragments. Quads include fragments for an adjacent section of 2 ⁇ 2 pixels, even if one or more such fragments are completely not covered by the triangle (where “completely not covered” means that no samples of the fragment are covered by the triangle and not occluded). The fragments that are completely not covered are called helper fragments. Helper fragments are used by the pixel shader stage 316 to calculate spatial derivatives for shading. Often, these spatial derivatives are used for mipmap selection and filtering for textures, but the spatial derivatives can be used for other purposes.
- the rasterizer stage 314 determines one or more shading rates for the samples of the triangle.
- the shading rate may be one of a sub-sample shading rate, a one-to-one shading rate, or a super-sample shading rate.
- a sub-sample shading rate means that the resolution of pixel shading is lower than the resolution of the render target (but not the resolution of the samples).
- a one-to-one shading rate means that the resolution of pixel shading is the same as the resolution of the render target.
- a super-sample shading rate means that the resolution of pixel shading is higher than the resolution of the render target.
- the resolution of pixel shading can be different from the resolution of rasterization (coverage determination) even with a super-sample shading rate.
- the rasterizer it is possible for the rasterizer to determine sample coverage for a particular number of samples per pixel and then for pixel shading to occur at a lower rate than that number of samples. For example, it is possible for rasterization to occur for four samples for each fragment, but for pixel shading to occur only twice per fragment.
- the resolution of pixel shading defines the number of fragments that are shaded together in the pixel shader stage 316 . More specifically, for sub-sampling, the resolution of pixel shading determines how many pixel locations in the render target are given the color determined by a single work-item in the pixel shader stage 316 . For example, if the shading rate is one quarter, then a work-item in the pixel shader stage 316 determines a color for four pixel locations in the render target. For super-sampling, the resolution of pixel shading determines how many samples of a given fragment are given the color determined by a single work-item. For example, if the resolution of pixel shading is “4 ⁇ ,” then four different work-items determine colors for four different samples per fragment generated by the rasterizer stage 314 .
- the shading rate may be determined on a per-triangle basis, a per-shading rate tile basis, or on a per-shading rate tile basis for individual triangles.
- a unit in the graphics processing pipeline 134 upstream of the pixel shader determines a shading rate for triangles sent to the rasterizer stage 314 .
- a vertex shader stage 304 determines shading rates for the triangles processed by that stage.
- the geometry shader stage 312 determines shading rates for triangles emitted by that stage.
- the rasterizer stage 314 determines shading rates for different shading rate tiles of the render target.
- the render target is divided into shading rate tiles that each comprises multiple pixels of the render target. More specifically, the render target is “tiled” into shading rate tiles, each of which can have a different shading rate. Any technically feasible technique for determining the shading rate for a shading rate tile may be used. In one example, a shading rate tile image is used. A shading rate tile image has information for different shading rate tiles of a render target that indicates the shading rate of those shading rate tiles. The shading rate image may be specified explicitly or algorithmically by the application.
- each triangle is associated with a triangle shading rate image that defines the shading rates for the different portions of the triangle.
- shading rate tiles it is possible for the size of shading rate tiles to be the same size as the number of render target pixels covered by the tile buffer or larger than that buffer. However, the contents of the tile buffer at any particular point in time will have the same shading rate.
- the rasterizer stage 314 accumulates quads generated as the result of rasterization in step 402 into a tile buffer 510 .
- a tile buffer may store any technically feasible number of quads.
- a tile buffer stores four adjacent quads in a 2 ⁇ 2 array.
- the quads in the tile buffer correspond to a contiguous portion of the render target. This allows for downsampling of the quads in a smaller number of quads when VRS is used.
- the rasterizer stage 314 triggers step 406 . Note, this triggering may occur with at least some portion of the tile buffer 510 empty. More specifically, the tile buffer 510 stores quads from a contiguous portion of screen space, from the same triangle.
- a non-full tile buffer 510 would be used in step 406 (generating modified-rate quads based on the shading rate).
- the rasterizer stage 314 examines the contents of the tile buffer 510 and generates modified-rate quads based on the shading rate. There are three possible ways this can happen. As described above, for any particular instance of the contents of the tile buffer, a shading rate is defined for all those contents. This shading rate can be one of a sub-sampling rate, a 1:1 rate, or a super sampling rate. If the shading rate is a sub-sampling rate, then the rasterizer stage 314 down-samples the quads of the tile buffer 510 to generate modified-rate quads. The resulting down-sampled quads include coarse fragments that are bigger than the pixels of the render target.
- the purpose of down-sampling quads is to reduce the number of pixel shader work-items that are spawned to shade the fragments. Specifically, because the pixel shader launches one work-item per fragment, making the fragments larger results in fewer work-items being spawned, which results in a faster completion of the shading workload.
- down-sampling With a sub-sampling shading rate, it is possible that the amount of coverage information available in a down-sampled quad is insufficient to represent the full resolution of coverage data of the quads in the tile buffer 510 . If that is the case, then down-sampling also includes compressing the coverage data.
- the rasterizer stage 314 simply outputs the quads of the tile buffer 510 unmodified, as the modified-rate quads.
- the rasterizer stage 314 up-samples the quads of the tile buffer 510 to generate modified-rate quads.
- the resulting up-sampled quads include more quads than the quads in the tile buffer 510 .
- the factor by which the number of quads is increased is equal to the super-sampling rate.
- the rasterizer stage 314 assigns centroid positions for the fragments of the quads.
- the manner in which this is done depends on several factors, including the shading rate, the numbers and positions of samples in the tile buffer quads, and possibly other factors.
- the centroid is the position at which pixel attributes such as texture coordinates are evaluated.
- the pixel shader stage 316 shades the fragments of the quads. As described elsewhere herein, one work-item is spawned per fragment. The pixel shader shades fragments using the centroids determined at step 408 . It is also possible for the pixel shader to modify coverage for any particular fragment, by, for example, switching one or more samples of the fragment from covered to not covered or from not covered to covered. In an example, the pixel shader determines that an alpha value corresponding to a particular covered sample is completely transparent (e.g., has an alpha value of 0) and therefore sets that sample to be not covered. It should be understood that the foregoing is just one example and that a pixel shader program, which can be written by an application developer, could potentially modify coverage in any technically feasible manner.
- the output merger stage 318 restores the original resolution of those quads, which includes applying fine coverage data from the rasterizer stage 314 . Additional details are provided with respect to FIG. 4D .
- the output merger stage 318 performs late pixel operations and writes the samples of the quads to the frame buffer. If the shaded quads were down-sampled (i.e., if VRS was used), then the output merger stage 318 writes the data from the quads restored at step 412 . If the shaded quads were up-sampled or if a 1:1 shading rate was used, then the data from the quads output by the pixel shader 316 is used to shade the render target.
- FIG. 4B illustrates operations for generating modified shading rate quads based on the contents of a tile buffer 510 for a super sample shading rate, according to an example.
- FIG. 4B represents the operations of step 406 for a super-sampling shading rate.
- the tile buffer 510 is shown in a state after having accumulated quads generated by the rasterizer stage 314 (step 404 ).
- the shading rate determined for the contents of the tile buffer is a super sample shading rate, meaning that pixel shading occurs at a resolution that is higher than the resolution of the render target.
- the shading rate is 4 ⁇ , but the teachings herein apply to any super-sample shading rate.
- the tile buffer 510 has 3 quads (the space for quad 1 is empty as there were no covered samples for that quad), each of which has four fragments. Each fragment in the tile buffer 510 has four coverage samples.
- the rasterizer stage 314 To generate the modified shading rate quads 422 , for each quad in the tile buffer 510 for which at least one sample is covered, the rasterizer stage 314 generates a number of quads equal to the shading rate. Each fragment in the generated quad has a subset of the samples of the fragments in the tile buffer 510 .
- the ratio of the number of samples of the fragments in the tile buffer 510 to the number of samples of the fragments that are generated is equal to the shading rate.
- the fragments in the tile buffer 510 have four times as many samples as the modified shading rate fragments.
- the fragments in any particular generated quad have samples from the same sample locations of the fragments of a corresponding quad in the tile buffer.
- each fragment in a generated quad has a sample at location “sample a” of the pixel template 420 illustrated.
- four quads are generated—one for each sample, such that each generated quad includes fragments with samples at the same sample location and the samples assigned to different quads are different.
- quad 1 is empty and does not result in any modified shading rate quads.
- Quad 2 results in quads 2 a , 2 b , 2 c , and 2 d being generated.
- the fragments of quad 2 a have sample a from the fragments of quad 2 .
- the fragments of quad 2 b have sample b from the fragments of quad 2 .
- the fragments of quad 2 c have sample c from the fragments of quad 2 .
- the fragments of quad 2 d have sample d from the fragments of quad 2 .
- Quads 3 a - 3 d and 4 a - 4 d derive their samples from quads 3 and 4 in a similar manner. Note that it is possible for the number of coverage samples per fragment to be different from the shading rate. In that case, the fragments of the modified shading rate quads get multiple samples from the quads in the tile buffer.
- the centroids for the fragments of the quads are assigned in step 408 .
- the centroids are locations where attributes, such as texture coordinates, are evaluated.
- a centroid for a fragment is assigned based on the locations of the samples assigned to that fragment. For example, the fragments of quads 2 a , 3 a , and 4 a get centroids at the location of sample a.
- the fragments of quads 2 b , 3 b , and 4 b get centroids at the location of sample b
- quads 2 c , 3 c , and 4 c get centroids at the location of sample c
- quads 2 d , 3 d , and 4 d get centroids at the location of sample d.
- the centroids are located at a location that is representative of those samples. In an example, the centroid is at the location of one of the covered samples, is midway between the covered samples, or is at any other location representative of the samples.
- the modified shading rate quads 422 are shaded in step 410 .
- Each fragment of each modified shading rate quad 422 is shaded using a different work-item, and thus the samples that originated from a single fragment in the tile buffer 510 can be given different colors.
- the pixel shader stage 316 can modify coverage, for example, by marking covered samples as uncovered.
- the output merger stage 318 writes the shaded fragments to the render target. Details on writing shaded samples to a render target are generally known and are not described herein in detail. Generally, this operation includes performing a z-test to determine whether samples are occluded by older samples, and if blending is enabled, blending the color of samples with those in the render target. Other operations may be performed as well.
- FIG. 4C illustrates operations related to down-sampling quads in the tile buffer 510 when a sub-sample shading rate (VRS) is used, according to an example.
- the down-sampling operation includes converting the quads of the tile buffer 510 into a smaller number of one or more modified shading rate quads 440 .
- the number of quads generated is equal to the number of quads in the tile buffer 510 multiplied by the shading rate (although a smaller number may be generated if the tile buffer 510 is not completely filled with quads or if there are generated quads that have no coverage).
- the shading rate is 1/4
- the number of quads in the tile buffer 510 is four
- Each generated quad includes four fragments.
- the coverage assigned to each such fragment is the amalgamation of the coverage assigned to the fragments of the quads in the tile buffer 510 .
- such an amalgamation would result in the fragments of the modified shading rate quads 440 having too much coverage data.
- the graphics processing pipeline 134 may have a limitation on the number of bits that can be used to specify coverage data for a fragment. In this situation, when coverage data is amalgamated into coverage data for a fragment of a generated quad, that data is reduced in fidelity (compressed). The coverage data that remains would be geometrically representative of the coverage of the fragments of the quads in the tile buffer 510 .
- each fragment of the quads in the tile buffer 510 has four samples.
- the shading rate is 1/4, meaning that four fragments of the fragments in the tile buffer 510 are shaded together as a single fragment in the pixel shader stage 316 .
- the pixel shading hardware has a limit on the number of samples that can be processed per fragment, and that limit is eight. Due to these factors, the down-sample operation 442 generates the modified shading rate quads 440 in the following manner. The shading rate of 1/4 results in each quad in the tile buffer 510 being converted into a single fragment in the modified shading rate quads 440 .
- each quad has four fragments, and the shading rate is 1/4, the four fragments of a quad are converted into a single fragment. Because the tile buffer 510 has four quads, the contents of the tile buffer 510 are converted into a single quad. Each coarse fragment of the quad corresponds to four fragments of the tile buffer 510 .
- the sixteen samples of each quad in the tile buffer 510 are compressed to eight samples for each coarse fragment.
- Each sample is geometrically representative of two samples in the tile buffer 510 .
- this compression operation is conservative in that, if either or both of the samples that correspond to a compressed sample is covered in the tile buffer 510 , then the sample of the coarse fragment is also covered, but if neither sample is covered, then the sample in the coarse fragment is not covered.
- dotted lines are provided in the modified shading rate quads 440 to illustrate the corresponding areas of the fragments in the tile buffer 510 . It can be seen that each sample in those corresponding areas corresponds to two samples in the tile buffer 510 .
- the top-left sample in the portion of the coarse fragment corresponding to the “fine fragment” corresponds to the two top samples of that fine fragment and the bottom-right sample in the portion of the coarse fragment corresponding to the fine fragment corresponds to the two bottom samples of that fine fragment.
- a shading rate of 1/4 is illustrated, other shading rates, such as a 1/2 horizontal (a row of two fragments in the tile buffer 510 forms a coarse fragment in the modified shading rate quads), 1/2 vertical (a column of two fragments in the tile buffer 510 forms a coarse fragment in the modified shading rate quads) or any other rate can be used.
- the centroids are assigned to the fragments of the generated quads.
- the centroid for each coarse fragment is set in any technically feasible manner.
- the centroid is representative of the locations of the covered samples of the coarse fragment.
- the location of one of the fragments is chosen.
- the center of the coarse fragment is used as the centroid.
- the centroid is used as the location at which the pixel shader stage 316 calculates attributes such as texture coordinates.
- the pixel shader stage 316 shades the fragments of the generated quads. Specifically, one work-item per coarse fragment is launched and the color (and other attributes) determined for each coarse fragment is applied to each covered sample of that fragment. It is also possible for the pixel shader stage 316 to modify the coverage of the coarse fragments, such as by setting a covered sample to be not covered or setting a non-covered sample to be covered.
- the output merger stage 318 applies fine coverage data from the rasterizer stage 314 to the shaded quads to generate fragments at the resolution of the render target.
- FIG. 4D illustrates an example of this operation.
- the output merger stage 318 up-samples the shaded coarse quads to generate shaded upsampled quads.
- output merger stage 318 divides each of the coarse fragments into upsampled fragments based on the shading rate. For a shading rate of 1/4, each coarse fragment is converted to four upsampled fragments. The samples of each upsampled fragment get the color of the coarse fragment from which those samples originate.
- the sample resolution is restored if the samples were originally compressed, with each restored sample getting the color of the corresponding sample of the coarse fragment. The coverage (covered or not covered) of each restored sample is the same as the coverage of the corresponding sample of the coarse fragment.
- the up sample proceeds as follows.
- the coarse fragment 1 has no coverage. Therefore, the quad that would be generated from that fragment has no coverage and is discarded.
- Coarse fragment 2 has color 1 and has six covered fragments as shown.
- the corresponding up-sampled quad (quad 2 ) has three fragments with four samples covered each and one fragment with no covered samples. Each sample of quad 2 has the color of coarse fragment 2 .
- the coverage and colors of coarse fragment 3 and coarse fragment 4 are used to generate quad 3 and quad 4 .
- the original coverage data generated by the rasterization stage 314 is used to modulate the coverage data generated in the up-sample operation.
- the modulation is an “AND” operation wherein if both copies of a sample are covered in the original coverage data and the coverage data from the up-sample operation, then the output sample is considered covered and if either or both samples are uncovered in the original coverage data, then the output sample is considered uncovered.
- the result is a set of quads, with modulated coverage and with colors generated by the pixel shader 316 .
- the quads are written to the render target as per usual (e.g., depth testing, blending, and other operations are performed to combine the colors of these output quads with the colors in the render target).
- the rasterizer stage 314 first generates quads and then accumulates those quads into the tile buffer 510 .
- the rasterizer stage 314 generates the quads in the tile buffer 510 directly and does not need to perform the two separate steps of generating the quads and then accumulating those quads into the tile buffer 510 .
- processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine.
- DSP digital signal processor
- ASICs Application Specific Integrated Circuits
- FPGAs Field Programmable Gate Arrays
- Such processors can be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media). The results of such processing can be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements features of the disclosure.
- HDL hardware description language
- non-transitory computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
- ROM read only memory
- RAM random access memory
- register cache memory
- semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Image Generation (AREA)
Abstract
Description
- Three-dimensional (“3D”) graphics processing pipelines perform a series of steps to convert input geometry into a two-dimensional (“2D”) image for display on a screen. Some of the steps include rasterization and pixel shading. Rasterization involves identifying which pixels (or sub-pixel samples) are covered by triangles provided by stages of the pipeline prior to the rasterizer. The output of rasterization includes quads—a block of 2×2 pixels—and coverage data that indicates which samples are covered by the pixels of the quads. The pixel shader shades the pixels of the quads, and the pixels of the quads are then written to a frame buffer. Because pixel shading is very resource-intensive, techniques are constantly being developed to improve efficiency of pixel shading.
- A more detailed understanding can be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:
-
FIG. 1 is a block diagram of an example device in which one or more features of the disclosure can be implemented; -
FIG. 2 illustrates details of the device ofFIG. 1 , according to an example; -
FIG. 3 is a block diagram showing additional details of the graphics processing pipeline illustrated inFIG. 2 ; and -
FIGS. 4A-4D illustrates a technique for performing rasterization at a different resolution than pixel shading, according to an example. - A technique for performing rasterization and pixel shading with decoupled resolution is provided herein. The technique involves performing rasterization as normal to generate quads. The quads are accumulated into a tile buffer. A shading rate is determined for the contents of the tile buffer. If the shading rate is a sub-sampling shading rate, then the quads in the tile buffer are down-sampled, which reduces the amount of work to be performed by a pixel shader. The shaded down-sampled quads are then restored to the resolution of the render target. If the shading rate is a super-sampling shading rate, then the quads in the tile buffer are up-sampled. The results of the shaded down-sampled or up-sampled quads are written to the render target.
-
FIG. 1 is a block diagram of anexample device 100 in which one or more features of the disclosure can be implemented. Thedevice 100 could be one of, but is not limited to, for example, a computer, a gaming device, a handheld device, a set-top box, a television, a mobile phone, a tablet computer, or other computing device. Thedevice 100 includes aprocessor 102, amemory 104, astorage 106, one ormore input devices 108, and one ormore output devices 110. Thedevice 100 also includes one ormore input drivers 112 and one ormore output drivers 114. Any of theinput drivers 112 are embodied as hardware, a combination of hardware and software, or software, and serve the purpose of controlling input devices 112 (e.g., controlling operation, receiving inputs from, and providing data to input drivers 112). Similarly, any of theoutput drivers 114 are embodied as hardware, a combination of hardware and software, or software, and serve the purpose of controlling output devices 114 (e.g., controlling operation, receiving inputs from, and providing data to output drivers 114). It is understood that thedevice 100 can include additional components not shown inFIG. 1 . - In various alternatives, the
processor 102 includes a central processing unit (CPU), a graphics processing unit (GPU), a CPU and GPU located on the same die, or one or more processor cores, wherein each processor core can be a CPU or a GPU. In various alternatives, thememory 104 is located on the same die as theprocessor 102, or is located separately from theprocessor 102. Thememory 104 includes a volatile or non-volatile memory, for example, random access memory (RAM), dynamic RAM, or a cache. - The
storage 106 includes a fixed or removable storage, for example, without limitation, a hard disk drive, a solid state drive, an optical disk, or a flash drive. Theinput devices 108 include, without limitation, a keyboard, a keypad, a touch screen, a touch pad, a detector, a microphone, an accelerometer, a gyroscope, a biometric scanner, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals). Theoutput devices 110 include, without limitation, a display, a speaker, a printer, a haptic feedback device, one or more lights, an antenna, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals). - The
input driver 112 andoutput driver 114 include one or more hardware, software, and/or firmware components that are configured to interface with and driveinput devices 108 andoutput devices 110, respectively. Theinput driver 112 communicates with theprocessor 102 and theinput devices 108, and permits theprocessor 102 to receive input from theinput devices 108. Theoutput driver 114 communicates with theprocessor 102 and theoutput devices 110, and permits theprocessor 102 to send output to theoutput devices 110. Theoutput driver 114 includes an accelerated processing device (“APD”) 116 which is coupled to adisplay device 118, which, in some examples, is a physical display device or a simulated device that uses a remote display protocol to show output. The APD 116 is configured to accept compute commands and graphics rendering commands fromprocessor 102, to process those compute and graphics rendering commands, and to provide pixel output to displaydevice 118 for display. As described in further detail below, the APD 116 includes one or more parallel processing units configured to perform computations in accordance with a single-instruction-multiple-data (“SIMD”) paradigm. Thus, although various functionality is described herein as being performed by or in conjunction with theAPD 116, in various alternatives, the functionality described as being performed by the APD 116 is additionally or alternatively performed by other computing devices having similar capabilities that are not driven by a host processor (e.g., processor 102) and configured to provide graphical output to adisplay device 118. For example, it is contemplated that any processing system that performs processing tasks in accordance with a SIMD paradigm may be configured to perform the functionality described herein. Alternatively, it is contemplated that computing systems that do not perform processing tasks in accordance with a SIMD paradigm performs the functionality described herein. -
FIG. 2 illustrates details of thedevice 100 and theAPD 116, according to an example. The processor 102 (FIG. 1 ) executes anoperating system 120, adriver 122, andapplications 126, and may also execute other software alternatively or additionally. Theoperating system 120 controls various aspects of thedevice 100, such as managing hardware resources, processing service requests, scheduling and controlling process execution, and performing other operations. The APD driver 122 controls operation of theAPD 116, sending tasks such as graphics rendering tasks or other work to theAPD 116 for processing. TheAPD driver 122 also includes a just-in-time compiler that compiles programs for execution by processing components (such as theSIMD units 138 discussed in further detail below) of theAPD 116. - The APD 116 executes commands and programs for selected functions, such as graphics operations and non-graphics operations that may be suited for parallel processing. The APD 116 can be used for executing graphics pipeline operations such as pixel operations, geometric computations, and rendering an image to display
device 118 based on commands received from theprocessor 102. The APD 116 also executes compute processing operations that are not directly related to graphics operations, such as operations related to video, physics simulations, computational fluid dynamics, or other tasks, based on commands received from theprocessor 102. - The APD 116 includes
compute units 132 that include one ormore SIMD units 138 that are configured to perform operations at the request of the processor 102 (or another unit) in a parallel manner according to a SIMD paradigm. The SIMD paradigm is one in which multiple processing elements share a single program control flow unit and program counter and thus execute the same program but are able to execute that program with different data. In one example, eachSIMD unit 138 includes sixteen lanes, where each lane executes the same instruction at the same time as the other lanes in theSIMD unit 138 but can execute that instruction with different data. Lanes can be switched off with predication if not all lanes need to execute a given instruction. Predication can also be used to execute programs with divergent control flow. More specifically, for programs with conditional branches or other instructions where control flow is based on calculations performed by an individual lane, predication of lanes corresponding to control flow paths not currently being executed, and serial execution of different control flow paths allows for arbitrary control flow. - The basic unit of execution in
compute units 132 is a work-item. Each work-item represents a single instantiation of a program that is to be executed in parallel in a particular lane. Work-items can be executed simultaneously (or partially simultaneously and partially sequentially) as a “wavefront” on a singleSIMD processing unit 138. One or more wavefronts are included in a “work group,” which includes a collection of work-items designated to execute the same program. A work group can be executed by executing each of the wavefronts that make up the work group. In alternatives, the wavefronts are executed on asingle SIMD unit 138 or ondifferent SIMD units 138. Wavefronts can be thought of as the largest collection of work-items that can be executed simultaneously (or pseudo-simultaneously) on asingle SIMD unit 138. “Pseudo-simultaneous” execution occurs in the case of a wavefront that is larger than the number of lanes in aSIMD unit 138. In such a situation, wavefronts are executed over multiple cycles, with different collections of the work-items being executed in different cycles. AnAPD scheduler 136 is configured to perform operations related to scheduling various workgroups and wavefronts oncompute units 132 andSIMD units 138. - The parallelism afforded by the
compute units 132 is suitable for graphics related operations such as pixel value calculations, vertex transformations, and other graphics operations. Thus in some instances, agraphics pipeline 134, which accepts graphics processing commands from theprocessor 102, provides computation tasks to thecompute units 132 for execution in parallel. - The
compute units 132 are also used to perform computation tasks not related to graphics or not performed as part of the “normal” operation of a graphics pipeline 134 (e.g., custom operations performed to supplement processing performed for operation of the graphics pipeline 134). Anapplication 126 or other software executing on theprocessor 102 transmits programs that define such computation tasks to theAPD 116 for execution. -
FIG. 3 is a block diagram showing additional details of thegraphics processing pipeline 134 illustrated inFIG. 2 . Thegraphics processing pipeline 134 includes stages that each performs specific functionality of thegraphics processing pipeline 134. Each stage is implemented partially or fully as shader programs executing in theprogrammable compute units 132, or partially or fully as fixed-function, non-programmable hardware external to thecompute units 132. - The
input assembler stage 302 reads primitive data from user-filled buffers (e.g., buffers filled at the request of software executed by theprocessor 102, such as an application 126) and assembles the data into primitives for use by the remainder of the pipeline. Theinput assembler stage 302 can generate different types of primitives based on the primitive data included in the user-filled buffers. Theinput assembler stage 302 formats the assembled primitives for use by the rest of the pipeline. - The
vertex shader stage 304 processes vertices of the primitives assembled by theinput assembler stage 302. Thevertex shader stage 304 performs various per-vertex operations such as transformations, skinning, morphing, and per-vertex lighting. Transformation operations include various operations to transform the coordinates of the vertices. These operations include one or more of modeling transformations, viewing transformations, projection transformations, perspective division, and viewport transformations, which modify vertex coordinates, and other operations that modify non-coordinate attributes. - The
vertex shader stage 304 is implemented partially or fully as vertex shader programs to be executed on one ormore compute units 132. The vertex shader programs are provided by theprocessor 102 and are based on programs that are pre-written by a computer programmer. Thedriver 122 compiles such computer programs to generate the vertex shader programs having a format suitable for execution within thecompute units 132. - The
hull shader stage 306,tessellator stage 308, anddomain shader stage 310 work together to implement tessellation, which converts simple primitives into more complex primitives by subdividing the primitives. Thehull shader stage 306 generates a patch for the tessellation based on an input primitive. Thetessellator stage 308 generates a set of samples for the patch. Thedomain shader stage 310 calculates vertex positions for the vertices corresponding to the samples for the patch. Thehull shader stage 306 anddomain shader stage 310 can be implemented as shader programs to be executed on thecompute units 132, that are compiled by thedriver 122 as with thevertex shader stage 304. - The
geometry shader stage 312 performs vertex operations on a primitive-by-primitive basis. A variety of different types of operations can be performed by thegeometry shader stage 312, including operations such as point sprite expansion, dynamic particle system operations, fur-fin generation, shadow volume generation, single pass render-to-cubemap, per-primitive material swapping, and per-primitive material setup. In some instances, a geometry shader program that is compiled by thedriver 122 and that executes on thecompute units 132 performs operations for thegeometry shader stage 312. - The
rasterizer stage 314 accepts and rasterizes simple primitives (triangles) generated upstream from therasterizer stage 314. Rasterization consists of determining which screen pixels (or sub-pixel samples) are covered by a particular primitive. Rasterization is performed by fixed function hardware. - The
pixel shader stage 316 calculates output values for screen pixels based on the primitives generated upstream and the results of rasterization. Thepixel shader stage 316 may apply textures from texture memory. Operations for thepixel shader stage 316 are performed by a pixel shader program that is compiled by thedriver 122 and that executes on thecompute units 132. - The
output merger stage 318 accepts output from thepixel shader stage 316 and merges those outputs into a frame buffer, performing operations such as z-testing and alpha blending to determine the final color for the screen pixels. - In one mode of operation, the rasterization performed by the
rasterizer stage 314 is done at the same resolution as pixel shading performed by thepixel shader stage 316. By way background, therasterizer stage 314 accepts triangles from earlier stages and performs scan conversion on the triangles to generate fragments. The fragments are data for individual pixels of a render target and include information such as location, depth, and coverage data, and later, after the pixel shader stage, shading data such as colors. The render target is the destination image to which rendering is occurring (i.e., colors or other values are being written). - Typically, the fragments are grouped into quads, each quad including fragments corresponding to four neighboring pixel locations (that is, 2×2 fragments). Scan conversion of a triangle involves generating a fragment for each pixel location covered by the triangle. If the render target is a multi-sample image, then each pixel has multiple sample locations, each of which is tested for coverage. The fragment records coverage data for the samples of that fragment. The fragments that are generated by the
rasterizer stage 314 are transmitted to thepixel shader stage 316, which shades the fragments (determines color values for those fragments), and may determine other values as well. - Performing rasterization and pixel shading at the same resolution means that for each fragment generated by the rasterizer, the
pixel shader 316 performs a calculation to determine a color for that fragment. In other words, the area of screen-space occupied by a pixel is the same area as the precision with which colors are determined. In one example, in the SIMD-based hardware of thecompute units 132, each fragment generated by therasterizer stage 314 is shaded by a different work-item. Thus, there is a one-to-one correspondence between generated fragments and work-items spawned to shade those fragments. Note that therasterizer stage 314 typically performs depth testing, culling fragments occluded by previously-rendered fragments. Thus, there is a one-to-one correspondence between fragments that survive this depth culling and work-items spawned to color those surviving fragments, although additional work-items may be spawned to render helper fragments for quads, which are ultimately discarded. Helper fragments are fragments that are not covered by a triangle but that are generated as part of a quad anyway to assist with calculating derivatives for texture sampling. Another way to understand the mode of operation in which rasterization is performed at the same resolution as shading is that the resolution at which the edges of a triangle can be defined is equivalent to the resolution at which colors of that triangle can be defined. - One issue with the above mode of operation, in which rasterization occurs at the same resolution as pixel shading occurs for triangles that have a fixed color or low frequency change in color. For such triangles, pixel shading operations on nearby fragments produce the same or similar color and are effectively redundant. A similar result could therefore be performed with a much smaller number of pixel shader operations. Thus, it is advantageous to reduce the shading resolution, with respect to the rasterization resolution, according to a technique referred to herein as variable rate shading (“VRS”). The advantage of such a technique is a reduction in the number of pixel shader operations being performed, which reduces processing load and improves performance. The details of VRS are described in detail below.
- One issue with VRS is the issue of integration of that technique with super sample anti-aliasing (“SSAA”). SSAA is a technique whereby each render target pixel has multiple coverage and color samples. More specifically, in this technique, the
graphics processing pipeline 134 both rasterizes and shades at a resolution that is higher than the resolution of the render target to generate a super-sampled image. Then, thegraphics processing pipeline 134 “resolves” that super-sampled image through an anti-aliasing technique to generate an image at the resolution of the render target. - An issue arises in a system that is capable of performing both VRS and SSAA. Specifically, in one implementation, it is possible to switch both VRS and SSAA on, such that VRS is operating to reduce the resolution of shading with respect to the resolution of the render target and SSAA is operating to increase the resolution of shading with respect to the resolution of the render target. This mode of operation could produce undefined or unexpected results and thus may not be desirable. Thus techniques are presented herein for integrating SSAA and VRS cohesively into a
graphics processing pipeline 134. -
FIG. 4 illustrates a technique for rasterizing, shading, and outputting a rendered image using one of SSAA, VRS, or neither, according to an example. The technique begins withstep 402, where therasterizer stage 314 rasterizes a triangle received from an earlier stage of thegraphics processing pipeline 134 to determine covered samples and to generate fragments including indications of those covered samples. The rasterization generates one fragment for each pixel in the render target for which there is coverage by a triangle. A fragment is a grouping of data that corresponds to a single pixel and has information such as sample coverage, color data for each sample (after the pixel shader stage), depth data for each sample, and possibly other types of data. Fragments are used to color the pixels of the frame buffer in theoutput merger stage 318. A sample is a point within a screen pixel for which information such as coverage information, depth information, and color information can be determined individually. In some modes of operation, there are multiple samples for each render target pixel. In general, the purpose of having multiple samples for each render target pixel is to perform anti-aliasing, which improves the visual appearance of hard edges within images. In other modes of operation, there is only one sample per render target pixel. - In
step 402, therasterizer stage 314 determines which samples are covered by received primitives and which samples are not covered. In general, therasterizer stage 314 receives triangles from earlier stages of thegraphics processing pipeline 134 and rasterizes those triangles to generate the fragments. Rasterizing a triangle includes determining which pixels of the render target are covered by the triangle, and which samples within those covered pixels are covered by the triangle, if there are multiple samples per pixel. Any technically feasible technique for rasterizing triangles may be used. A fragment is generated for each pixel for which one sample is covered. - The
rasterizer stage 314 also performs depth testing atstep 402. In one example, depth testing involves examining the depth value for each sample covered by the triangle and comparing those depth values to a depth buffer that stores depth values for already-processed triangles. The depth value for a particular sample is compared to the depth value stored at the depth buffer for the same position as the particular sample. If the depth buffer indicates that the sample is occluded, then that sample is marked as not covered and if the depth buffer indicates that the sample is not occluded, then that sample survives. The data indicating which sample locations are covered and not occluded is passed on to other parts of thegraphics processing pipeline 134 for later processing as described elsewhere in this description. Herein, the term “covered” when applied a sample means that the sample is covered by a triangle and passes the depth test and the term “not covered” or “uncovered” means that a sample is either not covered by a triangle or is covered by a triangle but does not pass the depth test. - Rasterization outputs fragments in 2×2 groups known as quads. More specifically, for each pixel of the render target that has at least one sample covered by the triangle, the
rasterizer stage 314 generates a fragment. Therasterizer 314 creates quads from these fragments. Quads include fragments for an adjacent section of 2×2 pixels, even if one or more such fragments are completely not covered by the triangle (where “completely not covered” means that no samples of the fragment are covered by the triangle and not occluded). The fragments that are completely not covered are called helper fragments. Helper fragments are used by thepixel shader stage 316 to calculate spatial derivatives for shading. Often, these spatial derivatives are used for mipmap selection and filtering for textures, but the spatial derivatives can be used for other purposes. - Also at
step 402, therasterizer stage 314 determines one or more shading rates for the samples of the triangle. The shading rate may be one of a sub-sample shading rate, a one-to-one shading rate, or a super-sample shading rate. A sub-sample shading rate means that the resolution of pixel shading is lower than the resolution of the render target (but not the resolution of the samples). A one-to-one shading rate means that the resolution of pixel shading is the same as the resolution of the render target. A super-sample shading rate means that the resolution of pixel shading is higher than the resolution of the render target. Note that it is possible for the resolution of pixel shading to be different from the resolution of rasterization (coverage determination) even with a super-sample shading rate. Specifically, it is possible for the rasterizer to determine sample coverage for a particular number of samples per pixel and then for pixel shading to occur at a lower rate than that number of samples. For example, it is possible for rasterization to occur for four samples for each fragment, but for pixel shading to occur only twice per fragment. - The resolution of pixel shading, also called the shading rate, defines the number of fragments that are shaded together in the
pixel shader stage 316. More specifically, for sub-sampling, the resolution of pixel shading determines how many pixel locations in the render target are given the color determined by a single work-item in thepixel shader stage 316. For example, if the shading rate is one quarter, then a work-item in thepixel shader stage 316 determines a color for four pixel locations in the render target. For super-sampling, the resolution of pixel shading determines how many samples of a given fragment are given the color determined by a single work-item. For example, if the resolution of pixel shading is “4×,” then four different work-items determine colors for four different samples per fragment generated by therasterizer stage 314. - The shading rate may be determined on a per-triangle basis, a per-shading rate tile basis, or on a per-shading rate tile basis for individual triangles. For shading on a per-triangle basis, a unit in the
graphics processing pipeline 134 upstream of the pixel shader determines a shading rate for triangles sent to therasterizer stage 314. In an example, avertex shader stage 304 determines shading rates for the triangles processed by that stage. In another example, thegeometry shader stage 312 determines shading rates for triangles emitted by that stage. For shading on a per-shading rate tile basis, therasterizer stage 314 determines shading rates for different shading rate tiles of the render target. The render target is divided into shading rate tiles that each comprises multiple pixels of the render target. More specifically, the render target is “tiled” into shading rate tiles, each of which can have a different shading rate. Any technically feasible technique for determining the shading rate for a shading rate tile may be used. In one example, a shading rate tile image is used. A shading rate tile image has information for different shading rate tiles of a render target that indicates the shading rate of those shading rate tiles. The shading rate image may be specified explicitly or algorithmically by the application. - For shading on a per-shading rate tile basis for individual triangles, the combination of per shading rate tile and per triangle information is used to determine a shading rate for any given quad. Specifically, each triangle is associated with a triangle shading rate image that defines the shading rates for the different portions of the triangle.
- It is possible for the size of shading rate tiles to be the same size as the number of render target pixels covered by the tile buffer or larger than that buffer. However, the contents of the tile buffer at any particular point in time will have the same shading rate.
- At
step 404, therasterizer stage 314 accumulates quads generated as the result of rasterization instep 402 into atile buffer 510. A tile buffer may store any technically feasible number of quads. In one example, a tile buffer stores four adjacent quads in a 2×2 array. The quads in the tile buffer correspond to a contiguous portion of the render target. This allows for downsampling of the quads in a smaller number of quads when VRS is used. After accumulating quads into the tile buffer, therasterizer stage 314 triggers step 406. Note, this triggering may occur with at least some portion of thetile buffer 510 empty. More specifically, thetile buffer 510 stores quads from a contiguous portion of screen space, from the same triangle. It is possible for there to be no coverage for a particular triangle in at least some of that contiguous portion, even if there is coverage in a different part of that contiguous portion. In such situations, anon-full tile buffer 510 would be used in step 406 (generating modified-rate quads based on the shading rate). - At
step 406, therasterizer stage 314 examines the contents of thetile buffer 510 and generates modified-rate quads based on the shading rate. There are three possible ways this can happen. As described above, for any particular instance of the contents of the tile buffer, a shading rate is defined for all those contents. This shading rate can be one of a sub-sampling rate, a 1:1 rate, or a super sampling rate. If the shading rate is a sub-sampling rate, then therasterizer stage 314 down-samples the quads of thetile buffer 510 to generate modified-rate quads. The resulting down-sampled quads include coarse fragments that are bigger than the pixels of the render target. The purpose of down-sampling quads is to reduce the number of pixel shader work-items that are spawned to shade the fragments. Specifically, because the pixel shader launches one work-item per fragment, making the fragments larger results in fewer work-items being spawned, which results in a faster completion of the shading workload. - With a sub-sampling shading rate, it is possible that the amount of coverage information available in a down-sampled quad is insufficient to represent the full resolution of coverage data of the quads in the
tile buffer 510. If that is the case, then down-sampling also includes compressing the coverage data. - If the shading rate is a 1:1 rate, then the
rasterizer stage 314 simply outputs the quads of thetile buffer 510 unmodified, as the modified-rate quads. - If the shading rate is a super-sampling rate, then the
rasterizer stage 314 up-samples the quads of thetile buffer 510 to generate modified-rate quads. The resulting up-sampled quads include more quads than the quads in thetile buffer 510. The factor by which the number of quads is increased is equal to the super-sampling rate. - At
step 408, therasterizer stage 314 assigns centroid positions for the fragments of the quads. The manner in which this is done depends on several factors, including the shading rate, the numbers and positions of samples in the tile buffer quads, and possibly other factors. The centroid is the position at which pixel attributes such as texture coordinates are evaluated. - At
step 410, thepixel shader stage 316 shades the fragments of the quads. As described elsewhere herein, one work-item is spawned per fragment. The pixel shader shades fragments using the centroids determined atstep 408. It is also possible for the pixel shader to modify coverage for any particular fragment, by, for example, switching one or more samples of the fragment from covered to not covered or from not covered to covered. In an example, the pixel shader determines that an alpha value corresponding to a particular covered sample is completely transparent (e.g., has an alpha value of 0) and therefore sets that sample to be not covered. It should be understood that the foregoing is just one example and that a pixel shader program, which can be written by an application developer, could potentially modify coverage in any technically feasible manner. - At step 412, if the quads were downsampled, then the
output merger stage 318 restores the original resolution of those quads, which includes applying fine coverage data from therasterizer stage 314. Additional details are provided with respect toFIG. 4D . - At
step 414, theoutput merger stage 318 performs late pixel operations and writes the samples of the quads to the frame buffer. If the shaded quads were down-sampled (i.e., if VRS was used), then theoutput merger stage 318 writes the data from the quads restored at step 412. If the shaded quads were up-sampled or if a 1:1 shading rate was used, then the data from the quads output by thepixel shader 316 is used to shade the render target. -
FIG. 4B illustrates operations for generating modified shading rate quads based on the contents of atile buffer 510 for a super sample shading rate, according to an example. In other words,FIG. 4B represents the operations ofstep 406 for a super-sampling shading rate. Thetile buffer 510 is shown in a state after having accumulated quads generated by the rasterizer stage 314 (step 404). The shading rate determined for the contents of the tile buffer is a super sample shading rate, meaning that pixel shading occurs at a resolution that is higher than the resolution of the render target. In the example ofFIG. 4B , the shading rate is 4×, but the teachings herein apply to any super-sample shading rate. - As shown, the
tile buffer 510 has 3 quads (the space forquad 1 is empty as there were no covered samples for that quad), each of which has four fragments. Each fragment in thetile buffer 510 has four coverage samples. To generate the modifiedshading rate quads 422, for each quad in thetile buffer 510 for which at least one sample is covered, therasterizer stage 314 generates a number of quads equal to the shading rate. Each fragment in the generated quad has a subset of the samples of the fragments in thetile buffer 510. - The ratio of the number of samples of the fragments in the
tile buffer 510 to the number of samples of the fragments that are generated is equal to the shading rate. For a 4× shading rate, the fragments in thetile buffer 510 have four times as many samples as the modified shading rate fragments. The fragments in any particular generated quad have samples from the same sample locations of the fragments of a corresponding quad in the tile buffer. In an example, each fragment in a generated quad has a sample at location “sample a” of thepixel template 420 illustrated. In this example, for each quad with at least one covered sample, four quads are generated—one for each sample, such that each generated quad includes fragments with samples at the same sample location and the samples assigned to different quads are different. In the example shown,quad 1 is empty and does not result in any modified shading rate quads.Quad 2 results in 2 a, 2 b, 2 c, and 2 d being generated. The fragments ofquads quad 2 a have sample a from the fragments ofquad 2. The fragments ofquad 2 b have sample b from the fragments ofquad 2. The fragments ofquad 2 c have sample c from the fragments ofquad 2. The fragments ofquad 2 d have sample d from the fragments ofquad 2.Quads 3 a-3 d and 4 a-4 d derive their samples from 3 and 4 in a similar manner. Note that it is possible for the number of coverage samples per fragment to be different from the shading rate. In that case, the fragments of the modified shading rate quads get multiple samples from the quads in the tile buffer.quads - As described with respect to
FIG. 4A , subsequent to generating the modifiedshading rate quads 422, the centroids for the fragments of the quads are assigned instep 408. The centroids are locations where attributes, such as texture coordinates, are evaluated. A centroid for a fragment is assigned based on the locations of the samples assigned to that fragment. For example, the fragments of 2 a, 3 a, and 4 a get centroids at the location of sample a. Similarly, the fragments ofquads 2 b, 3 b, and 4 b get centroids at the location of sample b,quads 2 c, 3 c, and 4 c get centroids at the location of sample c, andquads 2 d, 3 d, and 4 d get centroids at the location of sample d. If the modifiedquads shading rate quads 422 have multiple samples, then the centroids are located at a location that is representative of those samples. In an example, the centroid is at the location of one of the covered samples, is midway between the covered samples, or is at any other location representative of the samples. - As also described with respect to
FIG. 4A , the modifiedshading rate quads 422 are shaded instep 410. Each fragment of each modifiedshading rate quad 422 is shaded using a different work-item, and thus the samples that originated from a single fragment in thetile buffer 510 can be given different colors. It is also possible for thepixel shader stage 316 to modify coverage, for example, by marking covered samples as uncovered. Atstep 414, theoutput merger stage 318 writes the shaded fragments to the render target. Details on writing shaded samples to a render target are generally known and are not described herein in detail. Generally, this operation includes performing a z-test to determine whether samples are occluded by older samples, and if blending is enabled, blending the color of samples with those in the render target. Other operations may be performed as well. -
FIG. 4C illustrates operations related to down-sampling quads in thetile buffer 510 when a sub-sample shading rate (VRS) is used, according to an example. The down-sampling operation includes converting the quads of thetile buffer 510 into a smaller number of one or more modifiedshading rate quads 440. The number of quads generated is equal to the number of quads in thetile buffer 510 multiplied by the shading rate (although a smaller number may be generated if thetile buffer 510 is not completely filled with quads or if there are generated quads that have no coverage). In an example, the shading rate is 1/4, the number of quads in thetile buffer 510 is four, and the number of quads that are generated from these quads is one (4*1/4=1). - Each generated quad includes four fragments. The coverage assigned to each such fragment is the amalgamation of the coverage assigned to the fragments of the quads in the
tile buffer 510. In some situations, such an amalgamation would result in the fragments of the modifiedshading rate quads 440 having too much coverage data. More specifically, thegraphics processing pipeline 134 may have a limitation on the number of bits that can be used to specify coverage data for a fragment. In this situation, when coverage data is amalgamated into coverage data for a fragment of a generated quad, that data is reduced in fidelity (compressed). The coverage data that remains would be geometrically representative of the coverage of the fragments of the quads in thetile buffer 510. - In the example of
FIG. 4C , each fragment of the quads in thetile buffer 510 has four samples. Moreover, the shading rate is 1/4, meaning that four fragments of the fragments in thetile buffer 510 are shaded together as a single fragment in thepixel shader stage 316. In addition, the pixel shading hardware has a limit on the number of samples that can be processed per fragment, and that limit is eight. Due to these factors, the down-sample operation 442 generates the modifiedshading rate quads 440 in the following manner. The shading rate of 1/4 results in each quad in thetile buffer 510 being converted into a single fragment in the modifiedshading rate quads 440. Specifically, because each quad has four fragments, and the shading rate is 1/4, the four fragments of a quad are converted into a single fragment. Because thetile buffer 510 has four quads, the contents of thetile buffer 510 are converted into a single quad. Each coarse fragment of the quad corresponds to four fragments of thetile buffer 510. - Further, because the
pixel shader 316 can only handle 8 samples per fragment, the sixteen samples of each quad in thetile buffer 510 are compressed to eight samples for each coarse fragment. Each sample is geometrically representative of two samples in thetile buffer 510. Further, this compression operation is conservative in that, if either or both of the samples that correspond to a compressed sample is covered in thetile buffer 510, then the sample of the coarse fragment is also covered, but if neither sample is covered, then the sample in the coarse fragment is not covered. In the example ofFIG. 4C , dotted lines are provided in the modifiedshading rate quads 440 to illustrate the corresponding areas of the fragments in thetile buffer 510. It can be seen that each sample in those corresponding areas corresponds to two samples in thetile buffer 510. Moreover, the top-left sample in the portion of the coarse fragment corresponding to the “fine fragment” corresponds to the two top samples of that fine fragment and the bottom-right sample in the portion of the coarse fragment corresponding to the fine fragment corresponds to the two bottom samples of that fine fragment. Note that if the number of samples to be amalgamated into a single coarse fragment does not exceed the sample limit for that fragment, then compression does not occur. Note also that also a shading rate of 1/4 is illustrated, other shading rates, such as a 1/2 horizontal (a row of two fragments in thetile buffer 510 forms a coarse fragment in the modified shading rate quads), 1/2 vertical (a column of two fragments in thetile buffer 510 forms a coarse fragment in the modified shading rate quads) or any other rate can be used. - After
step 406, the centroids are assigned to the fragments of the generated quads. The centroid for each coarse fragment is set in any technically feasible manner. In one example, the centroid is representative of the locations of the covered samples of the coarse fragment. In another example, the location of one of the fragments is chosen. In yet another example, the center of the coarse fragment is used as the centroid. As described above, the centroid is used as the location at which thepixel shader stage 316 calculates attributes such as texture coordinates. - At
step 410, thepixel shader stage 316 shades the fragments of the generated quads. Specifically, one work-item per coarse fragment is launched and the color (and other attributes) determined for each coarse fragment is applied to each covered sample of that fragment. It is also possible for thepixel shader stage 316 to modify the coverage of the coarse fragments, such as by setting a covered sample to be not covered or setting a non-covered sample to be covered. - At step 412, the
output merger stage 318 applies fine coverage data from therasterizer stage 314 to the shaded quads to generate fragments at the resolution of the render target.FIG. 4D illustrates an example of this operation. First, theoutput merger stage 318 up-samples the shaded coarse quads to generate shaded upsampled quads. To do this,output merger stage 318 divides each of the coarse fragments into upsampled fragments based on the shading rate. For a shading rate of 1/4, each coarse fragment is converted to four upsampled fragments. The samples of each upsampled fragment get the color of the coarse fragment from which those samples originate. In addition, the sample resolution is restored if the samples were originally compressed, with each restored sample getting the color of the corresponding sample of the coarse fragment. The coverage (covered or not covered) of each restored sample is the same as the coverage of the corresponding sample of the coarse fragment. - In
FIG. 4D , the up sample proceeds as follows. Thecoarse fragment 1 has no coverage. Therefore, the quad that would be generated from that fragment has no coverage and is discarded.Coarse fragment 2 hascolor 1 and has six covered fragments as shown. The corresponding up-sampled quad (quad 2) has three fragments with four samples covered each and one fragment with no covered samples. Each sample ofquad 2 has the color ofcoarse fragment 2. Similarly, the coverage and colors ofcoarse fragment 3 andcoarse fragment 4 are used to generatequad 3 andquad 4. - At this point, the original coverage data generated by the
rasterization stage 314 is used to modulate the coverage data generated in the up-sample operation. The modulation is an “AND” operation wherein if both copies of a sample are covered in the original coverage data and the coverage data from the up-sample operation, then the output sample is considered covered and if either or both samples are uncovered in the original coverage data, then the output sample is considered uncovered. The result is a set of quads, with modulated coverage and with colors generated by thepixel shader 316. The quads are written to the render target as per usual (e.g., depth testing, blending, and other operations are performed to combine the colors of these output quads with the colors in the render target). - It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element can be used alone without the other features and elements or in various combinations with or without other features and elements. One example is an alternative technique for populating the
tile buffer 510 described above. More specifically, in the technique described above, therasterizer stage 314 first generates quads and then accumulates those quads into thetile buffer 510. In another technique, therasterizer stage 314 generates the quads in thetile buffer 510 directly and does not need to perform the two separate steps of generating the quads and then accumulating those quads into thetile buffer 510. - The methods provided can be implemented in a general purpose computer, a processor, or a processor core. Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. Such processors can be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media). The results of such processing can be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements features of the disclosure.
- The methods or flow charts provided herein can be implemented in a computer program, software, or firmware incorporated in a non-transitory computer-readable storage medium for execution by a general purpose computer or a processor. Examples of non-transitory computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
Claims (19)
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/228,692 US11276211B2 (en) | 2018-12-20 | 2018-12-20 | Integration of variable rate shading and super-sample shading |
| CN201980084278.2A CN113196333A (en) | 2018-12-20 | 2019-12-16 | Integration of variable rate shading and supersample shading |
| EP19897802.5A EP3899858A4 (en) | 2018-12-20 | 2019-12-16 | VARIABLE RATE SHADING INTEGRATION AND SUPERSAMPLING SHADING |
| PCT/US2019/066500 WO2020131679A1 (en) | 2018-12-20 | 2019-12-16 | Integration of variable rate shading and super-sample shading |
| KR1020217019698A KR102869715B1 (en) | 2018-12-20 | 2019-12-16 | Integration of variable rate shading and supersampled shading |
| JP2021530935A JP2022512082A (en) | 2018-12-20 | 2019-12-16 | Integration of variable rate shading and supersampling shading |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/228,692 US11276211B2 (en) | 2018-12-20 | 2018-12-20 | Integration of variable rate shading and super-sample shading |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20200202594A1 true US20200202594A1 (en) | 2020-06-25 |
| US11276211B2 US11276211B2 (en) | 2022-03-15 |
Family
ID=71097730
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/228,692 Active US11276211B2 (en) | 2018-12-20 | 2018-12-20 | Integration of variable rate shading and super-sample shading |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US11276211B2 (en) |
| EP (1) | EP3899858A4 (en) |
| JP (1) | JP2022512082A (en) |
| KR (1) | KR102869715B1 (en) |
| CN (1) | CN113196333A (en) |
| WO (1) | WO2020131679A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11004258B2 (en) * | 2016-09-22 | 2021-05-11 | Advanced Micro Devices, Inc. | Combined world-space pipeline shader stages |
| US11127109B1 (en) * | 2020-03-23 | 2021-09-21 | Samsung Electronics Co., Ltd. | Methods and apparatus for avoiding lockup in a graphics pipeline |
| US11257273B2 (en) * | 2019-12-19 | 2022-02-22 | Advanced Micro Devices, Inc. | Data output rate with variable rate shading |
| US20220277411A1 (en) * | 2021-01-28 | 2022-09-01 | Arm Limited | Tile-based graphics processing systems |
| US20220414950A1 (en) * | 2021-06-29 | 2022-12-29 | Advanced Micro Devices, Inc. | Per-pixel variable rate shading controls using stencil data |
| US11763521B2 (en) | 2021-08-13 | 2023-09-19 | Samsung Electronics Co., Ltd. | Method and apparatus for the automation of variable rate shading in a GPU driver context |
| WO2023177887A1 (en) * | 2022-03-17 | 2023-09-21 | Advanced Micro Devices, Inc. | Super resolution upscaling |
| CN116842022A (en) * | 2022-03-25 | 2023-10-03 | 中移(上海)信息通信科技有限公司 | Data update method, device, edge computing node and client |
| US12493990B2 (en) | 2022-03-17 | 2025-12-09 | Advanced Micro Devices, Inc. | Locking mechanism for image classification |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115022678B (en) * | 2022-05-30 | 2024-07-02 | 中国电信股份有限公司 | Image processing method, system, device, equipment and storage medium |
| US20250148691A1 (en) * | 2023-11-02 | 2025-05-08 | Nvidia Corporation | Avoiding artifacts from texture patterns in content generation systems and applications |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100002000A1 (en) * | 2008-07-03 | 2010-01-07 | Everitt Cass W | Hybrid Multisample/Supersample Antialiasing |
| US8044956B1 (en) * | 2007-08-03 | 2011-10-25 | Nvidia Corporation | Coverage adaptive multisampling |
| US8547395B1 (en) * | 2006-12-20 | 2013-10-01 | Nvidia Corporation | Writing coverage information to a framebuffer in a computer graphics system |
| US20140327696A1 (en) * | 2013-05-03 | 2014-11-06 | Advanced Micro Devices Inc. | Variable acuity rendering using multisample anti-aliasing |
| US20150170345A1 (en) * | 2013-12-12 | 2015-06-18 | Karthik Vaidyanathan | Decoupled Shading Pipeline |
| US20170161940A1 (en) * | 2015-12-04 | 2017-06-08 | Gabor Liktor | Merging Fragments for Coarse Pixel Shading Using a Weighted Average of the Attributes of Triangles |
| US20170293995A1 (en) * | 2016-04-08 | 2017-10-12 | Qualcomm Incorporated | Per-vertex variable rate shading |
| US20180240268A1 (en) * | 2017-02-17 | 2018-08-23 | Microsoft Technology Licensing, Llc | Variable rate shading |
| US20180308280A1 (en) * | 2017-04-21 | 2018-10-25 | Intel Corporation | Fragment compression for coarse pixel shading |
| US20190005713A1 (en) * | 2017-06-30 | 2019-01-03 | Microsoft Technology Licensing, Llc | Variable rate deferred passes in graphics rendering |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6943805B2 (en) | 2002-06-28 | 2005-09-13 | Microsoft Corporation | Systems and methods for providing image rendering using variable rate source sampling |
| US8368706B2 (en) | 2007-06-01 | 2013-02-05 | Gvbb Holdings S.A.R.L. | Image processing device and method for pixel data conversion |
| KR20090040515A (en) | 2007-10-22 | 2009-04-27 | 삼성전자주식회사 | Image Space-based Shading Apparatus and Method Using Adaptive Search Technique |
| US9355483B2 (en) | 2013-07-19 | 2016-05-31 | Nvidia Corporation | Variable fragment shading with surface recasting |
| US9569886B2 (en) * | 2013-12-19 | 2017-02-14 | Intel Corporation | Variable shading |
| US9905046B2 (en) * | 2014-04-03 | 2018-02-27 | Intel Corporation | Mapping multi-rate shading to monolithic programs |
| US9589367B2 (en) * | 2014-06-27 | 2017-03-07 | Samsung Electronics Co., Ltd. | Reconstruction of missing data point from sparse samples during graphics processing using cubic spline polynomials |
| US10535186B2 (en) * | 2016-08-30 | 2020-01-14 | Intel Corporation | Multi-resolution deferred shading using texel shaders in computing environments |
| US10510185B2 (en) * | 2017-08-25 | 2019-12-17 | Advanced Micro Devices, Inc. | Variable rate shading |
-
2018
- 2018-12-20 US US16/228,692 patent/US11276211B2/en active Active
-
2019
- 2019-12-16 CN CN201980084278.2A patent/CN113196333A/en active Pending
- 2019-12-16 EP EP19897802.5A patent/EP3899858A4/en active Pending
- 2019-12-16 KR KR1020217019698A patent/KR102869715B1/en active Active
- 2019-12-16 JP JP2021530935A patent/JP2022512082A/en active Pending
- 2019-12-16 WO PCT/US2019/066500 patent/WO2020131679A1/en not_active Ceased
Patent Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8547395B1 (en) * | 2006-12-20 | 2013-10-01 | Nvidia Corporation | Writing coverage information to a framebuffer in a computer graphics system |
| US8044956B1 (en) * | 2007-08-03 | 2011-10-25 | Nvidia Corporation | Coverage adaptive multisampling |
| US20100002000A1 (en) * | 2008-07-03 | 2010-01-07 | Everitt Cass W | Hybrid Multisample/Supersample Antialiasing |
| US20140327696A1 (en) * | 2013-05-03 | 2014-11-06 | Advanced Micro Devices Inc. | Variable acuity rendering using multisample anti-aliasing |
| US20150170345A1 (en) * | 2013-12-12 | 2015-06-18 | Karthik Vaidyanathan | Decoupled Shading Pipeline |
| US20170161940A1 (en) * | 2015-12-04 | 2017-06-08 | Gabor Liktor | Merging Fragments for Coarse Pixel Shading Using a Weighted Average of the Attributes of Triangles |
| US20170293995A1 (en) * | 2016-04-08 | 2017-10-12 | Qualcomm Incorporated | Per-vertex variable rate shading |
| US20180240268A1 (en) * | 2017-02-17 | 2018-08-23 | Microsoft Technology Licensing, Llc | Variable rate shading |
| US20180308280A1 (en) * | 2017-04-21 | 2018-10-25 | Intel Corporation | Fragment compression for coarse pixel shading |
| US20190005713A1 (en) * | 2017-06-30 | 2019-01-03 | Microsoft Technology Licensing, Llc | Variable rate deferred passes in graphics rendering |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210272354A1 (en) * | 2016-09-22 | 2021-09-02 | Advanced Micro Devices, Inc. | Combined world-space pipeline shader stages |
| US11004258B2 (en) * | 2016-09-22 | 2021-05-11 | Advanced Micro Devices, Inc. | Combined world-space pipeline shader stages |
| US11869140B2 (en) * | 2016-09-22 | 2024-01-09 | Advanced Micro Devices, Inc. | Combined world-space pipeline shader stages |
| US11257273B2 (en) * | 2019-12-19 | 2022-02-22 | Advanced Micro Devices, Inc. | Data output rate with variable rate shading |
| US11127109B1 (en) * | 2020-03-23 | 2021-09-21 | Samsung Electronics Co., Ltd. | Methods and apparatus for avoiding lockup in a graphics pipeline |
| US20210295465A1 (en) * | 2020-03-23 | 2021-09-23 | Samsung Electronics Co., Ltd. | Methods and apparatus for avoiding lockup in a graphics pipeline |
| US11798121B2 (en) * | 2021-01-28 | 2023-10-24 | Arm Limited | Tile-based graphics processing systems |
| US20220277411A1 (en) * | 2021-01-28 | 2022-09-01 | Arm Limited | Tile-based graphics processing systems |
| US20220414950A1 (en) * | 2021-06-29 | 2022-12-29 | Advanced Micro Devices, Inc. | Per-pixel variable rate shading controls using stencil data |
| US12067649B2 (en) * | 2021-06-29 | 2024-08-20 | Advanced Micro Devices, Inc. | Per-pixel variable rate shading controls using stencil data |
| US11763521B2 (en) | 2021-08-13 | 2023-09-19 | Samsung Electronics Co., Ltd. | Method and apparatus for the automation of variable rate shading in a GPU driver context |
| WO2023177887A1 (en) * | 2022-03-17 | 2023-09-21 | Advanced Micro Devices, Inc. | Super resolution upscaling |
| US12293485B2 (en) | 2022-03-17 | 2025-05-06 | Advanced Micro Devices, Inc. | Super resolution upscaling |
| US12493990B2 (en) | 2022-03-17 | 2025-12-09 | Advanced Micro Devices, Inc. | Locking mechanism for image classification |
| CN116842022A (en) * | 2022-03-25 | 2023-10-03 | 中移(上海)信息通信科技有限公司 | Data update method, device, edge computing node and client |
Also Published As
| Publication number | Publication date |
|---|---|
| KR102869715B1 (en) | 2025-10-14 |
| WO2020131679A1 (en) | 2020-06-25 |
| EP3899858A1 (en) | 2021-10-27 |
| US11276211B2 (en) | 2022-03-15 |
| CN113196333A (en) | 2021-07-30 |
| EP3899858A4 (en) | 2022-09-21 |
| KR20210095914A (en) | 2021-08-03 |
| JP2022512082A (en) | 2022-02-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11276211B2 (en) | Integration of variable rate shading and super-sample shading | |
| US12118656B2 (en) | VRS rate feedback | |
| US10510185B2 (en) | Variable rate shading | |
| US12067649B2 (en) | Per-pixel variable rate shading controls using stencil data | |
| CN112189215B (en) | Compiler-Assisted Techniques for Memory Usage Reduction in Graphics Pipelines | |
| US12505604B2 (en) | Hybrid binning | |
| US11030791B2 (en) | Centroid selection for variable rate shading | |
| US20220414939A1 (en) | Render target compression scheme compatible with variable rate shading | |
| US11257273B2 (en) | Data output rate with variable rate shading | |
| US12266139B2 (en) | Method and system for integrating compression | |
| US20230298261A1 (en) | Distributed visibility stream generation for coarse grain binning | |
| US12141915B2 (en) | Load instruction for multi sample anti-aliasing | |
| US12511815B2 (en) | System and method for primitive ID map sampling | |
| US20250111598A1 (en) | Hybrid deferred decoupled rendering | |
| US11900499B2 (en) | Iterative indirect command buffers | |
| US20220319091A1 (en) | Post-depth visibility collection with two level binning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SALEH, SKYLER JONATHON;POMIANOWSKI, ANDREW S.;SIGNING DATES FROM 20190104 TO 20190114;REEL/FRAME:048251/0455 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |