WO2000004502A1

WO2000004502A1 - Graphics processing for efficient polygon handling

Info

Publication number: WO2000004502A1
Application number: PCT/US1999/016080
Authority: WO
Inventors: William Lazenby; Dale Kirkland
Original assignee: Intergraph Corp
Current assignee: Intergraph Corp
Priority date: 1998-07-17
Filing date: 1999-07-15
Publication date: 2000-01-27
Anticipated expiration: 2001-01-17

Abstract

A method and graphic accelerator that process a polygon having N vertices utilizes a single vertex buffer location for storing data for a single base vertex of the polygon while the polygon is processed. To that end, the method and graphics processor first designates one of the vertices of the polygon to be the base vertex, and then defines N-2 different triangles by each of the vertices of the polygon. Each of the N-2 triangles is defined by the base vertex and two other of the N-1 vertices. A buffer is provided for storing the vertex data while it is being processed. The buffer includes a selected number of vertex locations for storing vertex data for one vertex. The selected number is less than N. In preferred embodiments, the base vertex is stored in the first vertex location in the buffer. The base vertex may be retrieved from the first vertex location each time one of the N-2 triangles is processed.

Description

GRAPHICS PROCESSING FOR EFFICIENT POLYGON HANDLING

FIELD OF THE INVENTION

The invention generally relates to computer systems and, more particularly, the invention relates to processing graphics request data for display on a computer display device.

BACKGROUND OF THE INVENTION

Three dimensional graphics request data commonly is processed in a computer system as a plurality of polygons having vertices. Each of the vertices has associated attribute data (e.g., color, transparency, depth, etc.) that is utilized to rasterize pixels on a computer display device. Items to be displayed commonly are in the form of a "T-fan." As known by those skilled in the art, a T-fan is a polygon having more than three vertices such as, for example, five vertices. Various techniques have been utilized to process T-fan shaped items.

SUMMARY OF THE INVENTION

In accordance with one aspect of the invention, a method and graphics accelerator that processes a polygon having N vertices utilizes a single vertex buffer location for storing data for a single base vertex of the polygon while the polygon is processed. To that end, the method and graphics processor first designates one of the vertices of the polygon to be the base vertex, and then defines N-2 different triangles by each of the vertices of the polygon. Each of the N-2 triangles is defined by the base vertex and two of the N vertices other than the base vertex. A buffer is provided for storing the vertex data while the polygon is being processed. The buffer includes a selected number of vertex locations each for storing vertex data for one vertex. The selected number is less than N. In preferred embodiments, the base vertex data is stored in the first vertex location in the buffer. The base vertex data preferably is retrieved from the first vertex location each time one of the N-2 triangles is processed.

In accordance with other embodiments of the invention, the base vertex data is stored in the first vertex location until each of the N-2 triangles is processed. The first vertex location then may be utilized for storing other data after each of the triangles is processed.

In accordance with another aspect of the invention, each of the N vertices except the base vertex defines a set of vertices, while the buffer includes a set of vertex locations other than the first vertex location. The graphics accelerator and method further process each of the vertices in the set of vertices in the set of vertex locations. In some embodiments, the set of vertices is processed in the set of vertex locations in a round robin manner. In other embodiments, the set of vertices includes N-1 vertices and the set of vertex locations includes fewer than N-1 vertex locations. In such embodiments, selected vertices in the set of vertex locations are overwritten when each of the other vertex locations in the set of vertex locations is filled with vertex data for other vertices.

In accordance with still other aspects of the invention, a modulo counter is utilized to perform a round-robin method of processing and storing vertex data. In preferred embodiments, the polygon is a T-fan.

BRIEF DESCRIPTION OF THE DRAWINGS The foregoing and other objects and advantages of the invention will be appreciated more fully from the following further description thereof with reference to the accompanying drawings wherein: Figure 1 schematically shows a portion of an exemplary computer system on which preferred embodiments of the invention may be implemented.

Figure 2 schematically shows a preferred graphics accelerator that may be utilized in accordance with preferred embodiments of the invention.

Figure 3A schematically shows a preferred embodiment of a geometry accelerator stage shown in Figure 2.

Figure 3B schematically shows an alternate embodiment of a geometry accelerator stage shown in Figure 2.

Figure 4 shows a preferred header for sending data to the rasterization stage from the geometry accelerator stage in the graphics accelerator shown in Figure 2. Figure 5 A shows an exemplary T-fan.

Figure 5B shows a processed exemplary T-fan that may be processed in accordance with preferred embodiments of the invention. Figures 5C and 5D show exemplary primitives.

DESCRIPTION OF SPECIFIC EMBODIMENTS

Figure 1 shows a portion of an exemplary computer system 100 on which preferred embodiments of the invention may be implemented. More particularly, the computer system 100 includes a host processor 104 (i.e., a central processing unit) for executing application level programs and system functions, volatile host memory 102 for short term data storage (i.e., random access memory), a graphics accelerator 106 for processing graphics request code in accord with preferred embodiments of the invention (see Figure 4), a display device 108 for displaying the graphics request code processed by the accelerator 106, and a bus 110 coupling all of the prior noted elements of the system 100.

The graphics accelerator 106 preferably utilizes any well known graphics processing application program interface such as, for example, the OPENGL™ application program interface (available from Silicon Graphics, Inc. of Mountain View, California) for processing three dimensional ("3D") and two dimensional ("2D") graphical request code. In preferred embodiments, the host processor 104 executes a graphical drawing application program such as, for example, the PLANT DESIGN SYSTEM™ drawing program, available from Intergraph Corporation of Huntsville, Alabama.

Figure 2 shows several elements of the graphics accelerator 106. In preferred embodiments, the graphics accelerator 106 includes a double buffered frame buffer 200 (i.e., having a back buffer and a front buffer) for storing the processed graphics request code in accordance with the OPENGL™ interface. Among other things, the graphics accelerator 106 also preferably includes a geometry accelerator 202 for performing geometry operations that commonly are executed in graphics processing, a rasterizer 204 for rasterizing pixels on the display device 108, and a resolver 206 for storing data in the frame buffer 200 and transmitting data from the frame buffer 200 to the display device

108. As noted above, the graphics accelerator 106 preferably is adapted to process both 2D and 3D graphical data. For more information relating to preferred embodiments of the graphics accelerator 106, see, for example, copending patent application entitled "Wide Instruction Word Graphics Processor", filed on even date herewith and naming Vernon Brethour, Gary Shelton, William Lazenby, and Dale Kirkland as inventors, the disclosure of which is incorporated herein, in its entirety, by reference.

Figure 3 A schematically shows a preferred embodiment of the geometry accelerator stage 202 shown in Figure 2. The geometry accelerator stage 202 processes graphics primitives, which are two-dimensional surfaces such as points, lines, and polygons that are combined in 2D and 3D space to create 2D and 3D objects. Each endpoint (i.e., comer) of a primitive is called a vertex. Figure 5C shows some exemplary primitives including a line, a triangle, and a quadrilateral. A line has two vertices, a triangle has three vertices, and a quadrilateral has four vertices. More complex primitives such as strip primitives may be formed by using primitives as sub-elements, or sub- primitives. Figure 5D shows exemplary strip primitives, including a line strip (Lstrip), a triangle strip (Tstrip), and a quadrilateral strip (Qstrip). Lstrips are composed of sub- primitives that are Lines, Tstrips are composed of sub-primitives that are Triangles, and Qstrips are composed of sub-primitives that are Quadrilaterals. The geometry accelerator

202 processes these and other primitives. More particularly, selected elements of the geometry accelerator 202 are shown for processing a polygon having more than three vertices. The geometry accelerator 202 includes an input buffer 320 for storing the input vertex data. In preferred embodiments, the input buffer 320 includes four vertex locations for storing all of the necessary vertex data for four vertices. Among that data is color data, transparency data, depth data, and other commonly known 2D or 3D graphical data. As is known in the art, such data may be represented by a plurality of numbers (e.g., thirty-two bit words) such as, for example, forty-eight numbers. Accordingly, each of the vertex locations in the input buffer preferably includes an array of forty-eight thirty-two bit words.

The geometry accelerator 202 also includes a geometry accelerator processor 330 ("processor") for executing geometry and other known math functions on the vertex data. Among other functions, the math functions may include multiplication operations, addition operations, and reciprocal operations. The processor 330 produces processed vertex data that is stored in an output buffer 340 that also is a part of the geometry accelerator 202. In some embodiments, data in the output buffer 340 is retrieved by the processor for further processing such as, for example, clipping operations.

In a manner similar to the input buffer 320, the output buffer 340 preferably includes four vertex locations for storing all of the necessary vertex data for the vertices being processed. Accordingly, the output buffer 340 may include four vertex locations for storing all of the necessary vertex data. For some graphics operations, additional vertex locations may be necessary. These may be called "reserved vertex locations." In preferred embodiments, each of the vertex locations in the output buffer 340 includes an array of thirty-two separate thirty-two bit words.

The geometry accelerator also includes a vertex handler 350 for retrieving the vertex data from the output buffer and transmitting such data to the rasterizer stage 204. In some embodiments, the vertex handler 350 is controlled by the processor 330 to retrieve the output vertex data. The processor 330 signals the vertex handler when vertex data for one or more vertices has been processed and placed in the output buffer 340. For some graphics operations, such as clipping, vertex data is partially processed and is sent to the output buffer 340 temporarily. The partially processed vertex data is then retrieved by the processor 330 for further processing before rasterization. In such cases, the processor 330 signals the vertex handler 350 not to retrieve vertex data from the output buffer 340 until the processing is completed. In a like manner, the vertex handler 350 also signals the processor 330 when the vertex handler 350 cannot retrieve vertex data from the output buffer 340, for example, when the rasterizer 204 cannot handle additional rasterization requests In a preferred embodiment, two bits of the address are required to select which of the four individual vertex locations to access within the input buffer 320 and the output buffer 340. In addition, two separate pointers are maintained for addressing the input buffer 320 and the output buffer 340. The addressing (i.e., indexing) of the input and output buffers is the same so that an input buffer vertex location and corresponding output buffer vertex location correspond to the same vertex. Keeping the two buffers in sync in this manner simplifies buffer addressing. In preferred embodiments, a specialized arithmetic logic unit ("ALU") support is provided in the form of increment and decrement instructions to traverse the input and output vertex buffers using modulo arithmetic. If an increment operation (i.e., an operation that adds one to a given pointer) results in a value that is greater than a predetermined maximum, then a "0" value is returned. If a decrement operation (i.e., an operation that subtracts one from a given pointer) results in a value that is less than "0", then the maximum value is returned. With four vertex locations V0, VI, V2, and V3 in each of the input and output buffers, the modulo counter is configured with a maximum value of 3. The counter counts 0, 1, 2, 3, 0, 1, 2, 3, etc. to address these locations.

Performance may be improved by dynamically configuring the buffer addressing. More particularly, the buffer addressing is configured by setting the maximum value on the modulo counter so that fewer than all of the vertex locations may be accessed depending on the processing required. The vertex locations which are accessed may be called "addressable vertex locations." The maximum value on the modulo counter, therefore, may be adjusted according to the graphics primitive being processed. In some cases involving a graphics primitive composed of sub-primitives, processing may be simplified by not using all of the available vertex locations. By setting the maximum value on the modulo counter so that the number of addressable vertex locations is equal to the sub-primitive size, simplified buffer addressing can be achieved.

When processing primitives having sub-primitives that are comprised of consecutive vertices, the buffer addressing can be dynamically configured based upon the sub-primitive size (e.g., 1, 2, 3, or 4). For example, consider the Lstrip in Figure 5D. The

Lstrip can be used as an approximation of a smooth curve drawn by connecting a series of consecutive vertices with lines, where each Line is defined by two vertices. Each subsequent line shares a vertex with the previous line. Here, the sub-primitive size is two. The buffer addressing could be configured to use only two of the vertex locations. Vertex data for A is stored in vertex location V0 and vertex data for B is stored in vertex location

VI when processing the first line. When the next line is processed, vertex data for B is stored in vertex location V0, and vertex data for C is stored in vertex location VI. When the next line is processed, vertex data for C is stored in vertex location V0 and so on. The vertex data for a first vertex of a line is always accessed from vertex location V0, and the vertex data for a second vertex of the line is always accessed from vertex location VI. The modulo counter thus uses a maximum value of one, and counts 0, 1, 0, 1, etc. When the modulo counter counts 0, vertex data for each odd-numbered vertex in the line request is accessed from vertex location V0, and when the modulo counter counts 1 , vertex data for each even-numbered vertex in the line request is accessed from vertex location VI. This simplifies the processing code used to process the primitive, which should access vertex data for corresponding vertices of each sub-primitive from the same vertex locations each time.

In other cases, performance is improved by utilizing all of the vertex locations. In the Lstrip example, performance may be improved by using a maximum value of three, so that all four vertex locations are used and the modulo counter would count 0, 1, 2, 3, 0, 1,

2, 3, etc. The performance may be improved by using all four vertex locations, in this case, because vertex data for vertex C is already stored in the input buffer when the first line (vertices A and B) is being processed. The next line (vertices B and C) can then processed without the need for the processor 330 to wait for vertex data for vertex B and vertex C to be stored in input vertex locations.

Other strip primitives preferably are processed in this same way, except that sub- primitives may share up to two vertices. Consider, for example, the Tstrip and the Qstrip shown in Figure 5D. In the case of a Tstrip, each new vertex in the request stream

(starting with the third vertex) forms a sub-triangle with the previous two vertices: ABC, BCD, CDE, etc. . . . In the case of a Qstrip, every second vertex in the request stream (starting with the fourth vertex) forms a sub-quad with the previous three vertices: ABCD, CDEF, EFGH, etc. This processing can be performed in the same manner as described above for consecutive vertices, but must be done every time a new sub-primitive is formed

(i.e., every vertex for a Tstrip, every other vertex for a Qstrip).

In processing primitives such as triangle fans and polygons that require a base vertex, the buffer addressing can be dynamically configured to one less than the number of vertices (i.e., N-1). This leaves one vertex location out of the modulo arithmetic (buffer N, or V3 in a preferred embodiments), which can be used to hold the base vertex data that is a part of each sub-primitive. In OPENGL™ triangle fan and polygon requests, the buffer pointers are set to address vertex location V3, and the first (base) vertex data is then read into vertex location V3 of the input buffer. Using the increment operation with a maximum value of two to advance to the next location results in 3+1=4, which is greater than the maximum of two. V0 thus becomes the next vertex location. In this manner, vertex location V3 is not an addressable vertex location, because the modulo counter cannot generate an integer that designates vertex location V3. The processor can subsequently access the base vertex data as needed by directly addressing vertex location V3 without using the modulo counter. Otherwise, the two most recent vertices can be accessed by using the modulo arithmetic.

In another embodiment, the output buffer includes additional reserved vertex locations necessary for certain graphics operations. These reserved vertex locations are not addressable by the modulo arithmetic. Instead, these reserved vertex locations may be addressed in a manner similar to the addressing of the vertex location holding the base vertex data.

Figure 3B schematically shows an alternate embodiment of the geometry accelerator stage 202 shown in Figure 2. The geometry accelerator further includes a vertex assembler 310 which receives input vertex data and passes it to the input buffer. The vertex assembler 310 communicates with the processor and receives an address which designates a vertex location in the input buffer 320 in which to store vertex data for a single vertex. The vertex assembler 310 also signals the processor 330 when vertex data is ready to be passed to the input buffer. Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention. These and other obvious modifications are intended to be covered by the appended claims.

Claims

We claim:

1. A method of processing a polygon having N vertices, where N is at least four, wherein one of the vertices is designated as the base vertex and the vertices other than the base vertex comprise a set of non-base vertices, and N-2 different triangles are defined by the base vertex and two non-base vertices, the method comprising:

A. providing a buffer for storing vertex data, the buffer including a selected number of vertex locations each for storing vertex data for one vertex; and

B. storing the base vertex data in a first vertex location in the buffer, the base vertex data being retrieved from the first vertex location each time one of the N-2 triangles is processed.

2. The method as defined by claim 1 wherein the base vertex data is stored in the first vertex location until each of the N-2 triangles is processed.

3. The method as defined by claim 1 wherein the buffer includes a set of other vertex locations other than the first vertex location, the method further comprising:

C. processing the vertices in the set of non-base vertices in the set of other vertex locations.

4. The method as defined by claim 3 wherein the set of non-base vertices is processed in the set of other vertex locations in a round robin manner.

5. The method as defined by claim 3 wherein the set of non-base vertices includes N-1 vertices and wherein the set of other vertex locations includes fewer than N-1 vertex locations, step C comprising:

C 1. overwriting vertex data for selected vertices in the set of other vertex locations when each of the vertex locations in the set of other vertex locations is filled with vertex data for other vertices.

6. The method as defined by claim 3 wherein a modulo counter generates integers appropriate for selecting vertex locations storing vertex data.

7. The method as defined by claim 1 wherein the polygon is a T-Fan.

8. A graphics accelerator for processing a polygon having N vertices, the graphics accelerator comprising: means for designating one of the vertices to be the base vertex and the vertices other than the base vertex as a set of non-base vertices; means for defining N-2 different triangles, each triangle being defined by the base vertex and two non-base vertices; and a buffer for storing vertex data, the buffer including a selected number of vertex locations each for storing vertex data for one vertex, the select number being less than N, the base vertex data being stored in a first vertex location in the buffer, the base vertex data being retrieved from the first vertex location each time one of the N-2 triangles is processed.

9. The graphics accelerator as defined by claim 8 wherein the base vertex data is stored in the first vertex location until each of the N-2 triangles is processed.

10. The graphics accelerator as defined by claim 8 wherein the buffer includes a set of other vertex locations other than the first vertex location, the graphics accelerator further comprising: a vertex processor that processes each of the vertices in the set of non-base vertices in the set of other vertex locations.

11. The graphics accelerator as defined by claim 10 wherein the set of non-base vertices is processed in the set of other vertex locations in a round robin manner.

12. The graphics accelerator as defined by claim 10 wherein the set of non-base vertices includes N-1 vertices and wherein the set of other vertex locations includes fewer than N-1 vertex locations, the graphics accelerator further comprising: means for overwriting selected vertices in the set of other vertex locations when each of the vertex locations in the set of other vertex locations is filled with vertex data for other vertices.

13. The graphics accelerator as defined by claim 10 further comprising a modulo counter for incrementing use of successive vertex locations in the set of other vertex locations.

14. The graphics accelerator as defined by claim 8 wherein the polygon is a T-Fan.

15. An apparatus for processing computer graphics requests, the apparatus comprising: a set of input vertex registers for storing input vertex data, the set having at least one reserved register and at least one addressable register; and a processor, coupled to the input vertex registers, having a clock capable of a single clock modulo increment operation yielding an integer appropriate for selecting at least one of the addressable registers, the processor actively accessing at least one of the input vertex registers that is either an addressable register designated by an integer generated by the single clock modulo increment operation or that is a reserved register.

16. An apparatus for processing computer graphics requests according to Claim 15, further comprising: a set of output vertex registers for storing processed output vertex data, at least one of the output vertex registers being actively accessed by the processor

17. An apparatus for processing computer graphics requests according to Claim 16, wherein for a number of input vertex registers there are at least an equivalent number of output vertex registers.

18. An apparatus for processing computer graphics requests according to Claim 17, wherein each input vertex register has an associated output vertex register, and vertex data which is removed from an input vertex register for processing is placed in the associated output vertex register after processing.

19. An apparatus for processing computer graphics requests according to Claim 16 further comprising: a vertex assembler capable of receiving the computer graphics requests in the form of input vertex data and passing the input vertex data to the input vertex registers.

20. An apparatus for processing computer graphics requests according to Claim 19, wherein the vertex assembler communicates with the processor and the processor designates an input vertex register in which to store input vertex data for a single vertex.

21. An apparatus for processing computer graphics requests according to Claim 15 further comprising: a vertex assembler capable of receiving the computer graphics requests in the form of input vertex data and passing the input vertex data to the input vertex registers.

22. An apparatus for processing computer graphics requests according to Claim 21, wherein the vertex assembler communicates with the processor and the processor designates an input vertex register in which to store input vertex data for a single vertex.

93249