WO1998028713A1

WO1998028713A1 - Enhanced methods and systems for caching and pipelining of graphics texture data

Info

Publication number: WO1998028713A1
Application number: PCT/US1997/022979
Authority: WO
Inventors: Ming-Te Lin; Warren Tsai; Tzoyao Chan
Original assignee: Cirrus Logic, Inc.
Priority date: 1996-12-20
Filing date: 1997-12-19
Publication date: 1998-07-02

Abstract

Circuits and methods for enhanced caching and pipelining of graphics texture data within a computer controlled graphics display system. The present invention includes: an improved three level texture map memory caching system; a pixel pipeline coupled to the output of a triangle engine for reducing triangle engine stalls, and a tile-in-tile (e.g., tile and subtile) texture map memory management scheme for increasing texture cache hits. These three systems cooperate together to increase graphics throughput within the computer controlled graphics display system. The three level texture map memory caching system includes a first level cache (L1) residing in the texture engine and contains subtiles that are cached in and out on a least recently used (LRU) basis. The second level cache (L2), e.g., in the off-screen RAM, is a larger cache that caches in and out tile length data. The third level cache (L3) resides in the host computer's main memory. In one embodiment, L1 is approximately 2k bytes, L2 is approximately 128k bytes, and main memory is accessible up to 32M bytes. The pixel pipeline coupled to the triangle engine advantageously allows the processing of triangle polygon information (e.g., color) during a texture data fetch interval of the texture engine. Additionally, the tile-in-tile memory addressing system provides a memory management technique that increases the chances of cache memory hits within the texture engine of the present invention by caching specially arraigned tiles and subtiles thereof during texture map data access operations.

Description

ENHANCED METHODS AND SYSTEMS FOR CACHING AND PIPELINING OF

GRAPHICS TEXTURE DATA

FIELD OF THE INVENTION The present invention relates to the field of computer controlled three dimensional (3D) graphics display systems. Specifically, the present invention relates to a system and method for enhancing data throughput in a 3D graphics unit containing a texture map data retrieval subsystem.

BACKGROUND OF THE INVENTION Computer controlled graphics display systems typically provide data and control signals to graphics hardware units (e.g., "graphics subsystems") which contain specialized circuits and encoded procedures for processing graphics instructions at high speeds. The graphics instructions are usually stored in a "display list" within computer memory. The instructions define the rendering of several types of graphic primitives, e.g., individual points, lines, polygons, fills, BLTs (bit block transfers), textures, etc., and graphics commands. Collections of graphics primitives can be used to render a two dimensional image on a display screen of an object that is represented in three dimensional space. Rendering involves translating the above graphics primitives and graphics instructions into raster encoded data that is then loaded into a frame buffer memory for display ("refresh") on the display screen.

Some polygon graphics primitives include specifications of texture data representative of graphic images to be displayed within the polygon. The operation of texture mapping refers to techniques for adding surface detail to areas or surfaces of the polygons displayed on the two dimensional display screen. Since the original graphics primitive is three dimensional, texture mapping often involves maintaining certain perspective attributes with respect to the surface detail added to the primitive. A typical texture map (data) is stored in memory and includes discrete point elements ("texels") which individually reside in a (u, v) texture coordinate space. A texture image is represented in computer memory as a bitmap or other raster-based encoded format. Further, the display screen includes discrete point elements (pixels) which reside in an (x, y) display coordinate space.

Generally, the process of texture mapping occurs by accessing encoded surface detail texels from a memory unit that stores the surface detail (e.g., an image) and transferring the surface detail texels to predetermined points of the graphics primitive (triangle) to be texture mapped. The individual texels of the texture map data are read out of memory and applied to their polygon in particular fashions depending on the placement and perspective of their associated polygon. Thus, color values for pixels in (x, y) display coordinate space are partly determined based on sampled texture map values. The process of texture mapping operates by applying color or visual attributes of texels of the (u, v) texture map to corresponding pixels of the graphics primitive on the display screen. After texture mapping, a version of the texture image is visible on surfaces of the graphics primitive, with the proper perspective, if any. Visual attributes aside from texture can also be added to the graphics primitive based on color attributes associated with the graphics primitive. The process of texture mapping takes a great demand on the memory capacity of the graphics display system because many texture maps are accessed from memory during a typical display screen update cycle. Since the frequency of the screen update cycles is rapid, the individual polygons of the screen (and related texture map data per polygon) need to be accessed and updated at an extremely rapid frequency requiring great data throughput capacities. In view of the above memory demands, high performance graphics hardware units typically contain low latency cache memory units and cache memory controller units for storing and retrieving texture mapped data at high speeds. With texture caches, as a texture-mapped polygon is processed through the graphics unit, an address check is made by the graphics controller as to whether or not the texture map for the polygon is stored in the texture cache. If the requested memory addresses are not present in the texture cache, the cache controller unit of the prior art system stalls while the desired texture data is obtained from external memory. Usually, the period of time (stall) from the cache controller unit sending out the external memory request until the texture data is actually fetched from the external memory is a relatively long period. During this stall period, certain portions of the graphics unit wait for the cache controller to replace the oldest set of cache data with newly fetched data from an external source. The graphics unit can contain a texture engine for performing the above texture mapping functions for a particular triangle primitive and can also contain a triangle engine for determining other attributes (e.g., color) of pixels within the triangle primitive. The triangle and texture engines in many cases operate in parallel on a respective triangle primitive. During the above mentioned stall period, the triangle engine that operates on the respective triangle polygon typically stalls out until the texture map data access cycle of the texture engine completes. It would be advantageous then to provide a system that allows the triangle engine to continue performing useful work during a texture map data access interval caused by a texture map data cache miss. The present invention provides this advantage. Further, since texture map data cache misses require a relatively lengthy data access interval, prior art texture data access systems use software procedures to "look ahead" in the display lists to predetermine which texture data will be needed by the graphics units before the actual polygons are texture mapped. Although this process is useful in that the software prediction can ready the required texture map data from main memory to the graphics unit, it suffers the disadvantage that the software prediction procedures are slow; therefore, the graphics data throughput of this prior art system may not operate well in many high performance applications. It would be advantageous to provide a caching method that reduces the duration of a data access interval so that software prediction could be eliminated but yet texture data throughput still not compromised upon a cache miss. In this manner, an increase in texture map data throughput can be realized. The present invention provides such advantages.

Moreover, since texture map data cache misses require a relatively lengthy data access cycle interval, it would be advantageous to provide a memory management system that manages texture map data ("texture data") in such a manner as to increase the chances that the texture engine avoids cache misses during texture mapping. The present invention provides such advantages. Accordingly, the present invention provides a system and method for providing efficient graphics data throughput in a computer controlled graphics display system. The present invention also provides a system that allows the triangle engine to continuing performing useful work during a texture data access interval caused by a texture data cache miss within the texture engine. The present invention also provides a caching method that reduces the duration of a data access cycle interval. In addition, the present invention provides a memory management system that manages texture data in such a way as to increase the chances that the texture engine avoids cache misses.

These and other advantageous not specifically recited above will become clear within discussions of the present invention to follow.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram of a computer controlled graphics display system (with graphics subsystem) in accordance with the present invention.

Figure 2 is a block diagram of circuit elements of the 3D graphics subsystem within the computer controlled graphics display system of the present invention.

Figure 3 is a logical diagram of the three level texture map data caching system in accordance with the present invention.

Figure 4 is a flow diagram of steps performed by the three level texture map data caching system in accordance with the present invention. Figure 5 is a diagram of a texture map data set divided into tiles with each tile further divided into subtiles in accordance with the tile-in-tile data memory management system of the present invention.

Figure 6 is a linear memory mapping of subtiles of a tile in accordance with the present invention. Figure 7 is an exemplary diagram of a subtile in accordance with the present invention.

Figure 8 is a diagram illustrating the "tx" locations for different pixel pitches in linear addressing in accordance with the tile data memory management system of present invention.

Figure 9 is a translation matrix illustrating linear addressing mapping, in (ty, dy, tx, dx), within the present invention's tile data memory management system. Figure 10 is a translation matrix illustrating tile address bit mapping, in (ty, tx, dy, dx), within the present invention's tile data memory management system.

Figure 11 is a translation matrix illustrating tile address bit mapping within the present invention's tile data memory management system

Figure 12 is a translation matrix illustrating a subtile address bitmap in accordance with the present invention's tile-in-tile address bit mapping. ST JMMARY OF THE INVENTION

Circuits and methods are described for enhanced caching and pipelining of graphics texture data within a computer controlled graphics display system. The present invention includes: an improved three level texture map memory caching system; a pixel pipeline coupled to the output of a triangle engine for reducing triangle engine stalls during a texture cache miss (causing a data access interval); and a tile-in-tile (e.g., tile and subtile) texture map memory management scheme for decreasing the number of texture cache misses. These three systems cooperate together in accordance with the present invention to increase graphics throughput within the computer controlled graphics display system. The three level texture map memory caching system includes a first level cache (LI) residing in the texture engine and contains subtiles that are cached in and out on a least recently used (LRU) basis. The second level cache (L2), e.g., in the off-screen RAM, is a larger cache that caches in and out tile length data from a third level cache. The third level cache (L3) resides in the host computer's main memory. In one embodiment, LI is approximately 2k bytes, L2 is approximately 128k bytes, and main memory is accessible up to 32M bytes for L3. The pixel pipeline coupled to the triangle engine advantageously allows the processing of triangle polygon information (e.g., color) during a texture data fetch interval of the texture engine resulting from a texture cache miss. Additionally, the tile-in-tile memory addressing system provides a memory management technique that increases the chances of cache memory hits within the texture engine of the present invention by caching neighbor texture data in specially arraigned tiles and subtiles thereof during texture map data access operations.

Specifically, embodiments of the present invention include a circuit for processing graphics data having: a) a texture circuit for performing texture mapping operations on a polygon; b) a polygon circuit for rendering pixels and color attributes of the polygon; c) a multi-level texture cache system for supplying texture map data to the texture circuit, the multi-level texture cache system having: cl) a first cache of a first size for containing a first portion of recently used texture map data, the first cache contained within the texture circuit; c2) a second cache of a second size for containing a second portion of recently used texture map data, the second cache contained within the graphics subsystem and accessed in response to a first cache miss; and c3) a third cache of a third size for containing a third portion of texture map data, the third cache located within a main memory and accessed in response to a second cache miss; d) a pixel pipeline circuit coupled to an output of the polygon circuit for allowing the polygon circuit to render a particular polygon during a texture data miss interval of the texture circuit; and e) a memory management system for reducing texture cache misses by dividing the texture map data into tiles and wherein each tile is further divided into subtiles containing texture map data. Embodiments of the present invention include the above and wherein the second cache provides texture map data in subtile sizes to the first cache in response to the first cache miss and wherein the third cache provides texture map data in tile sizes to the second cache in response to the second cache miss.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the following detailed description of the present invention, circuits and methods for enhanced caching and pipelining of graphics texture data within a computer controlled graphics display system, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be obvious to one skilled in the art that the present invention may be practiced without these specific details or by using alternate elements or processes. In other instances well known processes, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the present invention.

NOTATION AND NOMENCLATURE

Some portions of the detailed descriptions which follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, logic block, process, etc., is herein, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these physical manipulations take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. Herein, these signals are referred to as bits, values, elements, symbols, characters, terms, numbers, or the like with reference to the present invention.

It should be borne in mind, however, that all of these terms are to be interpreted as referencing physical manipulations and quantities and are merely convenient labels and are to be interpreted further in view of terms commonly used in the art. Unless specifically stated otherwise as apparent from the following discussions, it is understood that throughout discussions of the present invention, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data. The data is represented as physical (electronic) quantities within the computer system's registers and memories and is transformed into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

COMPUTER CONTROLLED GRAPHICS DISPLAY SYSTEM

With reference to Figure 1 , a block diagram is shown of a computer controlled graphics display system 112 used in accordance with the present invention. In general, host computer system 112 used by the an embodiment of the present invention comprises a bus 100 for communicating information, one or more host processors 101 coupled with the bus 100 for processing information and instructions, a computer readable volatile memory unit 102 (e.g. random access memory unit) coupled with the bus 100 for storing information and instructions for the host processor 101, a computer readable non-volatile memory unit 103 (e.g., read only memory unit) coupled with the bus 100 for storing static information and instructions for the host processor 101, a computer readable data storage device 104 such as a magnetic or optical disk and disk drive (e.g., hard drive or floppy diskette) coupled with the bus 100 for storing information and instructions, and a display device 105 coupled to the bus 100 for displaying information to the computer user. The display device 105 utilized with the computer system 112 of the present invention can be a liquid crystal device, cathode ray tube, or other display device suitable for creating graphic images and alphanumeric characters recognizable to the user. The host computer system 1 12 provides data and control signals via bus 100 to a graphics hardware system 108. The graphics hardware system 108 contains a 3D graphics subunit 109 which contains circuitry for executing a series of display instructions found within a display list stored in computer memory. The display list generally contains instructions regarding the rendering of several types of graphic primitives, e.g., individual points, lines, polygons, fills, BLTs (bit block transfers), textures, etc. Many of the polygon display instructions include texture data to be displayed within the polygon. Texture data is stored in computer readable memory units of system 112 in the form of raster based data (e.g., in one form its bit mapped) stored in (u,v) coordinates. The individual components (e.g., "texels") of the texture data are read out of memory and applied within their polygon in particular fashions depending on the placement and perspective of their associated polygon. The process of rendering a polygon with associated texture data is called "texture mapping." Texture mapping requires a great demand on the memory capacity of the graphics system 112 because many texture maps are accessed from memory during a typical screen update. Since screen updates need to be performed rapidly, polygons need to be updated very rapidly and further texture maps need to be accessed and applied in extremely rapid fashion, increasing memory demands. The 3D graphics subunit 109 supplies data and control signals over bus 100" to a frame buffer memory 110 (a local memory unit) which refreshes the display device 105 for rendering images (including graphics images) on display device 105. Components of the graphics subsystem 109 are discussed in more detail below.

GRAPHICS SUBUNIT 109 AND PIXEL PIPELINE CIRCUIT 274 Figure 2 illustrates a graphics processing circuit 200 in accordance with the present invention. Circuit 200 includes elements of graphics subsystem 109 and a peripheral components interconnect with accelerated graphics port (PCI/AGP) bus master unit 220, a PCI/AGP bus interface 214 for bus 100, and portions of main memory unit 102. The graphics subsystem 109 contains a texture engine ("circuit") 264, a polygon engine ("triangle engine") 272 and a pixel pipeline circuit 274 coupled to an output of the triangle engine 272. As described to follow, the pixel pipeline circuit 274 advantageously allows the triangle engine 272 to continue to perform rasterization operations computing pixel colors for a given polygon during a texture access interval of the texture engine 264 whereby texture data is obtained for the given polygon. This interval results upon a texture cache miss. In so doing, the pixel pipeline circuit 274 functions to prevent a triangle engine stall during the texture access interval. A pixel mixer circuit 276 is also coupled to receive data that is output from the texture engine 264 and from the pixel pipeline circuit 274. A data blend logic (DBL) circuit 270 is defined to include the triangle engine 272, pixel pipeline circuit 274 and the pixel mixer circuit 276.

The texture engine 264 of Figure 2 receives polygon vertex data over bus 254 (including a tile address) that corresponds to respective polygon primitives (e.g., triangles) to be rendered. The polygon vertex data includes data objects for each vertex of the polygon. With respect to triangle polygons, each of the three vertexes contains: its own position coordinate values (x, y, z); its own color values (red, green, blue); its own texture map coordinate values (u, v), its own perspective value (w), and other required values including an identification of the texture map data ("texture data") for the polygon, e.g., the tile address. The texture engine 264 is responsible for retrieving the texture map data for the polygon and mapping the texels of the texture data onto the pixels of the polygon. Once the texture engine 264 is given the texture map coordinates (u,v) for each vertex of the polygon, it can go to the texture map data and access the matching texels for placement into the triangle. During this texture mapping process, the texture engine 264 maintains the three dimensional perspective of the surface of the polygon.

A multi-level cache system, described below, is used by circuit 200 to retrieve texture map data that is needed by texture engine 264. Aside from the components of the multi-level cache system of the present invention, a number of well known procedures and circuits can be used to maintain the perspective and perform the texture mapping operations implemented within texture engine 264. One such implementation is described within copending patent application serial number , filed , entitled ENHANCED TEXTURE MAP

DATA FETCHING CIRCUIT AND METHOD, attorney docket number CRUS-0512-VDSK, and assigned to the assignee of the present invention. Mapped texture data is supplied from the texture engine 264 to a pixel mixer circuit 276 over bus 284. It is appreciated that a first level cache memory 266 is contained within the texture engine 264 and this first level cache memory 266 is part of the multi-level cache system of the present invention.

The triangle engine 272 of Figure 2 receives polygon color data over bus 254 and performs well known polygon rendering functions with respect to the position, color, and perspective of the polygon primitive. Bus 254 supplies color information while bus 252 supplies Z-buffer information. Essentially, polygon engine 272 uses interpolation to compute the pixel positions and colors of the pixels within the polygon primitive based on the polygon vertex data. A span walker circuit 262 processes a display list data and forwards the data over bus 282 to the triangle engine 272.

Pixel color information (e.g., attribute data) resulting from the polygon engine 272 is forwarded to the pixel pipeline circuit 274 of Figure 2. The pixel pipeline circuit 274 is a first-in- first-out (FIFO) memory unit that stores discrete pixel data for pixels belonging to individual polygons that have been processed by the triangle engine 272. The size of the pixel pipeline circuit 274 can be varied, but in one embodiment, it is large enough to contain data for approximately 100-200 pixels. The size of the pixel pipeline circuit 274 is designed to be sufficient to store pixel data for as many pixels as can be processed by the triangle engine 272 during a texture cache miss interval (causing a contemporaneous texture data fetch interval) of the texture engine 264. In effect, within the present invention, the size of the pixel pipeline circuit 274 is designed to be at least large enough to accommodate the maximum texture data latency for texture cache misses. In one embodiment, the maximum texture data latency results from a level 2 (e.g., second level) cache miss.

A texture data fetch interval of the texture engine 264 commences when texture data required of the texture engine 264 is not present in its first cache memory 266 and completes when the requested texture data is supplied to the first cache memory 266 for use in the texture mapping operations. In these cases, alternative cache memories within the multi-level cache system of the present invention are accessed in order to obtain the required texture data. In normal graphics operation, the triangle engine 272 and the texture engine 264 process a given polygon primitive ("polygon") in parallel, e.g., they receive the same vertex information over bus 254 for the same polygon and commence processing the polygon in parallel. The pixel pipeline circuit 274 of the present invention effectively allows the triangle engine 272 to continue processing a particular polygon during a texture data fetch interval caused by that polygon within the texture engine 264. In accordance with the present invention, the pixel attributes for the polygon, and other subsequently received and processed polygons, are stored into the pixel pipeline circuit 274 as space permits. After a given texture data fetch interval completes, the processed texture data from the texture engine 264 for that polygon is then forwarded to the pixel mixer circuit 276 to be combined with the pixel attributes for that same polygon that were stored in the pixel pipeline circuit 274.

In this manner and in accordance with the present invention, the triangle engine 272 is allowed to perform useful rendering functions during the texture data fetch interval. These useful functions include generating background color attributes for the pixels of a received polygon. In prior art systems, the triangle engine 272 was forced to stall out during a texture data fetch interval, thus decreasing graphics data throughput in these prior art systems. However, the present invention circuit 200 advantageously increases graphics throughput by using the pixel pipeline circuit 274 to buffer the triangle engine's pixel output. Moreover, by placing another texel address FIFO memory circuit on the output of texture engine 264, as described in the above referenced patent application, graphics data throughput can be further improved by buffering the texture data output by texture engine 264.

The pixel mixer circuit 276 of Figure 2 blends the texture data (texels) from the texture engine 264 with the pixel attributes (e.g., background color) from the pixel pipeline circuit 274 that correspond to a same polygon primitive to form a composite polygon image, e.g., background polygon color and texel images are combined. The data (pixels) of the composite image are then forwarded over buses 294 and 292 (for, respectively, Z-buffer writes and color writes) to a memory controller circuit 250. A number of well known memory controller circuits can be used within the scope of the present invention as circuit 250. The composite image data, in raster encoded format, is eventually stored in a raster encoded frame buffer, that can be situated within unit 240 in one embodiment, and eventually displayed on display screen 105 (Figure 1).

MULTI-LEVEL TEXTURE DATA CACHE SYSTEM The multi-level texture data cache system of the present invention includes a number of differently sized cache memories situated in different locations to improve the texture data accessing latencies. By decreasing texture data latencies, and decreasing the occurrences of cache misses using a tile-in-tile memory management approach (described in the next section), the present invention advantageously eliminates the need to perform time consuming software prediction to ready texture data in advance of the texture mapping processes. Instead, texture data is accessed on a "need to use" basis using a relatively speedy texture data accessing system designed in accordance with the present invention with reduced texture cache misses.

The multi-level texture data cache system of the present invention includes a first level cache memory 266, as discussed above, that is contained within the texture engine 264 as shown in Figure 2. The multi-level texture data cache system also includes a second level texture cache memory 242 which is contained in off screen memory of random access memory unit 240 (local memory) and a third level texture cache memory 210 which is contained in main memory unit 102. In one embodiment, the first cache 266 is a 2k byte sized, 4-way, 16 set cache unit that replaces the least recently used texture data such that first cache 266 maintains the most recently used texture data. The first cache 266 also contains two read ports and one write port. In one embodiment, second cache 242 is a 128k byte sized, 4-way, 16 set cache unit that replaces the least recently used texture data such that second cache 242 maintains the most recently used texture data. The third cache 210 is contained within main memory 102 and its size is variable limited only by the largest addressable size of the main memory (e.g., 32 M bytes in one embodiment). The third cache 210 maintains the entire contents of the texture data used by the graphics subsystem 109 for rendering images. The texture data within third cache 210 is accessed via an L3 texture page table 212.

To supply texture data from the third cache 210 a PCI bus configuration is used in one embodiment of the present invention. Main memory 102 is interfaced to a PCI bus interface 214 which is coupled to bus 100 which is coupled to a PCI bus master unit 220. Unit 220 contains page map logic 216 and is coupled to local memory unit 240 which is coupled to communicate with memory controller circuit 250. In physical addressing mode, texture data is loaded into the second cache 242 of the local memory 240 via the host processor 101 (Figure 1) or via the PCI bus master 220. In virtual addressing modes, the multi-level texture cache system is operative wherein the first cache 210, the second cache 242, and the third cache 266 are used in conjunction. In one embodiment, entries within the first cache 266 are 32 bytes each (8 bytes by four lines). These are subtitles. In one embodiment, entries within the second cache 242 are 2k bytes each (64 bytes x 32 lines). These are titles. Moreover, inside the third cache 210, each entry is 2k bytes (64 bytes x 32 lines). These are pages.

Figure 3 illustrates a logical diagram 310 of the relationships of the cache units within the multi-level texture cache system of the present invention. Third cache 210 unit contains the entire set of texture data used for texture mapping operations. Tile sized memory blocks of 2k bytes each are transferred from third cache 210 to second cache 242 upon a second cache miss. Subtile sized memory blocks of 32 bytes are transferred from second cache 242 to third cache 266 upon a first cache miss. Therefore, texture data is transferred from third cache 210 to first cache 266 through second cache 242 to maintain cache coherency. By providing a multi-level texture cache system, the present invention advantageously increases graphics data throughput from main memory 102 to the texture engine 264. In one embodiment of the present invention, the tiles are 2k bytes each and the subtiles are 32 bytes each. Figure 5 illustrates a texture map 410 divided into tiles 412 and each tile divided into multiple subtiles 450. Figure 4 illustrates a flow diagram 320 of the steps performed within the multi-level texture cache system of the present invention. Refer also to Figure 2 and Figure 1. At step 325, the present invention waits until an unprocessed polygon primitive is received by the texture engine

264 over bus 254. When a new unprocessed polygon is received by the texture engine 264, step 330 is entered. At step 330, the texture engine 264 checks first cache 266 to determine whether or not it contains the texture data required by the current polygon. If so, then at step 335, texture engine 264 accesses first cache memory 266 (LI) to obtain the required texture data. Texture mapping operations are then performed for the current polygon and step 325 is re-entered. This represents a first cache hit. Access into the first cache 266 to supply the requested texture data consumes approximately 1-3 clock cycles and is the fastest data access path for the texture engine

264.

At step 330, if the first cache 266 does not contain the desired texture data, then a first cache miss occurs and step 340 is entered. At step 340, a data access request is generated to the second cache (L2) 242 for the texture data ("subtile") that was not found within the first cache 266. This commences a texture data access interval with respect to the texture engine 264. At step 345, in the local memory unit 240, via memory controller 250, it is determined if the second cache 242 contains the texture data required by the current polygon. If so, then at step 350, a memory access request is generated such that a subtile sized block of memory (e.g., 32 bytes) is accessed, via memory controller 250, from the second cache 242 and supplied to the first cache 266. Access into the second cache 242 to supply the requested texture data consumes approximately 10-20 clock cycles and is the second fastest data access path for the texture engine 264. Step 330 is then immediately entered so that the requested texture data will then be found within the first cache 266. The pending texture data access interval continues until the requested texture data is supplied to the first cache 266 and, therefore, to the texture mapping operations for the current polygon. At step 345 of Figure 4, if the required texture data ("subtile") is not found within the second cache 242, then a second cache miss results, step 355 is entered and the pending texture data access interval continues. At step 355, a data access is requested over the PCI bus 100 to the third cache 210 (L3) of memory unit 102 for a tile sized (e.g., 2k bytes) block of texture data that contains the required subtile. At step 360, the host processor 101 (Figure 1), or the PCI bus master unit 220 (Figure 2), performs a texture data access operation from the third cache 210. At step 360, the texture data tile is then returned over PCI bus 100 and stored into the second cache

242. Access into the third cache 210 to supply the requested texture data consumes approximately

100-200 clock cycles and is the slowest data access path for the texture engine 264. Step 345 is then entered so that the requested texture data will then be found within the second cache 242 (and then eventually forwarded to the first cache 266). The pending texture data access interval continues until the requested texture data is supplied to the first cache 266 and, therefore, to the texture mapping operations for the current polygon. The above process 320 continues for each received polygon primitive.

TILE-LN-TILE MEMORY ADDRESSING SYSTEM

The present invention utilizes a tile-in-tile, e.g., "subtile-in-tile," memory management system 410 (Figure 5) to store and retrieve texture data such that texture data is accessed in "blocks" of a stored texture image (e.g., a localized collection of scan line segments) called tiles and subtiles within the present invention, rather than accessed as entire scan lines of texel data. In this fashion, texture data that is accessed from memory is more likely to represent a particular localized screen "area." This type of texture data accessing system is beneficial because certain graphics rendering processes (e.g., filtering) operate on a pixel and its neighboring pixels simultaneously. Therefore, by accessing a block of texture data, rather than a single line of texture data, the memory management system of the present invention is more likely to efficiently provide the neighboring texture data needed for rendering with reduced cache misses. This is the case because neighboring pixel data is frequently provided with a single memory access cycle. It is appreciated that the memory configuration 410 of Figure 5 is represented as displayable image data (texel data) stored in raster format in computer readable volatile memory units of system 112. Therefore, with respect to each of the texel data access operations performed above, in accordance with this embodiment of the present invention, it is assumed that address translations occur to access and store texture data. Within these translations, a "tile" address is translated into a "linear" (e.g., physical) address which is used to actually address the texture data stored in memory.

Although a number of different addressing techniques can be used within the present invention to realize a tile-in-tile memory addressing system, one particular addressing implementation is shown below as an example. In one embodiment, the present invention logically divides the texture data, as shown in Figure 5, into tile sized blocks 412 (e.g., 2k bytes each). In one embodiment of the present invention, a tile memory block 412 is 64 bytes by 32 lines. The tiles 412 are arranged in a matrix of rows of columns over certain texture data and each tile has a separate tile address represented by (tx, ty) coordinate values. As described in more detail below, the tile (and subtile) addresses are translated into linear (e.g., physical) addresses for memory accesses into RAM 102. Inside of each tile 412 is a collection of subtile sized blocks 450 and, in one embodiment, each tile 412 contains an 8x8 matrix of subtiles 450. In one embodiment, each subtile is 8 bytes per line by four lines or 32 bytes total. It is appreciated that the bytes sizes of the tiles 412 and subtiles 450 within the present invention can vary within the scope of the present invention.

With reference to Figure 5, the address of a tile in the horizontal dimension is shown and is called tx and the address of a tile in the vertical dimension is called ty. The tile is then addressed by coordinate (tx, ty). Within a tile, the address number for a subtile in the horizontal dimensional is called dx while the address number of subtile in the vertical dimension is called dy. The subtile is then addressed by coordinate (tx, ty) of its associated tile and then (dy, dx) within the tile.

As the above texture data accesses are proceeding, address translations occur that translate from tile addresses to linear addresses. For instance, during a cache miss interval, the present invention accesses texture data using tile and subtile addresses, e.g., (tx, ty) and (dx, dy), which are translated into linear addresses so that the texture data can be fetched from the memory 102.

With reference to the first 266, second 242 and third 210 caches described above, subtiles 450 are accessed from the second cache 242 to supply to the first cache 266 and tiles 412 are supplied from the third cache 210 to supply to the second cache 242.

In accordance with the tile-in-tile memory addressing system of the present invention, the following expression is used to determine a linear address based on a tile (tx, ty) and subtile address (dx, dy), (assuming no base address):

Linear Address = (ty*32) * pitch + dy * pitch + tx*64 + dx

= ty * 32 * pitch + dy * pitch + tx * 64 + dx where pitch is the pixel pitch (pixels per scan line) and is expressed as follows:

Pitch = 2^W (where w is from 6 to 12).

Assuming the base address is an address by the unit of pitch * 32, the linear address (considering the base address) is expressed as follows:

Linear Address = (ty + base address bits) * 32 * pitch

+ dy * pitch + tx * 64 + dx.

Figure 6 illustrates the tile address mapping (in hexadecimal addresses) of a particular tile 412 having an 8x8 matrix of subtiles 450. The hexadecimal numbers are example tile addresses in which texel data is stored for the corresponding subtile 450. Within each hexadecimal address number, the right digit is the set number and the left digit is the way number. The "dx" subtile addresses are sequential and numbered along 470 (there are eight discrete dx addresses for the tile shown in Figure 6) and likewise the "dy" subtile addresses are sequential and numbered along 460 (there are eight discrete dy addresses for the tile shown in Figure 6). Two exemplary subtiles are shown in Figure 6 with linear addresses 02 hex and 3E hex. It is appreciated that tile 412 is represented as displayable image data stored in computer readable volatile memory units of system 1 12.

In one embodiment, the following address subdivision is used with respect to a particular tile 412. Each tile is 2k bytes, and stored in a 4 way, 16-set cache memory. Each entry is 8-bytes by 4 lines (Figure 7). The address bits are defined as follows:

Bit Number Description

Bit[2:0] byte bit [2:0] in each subtile in a tile

Bit[4:3] set_bit[l:0]

Bit[5] way_bit [0]

Bit[7:6] line_bit[l:0] in each subtile in a tile

Bit[9:8] set bit [3:2]

Bit[10] way_. _bit[l] Figure 7 illustrates a memory configuration of an exemplary subtile 450 which contains four lines 482-488 and eight bytes 490a-490h per line. It is appreciated that subtile 450 is represented as displayable image data stored in computer readable volatile memory units of system

112. The number of texels represented within a byte (e.g., byte 492) within a given line (e.g., line 482) of subtile 450 depends on the selected pitch.

Figure 8 is an exemplary matrix 510 which illustrates the tx bit location in each different pitch configuration, wherein *1 represents a pitch of 2^ or 64 bytes per scan line, *2 represents a pitch of 2^ or 128 bytes per scan line, *3 represents a pitch of 2& or 256 bytes per scan line, *4 represents a pitch of 2^ or 512 bytes per scan line, *5 represents a pitch of 210 or 1024 bytes per scan line, *6 represents a pitch of 2l 1 or 2048 bytes per scan line, and *7 represents a pitch of 212 or 4098 bytes per scan line. With reference to Figure 8, the numbers indexed across the columns represent the tx bit locations in the linear address, 0 to 23, while the values indexed by row are the pitch values, *1 to *7.

Figure 9 is a translation matrix 550 illustrating the linear (physical) address mapping for one embodiment of the present invention using ty, dy, tx, dx addresses showing tx and dy bit locations. With reference to Figure 9, the values indexed by row are the pitch values, * 1 to *7, and the column values represent the linear address. The blanks represent addresses that are not changed during address translation.

Figure 10 is a translation matrix 560 that illustrates the tile address bit mapping in ty, tx, dy and dx for one embodiment of the present invention. With reference to Figure 10, the values indexed by row are the pitch values, *1 to *7 and the column values represent the linear address. The blanks represent addresses that are not changed during address translation. Figure 11 illustrates a tile address bit mapping translation matrix 570 represented with numerals in one embodiment of the present invention using the ty, dy, tx, and dx assignments of Figure 10. With reference to Figure 11, the values indexed by row are the pitch values, * 1 to *7 and the column values represent the linear address. As above, the blanks represent addresses that are not changed during address translation.

Figure 12 illustrates a subtile address bit mapping translation matrix 590 for one embodiment of the present invention with numerals to represent: way_bit, set_bit, line_bit, and byte_bit. With reference to Figure 12, the values indexed by row are the pitch values, *1 to *7 and the column values represent the linear address. The blanks represent addresses that are not changed during address translation.

It is appreciated that in view of the above tile and subtile configurations and further in view of the above translation matrices, the tile address to linear address translations described above can be readily performed.

CONCLUSION The preferred embodiment of the present invention, circuits and methods for enhanced caching and pipelining of graphics texture data within a computer controlled graphics display system, is thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.

Claims

CLAIMSWhat is claimed is:

1 . A circuit for processing graphics data comprising: a) a texture circuit for performing texture mapping operations on a polygon; b) a polygon circuit for rendering pixels and color attributes of said polygon; c) a multi-level texture cache system for supplying texture map data to said texture circuit, said multi-level texture cache system comprising: cl) a first cache of a first size for containing a first portion of recently used texture map data, said first cache contained within said texture circuit; c2) a second cache of a second size for containing a second portion of recently used texture map data, said second cache contained within said graphics subsystem and accessed in response to a first cache miss; and c3) a third cache of a third size for containing a third portion of texture map data, said third cache located within a main memory and accessed in response to a second cache miss; d) a pixel pipeline circuit coupled to an output of said polygon circuit for allowing said polygon circuit to render a particular polygon during a texture data miss interval of said texture circuit; and e) a memory management system for reducing texture cache misses by dividing said third portion of said texture map data into tiles and wherein each tile is further divided into subtiles containing texture map data.

2. A circuit as described in Claim 1 wherein said second cache provides texture map data in subtile sizes to said first cache in response to said first cache miss and wherein said third cache provides texture map data in tile sizes to said second cache in response to said second cache miss.

3. A circuit as described in Claim 2 wherein said tiles are each approximately 2k bytes in size and said subtiles are each approximately 32 bytes in size.

4. A circuit as described in Claim 1 wherein said second cache is contained within off screen memory in a random access memory.

5. A circuit as described in Claim 4 further comprising a memory controller circuit coupled between said random access memory and said first cache.

6. A circuit as described in Claim 1 further comprising a pixel mixer circuit coupled to receive pixel attributes from said pixel pipeline circuit and also coupled to receive texel data from said texel circuit wherein said pixel mixer circuit is for mixing said pixel attributes and said texel data for said polygon.

7. A circuit as described in Claim 2 wherein said first size is approximately 2048 bytes, wherein said first cache is a 4 way, 16 set cache memory, wherein said second size is approximately 128k bytes, and wherein said second cache is a 4 way, 16 set cache memory.

8. A circuit as described in Claim 4 wherein said second cache and said third cache are coupled to communicate over a Peripheral Component Interconnect (PCI) bus and coupled bus master circuit.

9. A computer controlled graphics display system comprising: a) main memory coupled to a bus; b) a host processor coupled to said bus; c) a graphics subsystem for processing graphics data, said graphics subsystem comprising: c 1) a texture circuit for performing texture mapping operations on a graphics primitive; c2) a polygon circuit for rendering pixels and color attributes of said graphics primitives; and c3) a pixel pipeline circuit coupled to an output of said polygon circuit for allowing said polygon circuit to render a particular graphics primitive during a texture data miss interval of said texture circuit; and d) a multi-level cache system for supplying texture map data to said texture circuit during said texture mapping operations, said multi-level cache system comprising: dl) a first cache of a first size for containing a first portion of recently used texture map data, said first cache contained within said texture circuit; d2) a second cache of a second size for containing a second portion of recently used texture map data, said second cache contained within said graphics subsystem and accessed upon a first cache miss; and d3) a third cache of a third size for containing a third portion of texture map data, said third cache located within said main memory and accessed upon a second cache miss; and e) a texture map memory management system for reducing cache misses of said multi-level cache system wherein said third portion of said texture map data is divided into tiles and each tile is divided into subtiles.

10. A system as described in Claim 9 wherein said second cache provides texture map data in subtile sizes to said first cache upon a first cache miss and wherein said third cache provides texture map data in tile sizes to said second cache upon a second cache miss.

1 1. A system as described in Claim 10 wherein said second cache is contained within off screen memory of random access memory.

12. A system as described in Claim 10 further comprising a memory controller circuit coupled between said random access memory and said first cache.

13. A system as described in Claim 10 further comprising a pixel mixer circuit coupled to receive pixel attributes from said pixel pipe and also coupled to receive texel data from said texel circuit, said pixel mixer circuit is for mixing pixel attributes and texel data for a respective graphics primitive.

14. A system as described in Claim 10 wherein said first size is approximately 2048 bytes, wherein said first cache is a 4 way, 16 set cache memory, wherein said second size is approximately 128k bytes, and wherein said second cache is a 4 way, 16 set cache memory.

15. A system as described in Claim 10 wherein said tiles are approximately 2k bytes in size and said subtiles are approximately 32 bytes in size.

16. A system as described in Claim 10 wherein said second cache and said third cache are coupled to communicate over a Peripheral Component Interconnect (PCI) bus and coupled bus master circuit.

17. In a computer controlled three dimensional graphics display system having a host processor coupled to a bus, a main memory coupled to said bus and a graphics subsystem, a method for processing graphics data, said method comprising the steps of: a) supplying texture map data for performing texture mapping operations on a polygon primitive, said step a) comprising the steps of: al) accessing a first cache of a first size to supply texture map data to a texture circuit that performs texture mapping operations, said first cache contained within said texture circuit; a2) accessing a second cache of a second size, in response to a first cache miss, to supply texture map data in subtile sizes to said first cache, said second cache contained within off screen memory; and a3) accessing a third cache, in response to a second cache miss, to supply texture map data in tile sizes to said first cache, said third cache located in said main memory; b) reducing stalls within a polygon circuit that renders color attributes of pixels within said polygon primitive by storing pixel information into a pixel pipeline circuit; and c) reducing texture map data cache misses by dividing texture map data stored in said third cache into tiles and further dividing said tiles into multiple subtiles.

18. A method as described in Claim 17 further comprising the step of using a pixel mixer circuit for mixing pixel attributes from said polygon circuit with texel data from said texture circuit with respect to said polygon primitive.

19. A method as described in Claim 17 wherein said first size is approximately 2k bytes, said first cache is a 4 way, 16 set cache memory, said second size is approximately 128k bytes, and said second cache is a 4 way, 16 set cache memory.

20. A method as described in Claim 17 wherein said subtiles comprise approximately four screen lines of information and eight bytes per line.