US20120206466A1 - Data storage address assignment for graphics processing - Google Patents
Data storage address assignment for graphics processing Download PDFInfo
- Publication number
- US20120206466A1 US20120206466A1 US13/024,579 US201113024579A US2012206466A1 US 20120206466 A1 US20120206466 A1 US 20120206466A1 US 201113024579 A US201113024579 A US 201113024579A US 2012206466 A1 US2012206466 A1 US 2012206466A1
- Authority
- US
- United States
- Prior art keywords
- addresses
- data
- data type
- cache lines
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
Definitions
- This disclosure relates to data storage and, more particularly, to assigning of data storage addresses for graphics processing.
- a device that provides content for visual presentation generally includes a graphics processing unit (GPU).
- the GPU processes and renders pixels that are representative of the content of an image on a display.
- the GPU processes various data types.
- the various types of data are stored in one or more data storage devices.
- the GPU retrieves the data from the one or more storage devices, and processes the data to render pixels on the display.
- a graphics processing unit may process the data of the various data types to render an image for display.
- a processor may store data of the various data types in a storage device, and define segments in which the data of the various data types is stored in the storage device or define segments in which addresses for the data of the various data types is stored in the storage device. Each segment may include a plurality of blocks that are addressable by contiguous addresses.
- the device may also include a common memory cache.
- the common memory cache may store data of the various data types.
- the GPU may be able to quickly retrieve data from the common memory cache.
- the GPU and the processor may be able to quickly retrieve data from the common memory cache.
- the common memory cache may store data for all of the various data types for graphics processing.
- the common memory cache may store data of a first data type, and data of a second data type, where the first and second data types are different data types for graphics processing.
- this disclosure describes a method comprising assigning, with a processing unit, a first contiguous range of addresses for a first data type for graphics processing, and assigning a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, storing, with the processing unit, at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, storing, with the processing unit, at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, and storing, in a plurality of cache lines of a common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- this disclosure describes an apparatus comprising a common memory cache that includes a plurality of cache lines, and a processing unit configured to assign a first contiguous range of addresses for a first data type for graphics processing, and assigns a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, store at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, and store at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, and store, in the plurality of cache lines of the common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- this disclosure describes a computer-readable storage medium comprising instructions that cause one or more processing units to assign a first contiguous range of addresses for a first data type for graphics processing, and assign a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, store at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, store at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, store, in a plurality of cache lines of a common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- this disclosure describes an apparatus comprising means for assigning a first contiguous range of addresses for a first data type for graphics processing, and assigning a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, means for storing at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, means for storing at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, and means for storing, in a plurality of cache lines of a common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- FIG. 1 is a block diagram illustrating a device that may be configured to implement aspects of this disclosure.
- FIG. 2 is a block diagram illustrating some of the components of FIG. 1 in greater detail.
- FIG. 3 is another block diagram illustrating some of the components of FIG. 1 in greater detail.
- FIG. 4 is a flow diagram illustrating an example operation of a device that may be configured to implement aspects of this disclosure.
- FIG. 5 is a flow diagram illustrating an example technique to determine which ones of a plurality of cache lines of a common memory cache are associated with a particular data type.
- FIG. 6 is a flow diagram illustrating an example technique performed by a processing unit.
- aspects of this disclosure may be related to efficient storage of graphics data of various data types for graphics processing. For purposes of illustration, aspects of this disclosure are described in the context where the data is used for graphics processing. However, aspects of this disclosure may be extendable to systems other than graphics processing systems.
- the techniques of this disclosure may be generally applicable to computer graphics systems such as desktop computers, and laptop computers that provide video or image content, digital media players, set-top boxes, mobile video reception devices such as mobile telephones, personal digital assistants (PDAs), video gaming consoles that include video displays, mobile video conferencing units, and the like.
- a graphics processing unit within a device, may process graphics data of various data types to generate viewable content that is displayed on the device.
- the GPU is one example of a processing unit.
- the various data types for graphics processing may include, but are not limited to, texture data, vertex data, instructions, constants, and pixel data.
- the graphics data of the various data types may be stored in a storage device of the device. There may be more data types for graphics processing than the examples provided above.
- the device also includes an input/output memory management unit (IOMMU).
- IOMMU input/output memory management unit
- the IOMMU may provide the GPU with a virtualized address space to storage blocks of the storage device.
- the IOMMU may include a plurality of address blocks. Each address block may store an address for where the graphics data of the various data types is stored in the storage device. Each address block of the IOMMU may be individually accessible by the GPU.
- a processor within the device, may fragment the address space of IOMMU into a plurality of segments.
- the processor may be one example of a processing unit.
- Each segment may include a plurality of address blocks which may be addressable with contiguous addresses.
- the processor may assign each segment to store addresses for where graphics data of a particular data type is stored in the storage device. For example, a first segment of the address space of the IOMMU may include address blocks that are addressable with contiguous addresses 0 - 15 , a second segment of the address space of the IOMMU may include address blocks that are addressable with contiguous addresses 16 - 31 , and so forth.
- the processor may assign address blocks, addressable by contiguous addresses 0 - 15 of the first segment, to store addresses for where graphics texture data is stored in the storage device.
- the processor may assign address blocks, addressable by contiguous addresses 16 - 31 of the second segment, to store addresses for where graphics vertex data is stored in the storage device, and so forth.
- the contiguous addresses for address blocks e.g., 0 - 15 and 16 - 31 , are provided for illustration purposes only, and are not limiting.
- the IOMMU may appear to be the device that stores the graphics data of the various data types for graphics processing. For example, when the processor or GPU reads or writes data, the processor or GPU reads or writes data as if it is being read from or written to the IOMMU.
- the IOMMU may maintain a map of where the read or written data is actually stored in the storage device. The map of where the data is actually stored in the storage device may be considered as a virtual address space.
- the processor within the device, may fragment the storage space of the storage device into a plurality of segments, rather than the IOMMU.
- the IOMMU may not be needed, although aspects of this disclosure should not be considered so limited.
- Each segment may include a plurality of storage blocks which may be addressable by contiguous addresses.
- the processor may assign each segment to store graphics data of a particular data type. For example, a first segment of the storage device may include storage blocks which may be addressable by contiguous addresses 0 - 15 , a second segment of the storage device may include storage blocks which may be addressable by contiguous addresses 16 - 31 , and so forth.
- the processor may assign address blocks, addressable by contiguous addresses 0 - 15 of the first segment, to store graphics pixel data.
- the processor may assign address blocks, addressable by contiguous addresses 16 - 31 of the second segment, to store instructions for graphics processing, and so forth.
- the contiguous address for storage blocks e.g., 0 - 15 and 16 - 31 , are provided for illustration purposes only, and are not limiting.
- the device may also include a common memory cache.
- the common memory cache may include a plurality of cache lines, where each cache line may be configured to store graphics data for any of the data types for graphics processing.
- the common memory cache may be configured to store texture data, vertex data, instructions, constants, and pixel data within the one or more cache lines of the common memory cache.
- a cache line may be considered as a fixed sized block of memory for storage.
- the common memory cache may store graphics data for quick access by the processor or GPU.
- Each cache line within the common memory cache may include at least two fields.
- a first field may store an address to one of the address blocks of the IOMMU or an address to one of the storage blocks of the storage device.
- the address block of the IOMMU may include the address within the storage device where the graphics data of a data type is stored.
- a second field of the cache line may store the actual graphics data.
- the processor may need to invalidate some of the cache lines within the common memory cache that store the graphics data for that data type.
- the processor may store a null data value in the second field of that cache line. Invalidating the cache lines within the common memory cache may indicate to the GPU that graphics data stored in an invalidated cache line is not current. This may cause the GPU to retrieve the graphics data for the data type from the storage device, rather than from the common memory cache, because the cache lines that store the graphics data do not store current data.
- the GPU may determine which cache lines store graphics data of that data type. To determine which cache lines store graphics data of that data type, the GPU may interrogate the first data field of each of the cache lines to determine whether the first data field of each of the cache lines stores an address that is within the assigned contiguous range of addresses for the address blocks or storage blocks, within the IOMMU or storage device, respectively, for that data type.
- the graphics data type is texture data for graphics processing.
- the processor assigned address blocks, for storage of addresses for where the texture data is stored on the storage device, the contiguous addresses 0 - 15 within the IOMMU.
- the processor may invalidate each cache line, in the common memory cache, that stores texture data.
- the GPU may determine whether the first field of each of the cache lines stores an address that is within 0 - 15 . If a cache line stores an address that is within 0 - 15 , the GPU may invalidate that cache line.
- the GPU may invalidate one or more of the cache lines that store graphics data for that particular data type.
- the GPU may not invalidate any of other cache lines in the common memory cache. For example, the GPU may not invalidate any of the cache lines that do not store data for that particular data type.
- FIG. 1 is a block diagram illustrating a device 10 that may be configured to implement aspects of this disclosure.
- Examples of device 10 include, but are not limited to, mobile wireless telephones, personal digital assistants (PDAs), video gaming consoles that include video displays, mobile video conferencing units, laptop computers, desktop computers, tablet computers, television set-top boxes, digital media players, and the like.
- Device 10 may include processor 12 , graphics processing unit (GPU) 14 , display 16 , display buffer 18 , storage device 20 , transceiver module 22 , user interface 24 , common memory cache 26 , and input/output memory management unit (IOMMU) 28 .
- GPU graphics processing unit
- IOMMU input/output memory management unit
- Processor 12 and GPU 14 may each be examples of a processing unit.
- Device 10 may include additional modules or units not shown in FIG. 1 for purposes of clarity.
- device 10 may include a speaker and a microphone, neither of which are shown in FIG. 1 , to effectuate telephonic communications in examples where device 10 is a mobile wireless telephone.
- the various modules and units shown in device 10 may not be necessary in every example of device 10 .
- user interface 24 and display 16 may be external to device 10 in examples where device 10 is a desktop computer.
- IOMMU 28 may not be necessary in every example, as described in more detail below.
- processor 12 GPU 14 , common memory cache 26 , and IOMMU 28 are illustrated as separate units, aspects of this disclosure are not so limited.
- GPU 14 , common memory cache 26 , and IOMMU 28 may be formed within processor 12 , e.g., one processing unit may include processor 12 and GPU 14 , as well as, common memory cache 26 and IOMMU 28 .
- processor 12 may include IOMMU 28 and GPU 14 may include common memory cache 26 .
- Different combinations of the configuration of processor 12 , GPU 14 , common memory cache 26 , and IOMMU 28 may be possible, and aspects of this disclosure contemplate the different combinations.
- processor 12 and GPU 14 which may be each considered as a processing unit, include, but are not limited to, a digital signal processor (DSP), general purpose microprocessor, application specific integrated circuit (ASIC), field programmable logic array (FPGA), or other equivalent integrated or discrete logic circuitry.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable logic array
- Storage device 20 may comprise one or more computer-readable storage media.
- Examples of storage device 20 include, but are not limited to, a random access memory (RAM), a read only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer or a processor.
- RAM random access memory
- ROM read only memory
- EEPROM electrically erasable programmable read-only memory
- CD-ROM or other optical disk storage such as CD-ROM or other optical disk storage
- magnetic disk storage or other magnetic storage devices
- flash memory or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer or a processor.
- storage device 20 may include instructions that cause processor 12 and/or GPU 14 to perform the functions ascribed to processor 12 and GPU 14 in this disclosure.
- Transceiver module 22 may include circuitry to allow wireless or wired communication between device 10 and another device or a network. Transceiver module 22 may include modulators, demodulators, amplifiers and other such circuitry for wired or wireless communication.
- Processor 12 may execute one or more applications. Examples of the applications include web browsers, e-mail applications, spreadsheets, video games, or other applications that generate viewable images for presentment.
- the one or more applications may be stored within storage device 20 .
- processor 12 may download the one or more applications via transceiver module 22 .
- Processor 12 may execute the one or more applications based on a selection by a user via user interface 24 . In some examples, processor 12 may execute the one or more applications without user interaction.
- Each of the viewable images generated by processor 12 may be two dimensional (2-D) or three dimensional (3-D) images formed with a plurality of polygons, which may be referred to as primitives.
- Processor 12 may determine the coordinates for the vertices of the polygons.
- a polygon is a triangle, although, polygons should not be considered limited to triangles.
- examples in this disclosure are described in the context of the polygons being triangles.
- processor 12 may determine the coordinates for the three vertices for each triangle.
- the coordinates for each vertex for each triangle may comprise an x and y coordinate.
- the coordinates for each vertex for each triangle may comprise an x, y, z, and w coordinate, where the w coordinate is a homogenous coordinate which may be beneficial to identify a vertex that is infinitely far away.
- the determined vertex coordinates for polygons are referred to as vertex data.
- Processor 12 may store the vertex data in storage device 20 .
- Processor 12 may also determine various other attributes for the determined vertices. For example, for each vertex, processor 12 may determine color values, which are referred to as pixel data. Each color value may include three or four components, e.g., red, green, and blue components, or red, green, blue, and transparency factor. Additional color coordinates may be used in some devices. Processor 12 may store the pixel data in storage device 20 .
- Processor 12 may also store a texture image in storage device 20 .
- a texture image may be an image that is applied to the polygons to make the polygons appear more realistic.
- the texture image may generally be a two dimensional array of texture data; however, the texture image may also be a one or three dimensional array of texture data. For purposes of illustration, aspects of this disclosure are described in the context of a two dimensional array of texture data.
- Processor 12 may store texture data within the two dimensional array based on coordinates of the array.
- the coordinates of the array may be (u, v), where the coordinate u is along the x-axis of the two dimensional array, and the coordinate v is along the y-axis of the two dimensional array.
- processor 12 may store the texture data, in storage device 20 , for a texture image that is to be applied to a polygon at locations within the array that correspond to the coordinates of the polygon.
- the vertex data, pixel data, and texture data are examples of different data types for graphics processing which GPU 14 may use to render an image on display 16 .
- GPU 14 may also utilize graphics data of other data types for graphics processing to render the image on display 16 .
- GPU 16 may utilize rendering instructions stored in storage device 20 to render the image on display 16 .
- the instructions stored in storage device 20 may be another example of a data type for graphics processing.
- GPU 16 may utilize constants stored in storage device 20 to render the image on display 16 .
- the constants stored in storage device 20 may be another example of a data type for graphics processing.
- GPU 14 may implement a graphics pipeline that uses graphics data from the various data types to render the image.
- the graphics pipeline may be implemented via at least some hardware.
- the graphics pipeline may be implemented as software executing on GPU 14 , firmware executing on GPU 14 , one or more hardware units formed on GPU 14 , or a combination thereof.
- the graphics pipeline may include multiple components.
- the graphics pipeline, of GPU 14 may include a vertex shader that retrieves the vertex data and transforms the coordinates of the vertices into another coordinate system, and calculates light values for the vertices.
- the graphics pipeline, of GPU 14 may also include a primitive assembler that determines the depth (z) of pixels within a polygon.
- the graphics pipeline, of GPU 14 may also include an early depth test engine that removes pixels within the polygons from further processing if those pixels are subsequently occluded by other pixels.
- the graphics pipeline, of GPU 14 may also include a pixel shader that interpolates color values for pixels within a polygon based on the color values for the vertices of the polygon.
- the various components of the graphics pipeline are provided for illustration purposes and should not be considered limiting. In alternate examples, the graphics pipeline may include more components than those described above. The graphics pipeline may also include fewer components than those described above.
- GPU 14 may output the rendered portions of the image to display buffer 18 .
- Display buffer 18 may temporarily store the rendered image until the entire image is rendered.
- Display buffer 18 may be considered as a frame buffer.
- Display buffer 18 may then transmit the rendered image to be displayed on display 16 .
- GPU 14 may output the rendered portions of the image directly to display 16 for display, rather than temporarily storing the image in display buffer 18 .
- Display 16 may comprise a liquid crystal display (LCD), a cathode ray tube (CRT) display, a plasma display, or another type of display device.
- LCD liquid crystal display
- CRT cathode ray tube
- plasma display or another type of display device.
- Storage device 20 may include a plurality of storage blocks. Each storage block may store graphics data of the various data types for graphics processing. Each storage block may be individually addressable. In some examples, processor 12 may fragment the storage space provided by storage device 20 into data storage segments. Each segment may include a plurality of data storage blocks that are each addressable by contiguous memory addresses. Hence, each data storage segment may be defined by a range of contiguous memory addresses.
- Processor 12 may assign each data storage segment for a particular data type. For example, processor 12 may assign a first segment to store graphics data of a first data type, and assign a second segment to store graphics data of a second data type.
- the storage blocks of the first segment may be addressable by contiguous addresses.
- processor 12 may assign a first range of contiguous addresses for the blocks in the first segment. The addresses of each storage block of the first segment may be within the first range of contiguous addresses. Similarly, the addresses of storage block of the second segment may be within a second range of contiguous addresses assigned by processor 12 .
- device 10 may include IOMMU 28 .
- IOMMU 28 may not be necessary in every example of device 10 .
- IOMMU 28 may not be necessary in examples where processor 12 assigns contiguous addresses for the graphics data, of the various data types, stored in storage device 20 .
- aspects of this disclosure should not be considered so limited.
- Device 10 may include IOMMU 28 even in examples where processor 12 assigns contiguous addresses for the graphics data, of the various data types, stored in storage device 20 .
- IOMMU 28 may be implemented as one or more hardware units, software executing on the hardware units, firmware executing on the hardware units, or any combination thereof.
- IOMMU 28 is one or more hardware units
- examples of IOMMU 28 may include, but are not limited to, a DSP, general purpose microprocessor, ASIC, FPGA, or other equivalent integrated or discrete logic circuitry.
- IOMMU 28 may provide GPU 14 with a virtualized address space to storage blocks of storage device 20 such that IOMMU 28 appears, e.g., to GPU 14 , to be the device that stores the graphics data of the various data types.
- IOMMU 28 may be a hardware component responsible for handling accesses to storage device 20 requested by processor 12 and/or GPU 14 .
- IOMMU 28 may include a table or other data structure, e.g., registers, indicating a plurality of address blocks. Each address block may store an address of one of the storage blocks of storage device 20 . Each address block may be individually addressable.
- processor 12 or GPU 14 desires to read or write data
- software executing on processor 12 or GPU 14 calls out an address of one of the address blocks of IOMMU 28 as if processor 12 or GPU 14 is reading from or writing to the address block, which is called out by its address.
- IOMMU 28 may then determine which storage block corresponds to the address stored in the address block of IOMMU 28 .
- IOMMU 28 may then write to or read from the storage block that corresponds to the address stored in the address block of IOMMU 28 .
- storage device address or addresses may refer to an address or addresses of storage blocks within storage device 20 .
- IOMMU address or addresses may refer to an address or addresses of address blocks within IOMMU 28 .
- each address block of IOMMU 28 may store a storage device address.
- IOMMU 28 may map IOMMU addresses to storage device addresses. In this manner, when processor 12 or GPU 14 calls out an IOMMU address, IOMMU 28 may determine the storage device address for the storage block that stores the graphics data based on the mapping by determining a storage device address that corresponds to the IOMMU address, e.g., as indicated by a table, register, matrix or other data structure.
- processor 12 may fragment the memory space of storage device 20 into segments. Each segment may include a plurality of storage blocks that are each addressable by contiguous storage device addresses. In some examples, processor 12 may fragment the address space of IOMMU 28 into segments. Each segment may include a plurality of address blocks that are each addressable by contiguous IOMMU addresses.
- processor 12 may assign a first range of contiguous IOMMU addresses to address blocks within IOMMU 28 .
- the first range of contiguous IOMMU addresses are for address blocks that store the addresses for where graphics data of the first data type is stored in storage device 20 .
- Processor 12 may also assign a second range of contiguous IOMMU addresses to address blocks within IOMMU 28 .
- the second range of contiguous IOMMU addresses are for address blocks that store the addresses for where graphics data of the second data type is stored in storage device 20 .
- Processor 12 may similarly assign ranges of contiguous IOMMU addresses for the various data types for graphics processing.
- assigning contiguous storage device addresses for graphics data of each data type, or assigning contiguous IOMMU addresses for graphics data of each data type may be advantageous.
- processor 12 and GPU 14 may not need to track the exact addresses where graphics data for each of the data types is stored in storage device 20 .
- Processor 12 and GPU 14 may only track the range of contiguous storage device addresses or IOMMU addresses.
- Each range of contiguous storage device addresses or IOMMU addresses may be considered as a sub-heap.
- Processor 12 may be considered as an allocator that assigns sub-heaps, e.g., ranges of contiguous storage device addresses or IOMMU addresses, to storage blocks within storage device 20 or address blocks within IOMMU 28 .
- GPU 14 desires to retrieve texture data for graphics processing. Also, assume that the texture data is not stored in storage blocks that are contiguously addressable, as what happens in conventional heap allocators. In this instance, GPU 14 would need to track the storage device addresses for each one of the texture data, which may be processing inefficient. In aspects of this disclosure, GPU 14 may not need to track each individual storage device address, and may track the range of the storage device addresses, which may promote efficient processing.
- processor 12 and GPU 14 may be less likely to retrieve incorrect data.
- software executing on GPU 14 such as a shader program, may cause GPU 14 to retrieve data of a particular data type, e.g., texture data.
- the shader program may be written by application developers using the syntax of the OpenCL version 1.1 specification, as one non-limiting example. However, aspects of this disclosure should not be considered limited to examples where the shader program is written in accordance with the OpenCL version 1.1 specification.
- the software executing on GPU 14 may provide GPU 14 with the addresses from where to retrieve the texture data.
- processor 12 may instruct GPU 14 with the range of contiguous IOMMU addresses for the texture data.
- the software executing on GPU 14 may then ensure that the addresses that it provides to GPU 14 for retrieving the texture data is within the range of contiguous IOMMU addresses.
- the shader program may ensure that GPU 14 does not try to access data that is out of range for the type of data that the shader program requested.
- device 10 may also include common memory cache 26 .
- common memory cache 26 is illustrated as being external to processor 12 and GPU 14 , in some examples, common memory cache 26 may be formed within a processing unit, e.g., processor 12 or GPU 14 , or partially formed in both processor 12 and GPU 14 .
- Common memory cache 26 may store graphics data for quick access by processor 12 or GPU 14 .
- the processing units e.g., processor 12 and GPU 14 , may be able to retrieve graphics data from common memory cache 26 quicker than retrieving graphics data from storage device 20 .
- Common memory cache 26 may include a plurality of cache lines, where each cache line may be configured to store graphics data for any of the graphics data types for graphics processing. For instance, common memory cache 26 may store texture data, vertex data, instructions, constants, and pixel data within the one or more cache lines of common memory cache 26 . Common memory cache 26 may promote efficient storage because common memory cache 26 is configured to store graphics data for the various data types, rather than multiple caches that store graphics data for a single data type. For example, device 10 may not need to include a texture data cache, pixel data cache, vertex data cache, instruction cache, and a constant cache because common memory cache 26 can store texture data, pixel data, vertex data, instructions, and constants in the same cache.
- a cache line may be considered as a fixed sized block of memory for storage.
- Each cache line may include two fields.
- a first field may store an address to one of a storage device address or IOMMU address, e.g., an address to one of the storage blocks of storage device 20 or an address to one of the address blocks of IOMMU 28 .
- a second field may store the actual graphics data of the various data types.
- processor 12 may need to invalidate cache lines within common memory cache 26 that store graphics data of that data type. For example, some of the cache lines within common memory cache 26 may store texture data for a current texture image. When processor 12 stores a new texture image, processor 12 may need to invalidate these cache lines within common memory cache 26 that store texture data because the texture image changed. As another example, some of the cache lines within common memory cache 26 may store vertex data for a current polygon that is being processed. When GPU 14 needs to process another polygon, processor 12 may need to invalidate these cache lines within common memory cache 26 that store vertex data because the vertices of the current polygon are no longer being processed.
- processor 12 or GPU 14 may store a null data value in the second field of that cache line. Invalidating the cache lines within common memory cache 26 may indicate to processor 12 and GPU 14 that data stored in an invalidated cache line is not current.
- a “cache miss” may occur when processor 12 or GPU 14 retrieve data from an invalidated cache line.
- processor 12 or GPU 14 may retrieve the graphics data from storage device 20 because the invalidated cache lines in common memory cache 26 do not store current data.
- processor 12 or GPU 14 may also store the retrieved graphics data in cache lines of common memory cache 26 for quick subsequent access.
- Processor 12 or GPU 14 may determine which cache lines of common memory cache 26 should be invalidated based on which data type changed. For example, if the texture image changed, GPU 14 may determine which cache lines of common memory cache 26 store texture data. In this example, GPU 14 may invalidate one or more cache lines that store texture data, and may not invalidate any of the cache lines that store data for data types other than texture data.
- processor 12 or GPU 14 may only invalidate some of cache lines of common memory cache 26 , and not invalidate all of common memory cache 26 every time graphics data for a particular data type changes. Not invalidating all of the cache lines of common memory cache 26 may further promote efficient storage because processor 12 or GPU 14 may not need to retrieve all of the graphics data that was previously stored in common memory cache 26 after each invalidation. Processor 12 or GPU 14 may only need to retrieve graphics data that changed and only store the changed data into common memory cache 26 .
- processor 12 or GPU 14 may determine which cache lines of common memory cache 26 store graphics data of that data type. For instance, GPU 14 may interrogate the first data field of each of the cache lines to determine whether the first data field of each of the cache lines stores an address that is within the assigned contiguous range of storage device address or IOMMU addresses for that data type. For example, assume the graphics data type is pixel data. Further assume that processor 12 assigned address blocks, within IOMMU 28 , the contiguous IOMMU addresses 16 - 31 . Contiguous IOMU addresses 16 - 31 may each store addresses of storage blocks of storage device 20 where the pixel data is stored, in this example.
- GPU 14 may need to retrieve the pixel data from storage device 20 .
- processor 12 may invalidate sufficient number of cache lines, in the common memory cache 26 , that store pixel data so that processor 12 can store the retrieved pixel data.
- Processor may determine whether the first field, of each of the cache lines, stores an address that is within 16 - 31 . If a cache line stores an address that is within 16 - 31 , processor 12 may invalidate that cache line.
- processor 12 may assign contiguous storage device addresses or IOMMU addresses, processor 12 may more easily determine whether a cache line stores graphics data of a particular data type. For example, if the storage device addresses or the IOMMU addresses were not contiguous, then processor 12 would need to track every single storage device address or IOMMU address for that particular data type. Processor 12 would then need to compare the address stored in the first data field of a cache line with every single storage device address or IOMMU address for that particular data type to determine whether a cache line stores graphics data of that particular data type.
- processor 12 may compare the address stored in the first field of each cache line to the range of contiguous storage device addresses or IOMMU addresses, not individual storage device addresses or IOMMU addresses, to determine whether a cache line stores graphics data of a particular data type.
- FIG. 2 is a block diagram illustrating some of the components of FIG. 1 in greater detail.
- FIG. 2 illustrates common memory cache 26 and storage device 20 , of FIG. 1 , in greater detail.
- storage device 20 includes twelve storage blocks 0 - 11 (collectively referred to as “storage blocks”).
- Storage device 20 may include more or fewer storage blocks then twelve storage blocks.
- storage device 20 may be a 4 Giga-byte (GB) storage device; however, aspects of this disclosure are not so limited.
- GB Giga-byte
- storage blocks 0 - 11 are not ordered consecutively. This is to illustrate that, in some examples, storage blocks 0 - 11 do not necessarily have to be contiguous on storage device 20 , although it may be possible for storage blocks 0 - 11 to be contiguous on storage device 20 .
- Each one of storage blocks 0 - 11 may be individually addressable by its address.
- FIG. 2 illustrates storage device addresses 0 - 11 .
- Storage device addresses 0 - 11 may be fragmented, i.e., divided, into segments where each segment comprises a range of storage device addresses.
- Processor 12 may assign each range of storage device addresses to a particular data type. The range of storage device addresses, for each data type, may be contiguous.
- range of storage device addresses 32 may be contiguous storage device addresses for storage blocks that store texture data, e.g., storage blocks 4 , 8 , and 1 .
- Range of storage device addresses 32 may include storage addresses 0 - 2 .
- Range of storage device addresses 34 may be contiguous storage device addresses for storage blocks that store vertex data, e.g., storage blocks 10 and 5 .
- Range of storage device addresses 34 may include storage addresses 3 and 4 .
- Range of storage device addresses 36 may be contiguous storage device addresses for storage blocks that store instructions, e.g., storage blocks 2 and 9 .
- Range of storage device addresses 36 may include storage addresses 5 and 6 .
- Range of storage device addresses 38 may be contiguous storage device addresses for storage blocks that store constants, e.g., storage blocks 0 and 6 . Range of storage device addresses 38 may include storage addresses 7 and 8 . Range of storage device addresses 40 may be contiguous storage device addresses for storage blocks that store pixel data, e.g., storage blocks 11 , 3 , and 7 . Range of storage device addresses 40 may include storage addresses 9 - 11 . There may be more or fewer texture data, vertex data, instructions, constants, and pixel data than illustrated in FIG. 2 .
- common memory cache 26 may include cache lines 42 A- 42 F (collectively referred to as “cache lines 42 ”). There may be more or fewer cache lines 42 than illustrated in FIG. 2 .
- common memory cache 26 may be a level 2 (L2) cache.
- common memory cache 26 may be a 32 kilobyte (KB) 8-way set associated L2 cache with fast range invalidate.
- Each one of cache lines 42 may include an address field 30 A and a data field 30 B.
- Address field 30 A may indicate the storage device address for the graphics data stored in data field 30 B.
- address field 30 A of cache line 42 A indicates that the address for the graphics data stored in data field 30 B is 1.
- data field 30 B of cache line 42 A stores second texture data which corresponds to storage block 8 because the storage device address of storage block 8 is 1.
- Cache lines 42 of common memory cache 26 that store the same data type may be considered as a set of cache lines 42 .
- cache line 42 B and cache line 42 E each store vertex data.
- cache line 42 B and cache line 42 E may form a set of cache lines that store vertex data.
- cache line 42 C, cache line 42 D, and cache line 42 F each store pixel data.
- cache line 42 C, cache line 42 D, and cache line 42 F may form a set of cache lines that store pixel data.
- a set of cache lines does not imply that only cache lines in that set can store a particular graphics data type. For instance, as illustrated in FIG. 2 , cache lines 42 A and 42 E store vertex data. However, in alternate examples, cache lines 42 A and 42 E may store other graphics data types, e.g., texture data, pixel data, instruction data, and constant data. The phrase “a set of cache lines” is used to indicate a group of cache lines 42 that store similar graphics data types.
- a set of cache lines may not be contiguous.
- cache lines 42 B and 42 E form a set of cache lines, but are not contiguous.
- cache lines 42 C, 42 D, and 42 F form a set of cache lines, but are not contiguous. In alternate examples, it may be possible for a set of cache lines to be contiguous.
- processor 12 may invalidate one of cache lines 42 if the graphics data changes for the type of data that the one of cache lines 42 stores. For example, if the texture image changes, processor 12 may invalidate one or more cache lines of cache lines 42 that store texture data. To determine whether one of cache lines 42 stores texture data, processor 12 may compare address field 30 A of each of cache lines 42 to the contiguous range of storage device addresses assigned to texture data.
- processor 12 assigned contiguous range of storage device addresses 32 , which include contiguous storage device addresses 0 - 2 , to texture data.
- processor 12 may compare address field 30 A of each of cache lines 42 to determine whether it is within the contiguous range of storage device addresses 32 .
- Address field 30 A of cache line 42 A is within the contiguous range of storage device addresses 32 , e.g., 1 is within 0-2.
- processor 12 may replace the second texture data, stored in data field 30 B of cache line 42 A, with a null data. Processor 12 may then retrieve new texture data from storage device 20 and may store the texture data in cache line 42 A.
- Processor 12 may not need to store the texture data in cache line 42 A in every example. In some examples, processor 12 may also update address field 30 A of cache line 42 A if the storage device address for the retrieved new texture data is stored at a storage device address that is different than 1.
- FIG. 3 is another block diagram illustrating some of the components of FIG. 1 in greater detail.
- FIG. 3 illustrates common memory cache 26 , IOMMU 28 , and storage device 20 , of FIG. 1 , in greater detail.
- Storage device 20 may be similar to storage device 20 illustrated in FIG. 2 .
- processor 12 may not assign ranges of contiguous storage device addresses.
- the addresses for each one of the storage blocks of storage device 20 are not order consecutively.
- the storage device address for a storage block corresponds to the identifier of the storage block.
- the storage device address for storage block 4 of storage device 20 is 4, the storage device address for storage block 8 of storage device 20 is 8, and so forth.
- the storage device address for a storage block is not limited to the identifier of the storage block.
- FIG. 3 illustrates IOMMU 28 in greater detail.
- IOMMU 28 includes twelve address blocks 0 - 11 .
- IOMMU 28 may include more or fewer address blocks then twelve address blocks.
- IOMMU 28 may provide GPU 14 with a virtualized address space to storage blocks 0 - 11 of storage device 20 .
- Each one of address blocks 0 - 11 may be individually addressable by its address.
- FIG. 3 illustrates IOMMU addresses 0 - 11 .
- IOMMU addresses 0 - 11 may be fragmented into segments where each segment comprises a range of IOMMU addresses.
- Processor 12 may assign each range of IOMMU addresses to a particular data type.
- the range of IOMMU addresses, for each data type, may be contiguous.
- range of IOMMU address 44 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks of storage device 20 that store texture data, e.g., storage device addresses 4 , 8 , and 1 .
- Range of IOMMU addresses 44 may include IOMMU addresses 0 - 2 .
- Range of IOMMU address 46 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks of storage device 20 that store vertex data, e.g., storage device addresses 10 and 5 .
- Range of IOMMU addresses 46 may include IOMMU addresses 3 and 4 .
- Range of IOMMU address 48 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks of storage device 20 that store instructions, e.g., storage device addresses 2 and 9 . Range of IOMMU addresses 48 may include IOMMU addresses 5 and 6 . Range of IOMMU address 50 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks of storage device 20 that store constants, e.g., storage device addresses 0 and 6 . Range of IOMMU addresses 50 may include IOMMU addresses 7 and 8 . Range of IOMMU address 52 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks of storage device 20 that store pixel data, e.g., storage device addresses 11 , 3 , and 7 . Range of IOMMU addresses 52 may include IOMMU addresses 9 - 11 . As with FIG. 2 , there may be more or fewer texture data, vertex data, instructions, constants, and pixel data the illustrated in FIG. 3 .
- common memory cache 26 may include cache lines 42 .
- each one of cache lines 42 may include address field 30 A and data field 30 B.
- address field 30 A may be indicate the IOMMU address for the address block of IOMMU 28 that stores the address for where the graphics data stored in data field 30 B is stored in storage device 20 .
- address field 30 A of cache line 42 A indicates that the address for the address block of IOMMU 28 is 1.
- IOMMU address 1 is for address block 1 of IOMMU 28 .
- Address block 1 of IOMMU stores storage device address 8 .
- Storage device address 8 corresponds to storage block 8 in storage device 20 . As illustrated in FIG. 3 , storage block 8 of storage device 20 stores the second texture data. Data field 30 B of cache line 42 A stores the second texture data which corresponds to storage block 8 with storage device address 8 . Storage device address 8 corresponds to address block 1 of IOMMU 28 , and the IOMMU address for address block 1 is 1, which corresponds to data field 30 A of cache line 42 A.
- processor 12 may invalidate one of cache lines 42 of FIG. 3 if the graphics data changes for the type of data that the one of cache lines 42 stores. As before, assume that the texture image changes, and processor 12 may need to invalidate one or more cache lines of cache lines 42 that store texture data. To determine whether one of cache lines 42 stores texture data, processor 12 may compare address field 30 A of each of cache lines to the contiguous range of IOMMU addresses assigned to texture data.
- processor 12 assigned contiguous range of IOMMU addresses 44 , which include contiguous IOMMU addresses 0 - 2 , to texture data.
- processor 12 may compare address field 30 A of each of cache lines 42 to determine whether it is within the contiguous range of IOMMU addresses 44 .
- Address field 30 A of cache line 42 A in FIG. 3 , is within the contiguous range of IOMMU addresses 44 .
- processor 12 may invalidate cache line 42 A.
- FIG. 4 is a flow diagram illustrating an example operation of device 10 that may be configured to implement aspects of this disclosure.
- a processing unit e.g., processor 12 or GPU 14
- storage blocks of storage device 20 and address blocks of IOMMU 28 may be referred to generally as blocks.
- the processing unit may assign a first contiguous range of addresses for a first data type, and assign a second contiguous range of addresses for a second data type ( 54 ).
- the first and second data types may be different data types for graphics processing.
- one of contiguous range of storage device addresses 32 , 34 , 36 , 38 , and 40 may comprise a first contiguous range of addresses.
- Another one of contiguous range of storage device addresses 32 , 34 , 36 , 38 , and 40 may comprise a second contiguous range of addresses.
- each one of contiguous range of storage device addresses 32 , 34 , 36 , 38 , and 40 is assigned for a particular data type.
- the assigned data type for each one of contiguous range of storage device addresses 32 , 34 , 36 , 38 , and 40 may comprise the first data type.
- the assigned data type for another one of contiguous range of storage device addresses 32 , 34 , 36 , 38 , and 40 may comprise the second data type.
- one of contiguous range of IOMMU addresses 44 , 46 , 48 , 50 , and 52 may comprise a first contiguous range of addresses.
- Another one of contiguous range of IOMMU addresses 44 , 46 , 48 , 50 , and 52 may comprise a second contiguous range of addresses.
- each one of contiguous range of IOMMU addresses 44 , 46 , 48 , 50 , and 52 is assigned for a particular data type.
- the assigned data type for each one of contiguous range of IOMMU addresses 44 , 46 , 48 , 50 , and 52 may comprise the first data type.
- the assigned data type for another one of contiguous range of IOMMU addresses 44 , 46 , 48 , 50 , and 52 may comprise the second data type.
- the processing unit may store graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, and store graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses ( 56 ).
- processor 12 or GPU 14 may store texture data within storage blocks of storage device 20 whose addresses are within contiguous range of storage device addresses 32 .
- processor 12 or GPU 14 may store graphics data of a particular data type within storage blocks of storage device 20 whose addresses are within contiguous range of addresses that are assigned for that particular data type.
- the processing unit e.g., processor 12 or GPU 14
- processor 12 or GPU 14 may store addresses for where graphics data of a particular data type is stored within storage blocks of storage device 20 whose addresses are within contiguous range of addresses that are assigned for that particular data type.
- Processor 12 or GPU 14 may store, some of the cache lines of a plurality of cache lines of common memory cache 26 , the graphics data of the first data type, and store, in some of the cache lines of the plurality of cache lines of common memory cache 26 , the graphics data of the second data type ( 58 ).
- common memory cache 26 includes cache lines 42 .
- cache lines 42 B and 42 E may be considered as a group of cache lines that store graphics data of a particular data type, e.g., vertex data.
- Cache lines 42 C, 42 D, and 42 F may be considered as another group of cache lines that store graphics data of a particular data type, e.g., pixel data.
- FIG. 5 is a flow diagram illustrating an example technique to determine which ones of the plurality of cache lines of common memory cache 26 is associated with a particular data type.
- a processing unit e.g., processor 12 or GPU 14
- processor 12 or GPU 14 may compare address field 30 A of cache lines 42 to each one of contiguous range of storage device addresses 32 , 34 , 36 , 38 , and 40 of FIG. 2 .
- processor 12 or GPU 14 may compare address field 30 A of cache lines 42 to each one of contiguous range of IOMMU addresses 44 , 46 , 48 , 50 , and 52 of FIG. 3 .
- Processor 12 or GPU 14 may determine which cache lines of cache lines 42 are associated with which data type based on the comparison ( 62 ). For example, processor 12 or GPU 14 may determine that cache line 42 A is associated with texture data, cache lines 42 B and 42 E are associated with vertex data, and cache lines 42 C, 42 D, and 42 F are associated with pixel data, as illustrated in FIGS. 2 and 3 . In the example of FIG. 2 , processor 12 or GPU 14 may determine that cache line 42 A is associated with texture data because address field 30 A of cache line 42 A corresponds to an address for a storage block of storage device 20 that stores texture data.
- Processor 12 or GPU 14 may determine that cache lines 42 B and 42 E are associated with vertex data because address field 30 A of cache lines 42 B and 42 E correspond to addresses for storage blocks of storage device 20 that store vertex data.
- Processor 12 or GPU 14 may determine that cache lines 42 C, 42 D, and 42 F are associated with pixel data because address field 30 A of cache lines 42 C, 42 D, and 42 F correspond to addresses for storage blocks of storage device 20 that store pixel data.
- processor 12 or GPU 14 may determine that cache line 42 A is associated with texture data because address field 30 A of cache line 42 A corresponds to an address of an address block of IOMMU 28 that stores an address for where texture data is stored in storage block 20 .
- Processor 12 or GPU 14 may determine that cache lines 42 B and 42 E are associated with vertex data because address field 30 A of cache lines 42 B and 42 E correspond to addresses of address blocks of IOMMU 28 that store addresses for where vertex data is stored in storage block 20 .
- Processor 12 or GPU 14 may determine that cache lines 42 C, 42 D, and 42 F are associated with pixel data because address field 30 A of cache lines 42 C, 42 D, and 42 F correspond to addresses of address blocks of IOMMU 28 that store addresses for where pixel data is stored in storage block 20 .
- FIG. 6 is a flow diagram illustrating an example technique performed by a processing unit, e.g., processor 12 or GPU 14 .
- a processing unit e.g., processor 12 or GPU 14
- the request may include an address for the first data type or the second data type.
- software executing on GPU 14 such as the shader program, may generate a request that causes GPU 14 to retrieve graphics data of a particular data type, e.g., graphics data for the first data type or graphics data for the second data type.
- the software executing on GPU 14 may provide GPU 14 with the addresses from where to retrieve the graphics data of the first or second data type in the request.
- Processor 12 or GPU 14 may determine that the address, within the request for the graphics data of the first or second data type, is within the first contiguous range of addresses or the second contiguous range of addresses, respectively ( 66 ). For example, assume the shader program requested texture data, and included a storage device address or an IOMMU address. In this example, GPU 14 may determine whether the storage device address is within contiguous range of storage device addresses 32 , of FIG. 2 , or determine whether the IOMMU address is within contiguous range of IOMMU addresses 44 , of FIG. 3 . By determining whether the address, in the request, is within the contiguous range of addresses, processor 12 or GPU 14 may ensure that processor 12 or GPU 14 does not inadvertently retrieve incorrect data.
- Processor 12 or GPU 14 may then process the request based on the determination ( 68 ). For example, if the request for the graphics data of the first or second data type is within the first contiguous range of addresses or the second contiguous range of addresses, respectively, processor 12 or GPU 14 may process the request. If, however, the request for the graphics data of the first or second data type is not within the first contiguous range of addresses or the second contiguous range of addresses, respectively, processor 12 or GPU 14 may not process the request.
- the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on an article of manufacture comprising a non-transitory computer-readable medium.
- Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another.
- Data storage device may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
- Such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- the code may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
- processors such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
- the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
- IC integrated circuit
- a set of ICs e.g., a chip set.
- Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Image Generation (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- This disclosure relates to data storage and, more particularly, to assigning of data storage addresses for graphics processing.
- A device that provides content for visual presentation generally includes a graphics processing unit (GPU). The GPU processes and renders pixels that are representative of the content of an image on a display. To render pixels on the display, the GPU processes various data types. The various types of data are stored in one or more data storage devices. The GPU retrieves the data from the one or more storage devices, and processes the data to render pixels on the display.
- In general, this disclosure describes techniques for efficient storage of data of various data types for graphics processing. A graphics processing unit (GPU) may process the data of the various data types to render an image for display. A processor may store data of the various data types in a storage device, and define segments in which the data of the various data types is stored in the storage device or define segments in which addresses for the data of the various data types is stored in the storage device. Each segment may include a plurality of blocks that are addressable by contiguous addresses.
- The device may also include a common memory cache. The common memory cache may store data of the various data types. The GPU may be able to quickly retrieve data from the common memory cache. In some examples, the GPU and the processor may be able to quickly retrieve data from the common memory cache. In some of the example implementations described in this disclosure, the common memory cache may store data for all of the various data types for graphics processing. For example, the common memory cache may store data of a first data type, and data of a second data type, where the first and second data types are different data types for graphics processing.
- In one example, this disclosure describes a method comprising assigning, with a processing unit, a first contiguous range of addresses for a first data type for graphics processing, and assigning a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, storing, with the processing unit, at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, storing, with the processing unit, at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, and storing, in a plurality of cache lines of a common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- In another example, this disclosure describes an apparatus comprising a common memory cache that includes a plurality of cache lines, and a processing unit configured to assign a first contiguous range of addresses for a first data type for graphics processing, and assigns a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, store at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, and store at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, and store, in the plurality of cache lines of the common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- In another example, this disclosure describes a computer-readable storage medium comprising instructions that cause one or more processing units to assign a first contiguous range of addresses for a first data type for graphics processing, and assign a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, store at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, store at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, store, in a plurality of cache lines of a common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- In another example, this disclosure describes an apparatus comprising means for assigning a first contiguous range of addresses for a first data type for graphics processing, and assigning a second contiguous range of addresses for a second data type for graphics processing, wherein the first and second data types are different data types, means for storing at least one of graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, means for storing at least one of graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses, and means for storing, in a plurality of cache lines of a common memory cache, the graphics data of the first data type and the graphics data of the second data type.
- The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a block diagram illustrating a device that may be configured to implement aspects of this disclosure. -
FIG. 2 is a block diagram illustrating some of the components ofFIG. 1 in greater detail. -
FIG. 3 is another block diagram illustrating some of the components ofFIG. 1 in greater detail. -
FIG. 4 is a flow diagram illustrating an example operation of a device that may be configured to implement aspects of this disclosure. -
FIG. 5 is a flow diagram illustrating an example technique to determine which ones of a plurality of cache lines of a common memory cache are associated with a particular data type. -
FIG. 6 is a flow diagram illustrating an example technique performed by a processing unit. - Aspects of this disclosure may be related to efficient storage of graphics data of various data types for graphics processing. For purposes of illustration, aspects of this disclosure are described in the context where the data is used for graphics processing. However, aspects of this disclosure may be extendable to systems other than graphics processing systems. The techniques of this disclosure may be generally applicable to computer graphics systems such as desktop computers, and laptop computers that provide video or image content, digital media players, set-top boxes, mobile video reception devices such as mobile telephones, personal digital assistants (PDAs), video gaming consoles that include video displays, mobile video conferencing units, and the like.
- A graphics processing unit (GPU), within a device, may process graphics data of various data types to generate viewable content that is displayed on the device. The GPU is one example of a processing unit. The various data types for graphics processing may include, but are not limited to, texture data, vertex data, instructions, constants, and pixel data. The graphics data of the various data types may be stored in a storage device of the device. There may be more data types for graphics processing than the examples provided above.
- In some non-limiting examples, the device also includes an input/output memory management unit (IOMMU). The IOMMU may provide the GPU with a virtualized address space to storage blocks of the storage device. The IOMMU may include a plurality of address blocks. Each address block may store an address for where the graphics data of the various data types is stored in the storage device. Each address block of the IOMMU may be individually accessible by the GPU.
- A processor, within the device, may fragment the address space of IOMMU into a plurality of segments. The processor may be one example of a processing unit. Each segment may include a plurality of address blocks which may be addressable with contiguous addresses. The processor may assign each segment to store addresses for where graphics data of a particular data type is stored in the storage device. For example, a first segment of the address space of the IOMMU may include address blocks that are addressable with contiguous addresses 0-15, a second segment of the address space of the IOMMU may include address blocks that are addressable with contiguous addresses 16-31, and so forth.
- In this example, the processor may assign address blocks, addressable by contiguous addresses 0-15 of the first segment, to store addresses for where graphics texture data is stored in the storage device. The processor may assign address blocks, addressable by contiguous addresses 16-31 of the second segment, to store addresses for where graphics vertex data is stored in the storage device, and so forth. The contiguous addresses for address blocks, e.g., 0-15 and 16-31, are provided for illustration purposes only, and are not limiting.
- From the perspective of the processor and the GPU, the IOMMU may appear to be the device that stores the graphics data of the various data types for graphics processing. For example, when the processor or GPU reads or writes data, the processor or GPU reads or writes data as if it is being read from or written to the IOMMU. The IOMMU may maintain a map of where the read or written data is actually stored in the storage device. The map of where the data is actually stored in the storage device may be considered as a virtual address space.
- In some alternate examples, the processor, within the device, may fragment the storage space of the storage device into a plurality of segments, rather than the IOMMU. In these examples, the IOMMU may not be needed, although aspects of this disclosure should not be considered so limited. Each segment may include a plurality of storage blocks which may be addressable by contiguous addresses. The processor may assign each segment to store graphics data of a particular data type. For example, a first segment of the storage device may include storage blocks which may be addressable by contiguous addresses 0-15, a second segment of the storage device may include storage blocks which may be addressable by contiguous addresses 16-31, and so forth. In this example, the processor may assign address blocks, addressable by contiguous addresses 0-15 of the first segment, to store graphics pixel data. The processor may assign address blocks, addressable by contiguous addresses 16-31 of the second segment, to store instructions for graphics processing, and so forth. The contiguous address for storage blocks, e.g., 0-15 and 16-31, are provided for illustration purposes only, and are not limiting.
- The device may also include a common memory cache. The common memory cache may include a plurality of cache lines, where each cache line may be configured to store graphics data for any of the data types for graphics processing. For instance, the common memory cache may be configured to store texture data, vertex data, instructions, constants, and pixel data within the one or more cache lines of the common memory cache. A cache line may be considered as a fixed sized block of memory for storage.
- The common memory cache may store graphics data for quick access by the processor or GPU. Each cache line within the common memory cache may include at least two fields. A first field may store an address to one of the address blocks of the IOMMU or an address to one of the storage blocks of the storage device. In examples where the first field stores an address to one of the address blocks of the IOMMU, the address block of the IOMMU may include the address within the storage device where the graphics data of a data type is stored. A second field of the cache line may store the actual graphics data.
- When graphics data for a data type changes, e.g., is rewritten or erased, the processor may need to invalidate some of the cache lines within the common memory cache that store the graphics data for that data type. To invalidate a cache line, the processor may store a null data value in the second field of that cache line. Invalidating the cache lines within the common memory cache may indicate to the GPU that graphics data stored in an invalidated cache line is not current. This may cause the GPU to retrieve the graphics data for the data type from the storage device, rather than from the common memory cache, because the cache lines that store the graphics data do not store current data.
- To invalidate some of the cache lines that store graphics data of a data type, the GPU may determine which cache lines store graphics data of that data type. To determine which cache lines store graphics data of that data type, the GPU may interrogate the first data field of each of the cache lines to determine whether the first data field of each of the cache lines stores an address that is within the assigned contiguous range of addresses for the address blocks or storage blocks, within the IOMMU or storage device, respectively, for that data type.
- For example, assume the graphics data type is texture data for graphics processing. Further assume that the processor assigned address blocks, for storage of addresses for where the texture data is stored on the storage device, the contiguous addresses 0-15 within the IOMMU. When the texture data changes, the processor may invalidate each cache line, in the common memory cache, that stores texture data. The GPU may determine whether the first field of each of the cache lines stores an address that is within 0-15. If a cache line stores an address that is within 0-15, the GPU may invalidate that cache line. In aspects of this disclosure, the GPU may invalidate one or more of the cache lines that store graphics data for that particular data type. The GPU may not invalidate any of other cache lines in the common memory cache. For example, the GPU may not invalidate any of the cache lines that do not store data for that particular data type.
-
FIG. 1 is a block diagram illustrating adevice 10 that may be configured to implement aspects of this disclosure. Examples ofdevice 10 include, but are not limited to, mobile wireless telephones, personal digital assistants (PDAs), video gaming consoles that include video displays, mobile video conferencing units, laptop computers, desktop computers, tablet computers, television set-top boxes, digital media players, and the like.Device 10 may includeprocessor 12, graphics processing unit (GPU) 14,display 16,display buffer 18,storage device 20,transceiver module 22,user interface 24,common memory cache 26, and input/output memory management unit (IOMMU) 28.Processor 12 andGPU 14 may each be examples of a processing unit. -
Device 10 may include additional modules or units not shown inFIG. 1 for purposes of clarity. For example,device 10 may include a speaker and a microphone, neither of which are shown inFIG. 1 , to effectuate telephonic communications in examples wheredevice 10 is a mobile wireless telephone. Furthermore, the various modules and units shown indevice 10 may not be necessary in every example ofdevice 10. For example,user interface 24 anddisplay 16 may be external todevice 10 in examples wheredevice 10 is a desktop computer. As another example,IOMMU 28 may not be necessary in every example, as described in more detail below. - Although
processor 12,GPU 14,common memory cache 26, andIOMMU 28 are illustrated as separate units, aspects of this disclosure are not so limited. As one example,GPU 14,common memory cache 26, andIOMMU 28 may be formed withinprocessor 12, e.g., one processing unit may includeprocessor 12 andGPU 14, as well as,common memory cache 26 andIOMMU 28. As another example,processor 12 may includeIOMMU 28 andGPU 14 may includecommon memory cache 26. Different combinations of the configuration ofprocessor 12,GPU 14,common memory cache 26, andIOMMU 28 may be possible, and aspects of this disclosure contemplate the different combinations. - Examples of
processor 12 andGPU 14, which may be each considered as a processing unit, include, but are not limited to, a digital signal processor (DSP), general purpose microprocessor, application specific integrated circuit (ASIC), field programmable logic array (FPGA), or other equivalent integrated or discrete logic circuitry.Storage device 20 may comprise one or more computer-readable storage media. Examples ofstorage device 20 include, but are not limited to, a random access memory (RAM), a read only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer or a processor. In some aspects,storage device 20 may include instructions that causeprocessor 12 and/orGPU 14 to perform the functions ascribed toprocessor 12 andGPU 14 in this disclosure. - Examples of
user interface 24 include, but are not limited to, a trackball, a mouse, a keyboard, and other types of input devices.User interface 24 may also be a touch screen and may be incorporated as a part ofdisplay 16.Transceiver module 22 may include circuitry to allow wireless or wired communication betweendevice 10 and another device or a network.Transceiver module 22 may include modulators, demodulators, amplifiers and other such circuitry for wired or wireless communication. -
Processor 12 may execute one or more applications. Examples of the applications include web browsers, e-mail applications, spreadsheets, video games, or other applications that generate viewable images for presentment. The one or more applications may be stored withinstorage device 20. In some instances,processor 12 may download the one or more applications viatransceiver module 22.Processor 12 may execute the one or more applications based on a selection by a user viauser interface 24. In some examples,processor 12 may execute the one or more applications without user interaction. - Each of the viewable images generated by
processor 12 may be two dimensional (2-D) or three dimensional (3-D) images formed with a plurality of polygons, which may be referred to as primitives.Processor 12 may determine the coordinates for the vertices of the polygons. One example of a polygon is a triangle, although, polygons should not be considered limited to triangles. For purposes of illustration, examples in this disclosure are described in the context of the polygons being triangles. For example,processor 12 may determine the coordinates for the three vertices for each triangle. For 2-D images, the coordinates for each vertex for each triangle may comprise an x and y coordinate. For 3-D images, the coordinates for each vertex for each triangle may comprise an x, y, z, and w coordinate, where the w coordinate is a homogenous coordinate which may be beneficial to identify a vertex that is infinitely far away. The determined vertex coordinates for polygons are referred to as vertex data.Processor 12 may store the vertex data instorage device 20. -
Processor 12 may also determine various other attributes for the determined vertices. For example, for each vertex,processor 12 may determine color values, which are referred to as pixel data. Each color value may include three or four components, e.g., red, green, and blue components, or red, green, blue, and transparency factor. Additional color coordinates may be used in some devices.Processor 12 may store the pixel data instorage device 20. -
Processor 12 may also store a texture image instorage device 20. A texture image may be an image that is applied to the polygons to make the polygons appear more realistic. The texture image may generally be a two dimensional array of texture data; however, the texture image may also be a one or three dimensional array of texture data. For purposes of illustration, aspects of this disclosure are described in the context of a two dimensional array of texture data.Processor 12 may store texture data within the two dimensional array based on coordinates of the array. The coordinates of the array may be (u, v), where the coordinate u is along the x-axis of the two dimensional array, and the coordinate v is along the y-axis of the two dimensional array. For example,processor 12 may store the texture data, instorage device 20, for a texture image that is to be applied to a polygon at locations within the array that correspond to the coordinates of the polygon. - The vertex data, pixel data, and texture data are examples of different data types for graphics processing which
GPU 14 may use to render an image ondisplay 16. In addition to the vertex data, pixel data, and texture data,GPU 14 may also utilize graphics data of other data types for graphics processing to render the image ondisplay 16. As one example,GPU 16 may utilize rendering instructions stored instorage device 20 to render the image ondisplay 16. The instructions stored instorage device 20 may be another example of a data type for graphics processing. As another example,GPU 16 may utilize constants stored instorage device 20 to render the image ondisplay 16. The constants stored instorage device 20 may be another example of a data type for graphics processing. -
GPU 14 may implement a graphics pipeline that uses graphics data from the various data types to render the image. The graphics pipeline may be implemented via at least some hardware. For example, the graphics pipeline may be implemented as software executing onGPU 14, firmware executing onGPU 14, one or more hardware units formed onGPU 14, or a combination thereof. The graphics pipeline may include multiple components. For example, the graphics pipeline, ofGPU 14, may include a vertex shader that retrieves the vertex data and transforms the coordinates of the vertices into another coordinate system, and calculates light values for the vertices. The graphics pipeline, ofGPU 14, may also include a primitive assembler that determines the depth (z) of pixels within a polygon. The graphics pipeline, ofGPU 14, may also include an early depth test engine that removes pixels within the polygons from further processing if those pixels are subsequently occluded by other pixels. The graphics pipeline, ofGPU 14, may also include a pixel shader that interpolates color values for pixels within a polygon based on the color values for the vertices of the polygon. - The various components of the graphics pipeline are provided for illustration purposes and should not be considered limiting. In alternate examples, the graphics pipeline may include more components than those described above. The graphics pipeline may also include fewer components than those described above.
- In some examples, as
GPU 14 renders the image,GPU 14 may output the rendered portions of the image to displaybuffer 18.Display buffer 18 may temporarily store the rendered image until the entire image is rendered.Display buffer 18 may be considered as a frame buffer.Display buffer 18 may then transmit the rendered image to be displayed ondisplay 16. In some alternate examples,GPU 14 may output the rendered portions of the image directly to display 16 for display, rather than temporarily storing the image indisplay buffer 18.Display 16 may comprise a liquid crystal display (LCD), a cathode ray tube (CRT) display, a plasma display, or another type of display device. -
Storage device 20 may include a plurality of storage blocks. Each storage block may store graphics data of the various data types for graphics processing. Each storage block may be individually addressable. In some examples,processor 12 may fragment the storage space provided bystorage device 20 into data storage segments. Each segment may include a plurality of data storage blocks that are each addressable by contiguous memory addresses. Hence, each data storage segment may be defined by a range of contiguous memory addresses. -
Processor 12 may assign each data storage segment for a particular data type. For example,processor 12 may assign a first segment to store graphics data of a first data type, and assign a second segment to store graphics data of a second data type. In this example, the storage blocks of the first segment may be addressable by contiguous addresses. For example,processor 12 may assign a first range of contiguous addresses for the blocks in the first segment. The addresses of each storage block of the first segment may be within the first range of contiguous addresses. Similarly, the addresses of storage block of the second segment may be within a second range of contiguous addresses assigned byprocessor 12. - In some examples,
device 10 may includeIOMMU 28.IOMMU 28 may not be necessary in every example ofdevice 10. For example,IOMMU 28 may not be necessary in examples whereprocessor 12 assigns contiguous addresses for the graphics data, of the various data types, stored instorage device 20. However, aspects of this disclosure should not be considered so limited.Device 10 may includeIOMMU 28 even in examples whereprocessor 12 assigns contiguous addresses for the graphics data, of the various data types, stored instorage device 20. -
IOMMU 28 may be implemented as one or more hardware units, software executing on the hardware units, firmware executing on the hardware units, or any combination thereof. In examples whereIOMMU 28 is one or more hardware units, examples ofIOMMU 28 may include, but are not limited to, a DSP, general purpose microprocessor, ASIC, FPGA, or other equivalent integrated or discrete logic circuitry. -
IOMMU 28 may provideGPU 14 with a virtualized address space to storage blocks ofstorage device 20 such thatIOMMU 28 appears, e.g., toGPU 14, to be the device that stores the graphics data of the various data types. For example,IOMMU 28 may be a hardware component responsible for handling accesses tostorage device 20 requested byprocessor 12 and/orGPU 14. For example,IOMMU 28 may include a table or other data structure, e.g., registers, indicating a plurality of address blocks. Each address block may store an address of one of the storage blocks ofstorage device 20. Each address block may be individually addressable. - When
processor 12 orGPU 14 desires to read or write data, software executing onprocessor 12 orGPU 14 calls out an address of one of the address blocks ofIOMMU 28 as ifprocessor 12 orGPU 14 is reading from or writing to the address block, which is called out by its address.IOMMU 28 may then determine which storage block corresponds to the address stored in the address block ofIOMMU 28.IOMMU 28 may then write to or read from the storage block that corresponds to the address stored in the address block ofIOMMU 28. - For purposes of clarity, the terms storage device address or addresses may refer to an address or addresses of storage blocks within
storage device 20. Also, the terms IOMMU address or addresses may refer to an address or addresses of address blocks withinIOMMU 28. As described above, each address block ofIOMMU 28 may store a storage device address. In general,IOMMU 28 may map IOMMU addresses to storage device addresses. In this manner, whenprocessor 12 orGPU 14 calls out an IOMMU address,IOMMU 28 may determine the storage device address for the storage block that stores the graphics data based on the mapping by determining a storage device address that corresponds to the IOMMU address, e.g., as indicated by a table, register, matrix or other data structure. - As described above,
processor 12 may fragment the memory space ofstorage device 20 into segments. Each segment may include a plurality of storage blocks that are each addressable by contiguous storage device addresses. In some examples,processor 12 may fragment the address space ofIOMMU 28 into segments. Each segment may include a plurality of address blocks that are each addressable by contiguous IOMMU addresses. - In some examples,
processor 12 may assign a first range of contiguous IOMMU addresses to address blocks withinIOMMU 28. In these examples, the first range of contiguous IOMMU addresses are for address blocks that store the addresses for where graphics data of the first data type is stored instorage device 20.Processor 12 may also assign a second range of contiguous IOMMU addresses to address blocks withinIOMMU 28. In these examples, the second range of contiguous IOMMU addresses are for address blocks that store the addresses for where graphics data of the second data type is stored instorage device 20.Processor 12 may similarly assign ranges of contiguous IOMMU addresses for the various data types for graphics processing. - In some instances, assigning contiguous storage device addresses for graphics data of each data type, or assigning contiguous IOMMU addresses for graphics data of each data type may be advantageous. As one example,
processor 12 andGPU 14 may not need to track the exact addresses where graphics data for each of the data types is stored instorage device 20.Processor 12 andGPU 14 may only track the range of contiguous storage device addresses or IOMMU addresses. Each range of contiguous storage device addresses or IOMMU addresses may be considered as a sub-heap.Processor 12 may be considered as an allocator that assigns sub-heaps, e.g., ranges of contiguous storage device addresses or IOMMU addresses, to storage blocks withinstorage device 20 or address blocks withinIOMMU 28. - For example, assume
GPU 14 desires to retrieve texture data for graphics processing. Also, assume that the texture data is not stored in storage blocks that are contiguously addressable, as what happens in conventional heap allocators. In this instance,GPU 14 would need to track the storage device addresses for each one of the texture data, which may be processing inefficient. In aspects of this disclosure,GPU 14 may not need to track each individual storage device address, and may track the range of the storage device addresses, which may promote efficient processing. - As another example, for potential benefits of assigning contiguous storage device addresses or IOMMU addresses for graphics data of each data type,
processor 12 andGPU 14 may be less likely to retrieve incorrect data. For example, software executing onGPU 14, such as a shader program, may causeGPU 14 to retrieve data of a particular data type, e.g., texture data. The shader program may be written by application developers using the syntax of the OpenCL version 1.1 specification, as one non-limiting example. However, aspects of this disclosure should not be considered limited to examples where the shader program is written in accordance with the OpenCL version 1.1 specification. The software executing onGPU 14 may provideGPU 14 with the addresses from where to retrieve the texture data. In this example,processor 12 may instructGPU 14 with the range of contiguous IOMMU addresses for the texture data. The software executing onGPU 14 may then ensure that the addresses that it provides toGPU 14 for retrieving the texture data is within the range of contiguous IOMMU addresses. For example, the shader program may ensure thatGPU 14 does not try to access data that is out of range for the type of data that the shader program requested. - As illustrated in
FIG. 1 ,device 10 may also includecommon memory cache 26. Althoughcommon memory cache 26 is illustrated as being external toprocessor 12 andGPU 14, in some examples,common memory cache 26 may be formed within a processing unit, e.g.,processor 12 orGPU 14, or partially formed in bothprocessor 12 andGPU 14.Common memory cache 26 may store graphics data for quick access byprocessor 12 orGPU 14. For example, the processing units, e.g.,processor 12 andGPU 14, may be able to retrieve graphics data fromcommon memory cache 26 quicker than retrieving graphics data fromstorage device 20. -
Common memory cache 26 may include a plurality of cache lines, where each cache line may be configured to store graphics data for any of the graphics data types for graphics processing. For instance,common memory cache 26 may store texture data, vertex data, instructions, constants, and pixel data within the one or more cache lines ofcommon memory cache 26.Common memory cache 26 may promote efficient storage becausecommon memory cache 26 is configured to store graphics data for the various data types, rather than multiple caches that store graphics data for a single data type. For example,device 10 may not need to include a texture data cache, pixel data cache, vertex data cache, instruction cache, and a constant cache becausecommon memory cache 26 can store texture data, pixel data, vertex data, instructions, and constants in the same cache. - A cache line may be considered as a fixed sized block of memory for storage. Each cache line may include two fields. A first field may store an address to one of a storage device address or IOMMU address, e.g., an address to one of the storage blocks of
storage device 20 or an address to one of the address blocks ofIOMMU 28. A second field may store the actual graphics data of the various data types. - When graphics data for a data type changes, e.g., is rewritten or erased,
processor 12 may need to invalidate cache lines withincommon memory cache 26 that store graphics data of that data type. For example, some of the cache lines withincommon memory cache 26 may store texture data for a current texture image. Whenprocessor 12 stores a new texture image,processor 12 may need to invalidate these cache lines withincommon memory cache 26 that store texture data because the texture image changed. As another example, some of the cache lines withincommon memory cache 26 may store vertex data for a current polygon that is being processed. WhenGPU 14 needs to process another polygon,processor 12 may need to invalidate these cache lines withincommon memory cache 26 that store vertex data because the vertices of the current polygon are no longer being processed. - As one example, to invalidate a cache line,
processor 12 orGPU 14 may store a null data value in the second field of that cache line. Invalidating the cache lines withincommon memory cache 26 may indicate toprocessor 12 andGPU 14 that data stored in an invalidated cache line is not current. A “cache miss” may occur whenprocessor 12 orGPU 14 retrieve data from an invalidated cache line. When a cache miss occurs,processor 12 orGPU 14 may retrieve the graphics data fromstorage device 20 because the invalidated cache lines incommon memory cache 26 do not store current data. In some examples, in addition to retrieving the graphics data fromstorage device 20,processor 12 orGPU 14 may also store the retrieved graphics data in cache lines ofcommon memory cache 26 for quick subsequent access. -
Processor 12 orGPU 14 may determine which cache lines ofcommon memory cache 26 should be invalidated based on which data type changed. For example, if the texture image changed,GPU 14 may determine which cache lines ofcommon memory cache 26 store texture data. In this example,GPU 14 may invalidate one or more cache lines that store texture data, and may not invalidate any of the cache lines that store data for data types other than texture data. - In this manner,
processor 12 orGPU 14 may only invalidate some of cache lines ofcommon memory cache 26, and not invalidate all ofcommon memory cache 26 every time graphics data for a particular data type changes. Not invalidating all of the cache lines ofcommon memory cache 26 may further promote efficient storage becauseprocessor 12 orGPU 14 may not need to retrieve all of the graphics data that was previously stored incommon memory cache 26 after each invalidation.Processor 12 orGPU 14 may only need to retrieve graphics data that changed and only store the changed data intocommon memory cache 26. - To invalidate cache lines that store graphics data of a data type that changed,
processor 12 orGPU 14 may determine which cache lines ofcommon memory cache 26 store graphics data of that data type. For instance,GPU 14 may interrogate the first data field of each of the cache lines to determine whether the first data field of each of the cache lines stores an address that is within the assigned contiguous range of storage device address or IOMMU addresses for that data type. For example, assume the graphics data type is pixel data. Further assume thatprocessor 12 assigned address blocks, withinIOMMU 28, the contiguous IOMMU addresses 16-31. Contiguous IOMU addresses 16-31 may each store addresses of storage blocks ofstorage device 20 where the pixel data is stored, in this example. - When
GPU 14 needs to access pixel data other than the pixel data stored incommon memory cache 26,GPU 14 may need to retrieve the pixel data fromstorage device 20. To store the new retrieved pixel data in common memory cache,processor 12 may invalidate sufficient number of cache lines, in thecommon memory cache 26, that store pixel data so thatprocessor 12 can store the retrieved pixel data. Processor may determine whether the first field, of each of the cache lines, stores an address that is within 16-31. If a cache line stores an address that is within 16-31,processor 12 may invalidate that cache line. - Because
processor 12 may assign contiguous storage device addresses or IOMMU addresses,processor 12 may more easily determine whether a cache line stores graphics data of a particular data type. For example, if the storage device addresses or the IOMMU addresses were not contiguous, thenprocessor 12 would need to track every single storage device address or IOMMU address for that particular data type.Processor 12 would then need to compare the address stored in the first data field of a cache line with every single storage device address or IOMMU address for that particular data type to determine whether a cache line stores graphics data of that particular data type. In aspects of this disclosure,processor 12 may compare the address stored in the first field of each cache line to the range of contiguous storage device addresses or IOMMU addresses, not individual storage device addresses or IOMMU addresses, to determine whether a cache line stores graphics data of a particular data type. -
FIG. 2 is a block diagram illustrating some of the components ofFIG. 1 in greater detail. For example,FIG. 2 illustratescommon memory cache 26 andstorage device 20, ofFIG. 1 , in greater detail. As illustrated inFIG. 2 ,storage device 20 includes twelve storage blocks 0-11 (collectively referred to as “storage blocks”).Storage device 20 may include more or fewer storage blocks then twelve storage blocks. In some examples,storage device 20 may be a 4 Giga-byte (GB) storage device; however, aspects of this disclosure are not so limited. - As illustrated in
FIG. 2 , storage blocks 0-11 are not ordered consecutively. This is to illustrate that, in some examples, storage blocks 0-11 do not necessarily have to be contiguous onstorage device 20, although it may be possible for storage blocks 0-11 to be contiguous onstorage device 20. Each one of storage blocks 0-11 may be individually addressable by its address. For example,FIG. 2 illustrates storage device addresses 0-11. Storage device addresses 0-11 may be fragmented, i.e., divided, into segments where each segment comprises a range of storage device addresses.Processor 12 may assign each range of storage device addresses to a particular data type. The range of storage device addresses, for each data type, may be contiguous. - For example, as illustrated in
FIG. 2 , range of storage device addresses 32 may be contiguous storage device addresses for storage blocks that store texture data, e.g., storage blocks 4, 8, and 1. Range of storage device addresses 32 may include storage addresses 0-2. Range of storage device addresses 34 may be contiguous storage device addresses for storage blocks that store vertex data, e.g., storage blocks 10 and 5. Range of storage device addresses 34 may include storage addresses 3 and 4. Range of storage device addresses 36 may be contiguous storage device addresses for storage blocks that store instructions, e.g., storage blocks 2 and 9. Range of storage device addresses 36 may include storage addresses 5 and 6. Range of storage device addresses 38 may be contiguous storage device addresses for storage blocks that store constants, e.g., storage blocks 0 and 6. Range of storage device addresses 38 may include storage addresses 7 and 8. Range of storage device addresses 40 may be contiguous storage device addresses for storage blocks that store pixel data, e.g., storage blocks 11, 3, and 7. Range of storage device addresses 40 may include storage addresses 9-11. There may be more or fewer texture data, vertex data, instructions, constants, and pixel data than illustrated inFIG. 2 . - As illustrated in
FIG. 2 ,common memory cache 26 may includecache lines 42A-42F (collectively referred to as “cache lines 42”). There may be more or fewer cache lines 42 than illustrated inFIG. 2 . In some examples,common memory cache 26 may be a level 2 (L2) cache. In some examples,common memory cache 26 may be a 32 kilobyte (KB) 8-way set associated L2 cache with fast range invalidate. - Each one of cache lines 42 may include an
address field 30A and adata field 30B.Address field 30A may indicate the storage device address for the graphics data stored indata field 30B. For example,address field 30A ofcache line 42A indicates that the address for the graphics data stored indata field 30B is 1. As illustrated in theFIG. 2 ,data field 30B ofcache line 42A stores second texture data which corresponds tostorage block 8 because the storage device address ofstorage block 8 is 1. - Cache lines 42 of
common memory cache 26 that store the same data type may be considered as a set of cache lines 42. For example, as illustrated inFIG. 2 ,cache line 42B andcache line 42E each store vertex data. In this example,cache line 42B andcache line 42E may form a set of cache lines that store vertex data. As another example, as illustrated inFIG. 2 ,cache line 42C,cache line 42D, andcache line 42F each store pixel data. In this example,cache line 42C,cache line 42D, andcache line 42F may form a set of cache lines that store pixel data. - It should be understood that “a set of cache lines” does not imply that only cache lines in that set can store a particular graphics data type. For instance, as illustrated in
FIG. 2 ,cache lines cache lines - In some examples, a set of cache lines may not be contiguous. For example, cache lines 42B and 42E form a set of cache lines, but are not contiguous. As another example, cache lines 42C, 42D, and 42F form a set of cache lines, but are not contiguous. In alternate examples, it may be possible for a set of cache lines to be contiguous.
- As described above, in some examples,
processor 12 may invalidate one of cache lines 42 if the graphics data changes for the type of data that the one of cache lines 42 stores. For example, if the texture image changes,processor 12 may invalidate one or more cache lines of cache lines 42 that store texture data. To determine whether one of cache lines 42 stores texture data,processor 12 may compareaddress field 30A of each of cache lines 42 to the contiguous range of storage device addresses assigned to texture data. - For instance, in the example of
FIG. 2 ,processor 12 assigned contiguous range of storage device addresses 32, which include contiguous storage device addresses 0-2, to texture data. In this example,processor 12 may compareaddress field 30A of each of cache lines 42 to determine whether it is within the contiguous range of storage device addresses 32.Address field 30A ofcache line 42A is within the contiguous range of storage device addresses 32, e.g., 1 is within 0-2. In this example,processor 12 may replace the second texture data, stored indata field 30B ofcache line 42A, with a null data.Processor 12 may then retrieve new texture data fromstorage device 20 and may store the texture data incache line 42A.Processor 12 may not need to store the texture data incache line 42A in every example. In some examples,processor 12 may also updateaddress field 30A ofcache line 42A if the storage device address for the retrieved new texture data is stored at a storage device address that is different than 1. -
FIG. 3 is another block diagram illustrating some of the components ofFIG. 1 in greater detail. For example,FIG. 3 illustratescommon memory cache 26,IOMMU 28, andstorage device 20, ofFIG. 1 , in greater detail.Storage device 20, as illustrated inFIG. 3 , may be similar tostorage device 20 illustrated inFIG. 2 . However, in the example ofFIG. 3 ,processor 12 may not assign ranges of contiguous storage device addresses. For instance, as illustrated inFIG. 3 , the addresses for each one of the storage blocks ofstorage device 20 are not order consecutively. For ease of illustration, in the example ofFIG. 3 , the storage device address for a storage block corresponds to the identifier of the storage block. For example, the storage device address forstorage block 4 ofstorage device 20 is 4, the storage device address forstorage block 8 ofstorage device 20 is 8, and so forth. However, aspects of this disclosure are not so limited. The storage device address for a storage block is not limited to the identifier of the storage block. -
FIG. 3 illustratesIOMMU 28 in greater detail. In the example ofFIG. 3 ,IOMMU 28 includes twelve address blocks 0-11.IOMMU 28 may include more or fewer address blocks then twelve address blocks. In some examples,IOMMU 28 may provideGPU 14 with a virtualized address space to storage blocks 0-11 ofstorage device 20. - Each one of address blocks 0-11 may be individually addressable by its address. For example,
FIG. 3 illustrates IOMMU addresses 0-11. IOMMU addresses 0-11 may be fragmented into segments where each segment comprises a range of IOMMU addresses.Processor 12 may assign each range of IOMMU addresses to a particular data type. The range of IOMMU addresses, for each data type, may be contiguous. - For example, as illustrated in
FIG. 3 , range ofIOMMU address 44 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks ofstorage device 20 that store texture data, e.g., storage device addresses 4, 8, and 1. Range of IOMMU addresses 44 may include IOMMU addresses 0-2. Range ofIOMMU address 46 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks ofstorage device 20 that store vertex data, e.g., storage device addresses 10 and 5. Range of IOMMU addresses 46 may include IOMMU addresses 3 and 4. Range ofIOMMU address 48 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks ofstorage device 20 that store instructions, e.g., storage device addresses 2 and 9. Range of IOMMU addresses 48 may include IOMMU addresses 5 and 6. Range ofIOMMU address 50 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks ofstorage device 20 that store constants, e.g., storage device addresses 0 and 6. Range of IOMMU addresses 50 may include IOMMU addresses 7 and 8. Range ofIOMMU address 52 may be contiguous IOMMU addresses for address blocks that store storage device addresses for storage blocks ofstorage device 20 that store pixel data, e.g., storage device addresses 11, 3, and 7. Range of IOMMU addresses 52 may include IOMMU addresses 9-11. As withFIG. 2 , there may be more or fewer texture data, vertex data, instructions, constants, and pixel data the illustrated inFIG. 3 . - As illustrated in
FIG. 3 , similar toFIG. 2 ,common memory cache 26 may include cache lines 42. Also, similar toFIG. 2 , each one of cache lines 42 may includeaddress field 30A anddata field 30B. In the example illustrated inFIG. 3 ,address field 30A may be indicate the IOMMU address for the address block ofIOMMU 28 that stores the address for where the graphics data stored indata field 30B is stored instorage device 20. For example,address field 30A ofcache line 42A, inFIG. 3 , indicates that the address for the address block ofIOMMU 28 is 1. As illustrated inFIG. 3 ,IOMMU address 1 is foraddress block 1 ofIOMMU 28.Address block 1 of IOMMU storesstorage device address 8.Storage device address 8 corresponds tostorage block 8 instorage device 20. As illustrated inFIG. 3 ,storage block 8 ofstorage device 20 stores the second texture data.Data field 30B ofcache line 42A stores the second texture data which corresponds tostorage block 8 withstorage device address 8.Storage device address 8 corresponds to addressblock 1 ofIOMMU 28, and the IOMMU address foraddress block 1 is 1, which corresponds todata field 30A ofcache line 42A. - As above, in some
examples processor 12 may invalidate one of cache lines 42 ofFIG. 3 if the graphics data changes for the type of data that the one of cache lines 42 stores. As before, assume that the texture image changes, andprocessor 12 may need to invalidate one or more cache lines of cache lines 42 that store texture data. To determine whether one of cache lines 42 stores texture data,processor 12 may compareaddress field 30A of each of cache lines to the contiguous range of IOMMU addresses assigned to texture data. - For instance, similar to the example of
FIG. 2 ,processor 12 assigned contiguous range of IOMMU addresses 44, which include contiguous IOMMU addresses 0-2, to texture data. In this example,processor 12 may compareaddress field 30A of each of cache lines 42 to determine whether it is within the contiguous range of IOMMU addresses 44.Address field 30A ofcache line 42A, inFIG. 3 , is within the contiguous range of IOMMU addresses 44. In this example,processor 12 may invalidatecache line 42A. -
FIG. 4 is a flow diagram illustrating an example operation ofdevice 10 that may be configured to implement aspects of this disclosure. For purposes of illustration, reference is made toFIGS. 1 , 2, and 3. As described above, a processing unit, e.g.,processor 12 orGPU 14, may fragment storage space ofstorage device 20 into segments that each include a plurality of storage blocks or fragment address space ofIOMMU 28 into segments that each include a plurality of address blocks. For purposes of illustrations, storage blocks ofstorage device 20 and address blocks ofIOMMU 28 may be referred to generally as blocks. - The processing unit may assign a first contiguous range of addresses for a first data type, and assign a second contiguous range of addresses for a second data type (54). The first and second data types may be different data types for graphics processing. For example, as illustrated in
FIG. 2 , one of contiguous range of storage device addresses 32, 34, 36, 38, and 40 may comprise a first contiguous range of addresses. Another one of contiguous range of storage device addresses 32, 34, 36, 38, and 40 may comprise a second contiguous range of addresses. Also, as illustrated inFIG. 2 , each one of contiguous range of storage device addresses 32, 34, 36, 38, and 40 is assigned for a particular data type. The assigned data type for each one of contiguous range of storage device addresses 32, 34, 36, 38, and 40 may comprise the first data type. The assigned data type for another one of contiguous range of storage device addresses 32, 34, 36, 38, and 40 may comprise the second data type. - As another example, as illustrated in
FIG. 4 , one of contiguous range of IOMMU addresses 44, 46, 48, 50, and 52 may comprise a first contiguous range of addresses. Another one of contiguous range of IOMMU addresses 44, 46, 48, 50, and 52 may comprise a second contiguous range of addresses. Also, as illustrated inFIG. 3 , each one of contiguous range of IOMMU addresses 44, 46, 48, 50, and 52 is assigned for a particular data type. The assigned data type for each one of contiguous range of IOMMU addresses 44, 46, 48, 50, and 52 may comprise the first data type. The assigned data type for another one of contiguous range of IOMMU addresses 44, 46, 48, 50, and 52 may comprise the second data type. - The processing unit, e.g.,
processor 12 orGPU 14, may store graphics data of the first data type or addresses of the graphics data of the first data type within blocks whose addresses are within the first contiguous range of addresses, and store graphics data of the second data type or addresses of the graphics data of the second data type within blocks whose addresses are within the second contiguous range of addresses (56). For example, as illustrated inFIG. 2 ,processor 12 orGPU 14 may store texture data within storage blocks ofstorage device 20 whose addresses are within contiguous range of storage device addresses 32. As illustrated inFIG. 2 ,processor 12 orGPU 14 may store graphics data of a particular data type within storage blocks ofstorage device 20 whose addresses are within contiguous range of addresses that are assigned for that particular data type. - As another example, as illustrated in
FIG. 3 , the processing unit, e.g.,processor 12 orGPU 14, may store addresses for where the texture data is stored instorage device 20 within addresses blocks ofIOMMU 28 whose addresses are within contiguous range of IOMMU addresses 44. As illustrated inFIG. 3 ,processor 12 orGPU 14 may store addresses for where graphics data of a particular data type is stored within storage blocks ofstorage device 20 whose addresses are within contiguous range of addresses that are assigned for that particular data type. -
Processor 12 orGPU 14 may store, some of the cache lines of a plurality of cache lines ofcommon memory cache 26, the graphics data of the first data type, and store, in some of the cache lines of the plurality of cache lines ofcommon memory cache 26, the graphics data of the second data type (58). For example, as illustrated inFIGS. 2 and 3 ,common memory cache 26 includes cache lines 42. Also, as illustrated inFIGS. 2 and 3 , cache lines 42B and 42E may be considered as a group of cache lines that store graphics data of a particular data type, e.g., vertex data. Cache lines 42C, 42D, and 42F may be considered as another group of cache lines that store graphics data of a particular data type, e.g., pixel data. -
FIG. 5 is a flow diagram illustrating an example technique to determine which ones of the plurality of cache lines ofcommon memory cache 26 is associated with a particular data type. For purposes of illustration, reference is made toFIGS. 1 , 2, and 3. A processing unit, e.g.,processor 12 orGPU 14, may compare an address filed of each cache line to the contiguous range of addresses (60). For example,processor 12 orGPU 14 may compareaddress field 30A of cache lines 42 to each one of contiguous range of storage device addresses 32, 34, 36, 38, and 40 ofFIG. 2 . As another example,processor 12 orGPU 14 may compareaddress field 30A of cache lines 42 to each one of contiguous range of IOMMU addresses 44, 46, 48, 50, and 52 ofFIG. 3 . -
Processor 12 orGPU 14 may determine which cache lines of cache lines 42 are associated with which data type based on the comparison (62). For example,processor 12 orGPU 14 may determine thatcache line 42A is associated with texture data, cache lines 42B and 42E are associated with vertex data, andcache lines FIGS. 2 and 3 . In the example ofFIG. 2 ,processor 12 orGPU 14 may determine thatcache line 42A is associated with texture data becauseaddress field 30A ofcache line 42A corresponds to an address for a storage block ofstorage device 20 that stores texture data.Processor 12 orGPU 14 may determine that cache lines 42B and 42E are associated with vertex data becauseaddress field 30A ofcache lines storage device 20 that store vertex data.Processor 12 orGPU 14 may determine that cache lines 42C, 42D, and 42F are associated with pixel data becauseaddress field 30A ofcache lines storage device 20 that store pixel data. - In the example of
FIG. 3 ,processor 12 orGPU 14 may determine thatcache line 42A is associated with texture data becauseaddress field 30A ofcache line 42A corresponds to an address of an address block ofIOMMU 28 that stores an address for where texture data is stored instorage block 20.Processor 12 orGPU 14 may determine that cache lines 42B and 42E are associated with vertex data becauseaddress field 30A ofcache lines IOMMU 28 that store addresses for where vertex data is stored instorage block 20.Processor 12 orGPU 14 may determine that cache lines 42C, 42D, and 42F are associated with pixel data becauseaddress field 30A ofcache lines IOMMU 28 that store addresses for where pixel data is stored instorage block 20. -
FIG. 6 is a flow diagram illustrating an example technique performed by a processing unit, e.g.,processor 12 orGPU 14. For purposes of illustration, reference is made toFIGS. 1 , 2, and 3. A processing unit, e.g.,processor 12 orGPU 14, may receive a request for graphics data of the first data type or graphics data of the second data type (64). The request may include an address for the first data type or the second data type. For example, software executing onGPU 14, such as the shader program, may generate a request that causesGPU 14 to retrieve graphics data of a particular data type, e.g., graphics data for the first data type or graphics data for the second data type. The software executing onGPU 14 may provideGPU 14 with the addresses from where to retrieve the graphics data of the first or second data type in the request. -
Processor 12 orGPU 14 may determine that the address, within the request for the graphics data of the first or second data type, is within the first contiguous range of addresses or the second contiguous range of addresses, respectively (66). For example, assume the shader program requested texture data, and included a storage device address or an IOMMU address. In this example,GPU 14 may determine whether the storage device address is within contiguous range of storage device addresses 32, ofFIG. 2 , or determine whether the IOMMU address is within contiguous range of IOMMU addresses 44, ofFIG. 3 . By determining whether the address, in the request, is within the contiguous range of addresses,processor 12 orGPU 14 may ensure thatprocessor 12 orGPU 14 does not inadvertently retrieve incorrect data. -
Processor 12 orGPU 14 may then process the request based on the determination (68). For example, if the request for the graphics data of the first or second data type is within the first contiguous range of addresses or the second contiguous range of addresses, respectively,processor 12 orGPU 14 may process the request. If, however, the request for the graphics data of the first or second data type is not within the first contiguous range of addresses or the second contiguous range of addresses, respectively,processor 12 orGPU 14 may not process the request. - In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on an article of manufacture comprising a non-transitory computer-readable medium. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. Data storage device may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- The code may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
- Various examples have been described. These and other examples are within the scope of the following claims.
Claims (38)
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/024,579 US9047686B2 (en) | 2011-02-10 | 2011-02-10 | Data storage address assignment for graphics processing |
CN201280008415.2A CN103370728B (en) | 2011-02-10 | 2012-02-10 | The method assigned for the address data memory of graphicprocessing and equipment |
EP12705572.1A EP2673746B1 (en) | 2011-02-10 | 2012-02-10 | Data storage address assignment for graphics processing |
PCT/US2012/024760 WO2012109619A1 (en) | 2011-02-10 | 2012-02-10 | Data storage address assignment for graphics processing |
KR1020137023665A KR101563070B1 (en) | 2011-02-10 | 2012-02-10 | Data storage address assignment for graphics processing |
JP2013553622A JP5694570B2 (en) | 2011-02-10 | 2012-02-10 | Data storage address allocation for graphics processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/024,579 US9047686B2 (en) | 2011-02-10 | 2011-02-10 | Data storage address assignment for graphics processing |
Publications (2)
Publication Number | Publication Date |
---|---|
US20120206466A1 true US20120206466A1 (en) | 2012-08-16 |
US9047686B2 US9047686B2 (en) | 2015-06-02 |
Family
ID=45755552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/024,579 Active 2033-11-23 US9047686B2 (en) | 2011-02-10 | 2011-02-10 | Data storage address assignment for graphics processing |
Country Status (6)
Country | Link |
---|---|
US (1) | US9047686B2 (en) |
EP (1) | EP2673746B1 (en) |
JP (1) | JP5694570B2 (en) |
KR (1) | KR101563070B1 (en) |
CN (1) | CN103370728B (en) |
WO (1) | WO2012109619A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150106550A1 (en) * | 2013-10-11 | 2015-04-16 | Canon Kabushiki Kaisha | Information processing apparatus, method of controlling the same and storage medium |
US20160012302A1 (en) * | 2013-03-21 | 2016-01-14 | Fuji Xerox Co., Ltd. | Image processing apparatus, image processing method and non-transitory computer readable medium |
US9454221B2 (en) * | 2011-12-07 | 2016-09-27 | Mitsubishi Electric Corporation | Rendering processing device, control device, and remote control device |
US20170083997A1 (en) * | 2015-09-17 | 2017-03-23 | Qualcomm Incorporated | Storing bandwidth-compressed graphics data |
US20190102299A1 (en) * | 2017-10-04 | 2019-04-04 | Intel Corporation | Systems, methods and apparatus for fabric delta merge operations to enhance nvmeof stream writes |
US10417134B2 (en) * | 2016-11-10 | 2019-09-17 | Oracle International Corporation | Cache memory architecture and policies for accelerating graph algorithms |
US10671419B2 (en) * | 2016-02-29 | 2020-06-02 | Red Hat Israel, Ltd. | Multiple input-output memory management units with fine grained device scopes for virtual machines |
US20220308877A1 (en) * | 2021-03-26 | 2022-09-29 | Intel Corporation | High performance constant cache and constant access mechanisms |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9378560B2 (en) | 2011-06-17 | 2016-06-28 | Advanced Micro Devices, Inc. | Real time on-chip texture decompression using shader processors |
CN103077130B (en) * | 2012-12-31 | 2016-03-16 | 上海算芯微电子有限公司 | Information processing method and device |
US9311743B2 (en) * | 2013-10-23 | 2016-04-12 | Qualcomm Incorporated | Selectively merging partially-covered tiles to perform hierarchical z-culling |
CN107003892B (en) * | 2016-12-29 | 2021-10-08 | 深圳前海达闼云端智能科技有限公司 | GPU virtualization method, device and system, electronic equipment and computer program product |
KR101943999B1 (en) | 2017-08-31 | 2019-01-30 | 성균관대학교 산학협력단 | Method of cache Capacity aware Cache Bypassing on GPU Computing |
US10467774B2 (en) * | 2017-11-06 | 2019-11-05 | Qualcomm Incorporated | Memory address flipping to determine data content integrity in GPU sub-system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6778175B2 (en) * | 2002-02-05 | 2004-08-17 | Xgi Technology Inc. | Method of arbitration of memory request for computer graphics system |
US7058755B2 (en) * | 2003-09-09 | 2006-06-06 | Ballard Power Systems Corporation | EEPROM emulation in flash memory |
US20080009317A1 (en) * | 2006-07-04 | 2008-01-10 | Sandisk Il Ltd. | Dual channel smart card data storage |
US20110093663A1 (en) * | 2009-10-15 | 2011-04-21 | Netronome Systems, Inc. | Atomic compare and write memory |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5313577A (en) | 1991-08-21 | 1994-05-17 | Digital Equipment Corporation | Translation of virtual addresses in a computer graphics system |
AU2553592A (en) | 1991-08-21 | 1993-03-16 | Digital Equipment Corporation | Computer graphics system |
US5761720A (en) | 1996-03-15 | 1998-06-02 | Rendition, Inc. | Pixel engine pipeline processor data caching mechanism |
US5987582A (en) | 1996-09-30 | 1999-11-16 | Cirrus Logic, Inc. | Method of obtaining a buffer contiguous memory and building a page table that is accessible by a peripheral graphics device |
US6311258B1 (en) | 1997-04-03 | 2001-10-30 | Canon Kabushiki Kaisha | Data buffer apparatus and method for storing graphical data using data encoders and decoders |
US5933158A (en) | 1997-09-09 | 1999-08-03 | Compaq Computer Corporation | Use of a link bit to fetch entries of a graphic address remapping table |
US5914730A (en) | 1997-09-09 | 1999-06-22 | Compaq Computer Corp. | System and method for invalidating and updating individual GART table entries for accelerated graphics port transaction requests |
US5949436A (en) | 1997-09-30 | 1999-09-07 | Compaq Computer Corporation | Accelerated graphics port multiple entry gart cache allocation system and method |
US6243081B1 (en) | 1998-07-31 | 2001-06-05 | Hewlett-Packard Company | Data structure for efficient retrieval of compressed texture data from a memory system |
US6683615B1 (en) | 1999-06-09 | 2004-01-27 | 3Dlabs Inc., Ltd. | Doubly-virtualized texture memory |
US6457100B1 (en) | 1999-09-15 | 2002-09-24 | International Business Machines Corporation | Scaleable shared-memory multi-processor computer system having repetitive chip structure with efficient busing and coherence controls |
US7760804B2 (en) | 2004-06-21 | 2010-07-20 | Intel Corporation | Efficient use of a render cache |
US20100079454A1 (en) | 2008-09-29 | 2010-04-01 | Legakis Justin S | Single Pass Tessellation |
US20100141664A1 (en) | 2008-12-08 | 2010-06-10 | Rawson Andrew R | Efficient GPU Context Save And Restore For Hosted Graphics |
-
2011
- 2011-02-10 US US13/024,579 patent/US9047686B2/en active Active
-
2012
- 2012-02-10 JP JP2013553622A patent/JP5694570B2/en not_active Expired - Fee Related
- 2012-02-10 WO PCT/US2012/024760 patent/WO2012109619A1/en active Application Filing
- 2012-02-10 KR KR1020137023665A patent/KR101563070B1/en active IP Right Grant
- 2012-02-10 EP EP12705572.1A patent/EP2673746B1/en not_active Not-in-force
- 2012-02-10 CN CN201280008415.2A patent/CN103370728B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6778175B2 (en) * | 2002-02-05 | 2004-08-17 | Xgi Technology Inc. | Method of arbitration of memory request for computer graphics system |
US7058755B2 (en) * | 2003-09-09 | 2006-06-06 | Ballard Power Systems Corporation | EEPROM emulation in flash memory |
US20080009317A1 (en) * | 2006-07-04 | 2008-01-10 | Sandisk Il Ltd. | Dual channel smart card data storage |
US20110093663A1 (en) * | 2009-10-15 | 2011-04-21 | Netronome Systems, Inc. | Atomic compare and write memory |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9454221B2 (en) * | 2011-12-07 | 2016-09-27 | Mitsubishi Electric Corporation | Rendering processing device, control device, and remote control device |
US10095940B2 (en) * | 2013-03-21 | 2018-10-09 | Fuji Xerox Co., Ltd. | Image processing apparatus, image processing method and non-transitory computer readable medium |
US20160012302A1 (en) * | 2013-03-21 | 2016-01-14 | Fuji Xerox Co., Ltd. | Image processing apparatus, image processing method and non-transitory computer readable medium |
CN104571941A (en) * | 2013-10-11 | 2015-04-29 | 佳能株式会社 | Information processing apparatus and method of controlling the same |
US9983834B2 (en) * | 2013-10-11 | 2018-05-29 | Canon Kabushiki Kaisha | Information processing apparatus, method of writing contiguous blocks for secure erease data and writing distributive blocks for non-secure erase data |
US20150106550A1 (en) * | 2013-10-11 | 2015-04-16 | Canon Kabushiki Kaisha | Information processing apparatus, method of controlling the same and storage medium |
US20170083997A1 (en) * | 2015-09-17 | 2017-03-23 | Qualcomm Incorporated | Storing bandwidth-compressed graphics data |
US10621690B2 (en) * | 2015-09-17 | 2020-04-14 | Qualcomm Incorporated | Storing bandwidth-compressed graphics data |
US10671419B2 (en) * | 2016-02-29 | 2020-06-02 | Red Hat Israel, Ltd. | Multiple input-output memory management units with fine grained device scopes for virtual machines |
US10417134B2 (en) * | 2016-11-10 | 2019-09-17 | Oracle International Corporation | Cache memory architecture and policies for accelerating graph algorithms |
US20190102299A1 (en) * | 2017-10-04 | 2019-04-04 | Intel Corporation | Systems, methods and apparatus for fabric delta merge operations to enhance nvmeof stream writes |
US10664396B2 (en) * | 2017-10-04 | 2020-05-26 | Intel Corporation | Systems, methods and apparatus for fabric delta merge operations to enhance NVMeoF stream writes |
US20220308877A1 (en) * | 2021-03-26 | 2022-09-29 | Intel Corporation | High performance constant cache and constant access mechanisms |
Also Published As
Publication number | Publication date |
---|---|
CN103370728A (en) | 2013-10-23 |
KR20130135309A (en) | 2013-12-10 |
US9047686B2 (en) | 2015-06-02 |
JP2014506700A (en) | 2014-03-17 |
CN103370728B (en) | 2016-06-01 |
JP5694570B2 (en) | 2015-04-01 |
EP2673746A1 (en) | 2013-12-18 |
EP2673746B1 (en) | 2015-04-08 |
WO2012109619A1 (en) | 2012-08-16 |
KR101563070B1 (en) | 2015-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9047686B2 (en) | Data storage address assignment for graphics processing | |
JP6385614B1 (en) | Hardware-enforced content protection for graphics processing units | |
JP6110044B2 (en) | Conditional page fault control for page residency | |
US9134954B2 (en) | GPU memory buffer pre-fetch and pre-back signaling to avoid page-fault | |
US8493404B2 (en) | Pixel rendering on display | |
US8823724B2 (en) | Sparse texture systems and methods | |
JP2015515662A (en) | Techniques for reducing memory access bandwidth in graphics processing systems based on destination alpha value | |
US10163180B2 (en) | Adaptive memory address scanning based on surface format for graphics processing | |
US10621690B2 (en) | Storing bandwidth-compressed graphics data | |
CN112801855B (en) | Method and device for scheduling rendering task based on graphics primitive and storage medium | |
JP2018523876A (en) | Hardware-enforced content protection for graphics processing units | |
US8860743B2 (en) | Sparse texture systems and methods | |
US8681169B2 (en) | Sparse texture systems and methods | |
KR20080014402A (en) | Method and apparatus for processing computer graphics data | |
US9779471B2 (en) | Transparent pixel format converter | |
KR102657586B1 (en) | Method and Apparatus for Managing graphics data | |
WO2019022881A1 (en) | Deferred batching of incremental constant loads | |
CN117435521B (en) | Texture video memory mapping method, device and medium based on GPU rendering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHARP, COLIN;PFEFFER, ZACHARY AARON;METZ, EDUARDUS A.;AND OTHERS;SIGNING DATES FROM 20110128 TO 20110204;REEL/FRAME:025786/0615 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |