CN113012026B - Graphics processor and method of operation thereof - Google Patents

Graphics processor and method of operation thereof Download PDF

Info

Publication number
CN113012026B
CN113012026B CN202110308083.0A CN202110308083A CN113012026B CN 113012026 B CN113012026 B CN 113012026B CN 202110308083 A CN202110308083 A CN 202110308083A CN 113012026 B CN113012026 B CN 113012026B
Authority
CN
China
Prior art keywords
data
pixel
cache
mixer
data type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110308083.0A
Other languages
Chinese (zh)
Other versions
CN113012026A (en
Inventor
郝雯琳
武凤霞
王渊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Granfei Intelligent Technology Co ltd
Original Assignee
Glenfly Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glenfly Tech Co Ltd filed Critical Glenfly Tech Co Ltd
Priority to CN202110308083.0A priority Critical patent/CN113012026B/en
Publication of CN113012026A publication Critical patent/CN113012026A/en
Priority to US17/467,280 priority patent/US11645732B2/en
Application granted granted Critical
Publication of CN113012026B publication Critical patent/CN113012026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The invention provides a graphics processor and a method of operating the same. The graphics processor includes a pixel shader, an output mixer, and a cache. The pixel shader is configured to output pixel frequency source data. The output mixer is coupled to the pixel shader and is configured to receive the pixel frequency source data. The cache is coupled to the output mixer and is configured to pre-record pixel level status of a cache line corresponding to a current render target. The cache determines to output pixel data or sample data to the output mixer according to the pixel level state. The sample data is a multiple of the pixel data. The output mixer updates or maintains the pixel level state. The graphics processor and the operation method thereof can reduce the data transmission quantity of the data bus between the output mixer and the cache.

Description

Graphics processor and method of operation thereof
Technical Field
The present invention relates to a processor, and more particularly, to a graphics processor and a method of operating the same.
Background
In the field of image display, in order to eliminate the edge Aliasing shape (i.e., the geometric Aliasing (Geometry Aliasing)) of geometric objects in an image, for a general graphics processor, technologies such as multisampling antialiasing (Multisampling Anti-Aliasing, MSAA) and supersampling antialiasing (SSAA) are often adopted. For example, when a multi-sampling antialiasing operation is required, the graphics processor samples a plurality of sub-sampling points for each pixel, performs a coloring calculation for each sub-sampling point, and synthesizes a final image, thereby achieving the effect of eliminating the edge aliasing.
However, in performing the multisampling antialiasing operation, the graphics processor needs to sample (i.e., upsample) a plurality of sub-sample points for each pixel and perform a shading calculation for the plurality of sub-sample points. This will result in a multiple increase in the sampled data, emphasizing the data transfer bandwidth between the graphics processor and the cache (or "memory"); and the coloring calculation needs to be performed on each sub-sampling point, which wastes the coloring resources of the graphics processor.
In view of this, how to effectively reduce the data transmission amount of the data bus of the cache during multi-sampling/over-sampling, save bandwidth, and/or save the operation resources of the graphics processor is a challenge in the art.
Disclosure of Invention
The invention is directed to a graphic processor and an operation method thereof, which can effectively save the operation resources of an arithmetic logic unit in a graphic controller by determining to output pixel data or sample data to an output mixer according to a pixel level state in a cache and updating or maintaining the pixel level state.
According to an embodiment of the present invention, a graphics processor of the present invention includes a pixel shader, an output mixer, and a cache. The pixel shader is configured to output pixel frequency source data. The output mixer is coupled to the pixel shader and is configured to receive the pixel frequency source data. The cache is coupled to the output mixer and is configured to pre-record pixel level status of a cache line corresponding to a current render target. The cache determines to output pixel data or sample data to the output mixer according to the pixel level state, the sample data is a multiple of the pixel data, and the output mixer updates or maintains the pixel level state.
According to an embodiment of the present invention, a method of operating a graphics processor of the present invention includes the steps of: pre-recording, by a cache, pixel level status of a cache line corresponding to a current render target; outputting pixel frequency source data through a pixel shader; receiving the pixel frequency source data through an output mixer; determining, by the cache, to output pixel data or sample data to the output mixer in accordance with the pixel level state, wherein the sample data is a multiple of the pixel data; and updating or maintaining the pixel level state by the output mixer.
Based on the above, the graphics processor and the operation method thereof of the present invention can determine to cache the output pixel data or sample data to the output mixer by judging the pixel level state, so as to effectively save the operation resources of the arithmetic logic unit in the graphics controller.
The present invention may be understood by reference to the following detailed description taken in conjunction with the accompanying drawings, it being noted that, for the sake of clarity and simplicity of the drawing, the various drawings in the present invention depict only a portion of the display device, and the specific elements in the drawings are not necessarily drawn to scale. In addition, the number and size of the components in the drawings are illustrative only and are not intended to limit the scope of the invention.
Drawings
FIG. 1 is a schematic diagram of a graphics processor according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a graphics processor in accordance with another embodiment of the present invention;
FIG. 3 is a schematic diagram of an upsampling module according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a data write of a cache line according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method of operation of a graphics processor in accordance with an embodiment of the present invention;
FIG. 6 is a schematic diagram of a graphics processor in accordance with another embodiment of the invention;
FIG. 7 is a flow chart of a method of operation of a graphics processor in accordance with another embodiment of the invention;
FIG. 8 is a flow chart of a fusion optimization control method of a graphics processor according to an embodiment of the invention.
Symbol description
100. 200, 600 graphics processor;
101. 103-1 to 103-4, 201, 607, pixel data;
102. 202, 602, coverage information;
110. 210, 610 a pixel shader;
120. 220, 620 output mixer;
121. 221 a color data buffer;
122. 222, 622 output mixers;
123. 223, a write-back unit;
130. 230, 630: cache;
140. 240, a memory;
203 sample mask
204. Sample data 608;
205 rendering the target format;
231, an up-sampling module;
231-1, data replication logic;
231-2, write control logic;
232. 231-1 to 231-M, 632;
432-1 to 432-4;
pixel frequency source data 605;
606, updating information;
624, a fusion module;
624-1, fusing an optimization control module;
S510-S550, S710-S750, S810-S860.
Detailed Description
The present invention will be described in more detail with reference to the drawings, wherein the invention is shown in the drawings.
The same names are used throughout the specification and claims to denote the same component. In the following description and in the claims, certain terms are used to refer to particular elements. Those of skill in the art will appreciate that a hardware manufacturer may refer to the same component by different names. The description and claims do not take the form of an element differentiated by name, but rather by functionality. Again, the term "coupled" as referred to throughout the specification and claims includes any direct or indirect connection. Finally, in the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to.
FIG. 1 is a schematic diagram of a graphics processor according to an embodiment of the invention. As shown in FIG. 1, the graphics processor 100 includes a Pixel Shader (Pixel Shader) 110, an Output mixer (Output manager) 120, a Cache (Cache) 130, and a Memory (Memory) 140. The output mixer 120 is coupled to the pixel shader 110 and the cache 130. The output mixer 120 may receive Pixel data (Pixel data) 101 (i.e., pixel data 101 after rasterization processing) transmitted from the Pixel shader 110 and write the Pixel data 101 as Pixel data 103-1 to 103-4 corresponding to sub-sampling points according to coverage information (Sample coverage information) 102 (the result of coverage test, depth and transparency test). The output mixer 120 may transfer the sub-sampled pixel data 103-1-103-4 to the cache 130. Cache 130 is coupled to memory 140. The cache 130 receives the sub-sampled pixel data 103-1 through 103-4 from the output mixer 120 and stores it in the memory 140. In this specification and its embodiments, cache 130 is a first level cache (L1 cache), but the invention is not so limited.
Specifically, the output mixer 120 further includes a color data buffer (Color data buffer) 121, a test unit 122, and a Write back unit (Write back unit) 123. The color data buffer 121 receives the pixel data 101 from the pixel rendering result output from the pixel shader 110, and transfers the pixel data 101 to the write-back unit 123. Taking 4 times (4X) of graphics processing of multi-sampling antialiasing (MSAA) as an example (i.e., four sub-sampling points for each pixel), the test unit 122 also obtains coverage information 102 (results of coverage test, depth and transparency test) of the above-mentioned sub-sampling points and generates a sample mask (not shown in fig. 1). The write-back unit 123 is coupled to the color data buffer 121 and the test unit 122. The write-back unit 123 receives the pixel data 101 transferred from the color data buffer 121 and the coverage information 102 transferred from the test unit 122, and writes the pixel data 101 into the pixel data 103-1 to 103-4 corresponding to the sub-sampling points according to the coverage information 102.
It should be noted that, the write-back unit 123 also generates a corresponding byte mask (not shown in fig. 1) according to the sample mask, so that the sample data is written into the memory 140 according to the byte mask when the cache 130 is writing data. For a more detailed description and illustration of the sample mask, please refer to fig. 2, tables 5 and 6, which are not further detailed herein.
It should be further noted that, for convenience of illustration, the pixel data 103-1 to 103-4 shown in fig. 1 is shown between the output mixer 120 and the cache 130, but it should be understood by those skilled in the art that the pixel data 103-1 to 103-4 is the amount of data to be transferred between the output mixer 120 and the cache 130 after the graphics processor 100 performs the multi-sampling antialiasing up-sampling operation. In other words, when the graphics processor performs the multi-sampling antialiasing operation, the corresponding amount of data to be transferred will also be multiplied, which greatly consumes the transfer bandwidth of the data bus.
Finally, when the graphic processor 100 determines that an image fusion (blending) operation is required for the pixel data 103-1 to 103-4, the output mixer 120 further reads the pixel data 103-1 to 103-4 from the memory 140 through the cache 130 to perform the fusion operation. In other words, the above-mentioned up-sampled pixel data 103-1 to 103-4 with 4 times (4X) multiple sampling antialiasing causes a great waste of data bus bandwidth when writing into/reading out the memory 140, and affects the transmission efficiency.
FIG. 2 is a schematic diagram of a graphics processor in accordance with another embodiment of the invention. Referring to FIG. 2, a graphics processor 200 includes a pixel shader 210, an output mixer 220, a cache 230, and a memory 240. The output mixer 220 is coupled to the pixel shader 210. The cache 230 is coupled to the output mixer 220. Memory 240 is coupled to cache 230. The output mixer 220 includes a color data buffer 221, a test unit 222, and a write-back unit 223. The Cache 230 includes an upsampling module (upsampling unit) 231 and a Cache line (Cache line) 232. In this embodiment, the graphics processor 200 may include a plurality of controller circuits, a plurality of register circuits, a plurality of logic operation circuits, and the like, to form the respective units, the respective modules, the related functional components, and the like according to the embodiments of the present invention.
In this embodiment, the graphics processor 200 is adapted to perform a graphics processing operation in a multisampling antialiasing mode. The color data buffer 221 receives pixel data 201 from the pixel shader 210 and provides the pixel data 201 to the write back unit 223. The test unit 222 outputs the coverage information 202 to the write-back unit 223, and the write-back unit 223 of the output mixer 220 obtains the sample mask 203 according to the coverage information 202 and outputs the pixel data 201 and the sample mask 203 to the up-sampling module 231 of the cache 230. In the present embodiment, the test unit 222 may be a depth and transparency test unit, but the present invention is not limited thereto.
In the present embodiment, the pixel data 201 is Rendering Target (RT) data, and the data size of the pixel data 201 is determined according to the rendering Target format. The data size of the pixel data 201 output by the pixel shader 210 each time may be as shown in table 1 below. In other words, one pixel data described in the present embodiment may be 8×2 n Bits, where n is an integer greater than or equal to 0.
TABLE 1
In the present embodiment, the write-back unit 223 does not copy the pixel data 201, but directly outputs the pixel data 201 and the sample mask 203 of the coverage information 202 to the upsampling module 231 of the cache 230. The upsampling module 231 of the cache 230 of the present embodiment may generate the sample data 204 according to the pixel data 201, the sample mask 203, and the rendering target format, wherein the sample data 204 may include a plurality of copies of the pixel data 201. The up-sampling module 231 of the cache 230 inputs the sample data 204 into the cache line 232 of the cache 230 to await writing to the memory 240.
In the present embodiment, the data size of the sample data 204 is determined according to the multi-sampling antialiasing mode and the rendering target format. In this regard, the data size of the sample data 204 is a multiple of the data size of the pixel data 201, where the multiple is equal to the magnification of the multisampling antialiasing mode. Referring to table 2 below, for example, if the rendering target format of the pixel data 201 is "R8G8B8A8-UNORM" shown in table 1, and the multisampling antialiasing mode is 4 times multisampling, the data size of the sample data is 128 bits (i.e., 32bits multiplied by 4). In contrast to fig. 1, the write-back unit 223 of the present embodiment outputs the pixel data 201 with 32bits and the sample mask 203 with 4 bits to the up-sampling module 231 of the cache 230, instead of outputting the sample data with 128 bits (or 16 bytes) and the byte mask with 16 bits to the cache 230. Therefore, the graphic processor 200 of the present embodiment can effectively reduce the data transfer amount of the data bus between the output mixer 220 and the cache 230 in the process of performing the up-sampling (upsampling) process of the graphic.
TABLE 2
Fig. 3 is a schematic diagram of an upsampling module according to an embodiment of the present invention. FIG. 4 is a diagram illustrating data writing in a cache line according to an embodiment of the present invention. Referring to fig. 2 to 4, in the present embodiment, the up-sampling module 231 of the cache 230 includes data copy logic 231-1 and write control logic 231-2. The data copy logic 231-1 may receive pixel data 201 and the write control logic 231-2 may receive the sample mask 203 and render target format 205. Rendering target format 205 may be provided by output mixer 220 or by a rendering register (not shown) of graphics processor 200. In this embodiment, the write control logic 231-2 may control the data replication logic 231-1 to replicate the pixel data 201 according to the sample mask 203 and the rendering target format 205, and sequentially input the cache lines 232-1 to 232-M of the cache 230, wherein M is a positive integer.
For example, taking 4-fold multisampling antialiasing graphics processing as an example, assume that the data content of pixel data 201 is "0x3f05221e", the data content of sample mask 203 is "B'1101", and rendering target format 205 is "R8G8B8A8-UNORM" (32 bits), so that, as shown in FIG. 4, of 16 bytes of cache line 232-1, every 4 bytes corresponds to one of samples 432-1 through 432-4, and 4 bytes of each of the portions of samples 432-1, 432-3, 432-4 corresponding to which the data of sample mask 203 is "1" are written to 32bits of data of pixel data 201. The 4 bytes of the portion of sample 432-2 corresponding to the data of sample mask 203 being "0" are not written with data and will remain as original data (illustrated with the symbol ". Thus, the data content of samples 432-1-432-4 is the result of sample data 204 being stored in cache line 232-1. In other words, compared to fig. 1, the up-sampling module 231 of the cache memory 230 of the present embodiment only needs to obtain the 32-bit pixel data 201 from the output mixer 220, so the graphics processor 200 of the present embodiment can effectively reduce the data transmission amount of the data bus between the output mixer 220 and the cache memory 230 compared to the cache memory 130 of fig. 1 which needs to obtain the 16-byte (or 128-bit) sample data from the output mixer 120.
FIG. 5 is a flow chart of a method of operation of a graphics processor in accordance with an embodiment of the present invention. Referring to fig. 2 and 5, the operation method of the present embodiment is at least applicable to the graphics processor 200 of fig. 2. In step S510, the pixel shader 210 outputs the pixel data 201. In step S520, the output mixer 220 receives the pixel data 201. In step S530, the output mixer 220 outputs the pixel data 201 and the sample mask 203 corresponding to the pixel data 201. In step S540, the cache 230 receives the pixel data 201 and the sample mask 203, and the cache 230 generates the sample data 204 according to the pixel data 201 and the sample mask 203. In step S550, the cache 230 writes the sample data 204 to the memory 240. Therefore, the operation method of the present embodiment can effectively reduce the data transfer amount of the data bus between the output mixer 220 and the cache 230. However, other component features, technical details and implementation of the graphics processor 200 may be sufficiently taught, suggested and implemented by referring to the above description of the embodiments of fig. 2-4, and thus are not repeated.
FIG. 6 is a schematic diagram of a graphics processor in accordance with another embodiment of the invention. Referring to fig. 6, a graphics processor 600 includes a pixel shader 610, an output mixer 620, and a cache 630. In this embodiment, the graphics processor 600 is adapted to perform a graphics processing operation in a multisampling antialiasing mode. In the present embodiment, the cache 630 is a first level cache, but the present invention is not limited thereto. It should be noted that, in an embodiment, the graphics processor 600 may further include the memory 240 of the embodiment of fig. 2, the output mixer 620 may further include the color data buffer 221 and the write-back unit 223 of the embodiment of fig. 2, and the cache 630 may further include the up-sampling module 231 of the embodiment of fig. 2. In this regard, the graphics processor 600 of the present embodiment may implement the following related data read operations independently, and may also combine and implement the related data write operations of fig. 2 to 5. In other words, in one embodiment, the graphics processor 600 may first generate and store the sample data in the memory according to the embodiments of fig. 2 to 5, and then read the sample data according to the embodiments of fig. 6 to 8.
In this embodiment, pixel shader 610 outputs pixel frequency source data (Pixel frequency source data) 605 to output mixer 620. The output mixer 620 is coupled to the pixel shader 610. The output mixer 620 receives the pixel frequency source data 605. The cache 630 is coupled to the output mixer 620. The cache 630 pre-records the pixel level status of the cache line 632 corresponding to the current render target (Pixel plane status). In this embodiment, the output mixer 620 includes a test unit 622 and a fusion module (Blending unit) 624. The fusion module 624 includes a fusion optimization control module 624-1. The test unit 622 may output coverage information 602 (which may be the same as the coverage information 202 described above). The fusion module 624 is coupled to the test unit 622. The fusion optimization control module 624-1 may receive coverage information 602 and pixel frequency source data 605. In this embodiment, the fusion optimization control module 624-1 can determine the mixer status data and the coverage level data according to the coverage information 602 and the pixel frequency source data 605, and the fusion optimization control module 624-1 can determine whether to output the update information 606 to the cache 630 according to the mixer status data and the coverage level data to update the pixel level status.
It should be noted that, in the present embodiment, the Pixel shader 610 operates at a Pixel frequency (Pixel frequency), so the mixer status data can be set to 1 bit data. In this regard, when the mixer state data is a first data type (e.g., "1"), it is indicated that the output mixer 620 is operating at the pixel frequency. When the mixer status data is of a second data type (e.g., "0"), it indicates that the output mixer 620 is operating at a Sample frequency (Sample frequency). In this embodiment, the coverage degree data may be 1-bit data. When each of the plurality of samples in the coverage information 602 is defined as the same coverage setting, the coverage degree data may be represented as a first data type (e.g., as "1"). When multiple samples in the coverage information 602 have different coverage settings, the coverage level data may be represented as a second data type (e.g., "0"). In this embodiment, the pixel level state may be 1 bit of data. The pixel level state may be represented by a first data type (e.g., "1") when the plurality of samples stored in each pixel of the cache line 632 of the cache 630 have the same pixel data, respectively, and a second data type (e.g., "0") when the plurality of samples stored in each pixel of the cache line 632 of the cache 630 have different pixel data. It is noted that the pixel level state may be stored in at least one of the output mixer 620 and the cache 630 and determined corresponding to the data content of the cache line 632 currently stored in the cache 630. The coverage data is directly determined by the current coverage information 602. The mixer state data may be determined by the coverage level data and the pixel level state in a controlled manner. The mixer state data is used to determine whether the current output mixer 620 is operating at the pixel frequency or the sample frequency and to update the pixel level state.
For example, referring to table 3 below, table 3 is the data content corresponding to two pixels (pixel 1, pixel 0) stored in one cache line, for example. In table 3, samples 0 to 3 of the pixel 0 have the same pixel data "0x7e38", and samples 0 to 3 of the pixel 1 have the same pixel data "0x850c". Thus, when the cache line 632 of the cache 630 stores data (Pixel plane) as shown in Table 3 below, the current Pixel level status recorded by the cache line 632 may be, for example, a data value of "1". In contrast, referring to Table 4 below, table 4 is the data content corresponding to, for example, two other pixels (pixel 1', pixel 0') stored in one cache line 632. In table 4, samples 0 to 3 of the pixel 0 'have the same pixel data "0x7e38", whereas the pixel data "0x00fb" of sample 1 of the pixel 1' has different pixel data "0x850c" from other samples. Thus, when the cache line 632 of the cache 630 stores data as in Table 4 below, the current pixel level status recorded by the cache line 632 may be, for example, a data value of "0".
TABLE 3 Table 3
TABLE 4 Table 4
For another example, referring to Table 5 below, table 5 shows two sample masks taken by fusion optimization control module 624-1 from coverage information 602 provided by test unit 622, where the two sample masks may correspond to two pixels (pixel 1, pixel 0) stored in one cache line 632, for example. In table 5, the sample mask for pixel 0 corresponds to samples 0-3 of pixel 0 having the same data value "0" (meaning that none of samples 0-3 of pixel 0 is data covered), and the sample mask for pixel 1 corresponds to samples 0-3 of pixel 1 having the same pixel data "1" (meaning that none of samples 0-3 of pixel 1 is data covered). Thus, when the fusion optimization control module 624-1 obtains coverage information as shown in table 5 below, the coverage data recorded by the fusion optimization control module 624-1 may be, for example, a data value of "1". In contrast, referring to Table 6 below, table 6 is another two sample masks that the fusion optimization control module 624-1 takes from the coverage information 602 provided by the test unit 622, where the two sample masks may correspond to two pixels (pixel 1', pixel 0') stored in one cache line 632, for example. In table 6, although the sample mask of the pixel 1 corresponds to samples 0 to 3 of the pixel 1 'having the same pixel data "1" (indicating that all of samples 0 to 3 of the pixel 1 are data-covered), the data value "1" corresponding to sample 2 in the sample mask of the pixel 0' has a data value "0" different from that corresponding to the other samples (indicating that none of samples 0, 1 and 3 of the pixel 0 are data-covered, but sample 2 is data-covered). Thus, when the fusion optimization control module 624-1 obtains coverage information as shown in table 6 below, the coverage degree data recorded by the fusion optimization control module 624-1 may be, for example, a data value of "0".
TABLE 5
TABLE 6
In this embodiment, the pixel shader 610 operates at a pixel frequency, and the output mixer 620 and the cache 630 can correspondingly adjust the output mixer 620 to operate at either the pixel frequency or the sample frequency according to the pixel level status. Specifically, in one implementation scenario, if the pixel level state and coverage data is the first data type (e.g., "1"), the mixer state data is the first data type (e.g., "1"). The cache 630 returns the pixel level state with the pixel data of the first data type to the output mixer 620. At this time, the output mixer 620 operates at a pixel frequency and performs pixel mixing on the pixel data. Then, the output mixer 620 outputs the data of which the mixing result is the pixel level to the cache 630, and maintains the pixel level state as the first data type (for example, as "1").
In another implementation scenario, if the pixel level state is the first data type and the coverage data is the second data type, the mixer state data is the first data type. The cache 630 returns a pixel level state with the pixel data of the first data type (e.g., a "1") to the output mixer 620. At this time, the output mixer 620 operates at a pixel frequency and performs pixel mixing on the pixel data. The output mixer 620 then outputs the mixed result as pixel level data to the cache 630 and updates the pixel level state to a second data type (e.g., to "0").
In yet another implementation, if the pixel level state is a second data type (e.g., "0"), the mixer state data is a second data type (e.g., "0"). The cache 630 returns the pixel level state as the second data type (e.g., a "0") to the output mixer 620. At this time, the output mixer 620 operates at the sample frequency and pixel-mixes the sample data. The output mixer 620 outputs the data of which the mixing result is at the sample level to the cache 630 and maintains the pixel level state at the second data type (e.g., at "0").
FIG. 7 is a flow chart of a method of operation of a graphics processor in accordance with another embodiment of the present invention. Referring to fig. 6 and 7, a graphics processor 600 may perform a flow as in the fig. 7 embodiment to optimize a fusion operation. In step S710, the cache 630 may pre-record the pixel level status of the cache line 632 corresponding to the current render target. In step S720, the pixel shader 610 may output the pixel frequency source data 605. In step S730, the output mixer 620 may receive the pixel frequency source data 605. In step S740, the cache 630 may determine to output the pixel data 607 or the sample data 608 to the output mixer 620 according to the pixel level status, wherein the sample data 608 is a multiple of the pixel data 607. In step S750, the output mixer 620 may update or maintain the pixel level state recorded by the cache 630. In other words, in some cases, if the cache 630 determines to output the pixel data 607 to the output mixer 620, the graphics processor 600 of the present embodiment can optimize the fusion operation to reduce the data transmission bandwidth and the operation resources of the arithmetic logic unit due to the transmission of multiple identical data between the output mixer 620 and the cache 630, compared to the case where the cache 130 of fig. 1 necessarily provides only the data readout form of the sample data.
FIG. 8 is a flow chart of a fusion optimization control method of a graphics processor according to an embodiment of the invention. Steps S810 to S860 in fig. 8 are further operation means and description of steps S750 and S760 in fig. 7. Referring to fig. 6 through 8, in step S810, the fusion optimization control module 624-1 may determine whether the pixel shader 610 is operating at a pixel frequency. If not (indicating operation at the sample frequency), the fusion optimization control module 624-1 performs step S840 to maintain the output mixer 620 operating at the sample frequency. If yes, the fusion optimization control module 624-1 performs step S820. In step S820, the fusion optimization control module 624-1 can determine whether the plurality of samples of each pixel stored in the cache line 632 of the cache 630 have the same pixel data, respectively, wherein the fusion optimization control module 624-1 can determine whether the data value of the pixel level status is "1", for example. If not, the fusion optimization control module 624-1 performs step S840 to maintain the output mixer 620 operating at the sample frequency. If yes, the fusion optimization control module 624-1 performs step S830. In step S830, the fusion optimization control module 624-1 determines whether each of the plurality of samples in the coverage information is defined as the same coverage setting, wherein the fusion optimization control module 624-1 may, for example, determine whether the data value of the coverage degree data is "1". If yes, the fusion optimization control module 624-1 executes step S850. In step S850, the fusion optimization control module 624-1 is operable to operate the output mixer 620 at the pixel frequency and maintain the pixel level at "1" (the first data type). If not, the fusion optimization control module 624-1 executes step S860. In step S860, the fusion optimization control module 624-1 is operable to output the mixer 620 to operate at the pixel frequency and update (via the update information 606) the pixel level to "0" (the second data type). Therefore, the graphics processor 600 and the operation method of the present embodiment can effectively optimize the data transmission bandwidth occupied by the output mixer 620 and the cache 630 and the operation resources of the arithmetic logic unit when the output mixer 620 performs the pixel data fusion operation.
In summary, the graphics controller of the present invention may be provided with the up-sampling module in the cache and/or the fusion optimization control module in the fusion module of the output mixer, and with the operation method of the embodiments of the present invention, the data transmission amount of the data bus between the output mixer and the cache may be effectively reduced and/or the operation resource of the arithmetic logic unit in the graphics controller may be saved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (20)

1. A graphics processor, comprising:
a pixel shader for outputting pixel frequency source data;
an output mixer coupled to the pixel shader and configured to receive the pixel frequency source data; and
a cache coupled to the output mixer and configured to pre-record pixel level status of a cache line corresponding to a current render target,
the cache decides to output pixel data or sample data to the output mixer according to the pixel level state, the sample data is a multiple of the pixel data, and the output mixer updates or maintains the pixel level state.
2. The graphics processor of claim 1, wherein the output mixer comprises:
the test unit is used for outputting coverage information; and
a fusion module coupled to the test unit and comprising a fusion optimization control module for receiving the coverage information and the pixel frequency source data,
the fusion optimization control module determines coverage degree data according to the coverage information and the pixel frequency source data, then determines mixer state data according to the coverage degree data and the pixel level state, and updates the pixel level state, wherein the mixer state data is used for determining that the output mixer operates at a pixel frequency or a sample frequency.
3. The graphics processor of claim 2, wherein the mixer state data is 1 bit of data and the output mixer operates at the pixel frequency when the mixer state data is a first data type and at the sample frequency when the mixer state data is a second data type.
4. A graphics processor as claimed in claim 3, characterized in that the coverage data is 1 bit of data and is of the first data type when each of a plurality of samples in the coverage information is defined as the same coverage setting and of the second data type when the plurality of samples in the coverage information have different coverage settings.
5. The graphics processor of claim 4 wherein said pixel level state is 1 bit of data, said pixel level state being said first data type when a plurality of samples stored in each pixel of said cache line respectively have the same said pixel data, said pixel level state being said second data type when said plurality of samples stored in said each pixel of said cache line have different said pixel data.
6. The graphics processor of claim 5, wherein when the pixel level state and the coverage level data are of the first data type, the mixer state data are of the first data type,
the cache returns the pixel level state for the first data type to the output mixer, and the output mixer operates at the pixel frequency,
the output mixer outputs data of which the mixing result is a pixel level to the cache, and the pixel level state is maintained as the first data type.
7. The graphics processor of claim 5, wherein when the pixel level state is the first data type and the coverage level data is the second data type, the mixer state data is the first data type,
the cache returns the pixel level state for the first data type to the output mixer, and the output mixer operates at the pixel frequency,
the output mixer outputs the data of which the mixing result is at the pixel level to the cache, and the pixel level state is updated to the second data type.
8. The graphics processor of claim 5, wherein when the pixel level state is the second data type, the mixer state data is the second data type,
the cache returns the pixel level state for the second data type to the output mixer, and the output mixer operates at the sample frequency,
the output mixer outputs the mixed result as sample level data to the cache, and the pixel level state is maintained as the second data type.
9. The graphics processor of claim 1, wherein the multiple is equal to a magnification of a multisampling antialiasing mode.
10. The graphics processor of claim 1, wherein the cache is a first level cache.
11. A method of operation of a graphics processor, comprising:
pre-recording, by a cache, pixel level status of a cache line corresponding to a current render target;
outputting pixel frequency source data through a pixel shader;
receiving the pixel frequency source data through an output mixer;
determining, by the cache, to output pixel data or sample data to the output mixer in accordance with the pixel level state, wherein the sample data is a multiple of the pixel data; and
the pixel level state is updated or maintained by the output mixer.
12. The method of operation of claim 11, further comprising:
outputting coverage information through a test unit of the output mixer;
receiving the coverage information and the pixel frequency source data through a fusion module of the output mixer;
determining coverage degree data according to the coverage information and the pixel frequency source data through the fusion module; and
determining, by the fusion module, mixer state data from the coverage level data and the pixel level state, and updating the pixel level state,
wherein the mixer state data is used to determine whether the output mixer is operating at a pixel frequency or a sample frequency.
13. The method of operation of claim 12 wherein the mixer state data is 1 bit of data and the output mixer operates at the pixel frequency when the mixer state data is a first data type and at the sample frequency when the mixer state data is a second data type.
14. The method of operation of claim 13, wherein the coverage data is 1 bit of data and is the first data type when each of a plurality of samples in the coverage information is defined as the same coverage setting and is the second data type when the plurality of samples in the coverage information have different coverage settings.
15. The method of operation of claim 14 wherein said pixel level state is 1 bit of data, said pixel level state being said first data type when a plurality of samples stored in each pixel of said cache line respectively have the same said pixel data, said pixel level state being said second data type when said plurality of samples stored in said each pixel of said cache line have different said pixel data.
16. The method of operation of claim 15, wherein the mixer state data is the first data type when the pixel level state and the coverage level data is the first data type, wherein the method of operation further comprises:
returning said pixel level state for said first data type to said output mixer through said cache, wherein said output mixer operates at said pixel frequency;
outputting the data of which the mixing result is at the pixel level to the cache through the output mixer; and
maintaining the pixel level state as the first data type.
17. The method of operation of claim 15, wherein when the pixel level state is the first data type and the coverage data is the second data type, the mixer state data is the first data type, wherein the method of operation further comprises:
returning said pixel level state for said first data type to said output mixer via said cache, wherein said output mixer operates at said pixel frequency,
outputting the data of which the mixing result is at the pixel level to the cache through the output mixer; and
updating the pixel level state to the second data type.
18. The method of operation of claim 15, wherein when the pixel level state is the second data type, the mixer state data is the second data type, wherein the method of operation further comprises:
returning the pixel level state for the second data type to the output mixer through the cache, wherein the output mixer operates at the sample frequency;
outputting, by the output mixer, data of which the mixing result is at a sample level to the cache; and
maintaining the pixel level state as the second data type.
19. The method of operation of claim 11, wherein the multiple is equal to a magnification of a multisampling antialiasing mode.
20. The method of operation of claim 11 wherein the cache is a first level cache.
CN202110308083.0A 2021-03-23 2021-03-23 Graphics processor and method of operation thereof Active CN113012026B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110308083.0A CN113012026B (en) 2021-03-23 2021-03-23 Graphics processor and method of operation thereof
US17/467,280 US11645732B2 (en) 2021-03-23 2021-09-06 Graphics processing unit having pixel shader, output merger, cache, memory and operation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110308083.0A CN113012026B (en) 2021-03-23 2021-03-23 Graphics processor and method of operation thereof

Publications (2)

Publication Number Publication Date
CN113012026A CN113012026A (en) 2021-06-22
CN113012026B true CN113012026B (en) 2023-09-05

Family

ID=76405342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110308083.0A Active CN113012026B (en) 2021-03-23 2021-03-23 Graphics processor and method of operation thereof

Country Status (1)

Country Link
CN (1) CN113012026B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620725A (en) * 2008-07-03 2010-01-06 辉达公司 Hybrid multisample/supersample antialiasing
CN111798365A (en) * 2020-06-12 2020-10-20 完美世界(北京)软件科技发展有限公司 Deep anti-saw data reading method, device, equipment and storage medium
CN112396610A (en) * 2019-08-12 2021-02-23 阿里巴巴集团控股有限公司 Image processing method, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8605087B2 (en) * 2008-07-03 2013-12-10 Nvidia Corporation Hybrid multisample/supersample antialiasing
US9396515B2 (en) * 2013-08-16 2016-07-19 Nvidia Corporation Rendering using multiple render target sample masks
US10242286B2 (en) * 2015-03-25 2019-03-26 Intel Corporation Edge-based coverage mask compression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620725A (en) * 2008-07-03 2010-01-06 辉达公司 Hybrid multisample/supersample antialiasing
CN112396610A (en) * 2019-08-12 2021-02-23 阿里巴巴集团控股有限公司 Image processing method, computer equipment and storage medium
CN111798365A (en) * 2020-06-12 2020-10-20 完美世界(北京)软件科技发展有限公司 Deep anti-saw data reading method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113012026A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
US10885607B2 (en) Storage for foveated rendering
EP3559914B1 (en) Foveated rendering in tiled architectures
US11113788B2 (en) Multi-space rendering with configurable transformation parameters
US7932914B1 (en) Storing high dynamic range data in a low dynamic range format
JP5823515B2 (en) Displaying compressed supertile images
US9665973B2 (en) Depth buffering
KR19980703654A (en) Hardware that rotates an image for display of longitudinal orientation
EP3230880A1 (en) Processing unaligned block transfer operations
CN107886466B (en) Image processing unit system of graphic processor
US7629982B1 (en) Optimized alpha blend for anti-aliased render
US11740470B2 (en) Low latency distortion unit for head mounted displays
CN113012026B (en) Graphics processor and method of operation thereof
CN113012025B (en) Graphics processor and method of operation thereof
US6867778B2 (en) End point value correction when traversing an edge using a quantized slope value
US7145570B2 (en) Magnified texture-mapped pixel performance in a single-pixel pipeline
CN115955589A (en) Optimized video splicing method, system and storage medium based on MIPI
CN108460729A (en) A kind of computer readable storage medium and the Image Reversal device using the medium
CN111192351B (en) Edge antialiasing graphics processing method, system, storage medium and apparatus
US7158132B1 (en) Method and apparatus for processing primitive data for potential display on a display device
US11645732B2 (en) Graphics processing unit having pixel shader, output merger, cache, memory and operation method thereof
JP2000029443A (en) Screen driver provided with animation circuit
US7079150B1 (en) Image processing wherein decompression and compression methods provide faster transmission of texture data between a texture buffer and Aprocessor than between a storage device and a processor
JP4482996B2 (en) Data storage apparatus and method and image processing apparatus
JPH04205678A (en) Image information processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 11th Floor, Building 3, No. 889 Bibo Road, Pudong New Area Pilot Free Trade Zone, Shanghai, 201200

Patentee after: Granfei Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 3 / F, building 2, No. 200, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201203

Patentee before: Gryfield Intelligent Technology Co.,Ltd.

Country or region before: China