CN110928610B

CN110928610B - Method, device and computer storage medium for verifying shader function

Info

Publication number: CN110928610B
Application number: CN202010082809.9A
Authority: CN
Inventors: 张斌; 马栋; 刘微
Original assignee: Nanjing Xintong Semiconductor Technology Co Ltd
Current assignee: Nanjing Sietium Semiconductor Co ltd
Priority date: 2020-02-07
Filing date: 2020-02-07
Publication date: 2020-05-19
Anticipated expiration: 2040-02-07
Also published as: CN110928610A

Abstract

The embodiment of the invention discloses a method and a device for verifying a shader function and a computer storage medium; the method can comprise the following steps: running a first shader function by a first shader in front of a fragment shader in a graphics rendering pipeline in a GPU aiming at pre-prepared original data for performing shader function operation; capturing the operation result of the first shader function after the first shader function is operated through the GPU and transmitting the operation result to a set cache; simulating and operating the first shader function through the CPU according to the original data to obtain a simulation result of the first shader function; reading the operation result of the first shader function in the set cache through a CPU; comparing, by the CPU, the run result of the first shader function to the simulated result of the first shader function to validate the first shader function.

Description

Method, device and computer storage medium for verifying shader function

Technical Field

The embodiment of the invention relates to the technical field of Graphic Processing Units (GPUs), in particular to a method and a device for verifying shader functions and a computer storage medium.

Background

Currently, in a graphics rendering pipeline, Shader functions can be written in a Shader Language (SL) to implement Shader programs for various types of shaders in the graphics rendering pipeline. Since the Shader function runs in the GPU, the verification for the Shader function is more complicated based on the difference in the storage and compilation environment between the CPU and the GPU.

Generally, in the current verification scheme for the Shader function, final output data of a graphics rendering pipeline is packaged or analyzed, so as to determine whether an execution result of the Shader function used by the graphics rendering pipeline in a processing process is correct. According to the scheme, the output data can be analyzed after the whole graphics rendering pipeline is executed, so that the pipeline level is multiple during verification, and the verification efficiency is low.

Disclosure of Invention

In view of the above, embodiments of the present invention are directed to methods, apparatuses, and computer storage media for verifying shader functions; the method and the device can verify the Shader function conveniently, and the verification flexibility is improved.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for verifying a shader function, including:

running a first shader function by a first shader in front of a fragment shader in a graphics rendering pipeline in a GPU aiming at pre-prepared original data for performing shader function operation;

capturing the operation result of the first shader function after the first shader function is operated through the GPU and transmitting the operation result to a set cache;

simulating and operating the first shader function through the CPU according to the original data to obtain a simulation result of the first shader function;

reading the operation result of the first shader function in the set cache through a CPU;

comparing, by the CPU, the run result of the first shader function to the simulated result of the first shader function to validate the first shader function.

In a second aspect, an embodiment of the present invention provides an apparatus for verifying a shader function, the apparatus including: a memory, a CPU and a GPU; wherein the content of the first and second substances,

the memory is used for storing original data;

the GPU configured to perform the steps of:

running a first shader function on the raw data through a first shader in the graphics rendering pipeline before the fragment shader; and the number of the first and second groups,

capturing the operation result of the first shader function after the first shader function is operated and transmitting the operation result to a setting cache;

the CPU configured to have performed the steps of:

simulating and operating the first shader function according to the original data to obtain a simulation result of the first shader function; and the number of the first and second groups,

reading the operation result of the first shader function in the setting cache; and the number of the first and second groups,

comparing the run result of the first shader function to the simulated result of the first shader function to validate the first shader function.

In a third aspect, embodiments of the present invention provide a computing device, which includes the device for verifying shader functions described in the second aspect.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where a program for validating shader functions is stored, and when executed by at least one processor, the program for validating shader functions implements the steps of the method for validating shader functions according to the first aspect.

The embodiment of the invention provides a method and a device for verifying a shader function and a computer storage medium; and transferring the operation result of the first shader function before the fragment shading period to a set cache after the first shader function is operated by the GPU, so that the CPU reads the operation result of the first shader function from the set cache after obtaining the simulation result of the first shader function and compares the operation result to verify the first shader function. Therefore, a complete graphics rendering pipeline flow is not required to be completed for the verification of the first shader function, the length of a pipeline to be executed for the verification is reduced, the complexity of the verification execution is reduced, the calculation error and the format conversion error in the manual verification process can be reduced by comparing the simulation operation result of the CPU with the real operation result of the GPU, and the verification accuracy of the shader function is ensured.

Drawings

FIG. 1 is a block diagram of a computing device according to an embodiment of the present invention;

FIG. 2 is a block diagram of a GPU according to an embodiment of the present invention;

FIG. 3 is a block diagram illustrating a logic structure of a graphics rendering pipeline according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for verifying shader functions according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating another method for verifying shader functions according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating an embodiment of a method for verifying a shader function according to the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to FIG. 1, there is shown a computing device 100 capable of implementing embodiments of the present invention, the computing device 100 may include, but is not limited to, the following: wireless devices, mobile or cellular telephones (including so-called smart phones), Personal Digital Assistants (PDAs), video game consoles (including video displays, mobile video game devices, mobile video conferencing units), laptop computers, desktop computers, television set-top boxes, tablet computing devices, electronic book readers, fixed or mobile media players, and the like. In the example of fig. 1, computing device 100 may include a Central Processing Unit (CPU) 102 and a system memory 104 that communicate via an interconnection path that may include a memory bridge 105. The memory bridge 105, which may be, for example, a north bridge chip, is connected to an I/O (input/output) bridge 107 via a bus or other communication path 106, such as a HyperTransport (HyperTransport) link. I/O bridge 107, which may be, for example, a south bridge chip, receives user input from one or more user input devices 108 (e.g., a keyboard, mouse, trackball, touch screen that can be incorporated as part of display device 110, or other type of input device) and forwards the input to CPU102 via path 106 and memory bridge 105. Graphics processor 112 is coupled to memory bridge 105 via a bus or other communication path 113 (e.g., PCI Express, accelerated graphics port, or hypertransport link); in one embodiment, GPU112 may be a graphics subsystem that delivers pixels to display device 110 (e.g., a conventional CRT or LCD based monitor). System disk 114 is also connected to I/O bridge 107. Switch 116 provides a connection between I/O bridge 107 and other components, such as network adapter 118 and various add-in

cards

120 and 121. Other components (not explicitly shown), including USB or other port connections, CD drives, DVD drives, film recording devices, and the like, may also be connected to I/O bridge 107. Communication paths interconnecting the various components in fig. 1 may be implemented using any suitable protocols, such as PCI (peripheral component interconnect), PCI-Express, AGP (accelerated graphics port), hypertransport, or any other bus or point-to-point communication protocol, and connections between different devices may use different protocols as is known in the art.

In one embodiment, GPU112 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. In another embodiment, GPU112 includes circuitry optimized for general purpose processing while preserving the underlying (underlying) computing architecture. In yet another embodiment, GPU112 may be integrated with one or more other system elements, such as memory bridge 105, CPU102, and I/O bridge 107, to form a system on a chip (SoC).

It will be appreciated that the system shown herein is exemplary and that variations and modifications are possible. The connection topology, including the number and arrangement of bridges, the number of CPUs 102, and the number of GPUs 112, may be modified as desired. For example, in some embodiments, system memory 104 is directly connected to CPU102 rather than through a bridge, and other devices communicate with system memory 104 via memory bridge 105 and CPU 102. In other alternative topologies, GPU112 is connected to I/O bridge 107 or directly to CPU102, rather than to memory bridge 105. While in other embodiments, I/O bridge 107 and memory bridge 105 may be integrated onto a single chip. Numerous embodiments may include two or more CPUs 102 and two or more GPUs 112. The particular components shown herein are optional; for example, any number of add-in cards or peripherals may be supported. In some embodiments, switch 116 is eliminated and network adapter 118 and add-in

cards

120, 121 are directly connected to I/O bridge 107.

Fig. 2 is a schematic block diagram of a GPU112 capable of implementing the technical solution of the embodiment of the present invention, in which the graphics memory 204 may be a part of the GPU 112. Thus, GPU112 may read data from graphics memory 204 and write data to graphics memory 204 without using a bus. In other words, GPU112 may process data locally using local storage instead of off-chip memory. Such graphics memory 204 may be referred to as on-chip memory. This allows GPU112 to operate in a more efficient manner by eliminating the need for GPU112 to read and write data via a bus, which may experience heavy bus traffic. In some cases, however, GPU112 may not include a separate memory, but rather utilize system memory 10 via a bus. Graphics memory 204 may include one or more volatile or non-volatile memories or storage devices, such as Random Access Memory (RAM), Static RAM (SRAM), Dynamic RAM (DRAM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, magnetic data media, or optical storage media.

Based on this, GPU112 may be configured to perform various operations related to: generate pixel data from graphics data provided by CPU102 and/or system memory 104 via memory bridge 105 and bus 113, interact with local graphics memory 204 (e.g., a general frame buffer) to store and update pixel data, transfer pixel data to display device 110, and so on.

In operation, CPU102 is the main processor of computing device 100, controlling and coordinating the operation of other system components. Specifically, CPU102 issues commands that control the operation of GPU 112. In some embodiments, CPU102 writes command streams for GPU112 into data structures (not explicitly shown in fig. 1 or 2) that may be located in system memory 104, graphics memory 204, or other storage locations accessible to both CPU102 and GPU 112. A pointer to each data structure is written to a pushbuffer to initiate processing of the command stream in the data structure. GPU112 reads the command stream from one or more pushbuffers and then executes the commands asynchronously with respect to the operation of CPU 102. Execution priority may be specified for each pushbuffer to control scheduling of different pushbuffers.

As described in particular in FIG. 2, the GPU112 may be connected to an I/O (input/output) unit 205 that communicates with the rest of the computing device 100 via a communication path 113 connected to the memory bridge 105 (or, in an alternative embodiment, directly to the CPU 102). The connection of the GPU112 to the rest of the computing device 100 may also vary. In some embodiments, GPU112 may be implemented as an add-in card that may be inserted into an expansion slot of computer system 100. In other embodiments, GPU112 may be integrated on a single chip with a bus bridge, such as memory bridge 105 or I/O bridge 107. While in other embodiments some or all of the elements of GPU112 may be integrated with CPU102 on a single chip.

In one embodiment, communication path 113 can be a PCI-EXPRESS link in which a dedicated channel is allocated to GPU112 as is known in the art. The I/O unit 205 generates data packets (or other signals) for transmission over the communication path 113 and also receives all incoming data packets (or other signals) from the communication path 113, directing the incoming data packets to the appropriate components of the GPU 112. For example, commands related to processing tasks may be directed to scheduler 207, while commands related to memory operations (e.g., reads or writes to graphics memory 204) may be directed to graphics memory 204.

In GPU112, an array 230 of rendering cores may be included, where array 230 may include C general purpose rendering cores 208, where C > 1. Based on the generic rendering cores 208 in the array 230, the GPU112 is able to concurrently perform a large number of program tasks or computational tasks. For example, each rendering core may be programmed to be able to perform processing tasks related to a wide variety of programs, including, but not limited to, linear and non-linear data transformations, video and/or audio data filtering, modeling operations (e.g., applying laws of physics to determine the position, velocity, and other attributes of objects), graphics rendering operations (e.g., tessellation shader, vertex shader, geometry shader, and/or fragment shader programs), and so forth.

Further, a fixed function processing unit 231, which may include hardware that is hardwired to perform certain functions, may also be included in GPU 112. Although fixed-function hardware may be configured to perform different functions via, for example, one or more control signals, the fixed-function hardware typically does not include program memory capable of receiving user-compiled programs. In some examples, fixed function processing unit 231 may include, for example, a processing unit that performs primitive assembly, a processing unit that performs rasterization, and a processing unit that performs fragment operations. For the processing unit executing the primitive assembly, the processing unit can restore the vertexes which are colored by the vertex shader unit into a grid structure of a graph, namely the primitive, according to the original connection relation, so that the subsequent fragment shader unit can process the graph; the rasterization operation includes converting the new primitive and outputting the fragments to a fragment shader; the fragment operation includes, for example, a depth test, a cropping test, an Alpha blend, or a transparency blend, and the pixel data output by the above operations can be displayed as graphics data by the display device 110. Combining the rendering core array 230 and the fixed-function processing unit 231, a complete logic model of the graphics rendering pipeline can be implemented.

In addition, rendering core array 230 may receive processing tasks to be performed from scheduler 207. Scheduler 207 may independently schedule the tasks for execution by resources of GPU112, such as one or more rendering cores 208 in rendering core array 230. In one example, scheduler 207 may be a hardware processor. In the example shown in fig. 2, scheduler 207 may be included in GPU 112. In other examples, scheduler 207 may also be a separate unit from CPU102 and GPU 112. Scheduler 207 may also be configured as any processor that receives a stream of commands and/or operations.

Scheduler 207 may process one or more command streams that include scheduling operations included in one or more command streams executed by GPU 112. Specifically, scheduler 207 may process one or more command streams and schedule operations in the one or more command streams for execution by rendering core array 230. In operation, CPU102, through GPU driver 103 included with system memory 104 in fig. 1, may send a command stream to scheduler 207 that includes a series of operations to be performed by GPU 112. Scheduler 207 may receive a stream of operations including a command stream through I/O unit 205 and may process the operations of the command stream sequentially based on an order of the operations in the command stream, and the operations in the command stream may be scheduled for execution by one or more processing units in rendering core array 230.

Based on the above description of fig. 1 and fig. 2, fig. 3 shows an example of the graphics rendering pipeline 80 formed by the structure of the GPU112 shown in fig. 2, it should be noted that the core part of the graphics rendering pipeline 80 is a logic structure formed by cascading the general-purpose rendering core 208 and the fixed function processing unit 231 included in the rendering core array 230, and further, for the scheduler 207, the graphics memory 204, and the I/O unit 205 included in the GPU112, all are peripheral circuits or devices that implement the logic structure function of the graphics rendering pipeline 80, accordingly, the graphics rendering pipeline 80 usually includes a programmable execution unit (as indicated by the round-cornered box in fig. 3) and a fixed function unit (as indicated by the square box in fig. 3), for example, the function of the programmable execution unit can be performed by the general-purpose rendering core 208 included in the rendering core array 230, the functions of the fixed function unit may be implemented by the fixed function processing unit 231. As shown in FIG. 3, graphics rendering pipeline 80 includes the following stages in order:

vertex fetch module 82, shown in the example of FIG. 3 as a fixed-function unit, is generally responsible for supplying graphics data (triangles, lines, and dots) to graphics rendering pipeline 80. For example, vertex crawling module 82 may collect vertex data for high-order surfaces, primitives, and the like, and output vertex data and attributes to vertex shader 84.

Vertex shader 84 is a programmable execution unit configured to execute a vertex shader program to highlight and transform vertex data as specified by the vertex shader program. For example, vertex shader 84 may be programmed to transform vertex data from an object-based coordinate representation (object space) to a coordinate system that may alternatively be based on a coordinate system such as world space or Normalized Device Coordinate (NDC) space. Vertex shader 84 may read the data stored by vertex crawling module 82 for use in processing vertex data.

Primitive assembly module 86, shown in FIG. 3 as a fixed-function unit, is responsible for collecting the vertices output by vertex shader module 84 and assembling the vertices into geometric primitives. For example, primitive assembly module 86 may be configured to group every three consecutive vertices into a geometric primitive (i.e., a triangle). In some embodiments, a particular vertex may be repeated for consecutive geometric primitives (e.g., two consecutive triangles in a triangle strip may share two vertices).

Geometry shader 88 is a programmable execution unit configured to execute a geometry shader program that transforms graphics primitives received from primitive assembly module 86 as specified by the geometry shader program. For example, geometry shader 88 may be programmed to subdivide a graphics primitive into one or more new graphics primitives and calculate parameters, such as plane equation coefficients, used to rasterize the new graphics primitives. In some examples, geometry shader 88 is not a necessary shader of graphics rendering pipeline 80, and thus, geometry shader 88 is optional, as represented by the dashed lines in the figure. In some embodiments, geometry shader 88 may also add or delete elements in the geometry stream. Geometry shader 88 outputs parameters and vertices specifying new graphics primitives to clipping and partitioning module 90.

The clipping and dividing module 90, shown as a fixed functional unit in fig. 3, is responsible for clipping and removing the assembled primitives, and then dividing the primitives according to the size of tiles.

Rasterizer 92 is typically a fixed function unit that is responsible for preparing primitives for fragment shader 94. For example, rasterization module 92 may generate fragments for shading by fragment shader 94. In some examples, rasterization module 92 may scan convert new graphics primitives and output fragments and overlay data to fragment shader 94; in addition, the rasterizing module 92 may be configured to implement z-culling (z-culling) and other z-based optimizations.

Fragment shader 94 is a programmable execution unit configured to execute a fragment shader program to transform fragments received from rasterization module 92 as specified by the fragment shader program. For example, fragment shader 94 may be programmed to implement operations such as perspective correction, texture mapping, shading, blending, and the like, to produce shaded fragments that are output to output merger module 96.

Output merger module 96, shown in FIG. 3 as a fixed function unit, is generally responsible for performing raster operations such as Stencil (STENCIL), z-test, blending, etc., and outputting pixel data as processed graphics data for storage in graphics memory 204. The processed graphics data may be stored in graphics memory 204 for display on display device 110 or for further processing by CPU102 or GPU 112.

For the graphics rendering pipeline 80, the functions of the vertex Shader 84, the geometry Shader 88, and the fragment Shader 94 are implemented by programming a corresponding Shader program to the general rendering core 208 through the SL, and in the process of programming the vertex Shader program, the geometry Shader program, and the fragment Shader program, verification is performed on a Shader function included in the Shader program, which usually needs to complete the logic flow of the complete graphics rendering pipeline 80 shown in fig. 3, and store the rendering result in the graphics memory 204, which is exemplified by a frame buffer, through the graphics rendering pipeline 80, when the rendering result needs to be determined, only through the fixed output of the graphics rendering pipeline 80, so as to perform verification operations such as effective drawing of a graphics interface, analyzing graphics, pixel data, and anticipating contrast. Therefore, the whole process execution pipeline of the conventional verification scheme is long, and data interaction cannot be carried out, similar to black box testing. Greatly limits the flexibility of data transmission and increases the complexity of result feedback.

In view of this, it is desirable to describe a technique for verifying Shader functions, so that the Shader functions of a part of shaders in the graphics rendering pipeline 80 can be verified in time without completing the complete logic flow of the graphics rendering pipeline 80, and the pipeline length of feedback is reduced. Specifically, referring to fig. 4, a method for verifying a shader function according to an embodiment of the present invention is shown, where the method may include:

s401: running a first shader function by a first shader in front of a fragment shader in a graphics rendering pipeline in a GPU aiming at pre-prepared original data for performing shader function operation;

s402: capturing the operation result of the first shader function after the first shader function is operated through the GPU and transmitting the operation result to a set cache;

s403: simulating and operating the first shader function through the CPU according to the original data to obtain a simulation result of the first shader function;

s404: reading the operation result of the first shader function in the set cache through a CPU;

s405: comparing, by the CPU, the run result of the first shader function to the simulated result of the first shader function to validate the first shader function.

It is noted that the first shader includes at least one of a vertex shader and a geometry shader; accordingly, the first shader function includes at least one of a vertex shader function and a geometry shader function.

In the implementation process of the above scheme, the execution sequence of S401, S402, S403, and S404 is not specifically limited, that is, in terms of the execution sequence, S401, S402, and S403 may be executed before, after, or simultaneously with S404, and the execution sequence is not specifically limited in the embodiment of the present invention. Through the technical scheme shown in fig. 4, it can be seen that the operation result of the first shader function is transferred to the setting cache after the first shader function is completely operated, so that the CPU reads the operation result of the first shader function from the setting cache after obtaining the simulation result of the first shader function, and compares the operation result with the setting cache to verify the first shader function. Therefore, by adopting the technical scheme shown in fig. 4, the complete graphics rendering pipeline flow does not need to be completed for the verification of the first shader function, the pipeline length to be executed for the verification is reduced, and therefore the complexity of the verification execution is reduced.

For the technical solution shown in fig. 4, in some examples, the capturing, by the GPU, the operation result of the first shader function after the first shader function is completely operated and transferring the operation result to the setting cache includes:

and after the first shader function is operated, the GPU transfers the operation result of the first shader function to a Transform feedback (transformatfeedback) cache by using a Transform feedback.

For the above example, it should be noted that Transform Feedback refers to a step in the graphics rendering pipeline before primitive assembly after vertex shader 84 processing (and after geometry shader 88 processing if any) in the graphics rendering pipeline. It can recapture vertices that are to be assembled into primitives (points, line segments, triangles) and then pass some or all of the attributes onto the cached object. For example, Vertex Buffer Objects (VBO) are typically used to store vertices used to perform rendering operations. And recapture the vertex after the shader processes through the transformation feedback, and write the processed vertex back to the VBO to avoid transferring the vertex data from the graphics memory to the main memory and then performing callback. It can be understood that the running result of the first shader function is transferred to the Transform Feedback buffer through the Transform Feedback, a fixed output of a pipeline is not relied on, a customized output format can be realized, a data transfer mode is more flexible, multiple groups of data can be transferred at one time in a stream mode to increase data throughput, and the transferred data amount can be increased in data width. Encapsulation and parsing of graphics rendering pipeline rendering results can be reduced compared to conventional schemes.

Based on the above description, with Transform Feedback, the vertex shader 84 and/or the geometry shader 88 can be combined into a small rendering pipeline without subsequent processing by the rasterizing module 92 and the fragment shader 94, thus enabling general purpose computing with the powerful computing power of the GPU. In some examples, the first shader function includes a vertex function that processes vertex data and a general purpose function for general purpose computation. For example, a vertex function is capable of brightening and transforming vertex data, and a general function is used to perform general calculations.

Furthermore, in some examples, since geometry shader 88 belongs to a shader for optional execution within graphics rendering pipeline 80, the first shader function includes a subdivision function and a geometry processing function, corresponding to geometry shader 88 being selected for execution and the first shader function being a geometry shader function. For example, a tessellation function is used for tessellating vertices of a graphics primitive into vertices of one or more new graphics primitives; the geometry processing function may then be used to add or delete elements in the geometry data stream.

For the solution shown in fig. 4, the first shader function is the shader function of the geometry shader 88 and the vertex shader preceding the fragment shader 94 in the graphics rendering pipeline 80, and for the embodiment of the present invention, the second shader function comprises a fragment shader function. Then for the second shader function, in some examples, referring to fig. 5, may further include:

s501: running a second shader function through a second shader in the GPU, and transmitting a running result of the second shader to a frame buffer (FrameBuffer);

s502: reading the operation result of the second shader function from a frame cache through a CPU;

s503: simulating and operating the second shader function through the CPU according to the original data to obtain a simulation result of the second shader function;

s504: comparing, by the CPU, the run result of the second shader function to the simulated result of the second shader function to validate the second shader function.

It should be noted that, for the fragment shader, the execution result of the fragment shader serving as the last stage programmable shader of the graphics rendering pipeline 80 can be stored in the frame buffer after the complete graphics rendering pipeline 80 is completed, and therefore, the CPU can read the execution result of the second shader function from the FrameBuffer and compare the execution result with the simulation result obtained by its own simulation, thereby verifying whether the second shader function is correct.

For the above scheme, taking a common function abs function as an example, at one end of the GPU, the vertex shader 84 receives pre-prepared original data, calculates the original data according to the vertex shader function, and transfers the calculation result to a Transform feedback cache through a Transform feedback;

at one end of the CPU, the CPU only needs to run the same vertex shader function for the same original data to simulate the vertex shader, so as to obtain a simulation result, and after reading the calculation result in the Transform feedback cache, the simulation result and the calculation result are compared, so that the verification of the abs function can be completed. The output data after the complete graphics rendering pipeline 80 is finished does not need to be packaged and analyzed; and the CPU end and the GPU end have the same implementation method of the shader function, so that the verification difference caused by different technical implementation is avoided.

For the above technical solution, refer to fig. 6, which shows a block diagram of a specific implementation of the above technical solution, and the specific implementation flow is as follows:

s1: preparing original data participating in shader function operation in advance;

s2: the vertex shader runs a shader function or a general computation function related to a vertex according to the original data, and writes a running result back to a Transform feedback cache through a Transform feedback; it is understood that, taking OpenGL (Open Graphics Library) as an example, the operation result written back in this step is expected to be about 50% of the shader function results in OpenGL, and the result is written back in advance for comparison, thereby completing the closed loop of the verification of the vertex shader function.

S3: the geometry shader writes back the operation result to a Transform feedback cache through a Transform feedback by operating a geometry shader function; it is understood that, taking OpenGL (OpenGraphics Library) as an example, the operation result written back in this step is expected to be about 20% of the shader function results in OpenGL, and the result is written back in advance for comparison, thereby completing the closed loop of the verification of the geometry shader function.

S4: the fragment shader runs a fragment shader function, and writes a running result into a frame buffer through a complete graphics rendering pipeline; it is understood that, taking OpenGL (Open Graphics Library) as an example, the fragment shader functions are about the remaining 30% of the shader functions in OpenGL.

S5: the simulation operation part of the CPU simulates a shader function of a GPU (graphics processing Unit) end aiming at the original data and feeds a simulation result back to the comparison part;

s6: the comparison part of the CPU compares the operation result read by the Transform feedback cache or the frame cache with the simulation result to verify the shader function corresponding to the operation result

Based on the same technical concept as that of the above technical solution, an embodiment of the present invention provides an apparatus for verifying a shader function, where the apparatus may be a part of a computing apparatus shown in fig. 1, and the apparatus includes: a memory, a CPU and a GPU; wherein the content of the first and second substances,

the memory is used for storing original data;

the GPU configured to perform the steps of:

the CPU configured to have performed the steps of:

In some examples, the GPU is configured to perform: after the first shader function is completely run, a running result of the first shader function is transferred to a Transformfeedback cache by using a Transformfeedback (Transformfeedback).

In some examples, the GPU is further configured to perform: running a second shader function through a second shader, and transmitting a running result of the second shader to a frame buffer (FrameBuffer);

the CPU also configured to have performed the steps of: reading the operation result of the second shader function from a frame buffer; and the number of the first and second groups,

simulating and operating the second shader function according to the original data to obtain a simulation result of the second shader function; and

comparing the run result of the second shader function to the simulated result of the second shader function to validate the second shader function.

It is understood that in this embodiment, "part" may be part of a circuit, part of a processor, part of a program or software, etc., and may also be a unit, and may also be a module or a non-modular.

In one or more examples or examples above, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise a USB flash disk, a removable hard disk, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The code may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. . Accordingly, the terms "processor" and "processing unit" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of embodiments of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (i.e., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by a collection of interoperative hardware units, including one or more processors as described above.

Various aspects of the present invention have been described. These and other embodiments are within the scope of the following claims. It should be noted that: the technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

It should be noted that: the technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.

Claims

1. A method of validating shader functions, comprising:

capturing the operation result of the first shader function after the first shader function is operated through the GPU and transmitting the operation result to a set cache; wherein, the set cache is a Transform feedback cache;

2. The method of claim 1, wherein the first shader comprises at least one of a vertex shader and a geometry shader; accordingly, the first shader function includes at least one of a vertex shader function and a geometry shader function.

3. The method of claim 2, wherein the first shader function comprises a vertex function for processing vertex data and a general purpose function for general purpose computation, corresponding to the first shader function being a vertex shader function.

4. The method of claim 2, wherein the first shader function comprises a subdivision function and a geometry processing function, corresponding to a geometry shader being selected for execution and the first shader function being a geometry shader function.

5. The method of claim 1, wherein capturing, by the GPU and transferring the result of the first shader function to a configuration cache after the first shader function is completely executed, comprises:

and after the first shader function is operated, the GPU transfers the operation result of the first shader function to a Transform feedback cache by using a Transform feedback.

6. The method according to any one of claims 1 to 5, further comprising:

running a second shader function through a second shader in the GPU, and transmitting a running result of the second shader to a frame buffer (FrameBuffer); wherein the second shader comprises a fragment shader;

reading the operation result of the second shader function from a frame cache through a CPU;

simulating and operating the second shader function through the CPU according to the original data to obtain a simulation result of the second shader function;

comparing, by the CPU, the run result of the second shader function to the simulated result of the second shader function to validate the second shader function.

7. An apparatus for validating shader functions, the apparatus comprising: a memory, a CPU and a GPU; wherein the content of the first and second substances,

the memory is used for storing original data;

the GPU configured to perform the steps of:

capturing the operation result of the first shader function after the first shader function is operated and transmitting the operation result to a setting cache; wherein, the set cache is a Transform feedback cache;

the CPU configured to have performed the steps of:

simulating and operating the first shader function according to the original data, and acquiring a simulation result of the first shader function; and the number of the first and second groups,

8. The device of claim 7, wherein the GPU is configured to perform: running a second shader function through a second shader, and transmitting a running result of the second shader to a frame buffer (FrameBuffer);

9. A computing device comprising means for validating shader functions as claimed in claim 7 or 8.

10. A computer storage medium storing a program of validating shader functions, which when executed by at least one processor implements the steps of the method of validating shader functions of any one of claims 1 to 6.