CN115861518A

CN115861518A - Ray intersection testing using quantized and interval representations

Info

Publication number: CN115861518A
Application number: CN202211113266.8A
Authority: CN
Inventors: C·A·伯恩斯
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2021-09-24
Filing date: 2022-09-14
Publication date: 2023-03-28
Anticipated expiration: 2042-09-14
Also published as: CN117593439A; KR20240116676A; GB202318608D0; DE102022122793B4; GB2612681A; DE102022122793A1; GB202212912D0; CN115861518B; TWI822330B; KR20230043717A; GB2612681B; TW202314645A; TW202403671A

Abstract

Techniques related to primitive intersection testing for ray tracing in a graphics processor are disclosed. In some embodiments, a graphics processor includes ray intersection circuitry configured to perform intersection tests, the intersection tests including: quantizing the first representation of the primitive to generate a reduced-precision interval representation of the primitive; quantizing the first representation of the ray to generate a reduced-precision interval representation of the ray; and determining an initial intersection result based on the coordinates of the interval representation of the primitive and the coordinates of the interval representation of the ray using an interval algorithm. The initial intersection result may be a conservative result such that a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray. The disclosed techniques may improve performance, reduce power consumption, or both, relative to conventional techniques.

Description

Ray intersection testing using quantized and interval representations

This application claims priority to U.S. provisional patent application No. 63/248,143, filed 24/9/2021, which is incorporated herein by reference in its entirety.

Background

Technical Field

The present disclosure relates generally to graphics processors, and more particularly to primitive intersection testing for ray tracing.

Description of the related Art

In computer graphics, ray tracing is a rendering technique for generating an image by tracing the path of light as pixels in an image plane and simulating the effect of encountering a virtual object. Ray tracing may allow for resolution of visibility in three dimensions between any two points in the scene, which is also a source of much of its computational cost. A typical ray tracker samples the path of light through a scene in the opposite direction of light propagation, starting from the camera and propagating into the scene, rather than starting from the light source (this is sometimes referred to as "back ray tracing"). Starting from the camera has the beneficial effect of tracking only the light visible to the camera. The system may model a rasterizer, where the ray simply stops at the first surface and invokes a shader (similar to a fragment shader) to compute the color. The more common secondary effects, where the exchange of illumination between scene elements, such as diffuse internal reflection and transmission, are also modeled. Shaders evaluating surface reflection properties may invoke additional intersection queries (e.g., generate new rays) to capture incident illumination from other surfaces. This recursive process has many expressions, but is commonly referred to as path tracing.

Graphics processors that implement ray tracing generally provide more realistic scenes and lighting effects relative to conventional rasterization systems. However, ray tracing is generally computationally expensive. Improvements to ray tracing techniques may improve realism in graphical scenes, improve performance (e.g., allow more rays to be traced per frame, in more complex scenes, or both), reduce power consumption (which may be particularly important in battery-powered devices), and so forth.

Ray intersection queries may be performed by shaders, dedicated hardware, or a combination of both. Different types of intersecting queries may provide different types of information. For example, a "closest hit" query may locate the closest intersecting geometry along a ray and within a parameter interval in which the ray is valid (this may be the most common type of query). The "any hits" query may indicate whether there are any intersecting geometries along the ray and within the parameter interval. This type of query may be used to shadow rays, for example, to determine whether a point in a scene has visibility to light or is occluded. Once the intersection geometry is determined, the geometry may be colored based on the intersection, and more rays may then be generated, for example, from the reflective surface for intersection testing.

Motion blur is a phenomenon that occurs when an image being recorded changes during recording of a single exposure. For example, a photograph of a moving freight train with a sufficiently long exposure time may show train blur, while non-moving objects are not. In a computer graphics context, a graphics processor may simulate the motion blur effect of a frame of graphics data. In this context, an animated graphics primitive (e.g., a triangle) may be modeled with a plurality of different positions during an open shutter interval (also referred to herein as a motion blur interval) of a virtual camera, and thus may affect pixel values at a plurality of positions in a frame to cause a blur effect.

An accurate time stamp is typically assigned to each ray, for example, within a motion blur interval. When implementing both ray tracing and motion blur, testing ray/primitive intersections can be expensive in terms of processor resources and power consumption.

Drawings

Fig. 1A is a diagram illustrating an overview of exemplary graphics processing operations, according to some embodiments.

Fig. 1B is a block diagram illustrating an exemplary graphics unit, according to some embodiments.

Fig. 2A is a block diagram illustrating an example low precision test circuit, according to some embodiments.

Fig. 2B is a block diagram illustrating an exemplary intersection testing technique, according to some embodiments.

Fig. 3 is a diagram illustrating exemplary interval representations of various values used in an initial intersection test, according to some embodiments.

Fig. 4 is a diagram illustrating an exemplary interpolation circuit configured to generate interval representations of interpolation primitives in a motion blur interval, according to some embodiments.

Fig. 5 is a block diagram illustrating an example clip factor circuit configured to generate a clip factor interval, according to some embodiments.

Fig. 6 is a diagram illustrating an exemplary circuit configured to translate and clip vertices using a clip factor interval, according to some embodiments.

Fig. 7 is a circuit diagram illustrating an exemplary circuit configured to generate initial intersection test results according to some embodiments.

Fig. 8 is a diagram illustrating an exemplary circuit configured to generate a modified interval product, according to some embodiments.

Fig. 9 is a diagram illustrating an exemplary triangle pair and order pair processing circuit, according to some embodiments.

Fig. 10 is a diagram illustrating an exemplary boundary of a quantized primitive representation and a region of deterministic hits, according to some embodiments.

Fig. 11 is a diagram illustrating an example test circuit configured to generate a hit or non-deterministic output, according to some embodiments.

Fig. 12 is a circuit diagram illustrating an example circuit configured to generate initial intersection test results, according to some embodiments.

Fig. 13 is a diagram illustrating an example primitive test sequence according to different orderings, including ordering from the middle, according to some embodiments.

Fig. 14 is a flow diagram illustrating an exemplary method according to some embodiments.

Fig. 15 is a flow diagram illustrating another exemplary method according to some embodiments.

Fig. 16 is a block diagram illustrating an exemplary computing device, according to some embodiments.

Fig. 17 is a diagram illustrating an exemplary application of the disclosed systems and devices according to some embodiments.

FIG. 18 is a block diagram illustrating an exemplary computer-readable medium storing circuit design information according to some embodiments.

Detailed Description

In the disclosed embodiment, a lower precision hardware triangle test is performed first as a filter, and a higher precision triangle test is performed if the lower precision test determines a potential hit. Such a low precision test may be conservative (e.g., it may generate false hits but should not generate false misses). U.S. patent application Ser. No. 17/136,542, filed on 29.12.2020 and entitled "Positive Testing for Ray interaction at Multiple Precisions," is incorporated herein by reference in its entirety. The' 542 patent application describes exemplary techniques for testing at different accuracies, and how potential errors due to quantization of inputs can be tracked throughout reduced-accuracy testing to ensure that results are conservative.

The present disclosure uses a spacing algorithm to track and limit potential quantization errors for hardware primitive testing using quantization of one or more inputs. In some embodiments, the disclosed techniques may advantageously provide tighter error bounds than embodiments of the' 542 patent application. Additionally, in some implementations, the disclosed techniques may use a reduced circuit area to perform primitive testing with a particular accuracy.

In addition, the disclosed embodiments, discussed in detail below, generate interpolated spatial coordinate intervals to represent the moving triangles of the conservative intersection test for a given ray time in the motion blur interval. In addition, the disclosed techniques provide efficient coding and processing techniques for moving and non-moving triangle pairs.

Still further, the disclosed techniques may provide deterministic hit results using intersection tests of lower precision without performing the intersection tests at the original precision (e.g., for "any hit" rays).

Finally, the disclosed traversal ordering techniques for accelerating data structures (e.g., "start from middle" ordering rather than front-to-back or back-to-front) may improve performance, reduce power consumption, or both, for traversal of certain types of rays.

Graphic processing overview

Referring to FIG. 1A, a flow diagram illustrating an exemplary process flow 100 for processing graphics data is shown. In some implementations, the transformation and lighting process 110 may involve processing lighting information for vertices received from an application based on defined light source locations, reflectivities, and the like, assembling the vertices into polygons (e.g., triangles), and converting the polygons to the correct size and orientation based on locations in three-dimensional space. The clipping process 115 may involve discarding polygons or vertices outside the visible area. The rasterization process 120 may involve defining fragments within each polygon and assigning an initial color value to each fragment, e.g., based on the texture coordinates of the polygon vertices. The fragments may specify attributes of the pixels they overlap, but actual pixel attributes may be determined based on combining multiple fragments (e.g., in a frame buffer), ignoring one or more fragments (e.g., if they are covered by other objects), or both. The shading process 130 may involve changing the pixel components based on lighting, shading, bump mapping, translucency, and the like. The colored pixels may be assembled in a frame buffer 135. Modern GPUs typically include programmable shaders that allow application developers to customize shading and other processing. Thus, in various embodiments, the exemplary elements of fig. 1A may be performed in various orders, performed in parallel, or omitted. Additional processing may also be performed.

Referring now to FIG. 1B, a simplified block diagram of an exemplary graphics unit 150 is shown, according to some embodiments. In the illustrated embodiment, graphics unit 150 includes programmable shader 160, vertex pipe 185, fragment pipe 175, texture Processing Unit (TPU) 165, image write unit 170, and memory interface 180. In some embodiments, graphics unit 150 is configured to process both vertex data and fragment data using programmable shaders 160, which may be configured to process graphics data in parallel using multiple execution pipelines or instances.

In the illustrated embodiment, vertex tube 185 may include various fixed function hardware configured to process vertex data. Vertex pipe 185 may be configured to communicate with programmable shaders 160 in order to coordinate vertex processing. In the illustrated embodiment, the vertex pipe 185 is configured to send the processed data to the fragment pipe 175 or the programmable shader 160 for further processing.

In the illustrated embodiment, fragment pipe 175 may include various fixed function hardware configured to process pixel data. The fragment pipe 175 may be configured to communicate with the programmable shaders 160 in order to coordinate fragment processing. The fragment pipe 175 may be configured to perform rasterization on polygons from either the vertex pipe 185 or the programmable shader 160 to generate fragment data. Vertex pipe 185 and fragment pipe 175 may be coupled to memory interface 180 (coupling not shown) for accessing graphics data.

In the illustrated embodiment, programmable shader 160 is configured to receive vertex data from vertex pipe 185 and fragment data from fragment pipe 175 and TPU 165. Programmable shaders 160 may be configured to perform vertex processing tasks on the vertex data, which may include various transformations and adjustments of the vertex data. In the illustrated embodiment, the programmable shader 160 is also configured to perform fragment processing tasks on the pixel data, such as, for example, texturing and shading processing. The programmable shader 160 may include multiple sets of multiple execution pipelines for processing data in parallel.

In some embodiments, a programmable shader includes a pipeline configured to execute one or more different SIMD sets in parallel. Each pipeline may include various stages configured to perform operations (such as fetch, decode, issue, execute, etc.) in a given clock cycle. The concept of a processor "pipeline" is well understood and refers to the concept of dividing the "work" performed by the processor on instructions into multiple stages. In some embodiments, the decode, dispatch, execution (i.e., fulfillment) and retirement of instructions may be examples of different pipeline stages. Many different pipeline architectures may have different element/portion orderings. Various pipeline stages perform such steps on instructions during one or more processor clock cycles, and then pass the instructions or operations associated with the instructions to other stages for further processing.

The term "SIMD-group" is intended to be construed according to its well-known meaning, which includes a set of threads for which processing hardware processes the same instruction in parallel using different input data for different threads. Various types of computer processors may include a set of pipelines configured to execute SIMD instructions. For example, a graphics processor typically includes a programmable shader core configured to execute instructions for a set of related threads in SIMD fashion. Other examples of names that may be used for SIMD groups include: wavefront, clique, or warp. A SIMD group may be part of a larger thread group that may be split into multiple SIMD groups based on the parallel processing capabilities of the computer. In some embodiments, each thread is assigned to a hardware pipeline that fetches the operands of that thread and performs specified operations in parallel with other pipelines of the set of threads. Note that a processor may have a large number of pipelines, so that multiple separate SIMD groups may also be executed in parallel. In some embodiments, each thread has private operand storage, for example in a register file. Thus, reading a particular register from the register file may provide a version of the register for each thread in the SIMD set.

In some implementations, a plurality of programmable shader units 160 are included in the GPU. In these embodiments, the global control circuitry may assign work to different subportions of the GPU, which in turn may assign work to the shader cores for processing by the shader pipeline.

In the illustrated embodiment, TPU 165 is configured to schedule fragment processing tasks from programmable shader 160. In some implementations, TPU 165 is configured to prefetch texture data and assign initial colors to fragments for further processing by programmable shader 160 (e.g., via memory interface 180). TPU 165 may be configured to provide fragment components in a normalized integer format or a floating point format, for example. In some embodiments, TPU 165 is configured to provide a set of four ("fragment quad") fragments in a 2 x 2 format that are processed by a set of four execution pipelines in programmable shader 160.

In some embodiments, the Image Writing Unit (IWU) 170 is configured to store processed tiles of an image, and may perform operations on the rendered image before transmitting them for display or to memory for storage. In some embodiments, graphics unit 150 is configured to perform tiled delayed rendering (TBDR). In tiled rendering, different portions of screen space (e.g., squares or rectangles of pixels) can be processed separately. In various implementations, the memory interface 180 may facilitate communication with one or more of a variety of memory hierarchies.

In the illustrated example, graphics unit 150 includes a Ray Intersection Accelerator (RIA) 190, which may include hardware configured to perform various ray intersection operations, as described in detail below.

Interval-based intersection test overview

Fig. 2A is a block diagram illustrating an exemplary quantization circuit and a low-precision intersection test circuit, according to some embodiments. In the illustrated embodiment, the graphics processor includes a test circuit 220.

In some embodiments, the quantization circuit is configured to quantize the ray data and generate an interval representation of the quantized values. In various embodiments, although the upper and lower limits of the generated interval are represented using a lower precision than the input representation, the interval is guaranteed to cover the initial value in the input precision. Note that the primitive data may also be stored in a quantization interval format (e.g., to speed up the data structure).

In the illustrated embodiment, the interval algorithm-based low-precision test circuit 220 is configured to generate conservative intersection results by performing an interval algorithm on the interval representations. Conservative intersection results may ensure that misses signaled by circuitry 220 do not result in a hit for a higher accuracy intersection test (e.g., operating on a value with input accuracy prior to quantization). In these implementations, a positive output from circuit 220 indicates a potential hit.

In various implementations, performing an initial intersection test of lower precision may advantageously improve performance, reduce power consumption, or both, relative to conventional techniques. In particular, a miss or deterministic hit generated by the initial test may avoid the need to perform higher precision tests for a given ray and primitive. Thus, both improving the accuracy of the test (e.g., by tightening error bounds) and improving the performance or power consumption of the initial test itself may have technical advantages.

Fig. 2B is a flow diagram illustrating an overall exemplary intersection testing technique according to some embodiments. In the illustrated embodiment, element 210 converts the ray direction to a lower precision floating point interval representation. Element 230 determines a clipping factor based on a quantized frame transform (for quantization of vertices, as discussed in detail below), and element 244 converts the clipping factor to a fixed-point interval representation. Element 242 also generates a fixed point interval representation of the ray origin based on the quantized frame transform. Element 246 generates a fixed-point interval representation of ray times. For motion blur processing, element 250 temporally interpolates quantized triangle vertices based on ray time (this element may be omitted or may pass directly through quantized triangle vertices when no motion blur operation is performed). Element 260 transforms the vertices according to the clipping factor and ray origin, and element 270 evaluates the edge equations to determine if there is a miss or potential hit. The various elements of fig. 2B are explained in further detail below. The specific operations of fig. 2B are included for illustrative purposes and are not intended to limit the scope of the present disclosure. However, in some embodiments, the disclosed operations may advantageously provide close spacing using reasonable circuit area and power consumption.

Exemplary quantization interval representation of intersection test values

Fig. 3 is a diagram illustrating exemplary interval representations of various values used in an initial intersection test, according to some embodiments. In the illustrated example, the intervals are generated for vertex position, ray origin, direction and time, clipping factor, and interpolated triangle vertices. It is noted that these particular interval values are discussed for illustrative purposes, but are not intended to limit the scope of the present disclosure. In other embodiments, intervals may be used to represent any of the various values used in determining the initial intersection results.

In the illustrated embodiment, for each quantized vertex position (e.g., for each of the three vertices of a triangle), three respective intervals are determined for the X, Y, and Z dimensions. Similar intervals are determined for the ray origin and ray direction. In some embodiments that support motion blur, upper and lower bounds on ray times are also determined.

In some embodiments that use clipping as part of the ray-triangle intersection test, the upper and lower limits are determined for two clipping factors in the non-principal coordinate direction of the ray.

In some implementations that support motion blur, the graphics processor determines X, Y and a Z interval for each vertex for an interpolated triangle that corresponds to ray times within the motion blur interval. FIG. 4 is discussed in detail below and provides an exemplary technique for generating an interval representation of an interpolated triangle. In general, more detailed techniques for determining various specific intervals are discussed in detail below.

As discussed in detail below, the data structure may represent a triangle, a moving triangle, a pair of triangles, a moving pair of triangles, or some combination thereof. In some embodiments, three vertices are used to represent triangles, six vertices are used to represent moving triangles, four vertices are used to represent triangle pairs, and eight vertices are used to represent moving triangle pairs.

In some implementations, the quantized triangle coordinates are stored as unsigned integer values of finite fixed point precision and rounded to zero. These coordinates may correspond to a local coordinate system recorded in the acceleration data structure ADS, for example, as discussed in the' 542 patent application. The quantized value may be an N-bit value. In some embodiments, each coordinate value uses a number of bits that facilitates packing within a field of a particular size. As one example, the 7-bit value for each quantized coordinate interval value of a single triangle may be packed into two 64-bit fields (x up/down, y up/down and z up/down 7-bit values =126 bits for each of the three vertices). In other embodiments, fixed point coding using various suitable numbers of bits may be utilized. In some embodiments, unsigned values are converted into a new coordinate system, where the values become signed integers. Note that in some cases, only one boundary of an interval may be stored, while another boundary may be implicit. This may reduce memory requirements on certain portions of the processor.

In this context, if p is a quantized value of a triangular coordinate, the interval representing the coordinate in the local quantized coordinate space is

Where in some implementations, δ p represents one unit of minimum precision (ULP) in the quantization format. The original coordinate values before quantization are guaranteed to be located within the interval. For N bit fixed point representation, based on the sum of the absolute values of the N bits and the absolute value of the sum>

Generally, the amount of hindrance discussed herein refers to the spacing.

Thus, a given non-moving triangle may be encoded using nine values (three vertices each with three coordinate lower bounds, where the upper bounds are implicitly one ULP greater than the lower bounds).

In some embodiments, the moving triangles are stored as two (or more) sets of coordinates, e.g., position p (0) at time t =0 and position p (1) at time t = 1. This may define a normalized time interval [0,1 ]]Of the linear movement of (a). Note that multiple linear moves during a sub-interval may also be used to encode non-linear moves over a larger motion blur interval. In this case, the moving triangle may include more than two sets of coordinates. Moving triangle coordinates at time t may use the interval

To indicate.

In some embodiments, ray times are quantized to less precise intervals as part of a low-precision intersection test

Where t is encoded with M bits at sub-interval resolution (e.g., at 2) ^M Where 1.0 is implicitly set). M may or may not correspond to the number of bits N used to represent the spatial coordinates of the triangle (or the number of bits used to represent the ray-space coordinates). As with the other quantization intervals, it is guaranteed that the original high precision value is found within the low precision interval. In some embodiments, time is a fourth coordinate axis that is independent of other coordinates such as x, y, and z.

Exemplary Interval-based motion blur processing

In some embodiments, the interval interpolation circuit is configured to reconstruct the conservative spatial interval to account for the temporal interval of quantization of the rays

Move the triangle coordinate up. Fig. 4 is a diagram illustrating an exemplary interpolation circuit configured to generate interval representations of interpolation primitives in a motion blur interval, according to some embodiments. Circuitry 410 may perform the operations discussed above with reference to element 250 of fig. 2B.

In the illustrated embodiment, the interpolation circuit 410 is configured to receive an interval representation of ray times and an interval representation of a moving triangle (e.g., x, y, and z intervals for each of six vertices), and to generate an interval representation of an interpolated triangle (e.g., x, y, and z intervals for each of three vertices).

As one example, circuitry 410 may determine the interpolated spatial coordinate interval as:

using the symbol p ⁰ = p (0) and p ¹ = p (1), the circuit 410 may guarantee coverage of any t e 0,1- δ t]Quantized time interval [ t, t + δ t ] of]Interpolated position coordinate interval of

The determination is as follows:

wherein

z＝p ⁰ (1-t-δt)+p ¹ t

In various embodiments, this equation may provide a good fit with reasonable performance and circuit area. In addition, it has been determined that the intervals provided by this equation are conservative.

In some embodiments, circuitry 410 is configured to determine from the equation

And (4) spacing. Note that in other embodiments, other equations may be implemented by the computer circuitry to determine conservative interpolated triangle intervals; the equations disclosed herein are included for illustrative purposes and are not intended to limit the scope of the present disclosure.

In various embodiments, at least in the motion blur mode of operation, the interpolated triangle interval may be tested using an initial low precision intersection test. Thus, the various primitive inputs discussed below may be used for conventional triangles or for interpolating triangles, depending on whether motion blur is utilized, for example. Additionally, although the various techniques discussed herein use a spacing algorithm; the interpolation triangle techniques for motion blur disclosed herein may also be used with other quantization representations and techniques (e.g., the techniques of the' 542 patent application).

Exemplary clipping factor determination

As discussed in the' 542 patent application, a clipping technique may be used to implement the intersection test. In the following discussion, the following naming convention is employed:

p ray origin, floating point object space

p ray origin, fixed point quantization space

Ray direction, floating point object space

v ^v Triangle vertex coordinates, fixed point quantization space

In some embodiments, the transformation to 2D shear space is given by:

to perform these calculations with a fixed-point algorithm, the device may convert the object space light quantities P and D into quantization spaces P and D according to:

before proceeding further, the device may determine which axis of the zoom ray direction has the greatest magnitude and rotate the axis name so that the longest axis is at the third position ("z"). In addition, if the directional component is negative, the device may replace the other two axes to maintain handedness. For the following discussion, it is assumed that the renaming has been applied to all Cartesian quantities.

Substituting it into equation 1 and simplifying to obtain:

in the context of the spacing technique disclosed herein, the various values represented in equation 4 are the spacing representations, as discussed above. Once in 2D clipping space, the ray position is restored to the origin of the coordinate system, with the direction aligned with the z-axis, where the apparatus can test for three directed edges of a 2D triangle represented by three clipping coordinates v '∈ { a', B ', C' }, according to the following condition:

u＝A′ _x ·B′ _y -A′ _y ·B′ _x

v＝B′ _x ·C′ _y -B′ _y ·C′ _x

w＝C′ _x ·A′ _y -C′ _y ·A′ _x

if u, v, w all have the same sign, the triangle covers the origin and the ray intersects the triangle, thus within numerical precision.

Fig. 5 is a block diagram illustrating an example clip factor circuit configured to generate a clip factor interval, according to some embodiments. In the illustrated embodiment, the clipping factor circuit (which may be included in the low precision test circuit 220) includes down conversion circuits 510A-510C, subtraction circuits 520A-520B, reciprocal circuit 530, interval product and scale adjustment circuits 540A-540B, and floating point to fixed point interval conversion circuits 550A-550B. In some embodiments, the circuit of fig. 5 implements the functionality of element 230 of fig. 2B.

In the illustrated embodiment, the down conversion circuit 510 is configured to convert the x, y, and z directions (after rotation, such that the longest axis is the z direction) into a reduced precision floating point interval representation. In some embodiments, the down-conversion is Rounded To Negative Infinity (RTNI) to generate the lower interval boundary and Rounded To Positive Infinity (RTPI) to generate the upper interval boundary.

In the illustrated embodiment, the subtraction circuit 520 is configured to subtract the x and y scale values from the z scale value, respectively, to generate S in an unsigned integer representation _z /S _x And S _z /S _y The result of the unsigned division of (a). In some embodiments, the scale value is a power of two, such that subtraction of the exponent corresponds to division. These scale factors may be determined based on the quantized frames of the primitives. Generally, a set of quantized values may share a constantDefine the "quantized frame" of the parameters of these values. In some embodiments, the quantized values are represented as fixed point offsets relative to a common origin and scale factors. Thus, the quantization frame may specify an origin (e.g., in x, y, and z coordinates) and scale factors (e.g., scale factors that are powers of 2 for each of the z, y, and z dimensions). The quantized primitive intervals discussed herein may be represented using fixed point coordinates interpreted in the context of quantized frames. Note that in the illustrated example, the output of circuit 520 is not a space.

In the illustrated embodiment, the inverse circuit 530 is configured to generate an inverse of the down-converted z-direction value.

In the illustrated embodiment, the interval product circuit and scale adjustment circuit 540 is configured to perform an interval product operation on its inputs to generate an output in a reduced precision floating point interval format. In some embodiments, the circuits 540 are configured to clamp their outputs to the range [ -1,1]. In some implementations, circuit 540 also applies the scaling from circuit 520 by using exponential adjustment multiplied by a power of two.

In the illustrated embodiment, the floating-point to fixed-point interval conversion circuit 550 is configured to convert the reduced-precision floating-point interval representation to D _x S _z /D _z S _x And D _y S _z /D _z S _y The fixed point interval representation of the clipping factor (which is input to the circuit of fig. 6 discussed below).

Fig. 6 is a diagram illustrating an exemplary circuit configured to translate and clip vertices using a clip factor interval, according to some embodiments. For example, FIG. 6 may use the interval algorithm to implement the operation of equation (4) above. Fig. 6 may implement the operations discussed above with reference to element 260 of fig. 2B. In the illustrated embodiment, the circuitry receives vertex and ray position data in intervals and is configured to perform interval subtraction and multiplication operations to generate panning and clipping vertices using the clipping factor intervals generated by the circuitry of fig. 5. In some embodiments, each output of FIG. 6 is an interval, in FIG. 7, the minus sign (e.g., a) whose lower limit may be used _y- ) And its upper limit can be usedPlus sign (e.g., a) _y+ ) To indicate.

Fig. 7 is a block diagram illustrating exemplary circuitry configured to perform an initial reduced-precision intersection test, according to some embodiments. In some embodiments, the circuit of fig. 7 implements the functionality of element 270 of fig. 2B. For example, FIG. 7 may perform the operations of the above equations corresponding to u, v, w based on the outputs of FIG. 6 to generate the intersection results. Note that the circuit of fig. 7 has some differences with respect to those equations. First, the circuit performs a comparison rather than a subtraction (e.g., A' _x ·B′ _y <A′ _y ·B′ _x But is not A' _x ·B′ _y -A′ _y ·B′ _x ) Since only symbols are needed. Second, in the illustrated embodiment, the circuit of FIG. 7 performs a double multiplication to provide a conservative test (e.g., considering only the "outer" portion of the edge spacing), but the circuit does not know which way is "out" because it may be considering the clockwise or counterclockwise faces of the triangle. The circuit 710 is configured to generate a modified interval product and is discussed in detail below with reference to fig. 8.

The exemplary AND and OR logic of FIG. 7 provides a result indicating whether the reduced precision test provides a deterministic miss. As shown, six double-sided edge tests may use 12 multipliers and 6 comparators, all fixed-point. Note that the various circuits may be combined or merged, e.g., the adder and subtractor may be implemented by a single component that performs both operations in parallel, and the multiplier and comparator may be merged to implement a single ab < = cd operation.

As discussed above, if there is a non-deterministic result (potential hit), the processor may perform a higher precision intersection test (e.g., using the original floating point representation).

Exemplary modified Interval product

Typically, a sign-spaced product requires four multipliers, as defined:

in some embodiments, two multipliers are used to implement the interval product. In order to fully resolve the sign of the interval product sum, we need to accurately resolve the signs of the two endpoints of each interval product. This can be accomplished by using only two multipliers per interval product unless both interval inputs of the interval product cross the origin. In this case, the hardware may raise an exception, and the intersection test may record a potential hit. Empirical data indicates that such abnormal situations may be rare under typical workloads. Code list 1 uses only two hardware multipliers to implement the modified signed-spaced product. />

/>

Fig. 8 illustrates one example of a circuit 810 configured to implement a modified, symbol-spaced product, according to some embodiments. In some embodiments, the circuit of fig. 8 is included in the corresponding element 710 of fig. 7. In this embodiment, routing circuit 810 is configured by four symbols of input to route operands to two multipliers, e.g., as set forth in code list 1. In this example, the circuit 810 is also configured to detect an abnormal condition.

Exemplary encoding and processing techniques for triangle pairs

Fig. 9 is a diagram illustrating an exemplary triangle pair and order pair processing circuit, according to some embodiments. As shown, triangle pair 910 is a set of two triangles that share two vertices (vertex 1 and vertex 2 in the illustrated example). Thus, the two triangles may be defined by four vertices. Given that triangle pairs are common in various models, in some embodiments, the processor is configured to store triangles using a triangle pair data structure with four vertices, which may reduce storage requirements.

In some embodiments, the processor includes sequential pair processing circuitry 920 configured to sequentially perform one or more operations on the triangle pairs, e.g., processing one triangle of a pair before processing the second triangle of the pair. As one example, the operation may be the result of an initial intersection, but other circuits may use similar sequential techniques. This may provide efficient processing in implementations where the same triangle pair structure is used for all triangles, but some structures may only have data for a single triangle. In these embodiments, sequential pair processing circuitry 920 may skip the operation of the second triangle in the pair if the data structure indicates that only one triangle is encoded.

Exemplary deterministic hit detection Using lower precision intersection testing

In some implementations, intersection test circuitry operating on quantized inputs may still provide deterministic information about whether a line corresponding to a ray intersects a primitive, which may be useful for certain types of rays. Thus, referring back to the example of fig. 7, a modified comparison circuit (in addition to or in place of the circuit of fig. 7) may be implemented to provide a result indicating whether a hit occurred on either a deterministic hit or a non-deterministic hit.

FIG. 10 is a diagram illustrating an exemplary region enclosed by a quantized representation of a two-dimensional triangle primitive (e.g., after clipping). In the illustrated example, edge 1010 shows an exact edge, e.g., if represented in terms of original precision. The outer boundary 1020 and the inner boundary 1030 illustrate the boundaries of the quantized representation, for example using a space representation.

As shown, rays falling in regions outside of boundary 1020 are deterministic misses, e.g., as can be detected by the circuitry of FIG. 7. Rays that fall in the region between the boundaries 1020 and 1030 are non-deterministic (e.g., because the exact location where the triangle edge falls within the region is not known). Light falling in this area may require higher precision testing.

As shown, a ray falling in a region within the boundary 1030 is a deterministic hit for the line corresponding to the ray. It is noted that the intersection detected by this test may not accurately indicate where a hit occurred, for example due to quantization. In addition, the intersection detected by this test may only indicate a hit on the line corresponding to the ray, e.g., due to a quantification of the interval over which the ray is valid.

However, in some embodiments, even if there are the limitations discussed above, it may be useful to determine deterministic hits in regions within the boundary 1030.

Fig. 11 is a block diagram illustrating an exemplary low precision test circuit 1120 configured to indicate whether there is a hit or whether it cannot be determined whether there is a hit. Fig. 12, discussed in detail below, provides a detailed example of such a circuit. Note that the circuit 1120 may also provide an output indicating whether there is a miss or whether it is uncertain whether there is a miss (e.g., whether the circuits of fig. 7 and 12 are combined).

In some implementations, in some cases where the output of the circuit 1120 indicates a deterministic hit, the processor may skip a higher precision intersection test. In some embodiments, such ray querying may terminate under the following conditions: the ray is any hit ray, the triangle is opaque, and the active ray interval completely covers at least one bounding volume that completely encloses the triangle. In some implementations, the triangle opacity may be determined based on whether alpha maps to a test. Whether an active ray interval completely covers at least one bounding volume that completely encloses a triangle may be determined based on a traversal of the ADS (which allows for determining which bounding volumes completely enclose a triangle based on the structure of the ADS) and a flat panel test circuit configured to test the traversed bounding volumes.

Under these conditions, the processor may record ray-triangle intersection hits without performing a higher accuracy test. This may advantageously improve performance, reduce power consumption, or both when processing any hit ray. Note that the conditions discussed above are included for illustrative purposes; in other embodiments, only a subset of these conditions may be examined, other conditions may be applied, and so forth.

Fig. 12 is a circuit diagram, similar to the diagram of fig. 7, showing a deterministic hit test circuit, according to some embodiments. In the illustrated embodiment, the circuit 710 is configured as described above with reference to fig. 7 and 8. However, the output is routed differently to the comparator to provide a hit or a result that is non-deterministic. In some embodiments, in addition to the circuitry shown in fig. 7, a comparator, and gate, and or gate, shown in fig. 12, are included such that the quantitative intersection test circuitry outputs two boolean results for a given test.

The following code listing 2 provides exemplary operations that may be implemented by the circuit of fig. 12 or other similar circuits.

/>

/>

Exemplary traversal techniques to potentially reduce intersection testing

Ray intersection calculations are typically facilitated by an Acceleration Data Structure (ADS). In order to effectively implement ray intersection query, the spatial data structure can reduce the number of ray surface intersection tests, thereby accelerating the query process. A common category of ADS is Bounding Volume Hierarchy (BVH), where surface primitives are encapsulated in a hierarchy of geometric proxy volumes (e.g., boxes) that test intersections more cheaply. These volumes may be referred to as bounding regions. The graphics processor locates a conservative set of candidate intersecting primitives for a given ray by traversing the data structure and performing a proxy intersection test along the path. A common form of BVH uses a 3D Axis Aligned Bounding Box (AABB). Once constructed, the AABB BVH is available for all ray queries and is a view-independent structure. In some embodiments, for each different mesh in the scene, the structures are constructed once in the object's local object space or model space, and the ray is transformed from world space into local space before traversing the BVH. This may allow for geometric instantiation of a single mesh with many rigid transformations and material properties (similar to instantiation in rasterization). Animation geometries typically require reconstruction of the data structure (sometimes with less expensive update operations, called "re-fitting"). For non-real-time use cases where millions or billions of rays are tracked for a single scene in a single frame, the cost of ADS construction is fully amortized to the extent of "free". However, in a real-time environment, there is typically a delicate tradeoff between the cost of building and the cost of traversing, where building more efficient structures is typically more expensive.

In some embodiments, the intersection circuit is configured to traverse the BVH ADS using the 3D axis alignment box as its bounding volume. The ADS may have a maximum branching factor (e.g., 2,4, 8, 16, etc.) that does not assume triangle geometry and a flexible user-defined payload (e.g., content at leaves). In some embodiments, a depth-first search is performed, for example, as discussed in U.S. patent application No. 17/103,317, filed 24/11/2020, which is incorporated by reference herein in its entirety.

In some embodiments, RIA 190 is configured to use the revised ordering of child nodes of a given node for a particular type of depth-first traversal. In some embodiments, the disclosed techniques are applied to secondary light. The secondary ray is a ray that travels from the intersection between the first (traced) ray and the surface. Many of any hit rays are secondary rays due to the type of effect that is typically achieved with any hit ray (e.g., shadow). Thus, the secondary ray originates near the intersecting surface and is directed away from that surface (and thus does not intersect that particular surface).

Due to the nature of secondary rays, the present inventors have recognized that front-to-back or back-to-front traversal of child nodes of intersecting bounding volumes may generally result in missed intersection tests. For example, for a front to back ray, the ray may intersect the bounding volume of the primitive that reflects the secondary ray (triggering an intersection test), but will not actually intersect the primitive.

Fig. 13 is a diagram comparing front-to-back ordering of intersecting child nodes of an acceleration data structure with an ordering starting from the middle, according to some embodiments. In the illustrated example, the secondary ray is a reflection based on the intersection of another ray (not shown) with primitive a. As shown, the ray ends at the light source (which may be because ray tracing typically tracks rays back from the camera to the light source to avoid processing unrelated rays). In this example, the ray is any hit ray and intersects primitive C.

Consider the exemplary case where a ray intersects the bounding volume of each illustrative primitive, and the illustrative primitives are all the child nodes of the node corresponding to the larger bounding volume. In this example, the traversal circuit may use various orderings of the child nodes to search first in a depth-first search.

As shown, using a front-to-back ordering, where the bounding volume closer to the origin of the ray is traversed first, the intersection test of primitives A and B results in a miss (because it is any hit ray) before the hit of primitive C is eventually detected and the query ends. Given that the ray that generated the exemplary secondary ray intersects primitive A, the miss of primitive A is not surprising.

Using a sorting from the middle advantageously provides faster hit detection, which in this example requires two fewer intersection tests relative to a front-to-back sorting. As shown, starting from the middle of the ray results in a hit for primitive C, and the query may end without testing primitives D, A or B.

In some embodiments, various techniques may be utilized to provide prioritization to one or more intermediate nodes relative to front/back nodes. As an example, consider a tree-like ADS with a branching factor N. The intersection circuit may first classify the child nodes whose bounding volumes intersect in a front-to-back order. For M ≦ N intersection children (indexed 0 to M-1), the intersection circuit may reorder the intersections via the following sequence depending on whether M is odd or even.

If M is an odd number and the division refers to integer division (e.g., 3/2=1), then the following is an exemplary reordered sequence of sub-indices:

M/2

M/2+1

M/2-1

M/2+2

M/2-2

...

M/2+M/2＝M-1

M/2-M/2＝0

if M is an even number, the following is an exemplary reordering sequence:

M/2

M/2-1

M/2+1

M/2-2

M/2+2

...

M/2+(M/2-1)＝M-1

M/2-M/2＝0

in some hardware implementations, for a maximum branching factor of N, the circuit may encode a reordered sequence of each value of M from 1 to N to quickly determine the order in which to traverse from the middle. As one non-limiting example, if N =8, the table may include the following sequence:

for M =1: [0]

for M =2: [1,0]

For M =3: [1,2,0]

For M =4: [2,1,3,0]

For M =5: [2,3,1,4,0]

For M =6: [3,2,4,1,5,0]

For M =7: [3,4,2,5,1,6,0]

For M =8: [4,3,5,2,6,1,7,0]

It is noted that the specific sequences discussed herein are included for illustrative purposes and are not intended to limit the scope of the present disclosure. In other embodiments, various orderings may be implemented in which one or more internal child nodes take precedence over front/back nodes.

In implementations using binary trees (N = 2), the traversal circuit may alternate between back-to-front and front-to-back traversal orders when searching for child nodes of different levels of the tree (e.g., front-to-back for odd depths in the tree, back-to-front for even depths in the tree, and vice versa).

Exemplary method

Fig. 14 is a flow diagram illustrating an exemplary method for performing initial intersection testing, according to some embodiments. The method shown in fig. 14 may be used in conjunction with any of the computer circuits, systems, devices, elements, or components, etc., disclosed herein. In various embodiments, some of the method elements shown may be performed concurrently in a different order than shown, or may be omitted. Additional method elements may also be performed as desired.

At 1410, in the illustrated embodiment, the graphics processor quantizes the first representation of the primitive to generate a reduced-precision interval representation of the primitive, wherein the interval representation includes interval values that are guaranteed to cover corresponding values specified by the first representation of the primitive. In some embodiments, the quantization of the first representation of the primitive uses a fixed point quantization representation rounded to zero for the lower bound of the interval and one minimum precision Unit (ULP) plus the lower bound for the upper bound of the interval.

At 1420, in the illustrated embodiment, the graphics processor quantizes the first representation of the ray to generate a reduced precision interval representation of the ray, where the interval representation includes interval values that are guaranteed to cover corresponding values specified by the first representation of the ray. In some embodiments, the reduced-precision interval representation of the ray includes a quantized ray time represented as an interval. In some embodiments, the circuitry generates a reduced-precision interval representation of the primitive based on the first and second positions of the primitive at different points within the motion-blurred time interval, such that the reduced-precision interval representation of the primitive covers all possible positions of the primitive during the interval representing the quantized-ray time.

At 1430, in the illustrated embodiment, the graphics processor determines an initial intersection result based on the coordinates of the interval representation of the primitive and the coordinates of the interval representation of the ray using an interval algorithm, wherein a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray.

In some embodiments, in response to a potential hit initial intersection result, the graphics processor is configured to perform an intersection test using the first representation of the primitive and the first representation of the ray.

In some embodiments, the clipping factor circuit generates an interval representation of the clipping factor based on the ray direction information and the scale information, and generates clipped vertex intervals based on the quantized representation of the primitives and the interval representation of the clipping factor. In some embodiments, the initial intersection result is based on the clipped vertex spacing. In some embodiments, the clipping factor circuit is configured to use: a first precision to represent a first coordinate of the ray origin in the coordinate direction, providing a threshold (e.g., maximum) contribution to the ray direction vector (e.g., axis renamed to the z direction); and using a second higher precision to represent the coordinates of the ray origin in the other directions.

In some embodiments, the first representation of primitives is a representation of a triangle pair comprising at most four vertices of two triangle primitives of the triangle pair, wherein the graphics processor includes circuitry configured to sequentially process triangles in a given triangle pair.

Fig. 15 is a flow diagram illustrating an exemplary method for performing initial intersection testing, according to some embodiments. The method shown in fig. 15 may be used in conjunction with any of the computer circuits, systems, devices, elements, or components, etc., disclosed herein. In various embodiments, some of the method elements shown may be performed concurrently in a different order than shown, or may be omitted. Additional method elements may also be performed as desired.

At 1510, in the illustrated embodiment, the graphics processor performs intersection testing, where the intersection testing operates on reduced precision representations of rays generated by quantizing the initial representations of rays and reduced precision representations of primitives generated by quantizing the initial representations of primitives. In the illustrated embodiment, the intersection test generates a first result for the first ray and the first primitive, wherein the first result indicates that the first ray intersected the first primitive according to their initial representation. In some implementations, the intersection test can also generate a second result for the second ray and the first primitive, where the second result indicates that it cannot be determined whether the second ray intersected the first primitive. The graphics processor may perform an intersection test on the second ray using the second ray and the initial representation of the first primitive. Intersection tests may be performed based on a traversal of an acceleration data structure that includes hierarchically arranged bounding volumes for at least a portion of a graphical scene.

At 1520, in the illustrated embodiment, the graphics processor records the intersection of the first ray with the first primitive based on the first result without performing an intersection test on the first ray using the first ray and the initial representation of the first primitive. In the illustrated embodiment, the intersections are recorded based on: a first result; determining that the first primitive is opaque; and determining that there is at least one bounding volume in the acceleration data structure that encloses the entire first primitive and that an entire enclosed portion of the first ray is active.

In some embodiments, the graphics processor is configured to record the intersection of the first ray based on any hit queries to the first ray (and may not record deterministic intersection results based on reduced precision testing of other types of queries).

In some embodiments, the test circuit is further configured to output a result of the first light and the first primitive, the result indicating: from their initial representation, the first ray misses the first primitive, or it cannot be determined whether the first ray misses the first primitive. For example, the processor may include the comparators and logic circuits of both fig. 7 and 12. For the first ray and the first primitive, in the example discussed above, this output would indicate that it is not possible to determine whether the first ray missed the first primitive because the other output indicates a deterministic hit.

In some embodiments, the processor uses a traversal order from the middle on at least some types of rays. In some embodiments, the processor is configured to perform the intersection test based on a traversal (e.g., by a traversal circuit) of the acceleration data structure that includes nodes corresponding to the hierarchically arranged bounding volumes. In particular, the processor may perform a depth-first search of the acceleration data structure, and for a set of child nodes of a first node in the acceleration data structure, select a next node for the depth-first search according to an ordering of intersecting bounding regions of the set of child nodes, where the ordering begins with a bounding volume that is closer to a midpoint of the ray being tested than the one or more leading bounding volumes and the one or more trailing bounding volumes.

In some embodiments, prior to determining the ordering, the processor determines a number of nodes in the set of child nodes, wherein the set of child nodes correspond to nodes that respectively intersect the ray being tested. For example, once the number of intersecting child nodes is determined, the processor may access a lookup table to determine the ordering. In some implementations, the ray being tested is any hit ray, and traversal of the ray being tested ends in response to detecting the intersection. In some embodiments, subsequent nodes alternate in ordering between forward and backward nodes relative to the starting node. As used herein, nodes closer to the "front" of a ray are also closer to the end of the ray, and nodes closer to the "back" of the ray are also closer to the origin of the ray. The exemplary ordering discussed above with reference to fig. 13 is an example of alternating between forward and backward nodes starting from an intermediate node.

Example apparatus

Referring now to fig. 16, shown is a block diagram illustrating an exemplary embodiment of a device 1600. In some embodiments, elements of apparatus 1600 may be included within a system-on-chip. In some embodiments, device 1600 may be included in a mobile device that may be battery powered. Thus, power consumption of device 1600 may be an important design consideration. In the illustrated embodiment, device 1600 includes a structure 1610, a computing complex 1620, an input/output (I/O) bridge 1650, a cache/memory controller 1645, a graphics unit 1675, and a display unit 1665. In some embodiments, device 1600 may include other components (not shown) in addition to or in place of those shown, such as a video processor encoder and decoder, image processing or recognition components, computer vision components, and so forth.

The fabric 1610 may include various interconnects, buses, MUXs, controllers, etc., and may be configured to facilitate communication between the various elements of the device 1600. In some embodiments, portions of the structure 1610 may be configured to implement a variety of different communication protocols. In other embodiments, the structure 1610 may implement a single communication protocol, and the components coupled to the structure 1610 may internally convert from the single communication protocol to other communication protocols.

In the illustrated embodiment, computing complex 1620 includes Bus Interface Unit (BIU) 1625, cache 1630, and cores 1635 and 1640. In various embodiments, compute complex 1620 may include various numbers of processors, processor cores, and caches. For example, computing complex 1620 may include 1,2, or 4 processor cores, or any other suitable number. In one embodiment, cache 1630 is a set of associative L2 caches. In some embodiments, cores 1635 and 1640 may include internal instruction and data caches. In some embodiments, a coherency unit (not shown) in the fabric 1610, cache 1630, or elsewhere in the device 1600 may be configured to maintain coherency between the various caches of the device 1600. BIU 1625 may be configured to manage communications between computing complex 1620 and other elements of device 1600. Processor cores such as cores 1635 and 1640 may be configured to execute instructions of a particular Instruction Set Architecture (ISA) that may include operating system instructions and user application instructions.

The cache/memory controller 1645 may be configured to manage data transfer between the fabric 1610 and one or more caches and memory. For example, cache/memory controller 1645 may be coupled to an L3 cache, which in turn may be coupled to system memory. In other embodiments, the cache/memory controller 1645 may be coupled to memory directly. In some embodiments, the cache/memory controller 1645 may include one or more internal caches.

As used herein, the term "coupled to" may indicate one or more connections between elements, and a coupling may include intermediate elements. For example, in FIG. 16, graphics unit 1675 may be described as being "coupled" to memory through fabric 1610 and cache/memory controller 1645. In contrast, in the illustrated embodiment of fig. 16, graphics unit 1675 is "directly coupled" to structure 1610 because there are no intervening elements present.

Graphics unit 1675 may include one or more processors, e.g., one or more Graphics Processing Units (GPUs). For example, graphics unit 1675 may receive graphics-oriented instructions, such as

Metal or->

And (5) instructions. Graphics unit 1675 may execute special-purpose GPU instructions or perform other operations based on the received graphics-oriented instructions. Graphics unit 1675 may generally be configured to process large blocks of data in parallel, and may build images in a frame buffer for output to a display, which may be included in the device or may be a separate device. Graphics unit 1675 may include transform, lighting, triangle, and rendering engines in one or more graphics processing pipelines. The graphic unit 1675 may output pixel information for displaying an image. In various embodiments, graphics unit 1675 may include programmable shader circuitry that may include highly parallel execution cores configured to execute graphics programs that may include pixel tasks, vertex tasks, and compute tasks (which may or may not be graphics-related).

In some embodiments, graphics unit 1175 includes circuitry 220 that can reduce power consumption, improve performance, or both, relative to a conventional GPU.

Display unit 1665 may be configured to read data from the frame buffer and provide a stream of pixel values for display. In some embodiments, display unit 1665 may be configured to display a pipeline. Additionally, the display unit 1665 may be configured to mix multiple frames to produce an output frame. Additionally, the display unit 1665 can include one or more interfaces (e.g., a touchscreen or an external display) for coupling to a user display (e.g.,

or embedded displayport (eDP)).

I/O bridge 1650 may include various elements configured to implement, for example, universal Serial Bus (USB) communications, security, audio, and low-power always-on functions. I/O bridge 1650 may also include interfaces such as Pulse Width Modulation (PWM), general purpose input/output (GPIO), serial Peripheral Interface (SPI), and inter-integrated circuit (I2C). Various types of peripherals and devices can be coupled to device 1600 via I/O bridge 1650.

In some embodiments, device 1600 includes network interface circuitry (not explicitly shown) that can be connected to structure 1610 or I/O bridge 1650. The network interface circuit may be configured to communicate via various networks, which may be wired networks, wireless networks, or both. For example, the network interface circuit may be configured to communicate via a wired local area network, a wireless local area network (e.g., via WiFi), or a wide area network (e.g., the internet or a virtual private network). In some embodiments, the network interface circuit is configured to communicate via one or more cellular networks using one or more radio access technologies. In some embodiments, the network interface circuitry is configured to communicate using device-to-device communications (e.g., bluetooth or WiFi Direct), among others. In various embodiments, the network interface circuit may provide connection for device 1600 to various types of other devices and networks.

Exemplary applications

Turning now to fig. 17, various types of systems are shown that may include any of the circuits, devices, or systems described above. The system or apparatus 1700, which may incorporate or otherwise utilize one or more of the techniques described herein, may be used in a wide variety of fields. For example, the system or device 1700 may be used as part of the hardware of a system such as a desktop computer 1710, a laptop computer 1720, a tablet 1730, a cellular or mobile phone 1740 or a television 1750 (or a set-top box coupled to a television).

Similarly, the disclosed elements may be used in a wearable device 1760, such as a smart watch or health monitoring device. In many embodiments, the smart watch may implement a variety of different functions-e.g., access to email, cellular services, calendars, health monitoring, etc. The wearable device may also be designed to perform only health monitoring functions, such as monitoring vital signs of the user, performing epidemiological functions such as contact tracking, providing communications to emergency medical services, and the like. Other types of devices are also contemplated, including devices worn on the neck, devices implantable in the human body, glasses or helmets designed to provide a computer-generated reality experience, such as those based on augmented reality and/or virtual reality, and the like.

System or device 1700 may also be used in various other environments. For example, the system or device 1700 may be used in the context of a server computer system (such as a dedicated server) or on shared hardware that implements cloud-based services 1770. Still further, the system or device 1700 may be implemented in a wide range of dedicated everyday devices, including devices 1780 common in the home, such as refrigerators, thermostats, security cameras, and so forth. The interconnection of such devices is commonly referred to as the "internet of things" (IoT). The elements may also be implemented in various modes of transport. For example, the system or device 1700 may be used in various types of control systems, guidance systems, entertainment systems, etc. of a vehicle 1790.

The application illustrated in fig. 17 is merely exemplary and is not intended to limit potential future applications of the disclosed system or device. Other exemplary applications include, but are not limited to: portable gaming devices, music players, data storage devices, unmanned aerial vehicles, and the like.

Exemplary computer readable Medium

The present disclosure has described various exemplary circuits above in detail. It is intended that the present disclosure not only encompass embodiments that include such circuitry, but also encompass computer-readable storage media that include design information that specifies such circuitry. Accordingly, the present disclosure is intended to support claims that encompass not only an apparatus including the disclosed circuitry, but also a storage medium that specifies the circuitry in a format that is configured to generate a manufacturing system identification of hardware (e.g., an integrated circuit) that includes the disclosed circuitry. Claims to such a storage medium are intended to cover, for example, an entity that generates the circuit design but does not itself fabricate the design.

Fig. 18 is a block diagram illustrating an exemplary non-transitory computer-readable storage medium storing circuit design information, according to some embodiments. In the illustrated embodiment, the semiconductor manufacturing system 1820 is configured to process design information 1815 stored on a non-transitory computer-readable medium 1810 and manufacture an integrated circuit 1830 based on the design information 1815.

The non-transitory computer-readable storage medium 1810 may include any of a variety of suitable types of memory devices or storage devices. Non-transitory computer-readable storage medium 1810 may be an installation medium, such as a CD-ROM, floppy disk, or tape device; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, rambus RAM, etc.; non-volatile memory such as flash memory, magnetic media, e.g., a hard disk drive or optical storage; registers, or other similar types of memory elements, etc. The non-transitory computer-readable storage medium 1810 may include other types of non-transitory memories or combinations thereof. The non-transitory computer-readable storage medium 1810 may include two or more memory media that may reside in different locations, such as in different computer systems connected by a network.

Design information 1815 may be specified using any of a variety of suitable computer languages, including hardware description languages such as, but not limited to: VHDL, verilog, systemC, systemVerilog, RHDL, M, myHDL, and the like. The design information 1815 may be used by the semiconductor manufacturing system 1820 to fabricate at least a portion of the integrated circuit 1830. The format of the design information 1815 may be recognized by at least one semiconductor manufacturing system 1820. In some embodiments, the design information 1815 may also include one or more cell libraries that specify synthesis, layout, or both of the integrated circuit 1830. In some embodiments, the design information is specified in whole or in part in the form of a netlist that specifies the cell library elements and their connectivity. The separately obtained design information 1815 may or may not include enough information for manufacturing the corresponding integrated circuit. For example, design information 1815 may specify circuit elements to be manufactured, but not their physical layout. In this case, the design information 1815 may need to be combined with layout information to actually manufacture the specified circuit.

In various embodiments, the integrated circuit 1830 may include one or more custom macrocells, such as memory, analog or mixed-signal circuits, and the like. In this case, the design information 1815 may include information related to the included macro cell. Such information may include, but is not limited to, a circuit diagram capture database, mask design data, behavioral models, and device or transistor level netlists. As used herein, mask design data may be formatted in accordance with the Graphic Data System (GDSII) or any other suitable format.

The semiconductor manufacturing system 1820 may include any of a variety of suitable elements configured to fabricate integrated circuits. This may include, for example, elements used to deposit semiconductor material (e.g., on a wafer that may include a mask), remove material, change the shape of the deposited material, modify the material (e.g., by doping the material or using ultraviolet processing to modify the dielectric constant), and so forth. The semiconductor manufacturing system 1820 may also be configured to perform various tests of the manufactured circuits for proper operation.

In various embodiments, the integrated circuit 1830 is configured to operate according to a circuit design specified by the design information 1815, which may include performing any of the functions described herein. For example, the integrated circuit 1830 may include any of the various elements shown in fig. 1B, 2, 4-9, 11, 12, and 16. Additionally, the integrated circuit 1830 may be configured to perform various functions described herein in connection with other components. Further, the functionality described herein may be performed by a plurality of connected integrated circuits.

As used herein, a phrase in the form of "design information specifying the design of a circuit configured as …" does not imply that the circuit involved must be manufactured in order to satisfy the element. Rather, the phrase indicates that the design information describes a circuit that, when manufactured, is to be configured to perform the indicated action or is to include the specified component.

***

The present disclosure includes reference to an "embodiment" or group of embodiments "(e.g.," some embodiments "or" various embodiments "). Embodiments are various specific implementations or examples of the disclosed concept. References to "an embodiment," "one embodiment," "a particular embodiment," etc., do not necessarily refer to the same embodiment. A wide variety of possible embodiments are contemplated, including those specifically disclosed, as well as modifications or alterations falling within the spirit or scope of the present disclosure.

This disclosure may discuss potential advantages that may result from the disclosed embodiments. Not all implementations of these embodiments will necessarily exhibit any or all of the potential advantages. Whether or not a particular implementation achieves an advantage depends on a number of factors, some of which are outside the scope of this disclosure. Indeed, there are many reasons that a particular implementation falling within the scope of the claims may not exhibit some or all of any of the disclosed advantages. For example, a particular implementation may include other circuitry outside the scope of the disclosure, in conjunction with one of the disclosed embodiments, that negates or mitigates one or more of the disclosed advantages. Moreover, sub-optimal design implementations of particular implementations (e.g., implementation techniques or tools) may also negate or mitigate the advantages disclosed. Even assuming a specific implementation of the technology, the implementation of the advantages may depend on other factors, such as the environmental circumstances in which the specific implementation is deployed. For example, input provided to a particular implementation may prevent one or more problems addressed in the present disclosure from occurring in a particular situation, and as a result, may not realize the benefits of its solution. In view of the existence of possible factors outside of this disclosure, any potential advantages described herein should not be construed as limitations on the claims which must be satisfied in order to demonstrate infringement behavior. Rather, the identification of such potential advantages is intended to illustrate one or more types of improvements available to designers who benefit from the present disclosure. Permanently describing such advantages (e.g., stating that a particular advantage "may appear") is not intended to convey a question as to whether such advantage can actually be achieved, but rather realizes that the achievement of such advantage generally depends on the technical reality of additional factors.

Unless otherwise indicated, the embodiments are non-limiting. That is, the disclosed embodiments are not intended to limit the scope of the claims drafted based on this disclosure, even if a single example is described for only certain features. The disclosed embodiments of the present invention are intended to be illustrative rather than restrictive, and no statements are to be made in the scope of the present invention which might otherwise be made. It is therefore intended that the present application be construed as including the claims which cover the disclosed embodiments and such alternatives, modifications and equivalents as will be apparent to those skilled in the art upon being informed by the effective utility of the present disclosure.

For example, features in the present application may be combined in any suitable manner. Accordingly, new claims may be formulated to any such combination of features during the prosecution of the present patent application (or of a patent application claiming priority thereto). In particular, with reference to the appended claims, features of the dependent claims may, where appropriate, be combined with features of other dependent claims, including claims dependent on other independent claims. Similarly, features from respective independent claims may be combined where appropriate.

Thus, although the appended dependent claims may be written such that each dependent claim is dependent on a single other claim, additional dependencies are also envisaged. Any combination of dependent features consistent with the present disclosure is contemplated and may be claimed in this or another patent application. In short, the combination is not limited to those specifically recited in the appended claims.

It is also contemplated that claims drafted in one format or legal type (e.g., device) are intended to support corresponding claims in another format or legal type (e.g., method), where appropriate.

***

Because the present disclosure is a legal document, various terms and phrases may be constrained by regulatory and judicial interpretations. Given the notice hereby, the following paragraphs and definitions provided throughout this disclosure will serve to determine how to interpret claims drafted based on this disclosure.

Reference to an item in the singular (i.e., a noun or noun phrase preceded by "a," "an," or "the") is intended to mean "one or more" unless the context clearly dictates otherwise. Thus, reference to an "item" in the claims does not exclude additional instances of that item, unless the context requires otherwise. "plurality" of items refers to a collection of two or more items.

The word "may" is used herein in a permissive sense (i.e., having the potential to, being able to), rather than the mandatory sense (i.e., must).

The terms "comprise" and "include," and forms thereof, are open-ended and mean "including, but not limited to.

When the term "or" is used in this disclosure with respect to a list of options, it will generally be understood to be used in an inclusive sense unless the context provides otherwise. Thus, the expression "x or y" is equivalent to "x or y, or both", thus encompassing 1) x but not y, 2) y but not x, and 3) both x and y. On the other hand, phrases such as "either x or y, but not both" make clear "or" are used in an exclusive sense.

The expression "w, x, y or z, or any combination thereof," or "at least one of. For example, given a set [ w, x, y, z ], these phrases encompass any single element in the set (e.g., w but not x, y, or z), any two elements (e.g., w and x but not y or z), any three elements (e.g., w, x, and y but not z), and all four elements. The phrase "at least one of w, x, y, and z" thus refers to at least one element of the set [ w, x, y, z ], thereby encompassing all possible combinations in the list of elements. The phrase should not be interpreted as requiring the presence of at least one instance of w, at least one instance of x, at least one instance of y, and at least one instance of z.

In this disclosure, various "tags" may precede a noun or noun phrase. Unless the context provides otherwise, different labels for features (e.g., "first circuit," "second circuit," "particular circuit," "given circuit," etc.) refer to different instances of the feature. In addition, unless otherwise noted, the labels "first," "second," and "third" when applied to features do not imply any type of ordering (e.g., spatial, temporal, logical, etc.).

The phrase "based on" or used to describe one or more factors that affect the determination. This term does not exclude that there may be additional factors that may influence the determination. That is, the determination may be based on specified factors only or on specified factors and other unspecified factors. Consider the phrase "determine a based on B". This phrase specifies that B is a factor used to determine a or that B affects a determination. This phrase does not exclude that the determination of a may also be based on some other factor such as C. This phrase is also intended to cover embodiments in which a is determined based on B only. As used herein, the phrase "based on" is synonymous with the phrase "based, at least in part, on".

The phrases "responsive to" and "response to" describe one or more factors that trigger an effect. The phrase does not exclude the possibility that additional factors may influence or otherwise trigger an effect, either in combination with or independent of the specified factors. That is, the effect may be responsive to only these factors, or may be responsive to specified factors as well as other unspecified factors. Consider the phrase "perform a in response to B. The phrase specifies that B is a factor that triggers the execution of a or triggers a particular outcome of a. The phrase does not exclude that performing a may also be responsive to some other factor, such as C. This phrase also does not exclude that execution a may be performed jointly in response to B and C. This phrase is also intended to cover embodiments in which a is performed only in response to B. As used herein, the phrase "responsive to" is synonymous with the phrase "responsive at least in part. Similarly, the phrase "responsive to" is synonymous with the phrase "responsive, at least in part, to".

***

Within this disclosure, different entities (which may be referred to variously as "units," "circuits," other components, etc.) may be described or claimed as "configured to" perform one or more tasks or operations. This expression-an [ entity ] configured to [ perform one or more tasks ] -is used herein to refer to a structure (i.e., something physical). More specifically, this expression is used to indicate that the structure is arranged to perform one or more tasks during operation. A structure may be said to be "configured to" perform a task even though the structure is not currently being operated on. Thus, an entity described or stated as "configured to" perform a task refers to physical things for performing the task, such as devices, circuits, systems with processor units, and memories storing executable program instructions, etc. This phrase is not used herein to refer to intangible matter.

In some cases, various units/circuits/components may be described herein as performing a set of tasks or operations. It should be understood that these entities are "configured to" perform those tasks/operations, even if not specifically noted.

The term "configured to" is not intended to mean "configurable to". For example, an unprogrammed FPGA is not considered to be "configured to" perform a particular function. However, the unprogrammed FPGA may be "configurable" to perform this function. After being properly programmed, the FPGA may then be considered "configured to" perform a particular function.

For purposes of U.S. patent application based on this disclosure, it is expressly intended in the claims that a structure "configured to" perform one or more tasks is intended for the claimed elementIs not limited toReference 35u.s.c. § 112 (f). If the applicant wanted to refer to section 112 (f) during the application of a U.S. patent application based on this disclosure, it would use "for [ performing a function]To express elements of the claims.

Different "circuits" may be described in this disclosure. These circuits or "circuits" constitute hardware that includes various types of circuit elements, such as combinational logic, clocked memory devices (e.g., flip-flops, registers, latches, etc.), finite state machines, memory (e.g., random access memory, embedded dynamic random access memory), programmable logic arrays, and so forth. The circuit may be custom designed or taken from a standard library. In various implementations, the circuit may optionally include digital components, analog components, or a combination of both. Certain types of circuits may be generally referred to as "units" (e.g., decode units, arithmetic Logic Units (ALUs), functional units, memory Management Units (MMUs), etc.). Such a unit is also referred to as a circuit or a circuit.

Accordingly, the disclosed circuits/units/components and other elements shown in the figures and described herein include hardware elements, such as those described in the preceding paragraphs. In many cases, the internal arrangement of hardware elements within a particular circuit may be specified by describing the functionality of that circuit. For example, a particular "decode unit" may be described as performing the function of "processing the opcode of an instruction and routing the instruction to one or more of a plurality of functional units," meaning that the decode unit is "configured to" perform that function. To those skilled in the computer art, this functional specification is sufficient to imply a set of possible structures for the circuit.

In various embodiments, as described in the preceding paragraphs, the circuits, units, and other elements may be defined by the functions or operations that they are configured to perform. The arrangement of such circuits/units/components relative to each other and the manner in which they interact form a microarchitectural definition of hardware that is ultimately fabricated in an integrated circuit or programmed into an FPGA to form a physical implementation of the microarchitectural definition. Accordingly, the microarchitectural definition is considered by those skilled in the art to be a structure that can derive many physical implementations, all of which fall within the broader structure described by the microarchitectural definition. That is, a skilled person having a microarchitectural definition provided in accordance with the present disclosure may implement the structure by encoding a description of the circuit/cell/component in a Hardware Description Language (HDL) such as Verilog or VHDL without undue experimentation and with the application of ordinary skill. HDL descriptions are often expressed in a way that can appear functional. It will be apparent to those skilled in the art that this HDL description is a way to transform the structure of a circuit, cell, or component into a next level of implementation detail. Such HDL descriptions may take the form of: behavioral code (which is typically non-synthesizable), register Transfer Language (RTL) code (which is typically synthesizable in comparison to behavioral code), or structural code (e.g., a netlist specifying logic gates and their connectivity). HDL descriptions can be synthesized sequentially for a library of cells designed for a given integrated circuit fabrication technology and can be modified for timing, power, and other reasons to obtain a final design database that is transmitted to the factory to generate masks and ultimately produce integrated circuits. Some hardware circuits or portions thereof may also be custom designed in a schematic editor and captured into an integrated circuit design along with a synthesis circuit. The integrated circuit may include transistors and other circuit elements (e.g., passive elements such as capacitors, resistors, inductors, etc.), as well as interconnects between the transistors and the circuit elements. Some embodiments may implement multiple integrated circuits coupled together to implement a hardware circuit, and/or may use discrete components in some embodiments. Alternatively, the HDL designs can be synthesized as a programmable logic array such as a Field Programmable Gate Array (FPGA) and can be implemented in an FPGA. This decoupling between the design of a set of circuits and the subsequent low-level implementation of these circuits often leads to the situation: where a circuit or logic designer never specifies a particular set of structures for a low-level implementation beyond what is described as what the circuit is configured to do, as the process is performed at different stages of the circuit implementation process.

The fact that many different low-level combinations of circuit elements can be used to achieve the same specifications for a circuit results in a large number of equivalent structures for the circuit. As noted, these low-level circuit implementations may vary according to variations in manufacturing techniques, the foundry selected to manufacture the integrated circuit, the cell libraries provided for a particular project, and so forth. In many cases, the choice made to produce these different implementations through different design tools or methods may be arbitrary.

Furthermore, a single specific implementation of a particular functional specification of a circuit typically includes a large number of devices (e.g., millions of transistors) for a given implementation. Thus, the sheer volume of this information makes it impractical to provide a complete description of the low-level structure for implementing a single embodiment, let alone the large number of equivalent possible implementations. To this end, the present disclosure describes the structure of a circuit that is abbreviated using functions commonly used in the industry.

Claims

1. An apparatus, comprising:

a graphics processor configured to determine whether a ray intersects a primitive in a graphics scene, wherein the graphics processor comprises:

ray intersection circuitry configured to perform intersection tests, the intersection tests comprising:

quantizing a first representation of the primitive to generate a reduced-precision interval representation of the primitive, wherein the interval representation includes interval values that are guaranteed to cover corresponding values specified by the first representation of the primitive;

quantizing the first representation of the ray to generate a reduced-precision interval representation of the ray, wherein the interval representation includes interval values that are guaranteed to cover corresponding values specified by the first representation of the ray; and

determining an initial intersection result based on coordinates of the interval representation of the primitive and coordinates of the interval representation of the ray using an interval algorithm, wherein a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray.

2. The apparatus of claim 1, further comprising a clipping factor circuit configured to:

generating an interval representation of the shearing factor based on the ray direction information and the scale information; and

generating clipped vertex intervals based on the quantized representation of the primitive and the interval representation of the clipping factor;

wherein the initial intersection result is based on the clipped vertex spacing.

3. The apparatus of claim 2, wherein the clipping factor circuit is configured to:

representing a first coordinate of the origin of the ray in a coordinate direction using a first precision, thereby providing a threshold contribution to a ray direction vector; and

the coordinates of the origin of the ray in other directions are represented using a second, higher precision.

4. The device of claim 1, wherein the quantization of the first representation of the primitive uses a fixed point quantization representation rounded to zero for a lower bound of the interval and one minimum unit of precision (ULP) plus the lower bound for an upper bound of the interval.

5. The device of claim 1, wherein the first representation of the primitive is a representation of a triangle pair comprising at most four vertices of two triangle primitives of the triangle pair, wherein the graphics processor comprises circuitry configured to sequentially process triangles in a given triangle pair.

6. The apparatus of claim 1, wherein the reduced-precision interval representation of the ray comprises a quantized ray time represented as an interval.

7. The apparatus of claim 6, further comprising:

circuitry configured to generate the reduced-precision interval representation of the primitive based on first and second positions of the primitive at different points within a motion blur time interval such that the reduced-precision interval representation of the primitive covers all possible positions of the primitive during the interval representing the quantized ray time.

8. The apparatus of claim 1, wherein, in response to a potential hit initial intersection result, the graphics processor is configured to perform an intersection test using the first representation of the primitive and the first representation of the ray.

9. The apparatus of any of claims 1-8, wherein the apparatus is a computing device, the computing device further comprising:

a central processing unit;

a display; and

a network interface circuit.

10. A method, comprising:

quantizing, by a graphics processor, a first representation of a primitive to generate a reduced-precision interval representation of the primitive, wherein the interval representation comprises interval values that are guaranteed to cover corresponding values specified by the first representation of the primitive;

quantizing, by the graphics processor, a first representation of a ray to generate a reduced-precision interval representation of the ray, wherein the interval representation comprises interval values that are guaranteed to cover corresponding values specified by the first representation of the ray; and

determining, by the graphics processor, an initial intersection result based on coordinates of the interval representation of the primitive and coordinates of the interval representation of the ray using an interval algorithm, wherein a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray.

11. The method of claim 10, further comprising:

generating, by the graphics processor, an interval representation of a shearing factor based on the ray direction information and the scale information; and

generating, by the graphics processor, clipped vertex intervals based on the quantized representation of the primitive and the interval representation of the clipping factor;

wherein the initial intersection result is based on the clipped vertex spacing.

12. The method of any of claims 10-11, wherein quantizing the first representation of the primitive uses a fixed point quantized representation rounded to zero for a lower limit of the interval and one minimum unit of precision (ULP) plus the lower limit for an upper limit of the interval.

13. A non-transitory computer-readable storage medium having design information stored thereon, the design information specifying a design of at least a portion of a hardware integrated circuit in a format recognized by a semiconductor manufacturing system configured to use the design information to produce the circuit from the design, wherein the design information specifies that the circuit includes:

quantizing the first representation of the ray to generate a reduced-precision interval representation of the ray, wherein the interval representation includes interval values that are guaranteed to override corresponding values specified by the first representation of the ray; and

14. The non-transitory computer-readable storage medium of claim 13, wherein the design information further specifies that the circuit comprises:

a clipping factor circuit configured to:

wherein the initial intersection result is based on the clipped vertex spacing.

15. The non-transitory computer-readable storage medium of claim 14, wherein the clipping factor circuit is configured to:

representing a first coordinate of an origin of the ray in a coordinate direction using a first precision, thereby providing a threshold contribution to a ray direction vector; and

16. The non-transitory computer-readable storage medium of claim 13, wherein the quantizing of the first representation of the primitive uses a fixed point quantized representation rounded to zero for a lower bound of the interval and uses the lower bound plus one unit of minimum precision (ULP) for an upper bound of the interval.

17. The non-transitory computer-readable storage medium of claim 13, wherein the first representation of the primitive is a representation of a triangle pair comprising at most four vertices of two triangle primitives of the triangle pair, wherein the graphics processor comprises circuitry configured to sequentially process triangles in a given triangle pair.

18. The non-transitory computer-readable storage medium of claim 13, wherein the reduced-precision interval representation of the ray comprises a quantized ray time represented as an interval.

19. The non-transitory computer-readable storage medium of claim 18, wherein the design information further specifies that the circuit comprises:

circuitry configured to generate the reduced-precision interval representation of the primitive based on first and second positions of the primitive at different points within a motion-blurred time interval, such that the reduced-precision interval representation of the primitive covers all possible positions of the primitive during the interval representing the quantized ray time.

20. The non-transitory computer readable storage medium of any one of claims 13-19, wherein, in response to a potential hit initial intersection result, the graphics processor is configured to perform an intersection test using the first representation of the primitive and the first representation of the ray.