CN106708780A

CN106708780A - Low complexity branch processing circuit of uniform dyeing array towards SIMT framework

Info

Publication number: CN106708780A
Application number: CN201611140108.6A
Authority: CN
Inventors: 牛少平; 田泽; 韩鹏; 韩一鹏; 许宏杰; 张骏; 魏艳艳
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2016-12-12
Filing date: 2016-12-12
Publication date: 2017-05-24

Abstract

The invention belongs to the technical field of integrated circuits, and provides a low complexity branch processing circuit of a uniform dyeing array towards the SIMT framework. The low complexity branch processing circuit comprises an assertion register unit (1), an assertion stack unit (2), and a control unit (3). Through the adoption of the low complexity branch processing circuit, the requirements for different numbers of parallel units and different numbers of sites can be satisfied, the realization circuit of the mechanism is high in time series performance and good in expandability.

Description

Towards the low complex degree branch process circuit of SIMT frameworks unification dyeing array

Technical field

The invention belongs to technical field of integrated circuits, it is related to a kind of low complexity for unifying stainer array based on SIMT frameworks Degree branch process circuit.

Background technology

Unified stainer array completes the unified dyeing function of summit, pixel in unified dyeing graphic process unit.In unification In stainer array, the realization of parallel processing is that based on SIMT, there is 16 in main parallel execution unit, at one Need to be performed simultaneously on 16 execution units simultaneously after instruction issue.But in programming, it is necessary to jumped including condition Turn to wait flow control instructions, because the input data in 16 Parallel Units is different, it seem likely that occurring multiple parallel single First condition judgment is inconsistent, and then causes to redirect also inconsistent situation.

The content of the invention

Goal of the invention：

The present invention mainly proposes a kind of low complex degree branch process circuit towards SIMT frameworks unification dyeing array, the electricity For varying number Parallel Unit, varying number scene can meet requirement on road, the mechanism realize circuit sequence performance it is high, can Favorable expandability.

Technical scheme：

A kind of low complex degree branch process circuit towards SIMT frameworks unification dyeing array, including：

Predicate register unit (1), assert stack cell (2), control unit (3)；

Predicate register unit (1)：When instructing performing module (4) to perform condition judgment instruction, by the knot of condition judgment Fruit and scene number are exported and give predicate register unit (1), and the value is stored in this and showed by predicate register unit (1) according to scene number In the predicate register of field；When branch process circuit performs POP and instructs, control unit (3) is read from stack cell (2) is asserted In going out to assert that the numerical value of storehouse fills in predicate register unit (1) the live predicate register；Held in branch process circuit When row INV is instructed, the value of former predicate register is negated and asserts the numerical value step-by-step phase "AND" at the top of storehouse by control unit (3), In write-in predicate register unit (1) live predicate register；Predicate register unit (1) posts asserting for each scene Storage value is exported gives control unit (3)；

Control unit (3)：With task scheduling modules (5), IFID modules (6), predicate register unit (1), assert storehouse Unit (2) is connected, and control unit (3) receives the branch process instruction that IFIF modules (6) is issued, the branch process instruction bag Include：POP instructions, INV instructions, PUSH instructions；When POP instructions are performed, by live reading numerical values from stack cell (2) is asserted It is transferred to predicate register unit (1)；Control unit (3) will come from predicate register unit (1) when PUSH instructions are performed The predicate register value write-in of current live is asserted in stack cell (2)；Control unit (3) is when INV instructions are performed, and control is single First (3) enter the numerical value with the step-by-step negation of predicate register value from the numerical value asserted at the top of stack cell (2) acquisition storehouse Row step-by-step AND-operation, and operating result is transmitted back to predicate register unit (1)；Control unit (3) is sent out in IFID modules (6) When penetrating non-branch process instruction, the predicate register value of predicate register unit (1) will be come from, come from task scheduling modules (5) TaskMask step-by-step phase "AND", and result is transferred to instruction performing module (4)；

Assert stack cell (2)：Reception control unit (3) sends three types operation, including：POP operations, PUSH behaviour Make, INV is operated；For POP operations, the scene number that stack cell (2) is input into according to control unit (3) is asserted, from correspondence scene Storehouse top read and data and return to control unit (3)；For PUSH operations, assert that stack cell (2) is single according to control The scene number of first (3) input, to the storehouse top write-in data at correspondence scene；For INV operations, stack cell (2) root is asserted The scene number being input into according to control unit (3) returns to the data at the top of correspondence scene, but does not perform read operation, that is, does not influence The content of whole storehouse.

Beneficial effect：

1st, for multiple scenes, can (data mask) whether effective according to data, condition success or not (predict Mask whether the index (excute mask) for indicating this thread actually to perform) is produced, so as to ensure the dyeing of SIMT structures Array clock can correctly perform the instruction of transmitting, and increase of the mechanism to Parallel Unit number, and efficiency is unaffected；

2nd, on multiple scenes, the present invention is carried out so that 8 warp, each warp 4 cycles of execution amount to 32 scenes as an example Design, but increasing data mask registers, predict mask registers, predict mask heaps for more scenes Similarly supported after stack etc.；

3rd, design structure of the invention is simple, and scalability is high, circuit realiration efficiency high.

Brief description of the drawings

Fig. 1 is the function structure block diagram of the branch process mechanism of present invention description.

Specific embodiment

Below in conjunction with the accompanying drawings and specific embodiment, technical scheme is clearly and completely stated.Obviously, The embodiment stated only is a part of embodiment of the invention, rather than whole embodiments, based on the embodiment in the present invention, Those skilled in the art are not making all other embodiment that creative work premise is obtained, and belong to guarantor of the invention Shield scope.

A kind of low complex degree branch process circuit towards SIMT frameworks unification dyeing array, as shown in figure 1, including：

Predicate register unit (1), assert stack cell (2), control unit (3).

The predicate register unit (1), each scene to that should have a predicate register, for storing the scene Predicate register value.The digit of predicate register is equal to the number of Parallel Unit, and the number of predicate register is run equal to program Live number.

Described to assert stack cell (2), each scene is to that should have one to assert storehouse, and the storehouse is nested in configuration processor When use, for carrying out the popping of predicate register, stack-incoming operation.Assert that the bit wide of storehouse is equal to the number of Parallel Unit, break Say that the depth of storehouse is equal to the series of routine nesting, assert that the number of storehouse is equal to the live number of program operation.

Described control unit (3), is to assert storehouse for performing PUSH, POP, INV instruction read-write, produces new asserting to post Storage value, and by the value step-by-step phase of PredictMask and TaskMask and produce ExcuteMask.

Embodiment

1st, predicate register unit

Predicate register is Predicate Mask, inside 1 SSC, in 1 cycle, correspondence 20 1 Predicate Mask.To support that 8 warp, each warp run 4 cycles, predicate register needs 32 sets of scenes；SFU Corresponding Predicate Mask are carried out or operated to obtain by the Predicate Mask of 4 SC in same SPU.

The value of predicate register receives following behavioral implications：SC to the implementing result of conditional jump instructions, to asserting storehouse POP, INV are operated.

2nd, control unit

The unit is responsible for the generation of Excute Mask, and asserts PUSH, POP and INV of storehouse.

The DataMask TaskMask corresponding with the SPU of one SPU inside SC and SFU are identical, and ExcuteMask Be then DataMask and PredicateMask step-by-step with.

PUSH operating process is to read the content (1 cycle, totally 20) of predicate register, and storehouse is asserted in write-in；

POP operating process is to be read from the top for asserting storehouse and assert information, is written into predicate register；

The process of INV operations is to read the content (it is assumed that m) of current predicate register, and stack is asserted in reading Information (it is assumed that n) is asserted, after being negated to m step-by-steps, result and n is carried out into step-by-step and then ((~m) ＆n) write-in is disconnected by result Speech register.

3rd, stack cell is asserted

The storehouse is used to preserve Predicate Mask, and to support 8 warp4 cycles, the storehouse needs 32 arbitrages .For each scene, the storehouse width is 20b, and depth is 32 (supporting 32 layers of conditional branching nesting).

Claims

1. it is a kind of to unify to dye the low complex degree branch process circuit of array towards SIMT frameworks, it is characterised in that including：

Predicate register unit (1), assert stack cell (2), control unit (3)；

Predicate register unit (1)：Instruct performing module (4) perform condition judgment instruct when, by the result of condition judgment with And scene number exports and gives predicate register unit (1), the value is stored in the scene by predicate register unit (1) according to scene number In predicate register；When branch process circuit performs POP and instructs, control unit (3) reads disconnected from stack cell (2) is asserted In saying that the numerical value of storehouse fills in predicate register unit (1) the live predicate register；INV is performed in branch process circuit During instruction, the value of former predicate register is negated and asserts the numerical value step-by-step phase "AND" at the top of storehouse by control unit (3), and write-in is disconnected In speech register cell (1) live predicate register；Predicate register unit (1) is by each live predicate register value Export and give control unit (3)；

Control unit (3)：With task scheduling modules (5), IFID modules (6), predicate register unit (1), assert stack cell (2) it is connected, control unit (3) receives the branch process instruction that IFIF modules (6) is issued, the branch process instruction includes：POP Instruction, INV instructions, PUSH instructions；When POP instructions are performed, it is transferred to by live reading numerical values from stack cell (2) is asserted Predicate register unit (1)；Control unit (3) will come from predicate register unit (1) currently existing when PUSH instructions are performed The predicate register value write-in of field is asserted in stack cell (2)；Control unit (3) perform INV instruct when, control unit (3) From asserting that stack cell (2) obtains the numerical value at the top of storehouse, by the step-by-step negation of the numerical value and predicate register value carry out by Position AND-operation, and operating result is transmitted back to predicate register unit (1)；Control unit (3) launches non-in IFID modules (6) When branch process is instructed, the predicate register value of predicate register unit (1) will be come from, come from task scheduling modules (5) TaskMask step-by-step phase "AND", and by result be transferred to instruction performing module (4)；

Assert stack cell (2)：Reception control unit (3) sends three types operation, including：POP operations, PUSH operations, INV Operation；For POP operations, the scene number that stack cell (2) is input into according to control unit (3) is asserted, from the storehouse at correspondence scene Top reads data and returns to control unit (3)；For PUSH operations, assert that stack cell (2) is defeated according to control unit (3) The scene number for entering, to the storehouse top write-in data at correspondence scene；For INV operations, assert that stack cell (2) is single according to control The scene number of first (3) input returns to the data at the top of correspondence scene, but does not perform read operation, that is, does not influence whole storehouse Content.