CN113076135B

CN113076135B - Logic resource sharing method for special instruction set processor

Info

Publication number: CN113076135B
Application number: CN202110366542.0A
Authority: CN
Inventors: 陈虎; 曹强辉
Original assignee: Godson Guangzhou Technology Co ltd
Current assignee: Godson Guangzhou Technology Co ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2023-12-26
Anticipated expiration: 2041-04-06
Also published as: CN113076135A

Abstract

A method of sharing logical resources for a special purpose instruction set processor, the steps comprising: step S1: for the optimal transformation setPartitioning, namely partitioning the similarity matrix R according to columns; step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time _i，j As a further extended growth point; step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point _i，a Line j finds the maximum element value gamma that occurs for the first time _j，b Then shielding all element values of the ith row and the j row; step S4: for newly found element value gamma _i，a 、γ _j，b Judging, if the judgment result is 0, continuing; otherwise, executing shielding or repeating the step; step S5: the calculation mode group generated in step S4 is archived. The method has the advantages of simple principle, easy realization, capability of improving the calculation efficiency and the resource utilization rate, and the like.

Description

Logic resource sharing method for special instruction set processor

Technical Field

The invention mainly relates to the technical field of instruction set processors, in particular to a logic resource sharing method aiming at a special instruction set processor.

Background

Applications typically include a number of fixed computation modes (Computing Pattern), such as butterfly operations in the fast discrete cosine transform (Fast Discrete Cosine Transform, FDCT) of FIG. 1. Solidifying these computing modes into dedicated extended instructions can significantly improve computing efficiency. An application oriented instruction set Processor (Application Specific Instruction-set Processor, ASIP) improves the performance of the Processor in this way. As shown in fig. 1, for ease of design, a dedicated extended functional unit (extended Functional Unit, eFU) is typically scribed within ASIP to implement an extended instruction Set (extended Instruction-Set Architecture, eISA).

However, the arithmetic logic resources available in the above-described extended functional unit eFU are generally limited, depending on factors such as area. The limited resources of the extended functional unit eFU require that as many computing modes as possible be implemented on one extended functional unit eFU to maximize the execution efficiency of the application. That is, the resources are shared as much as possible between the computing modes (or extended instruction sets) implemented on the same extended functional unit eFU. Therefore, for a given area constraint of the extended functional unit eFU, a set of computing modes, a resource sharing policy between computing modes needs to be designed to maximize computing efficiency and resource utilization.

Practitioners have proposed solutions to improve the computational efficiency and resource utilization of the extended functional unit eFU, such as most typically converting the computational pattern of an instruction into a set of paths and comparing paths among multiple sets of paths to find a set of paths that maximizes the performance gain δ given area constraints. However, this conventional approach still has some drawbacks:

first, path-based resource sharing alone introduces excessive selector MUXs, thereby increasing control complexity, affecting the operating frequency of the system.

And (II) the integrity of the calculation mode of one instruction is broken, and the readability and maintainability of the design are poor.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems existing in the prior art, the invention provides a logic resource sharing method aiming at a special instruction set processor, which has the advantages of simple principle, easy realization and capability of improving the calculation efficiency and the resource utilization rate.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for sharing logic resources for a special instruction set processor, comprising the steps of:

step S1: for the optimal transformation setPartitioning, namely partitioning the similarity matrix R according to columns;

step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time _i，j The calculation mode indicated by the coordinates (i, j) is taken as a further extensionA growing point;

step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point _i，a Line j finds the maximum element value gamma that occurs for the first time _j，b Then shielding all element values of the ith row and the j row;

step S4: for newly found element value gamma _i，a 、γ _j，b Judging, if both are 0, entering step S5; if not, continuing to expand in the way of step S3 in the a line, the b line or both the a line and the b line;

step S5: archiving the calculation mode group generated in the step S4 as G _idx 。

As a further improvement of the process of the invention: in the step S2, if the area constraint a is satisfied, the calculation pattern S 'indicated by the coordinates (i, j) is determined' _i 、S′ _j Contained in a group and S' _i 、S′ _j As a growth point for further expansion of the group; at the same time, the element value gamma is masked _i，j 。

As a further improvement of the process of the invention: in the step S3, if γ _i，a Not 0 and area constraint a is satisfied, the calculation mode indicated by coordinates (i, a) that has not been included in the packet is included in the packet.

As a further improvement of the process of the invention: in the step S3, if γ _j，b Not 0 and the area constraint a is satisfied, the calculation mode indicated by the coordinates (j, b) that has not been included in the packet is included in the packet.

As a further improvement of the process of the invention: in the step S4, if 2 newly found element values γ _i，a 、γ _j，b All are 0, the growth process of one packet is ended, and the process proceeds to step S5.

As a further improvement of the process of the invention: in the step S4, if the element value γ _i，a Not equal to 0, find the maximum element value gamma that appears for the first time in line a _a，c Then shielding all element values of the a line; if the element value gamma _j，b Not equal to 0, find first occurrence in line bMaximum element value gamma of (2) _b，d All element values on line b are then masked.

As a further improvement of the process of the invention: in the step S4, if gamma _a，c Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (a, c) that has not been included in the packet; if gamma is _b，d Not 0 and area constraint a is satisfied, then the calculation pattern indicated by coordinates (b, d) that has not been included is included in the packet, while the element value γ is masked _a，c 、γ _b，d . Let gamma _i，a ＝γ _a，c 、γ _j，b ＝γ _b，d And repeats step S4.

As a further improvement of the process of the invention: in said step S5, if there is a calculation pattern in the current block that has not been incorporated into a certain packet and that can be included, the index idx is incremented by 1 and steps S2-S4 are repeated.

As a further improvement of the process of the invention: in the step S5, if all the calculation modes in the current block are already included in a certain packet, the next block is switched to and steps S2-S4 are repeated.

As a further improvement of the process of the invention: in the step S5, if all the calculation modes that are not already included in a certain packet in the current block cannot be included, the calculation modes are incorporated into the next block to repeat the steps S2-S4.

Compared with the prior art, the invention has the advantages that:

1. the logic resource sharing method for the special instruction set processor has simple principle, is easy to realize, and can furthest improve the calculation efficiency and the resource utilization rate, namely in the optimal transformation setIn (3), a calculation mode grouping mode G1, G2, …, gx is acquired, and the gain generated by the calculation mode implemented at eFU can be obtained to be a local or global maximum value by using the calculation mode grouping mode.

2. The logic resource sharing method for the special instruction set processor realizes the sharing of arithmetic logic resources among computing modes on the premise of keeping the integrity of the computing modes by the following two technical means, thereby keeping the readability and maintainability of the computing modes while compressing the resources: (1) Performing equivalent transformation on the computing modes, and participating in a resource sharing process according to the equivalent transformation result of the computing modes, so that the possibility of sharable resources among a plurality of computing modes is increased, and the computing efficiency and the resource utilization rate are improved to the greatest extent; (2) A condition is defined that 2 equivalent computing modes are directly associated (shared resources), i.e., one equivalent computing mode is directly associated (shared resources) only with another equivalent computing mode with which it shares the most.

Drawings

Fig. 1 is a schematic diagram of a construction flow of ASIP.

FIG. 2 is a schematic flow chart of the method of the present invention.

Fig. 3 is a schematic diagram of the invention in a specific application embodiment.

FIG. 4 is a schematic diagram of equivalent transformation and equivalent computing modes in a specific application embodiment of the present invention.

Detailed Description

The invention will be described in further detail with reference to the drawings and the specific examples.

Given the area constraint a of the extended functional unit eFU, calculate the pattern set = (S) ₁ ，S ₂ ，…，S _k ) Optimal transformation set for computing mode setCalculating a similarity matrix R of the mode set; and set up:

1) Gain delta (S) ₁ )≥δ(S ₂ )≥…δ(S _K-1 )≥δ(S _k )；

2) Computing mode set and optimal transformation setThe area resource consumed by each arithmetic logic operation contained in the computing mode is 1;

thenThe invention aims at optimizing a transformation setA way of grouping the computation patterns is found such that the gain produced by a computation pattern implemented on a given extended functional unit eFU takes a local or global maximum.

As shown in fig. 2 and 3, a logic resource sharing method for a special instruction set processor of the present invention includes the steps of:

step S1: for the optimal transformation setAnd performing blocking, namely blocking the similarity matrix R according to columns. Let n blocks total, denoted as Z ₁ ，Z ₂ ，…，Z _n 。

Step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time _i，j Taking the calculation mode indicated by the coordinates (i, j) as a further extended growth point;

step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point _i，a Line j finds the maximum element value gamma that occurs for the first time _j，b All element values in rows i, j are then masked.

Step S4: for newly found element value gamma _i，a 、γ _j，b Judging, if both are 0, entering step S5; if not all 0S, the expansion is continued in step S3 on row a, row b or both rows a and b.

Step S5: the calculation mode group generated in step S4 is archived and denoted as Gidx.

In a specific application example, in the above step S1, as shown in fig. 3, the similarity matrix R includes 10×10 elements; the 1 st block contains the calculation mode (S' ₁ ，S′ ₂ ，…，S′ ₆ ) Columns 1-6 of the corresponding similarity matrix R; the 2 nd block contains the calculation mode (S' ₇ ，S′ ₈ ，…，S′ ₁₀ ) Corresponding to columns 7-10 of the similarity matrix R.

In a specific application example, in the above step S2, if the area constraint a is satisfied, the calculation mode S 'indicated by the coordinates (i, j) is set' _i 、S′ _j Contained in a group and S' _i 、S′ _j As a growth point for further expansion of the group; at the same time, the element value gamma is masked _i，j 。

As in the specific application example, in the above step S2, as in fig. 3, the maximum element value occurring for the first time is _1，3 =4, the first 2 calculation modes contained in the packet are S' ₁ 、S′ ₃ 。

In a specific application example, in the above step S3, if γ _i，a Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (i, a) that has not been included in the packet; if gamma is _j，b Not 0 and the area constraint a is satisfied, the calculation mode indicated by the coordinates (j, b) that has not been included in the packet is included in the packet.

As in the specific application example, in the above step S3, as in fig. 3, γ _i，a ＝γ _1，6 ＝2，γ _j，b ＝γ _c3，6 =3, newly added calculation pattern S' ₆ Is contained in the packet.

In a specific application example, in the above step S4, specifically includes:

if 2 newly found element values gamma _i，a 、γ _j，b All are 0, the growth process of one packet is ended, and the process proceeds to step S5.

If the element value gamma _i，a Not equal to 0, find the maximum element value gamma that appears for the first time in line a _a，c Then shielding all element values of the a line;

if the element value gamma _j，b Not equal to 0, find the maximum element value gamma that appears for the first time in line b _b，d Then shielding all element values of the b line;

if gamma is _a，c Not equal to 0 and area constraint A is satisfied, the coordinates are thenThe computing patterns indicated by (a, c) that have not been included in the packet are included in the packet;

if gamma is _b，d Not 0 and area constraint a is satisfied, then the calculation pattern indicated by coordinates (b, d) that has not been included is included in the packet, while the element value γ is masked _a，c 、γ _b，d . Let gamma _i，a ＝γ _a，c 、γ _j，b ＝γ _b，d And repeats step S4.

As in the present specific application example, as in fig. 3, since all elements in the 6 th row of the 1 st block are 0 in the above-described step S4, the loop in step S4 ends.

In a specific application example, in the above step S5, as in fig. 3, G ₁ ＝(S′ ₁ ，S′ ₃ ，S′ ₆ )，G ₂ ＝(S′ ₄ ，S′ ₅ ). 1) If there is a calculation mode in the current block that has not been incorporated into a certain packet and that can be included, the index idx is incremented by 1 and steps S2-S4 are repeated; 2) If all calculation modes in the current block are already contained in a certain packet, switching to the next block and repeating steps S2-S4. 3) If all the calculation modes of the current block that have not been included in a certain packet cannot be included, as shown in S 'in FIG. 3' ₂ The calculation modes are incorporated into the next block repeat steps S2-S4.

Further, after all n blocks in step S1 have been processed according to steps S2-S5, the calculation mode group output in step S5 is the final implementation objective of the present invention, i.e. in the optimal transformation setIn (1) a mode G of grouping calculation modes ₁ ，G ₂ ，…，G _x The gain produced by the computation mode implemented on eFU may be maximized locally or globally. As in FIG. 3, G ₁ ＝(S′ ₁ ，S′ ₃ ，S′ ₆ )，G ₂ ＝(S′ ₄ ，S′ ₅ )，G ₃ ＝(S′ ₂ ，S′ ₇ ，S′ ₉ ，S′ ₈ ) A set of efficient grouping means.

In order to make the above description of the present invention more clear and complete, the present invention further provides additional description of some definitions in the course of the above-described method in combination with common general knowledge in the art.

A topological representation of the computation mode. Directed acyclic graphs (Directed Acyclic Graph, DAG) G (V, E). The dataflow graph of one computing mode is typically represented by a directed acyclic graph G (V, E). Wherein the nodes in set V represent operations, and edges E (u, V) in set E represent that data generated by node u is consumed by node V. There are 4 nodes, 4 edges in the calculation mode ADDBF of butterfly as in graph 1.

Gain 6 of the pattern is calculated. The gain delta (S) of the calculation mode S is defined as the ratio of the execution time it takes to complete the calculation in software to the execution time it takes to complete the calculation in hardware (the calculation mode is implemented as a dedicated extension instruction). The execution time of the compute mode is measured in system clock cycles.

Area of calculation modeThe area of the calculation mode S is defined as the sum of the area resources consumed by the arithmetic logic operation it contains. In this patent, it is assumed that each arithmetic logic operation consumes 1 area resource.

Equivalent transformation of the calculation mode: if a new node is inserted between 2 nodes of the edge in the directed acyclic graph G (V, E) of the calculation pattern S, and the output result of the calculation pattern S' after down-conversion in any input combination is the same as the output result of the original calculation pattern S, this process is called one equivalent conversion. Wherein:

equivalent variation rules. In principle, any number of new nodes may be inserted between 2 nodes on a certain side in the directed acyclic graph G (V, E) as long as the output result is guaranteed to be the same. For simplicity, at most, only one new node is allowed to be inserted between 2 nodes of one edge in the patent, namely, one equivalent transformation is performed.

Equivalent transformer. One transformation means a certain edge in the directed acyclic graph G (V, E)A new node is inserted between 2 nodes u, v of e (u, v). Symbol for conversion operationIt is indicated that a kind of transformer corresponds to an arithmetic logic operation (e.g. addition, subtraction, multiplication, shift, etc.). The transformation on the side e (u, v) is denoted +.>Or->Node +.>Consumption, node->The generated data is consumed by node v. In principle, any number of transducers can be defined, the set of transducers predefined in this patent +.>Comprises 7 kinds of transformation symbols, namely +.>Wherein the logical meaning of each transformer is as follows:

bypass path

Adding 0 to

Minus 0

Press bit with 1 (1)

By bit or 0

Multiplying by 1

Left shift 1 bit and right shift 1 bit

Equivalent computing mode. The calculation mode S' after one or more equivalent transformations of one calculation mode S is an equivalent calculation mode of S. Any 2 equivalent calculation modes S', S "of S are functionally equivalent.

As shown in fig. 4, a calculation pattern S _i At its edge e ₃ Warp yarnGenerating an equivalent calculation pattern S 'after transformation' _i 。

And calculating the similarity between the modes. If there is a calculation mode S _i One edge e of (2) _i (u _i ，v _i ) With another calculation mode S _j One edge e of (2) _j (u _j ，v _j ) Identical, i.e. edge e _i (u _i ，v _i )、e _j (u _j ，v _j ) Source node u of (2) _i 、u _j The corresponding arithmetic logic operations are the same and the destination node v _i 、v _j The corresponding arithmetic logic operations are also the same, then the calculation mode S _i 、S _j With similarity, denoted by the symbol gamma _i，j Metrics. As in FIG. 2, S _i And S is equal to _j Similarity gamma between _i，j =0, and S _i Equivalent calculation pattern S 'of (2)' _i And S is equal to _j Similarity between gamma' _i，j X 2. Meanwhile, a calculation mode S and the method thereofThe similarity itself is defined as 0.

An equivalent transformation set for pattern S is computed. Given the computing pattern S and the edge set e= (E) of its directed acyclic graph G (V, E) ₁ ，e ₂ ，…，e _m ) For a combination (E) of i sides selected arbitrarily from the set E ₁ ，e ₂ ，…，e _i ) Any side of (i=1, 2, …, m) is arbitraryThe set of equivalence transforms of S formed by the equivalence transforms is denoted as S ^* 。

An optimal set of transformations for the pattern set is computed. Given a set of calculation modes "= (S ₁ ，S ₂ ，…，S _k ) Equivalent transformation set of calculation modeIf gather (S' ₁ ，S′ ₂ ，…，S′ _K )/> So that the gain function f (S' ₁ ，S′ ₂ ，…，S′ _K ) Taking the maximum value, the set is called the optimal transformation set of set->

And calculating a similarity matrix of the mode set. Given a set of computing modes = (S) ₁ ，S ₂ ，…，S _k ) And its optimal transformation setAnd set gain delta (S) ₁ )≥δ(S ₂ )≥…δ(S _K-1 )≥δ(S _k ) Set->The matrix of similarity between any 2 calculation modes is referred to as the similarity matrix of the set, and is denoted as R.

In the formula (4), since the similarity of one calculation pattern S to itself is defined as 0, and γ _i，j ＝γ _j，i So when j is less than or equal to i, gamma _i，j Set to 0.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. A method for sharing logic resources for a special purpose instruction set processor, comprising the steps of:

step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time _i,j As a further extended growth point; if the area constraint A is satisfied, the coordinate (i, j) is indicatedIs calculated in the mode S' _i 、S′ _j Contained in a group and S' _i 、S′ _j As a growth point for further expansion of the group; at the same time, the element value gamma is masked _i,j ；

Step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point _i,a Line j finds the maximum element value gamma that occurs for the first time _j,b Then shielding all element values of the ith row and the j row; if gamma is _i,a Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (i, a) that has not been included in the packet; if gamma is _j,b Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (j, b) that has not been included in the packet;

step S4: for newly found element value gamma _i,a 、γ _j,b Judging, if both are 0, entering step S5; if not, executing shielding or repeating the step; if the element value gamma _i,a Not equal to 0, find the maximum element value gamma that appears for the first time in line a _a,c Then shielding all element values of the a line; if the element value gamma _j,b Not equal to 0, find the maximum element value gamma that appears for the first time in line b _b,d Then shielding all element values of the b line; if gamma is _a,c Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (a, c) that has not been included in the packet; if gamma is _b,d Not 0 and area constraint a is satisfied, then the calculation pattern indicated by coordinates (b, d) that has not been included is included in the packet, while the element value γ is masked _a,c 、γ _b,d Let gamma _i,a ＝γ _a,c 、γ _j,b ＝γ _b,d And repeating step S4;

2. The method according to claim 1, wherein in the step S4, if 2 newly found element values γ are found _i,a 、γ _j,b All are 0, the growth process of one packet is ended, and the process proceeds to step S5.

3. The method according to claim 1, wherein in step S5, if there is a calculation pattern in the current block that has not been incorporated into a certain packet and can be included, the index idx is incremented by 1 and steps S2-S4 are repeated.

4. The method according to claim 1, wherein in step S5, if all the calculation modes in the current block are already included in a certain packet, switching to the next block and repeating steps S2-S4.

5. The method according to claim 1, wherein in step S5, if all computing modes in the current block that have not been included in a certain packet cannot be included, the computing modes are incorporated into the next block to repeat steps S2-S4.