CN113076135B - Logic resource sharing method for special instruction set processor - Google Patents

Logic resource sharing method for special instruction set processor Download PDF

Info

Publication number
CN113076135B
CN113076135B CN202110366542.0A CN202110366542A CN113076135B CN 113076135 B CN113076135 B CN 113076135B CN 202110366542 A CN202110366542 A CN 202110366542A CN 113076135 B CN113076135 B CN 113076135B
Authority
CN
China
Prior art keywords
line
gamma
packet
element value
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110366542.0A
Other languages
Chinese (zh)
Other versions
CN113076135A (en
Inventor
陈虎
曹强辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Godson Guangzhou Technology Co ltd
Original Assignee
Godson Guangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Godson Guangzhou Technology Co ltd filed Critical Godson Guangzhou Technology Co ltd
Priority to CN202110366542.0A priority Critical patent/CN113076135B/en
Publication of CN113076135A publication Critical patent/CN113076135A/en
Application granted granted Critical
Publication of CN113076135B publication Critical patent/CN113076135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A method of sharing logical resources for a special purpose instruction set processor, the steps comprising: step S1: for the optimal transformation setPartitioning, namely partitioning the similarity matrix R according to columns; step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time i,j As a further extended growth point; step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point i,a Line j finds the maximum element value gamma that occurs for the first time j,b Then shielding all element values of the ith row and the j row; step S4: for newly found element value gamma i,a 、γ j,b Judging, if the judgment result is 0, continuing; otherwise, executing shielding or repeating the step; step S5: the calculation mode group generated in step S4 is archived. The method has the advantages of simple principle, easy realization, capability of improving the calculation efficiency and the resource utilization rate, and the like.

Description

Logic resource sharing method for special instruction set processor
Technical Field
The invention mainly relates to the technical field of instruction set processors, in particular to a logic resource sharing method aiming at a special instruction set processor.
Background
Applications typically include a number of fixed computation modes (Computing Pattern), such as butterfly operations in the fast discrete cosine transform (Fast Discrete Cosine Transform, FDCT) of FIG. 1. Solidifying these computing modes into dedicated extended instructions can significantly improve computing efficiency. An application oriented instruction set Processor (Application Specific Instruction-set Processor, ASIP) improves the performance of the Processor in this way. As shown in fig. 1, for ease of design, a dedicated extended functional unit (extended Functional Unit, eFU) is typically scribed within ASIP to implement an extended instruction Set (extended Instruction-Set Architecture, eISA).
However, the arithmetic logic resources available in the above-described extended functional unit eFU are generally limited, depending on factors such as area. The limited resources of the extended functional unit eFU require that as many computing modes as possible be implemented on one extended functional unit eFU to maximize the execution efficiency of the application. That is, the resources are shared as much as possible between the computing modes (or extended instruction sets) implemented on the same extended functional unit eFU. Therefore, for a given area constraint of the extended functional unit eFU, a set of computing modes, a resource sharing policy between computing modes needs to be designed to maximize computing efficiency and resource utilization.
Practitioners have proposed solutions to improve the computational efficiency and resource utilization of the extended functional unit eFU, such as most typically converting the computational pattern of an instruction into a set of paths and comparing paths among multiple sets of paths to find a set of paths that maximizes the performance gain δ given area constraints. However, this conventional approach still has some drawbacks:
first, path-based resource sharing alone introduces excessive selector MUXs, thereby increasing control complexity, affecting the operating frequency of the system.
And (II) the integrity of the calculation mode of one instruction is broken, and the readability and maintainability of the design are poor.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems existing in the prior art, the invention provides a logic resource sharing method aiming at a special instruction set processor, which has the advantages of simple principle, easy realization and capability of improving the calculation efficiency and the resource utilization rate.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for sharing logic resources for a special instruction set processor, comprising the steps of:
step S1: for the optimal transformation setPartitioning, namely partitioning the similarity matrix R according to columns;
step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time i,j The calculation mode indicated by the coordinates (i, j) is taken as a further extensionA growing point;
step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point i,a Line j finds the maximum element value gamma that occurs for the first time j,b Then shielding all element values of the ith row and the j row;
step S4: for newly found element value gamma i,a 、γ j,b Judging, if both are 0, entering step S5; if not, continuing to expand in the way of step S3 in the a line, the b line or both the a line and the b line;
step S5: archiving the calculation mode group generated in the step S4 as G idx
As a further improvement of the process of the invention: in the step S2, if the area constraint a is satisfied, the calculation pattern S 'indicated by the coordinates (i, j) is determined' i 、S′ j Contained in a group and S' i 、S′ j As a growth point for further expansion of the group; at the same time, the element value gamma is masked i,j
As a further improvement of the process of the invention: in the step S3, if γ i,a Not 0 and area constraint a is satisfied, the calculation mode indicated by coordinates (i, a) that has not been included in the packet is included in the packet.
As a further improvement of the process of the invention: in the step S3, if γ j,b Not 0 and the area constraint a is satisfied, the calculation mode indicated by the coordinates (j, b) that has not been included in the packet is included in the packet.
As a further improvement of the process of the invention: in the step S4, if 2 newly found element values γ i,a 、γ j,b All are 0, the growth process of one packet is ended, and the process proceeds to step S5.
As a further improvement of the process of the invention: in the step S4, if the element value γ i,a Not equal to 0, find the maximum element value gamma that appears for the first time in line a a,c Then shielding all element values of the a line; if the element value gamma j,b Not equal to 0, find first occurrence in line bMaximum element value gamma of (2) b,d All element values on line b are then masked.
As a further improvement of the process of the invention: in the step S4, if gamma a,c Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (a, c) that has not been included in the packet; if gamma is b,d Not 0 and area constraint a is satisfied, then the calculation pattern indicated by coordinates (b, d) that has not been included is included in the packet, while the element value γ is masked a,c 、γ b,d . Let gamma i,a =γ a,c 、γ j,b =γ b,d And repeats step S4.
As a further improvement of the process of the invention: in said step S5, if there is a calculation pattern in the current block that has not been incorporated into a certain packet and that can be included, the index idx is incremented by 1 and steps S2-S4 are repeated.
As a further improvement of the process of the invention: in the step S5, if all the calculation modes in the current block are already included in a certain packet, the next block is switched to and steps S2-S4 are repeated.
As a further improvement of the process of the invention: in the step S5, if all the calculation modes that are not already included in a certain packet in the current block cannot be included, the calculation modes are incorporated into the next block to repeat the steps S2-S4.
Compared with the prior art, the invention has the advantages that:
1. the logic resource sharing method for the special instruction set processor has simple principle, is easy to realize, and can furthest improve the calculation efficiency and the resource utilization rate, namely in the optimal transformation setIn (3), a calculation mode grouping mode G1, G2, …, gx is acquired, and the gain generated by the calculation mode implemented at eFU can be obtained to be a local or global maximum value by using the calculation mode grouping mode.
2. The logic resource sharing method for the special instruction set processor realizes the sharing of arithmetic logic resources among computing modes on the premise of keeping the integrity of the computing modes by the following two technical means, thereby keeping the readability and maintainability of the computing modes while compressing the resources: (1) Performing equivalent transformation on the computing modes, and participating in a resource sharing process according to the equivalent transformation result of the computing modes, so that the possibility of sharable resources among a plurality of computing modes is increased, and the computing efficiency and the resource utilization rate are improved to the greatest extent; (2) A condition is defined that 2 equivalent computing modes are directly associated (shared resources), i.e., one equivalent computing mode is directly associated (shared resources) only with another equivalent computing mode with which it shares the most.
Drawings
Fig. 1 is a schematic diagram of a construction flow of ASIP.
FIG. 2 is a schematic flow chart of the method of the present invention.
Fig. 3 is a schematic diagram of the invention in a specific application embodiment.
FIG. 4 is a schematic diagram of equivalent transformation and equivalent computing modes in a specific application embodiment of the present invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and the specific examples.
Given the area constraint a of the extended functional unit eFU, calculate the pattern set = (S) 1 ,S 2 ,…,S k ) Optimal transformation set for computing mode setCalculating a similarity matrix R of the mode set; and set up:
1) Gain delta (S) 1 )≥δ(S 2 )≥…δ(S K-1 )≥δ(S k );
2) Computing mode set and optimal transformation setThe area resource consumed by each arithmetic logic operation contained in the computing mode is 1;
thenThe invention aims at optimizing a transformation setA way of grouping the computation patterns is found such that the gain produced by a computation pattern implemented on a given extended functional unit eFU takes a local or global maximum.
As shown in fig. 2 and 3, a logic resource sharing method for a special instruction set processor of the present invention includes the steps of:
step S1: for the optimal transformation setAnd performing blocking, namely blocking the similarity matrix R according to columns. Let n blocks total, denoted as Z 1 ,Z 2 ,…,Z n
Step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time i,j Taking the calculation mode indicated by the coordinates (i, j) as a further extended growth point;
step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point i,a Line j finds the maximum element value gamma that occurs for the first time j,b All element values in rows i, j are then masked.
Step S4: for newly found element value gamma i,a 、γ j,b Judging, if both are 0, entering step S5; if not all 0S, the expansion is continued in step S3 on row a, row b or both rows a and b.
Step S5: the calculation mode group generated in step S4 is archived and denoted as Gidx.
In a specific application example, in the above step S1, as shown in fig. 3, the similarity matrix R includes 10×10 elements; the 1 st block contains the calculation mode (S' 1 ,S′ 2 ,…,S′ 6 ) Columns 1-6 of the corresponding similarity matrix R; the 2 nd block contains the calculation mode (S' 7 ,S′ 8 ,…,S′ 10 ) Corresponding to columns 7-10 of the similarity matrix R.
In a specific application example, in the above step S2, if the area constraint a is satisfied, the calculation mode S 'indicated by the coordinates (i, j) is set' i 、S′ j Contained in a group and S' i 、S′ j As a growth point for further expansion of the group; at the same time, the element value gamma is masked i,j
As in the specific application example, in the above step S2, as in fig. 3, the maximum element value occurring for the first time is 1,3 =4, the first 2 calculation modes contained in the packet are S' 1 、S′ 3
In a specific application example, in the above step S3, if γ i,a Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (i, a) that has not been included in the packet; if gamma is j,b Not 0 and the area constraint a is satisfied, the calculation mode indicated by the coordinates (j, b) that has not been included in the packet is included in the packet.
As in the specific application example, in the above step S3, as in fig. 3, γ i,a =γ 1,6 =2,γ j,b =γ c3,6 =3, newly added calculation pattern S' 6 Is contained in the packet.
In a specific application example, in the above step S4, specifically includes:
if 2 newly found element values gamma i,a 、γ j,b All are 0, the growth process of one packet is ended, and the process proceeds to step S5.
If the element value gamma i,a Not equal to 0, find the maximum element value gamma that appears for the first time in line a a,c Then shielding all element values of the a line;
if the element value gamma j,b Not equal to 0, find the maximum element value gamma that appears for the first time in line b b,d Then shielding all element values of the b line;
if gamma is a,c Not equal to 0 and area constraint A is satisfied, the coordinates are thenThe computing patterns indicated by (a, c) that have not been included in the packet are included in the packet;
if gamma is b,d Not 0 and area constraint a is satisfied, then the calculation pattern indicated by coordinates (b, d) that has not been included is included in the packet, while the element value γ is masked a,c 、γ b,d . Let gamma i,a =γ a,c 、γ j,b =γ b,d And repeats step S4.
As in the present specific application example, as in fig. 3, since all elements in the 6 th row of the 1 st block are 0 in the above-described step S4, the loop in step S4 ends.
In a specific application example, in the above step S5, as in fig. 3, G 1 =(S′ 1 ,S′ 3 ,S′ 6 ),G 2 =(S′ 4 ,S′ 5 ). 1) If there is a calculation mode in the current block that has not been incorporated into a certain packet and that can be included, the index idx is incremented by 1 and steps S2-S4 are repeated; 2) If all calculation modes in the current block are already contained in a certain packet, switching to the next block and repeating steps S2-S4. 3) If all the calculation modes of the current block that have not been included in a certain packet cannot be included, as shown in S 'in FIG. 3' 2 The calculation modes are incorporated into the next block repeat steps S2-S4.
Further, after all n blocks in step S1 have been processed according to steps S2-S5, the calculation mode group output in step S5 is the final implementation objective of the present invention, i.e. in the optimal transformation setIn (1) a mode G of grouping calculation modes 1 ,G 2 ,…,G x The gain produced by the computation mode implemented on eFU may be maximized locally or globally. As in FIG. 3, G 1 =(S′ 1 ,S′ 3 ,S′ 6 ),G 2 =(S′ 4 ,S′ 5 ),G 3 =(S′ 2 ,S′ 7 ,S′ 9 ,S′ 8 ) A set of efficient grouping means.
In order to make the above description of the present invention more clear and complete, the present invention further provides additional description of some definitions in the course of the above-described method in combination with common general knowledge in the art.
A topological representation of the computation mode. Directed acyclic graphs (Directed Acyclic Graph, DAG) G (V, E). The dataflow graph of one computing mode is typically represented by a directed acyclic graph G (V, E). Wherein the nodes in set V represent operations, and edges E (u, V) in set E represent that data generated by node u is consumed by node V. There are 4 nodes, 4 edges in the calculation mode ADDBF of butterfly as in graph 1.
Gain 6 of the pattern is calculated. The gain delta (S) of the calculation mode S is defined as the ratio of the execution time it takes to complete the calculation in software to the execution time it takes to complete the calculation in hardware (the calculation mode is implemented as a dedicated extension instruction). The execution time of the compute mode is measured in system clock cycles.
Area of calculation modeThe area of the calculation mode S is defined as the sum of the area resources consumed by the arithmetic logic operation it contains. In this patent, it is assumed that each arithmetic logic operation consumes 1 area resource.
Equivalent transformation of the calculation mode: if a new node is inserted between 2 nodes of the edge in the directed acyclic graph G (V, E) of the calculation pattern S, and the output result of the calculation pattern S' after down-conversion in any input combination is the same as the output result of the original calculation pattern S, this process is called one equivalent conversion. Wherein:
equivalent variation rules. In principle, any number of new nodes may be inserted between 2 nodes on a certain side in the directed acyclic graph G (V, E) as long as the output result is guaranteed to be the same. For simplicity, at most, only one new node is allowed to be inserted between 2 nodes of one edge in the patent, namely, one equivalent transformation is performed.
Equivalent transformer. One transformation means a certain edge in the directed acyclic graph G (V, E)A new node is inserted between 2 nodes u, v of e (u, v). Symbol for conversion operationIt is indicated that a kind of transformer corresponds to an arithmetic logic operation (e.g. addition, subtraction, multiplication, shift, etc.). The transformation on the side e (u, v) is denoted +.>Or->Node +.>Consumption, node->The generated data is consumed by node v. In principle, any number of transducers can be defined, the set of transducers predefined in this patent +.>Comprises 7 kinds of transformation symbols, namely +.>Wherein the logical meaning of each transformer is as follows:
bypass path
Adding 0 to
Minus 0
Press bit with 1 (1)
By bit or 0
Multiplying by 1
Left shift 1 bit and right shift 1 bit
Equivalent computing mode. The calculation mode S' after one or more equivalent transformations of one calculation mode S is an equivalent calculation mode of S. Any 2 equivalent calculation modes S', S "of S are functionally equivalent.
As shown in fig. 4, a calculation pattern S i At its edge e 3 Warp yarnGenerating an equivalent calculation pattern S 'after transformation' i
And calculating the similarity between the modes. If there is a calculation mode S i One edge e of (2) i (u i ,v i ) With another calculation mode S j One edge e of (2) j (u j ,v j ) Identical, i.e. edge e i (u i ,v i )、e j (u j ,v j ) Source node u of (2) i 、u j The corresponding arithmetic logic operations are the same and the destination node v i 、v j The corresponding arithmetic logic operations are also the same, then the calculation mode S i 、S j With similarity, denoted by the symbol gamma i,j Metrics. As in FIG. 2, S i And S is equal to j Similarity gamma between i,j =0, and S i Equivalent calculation pattern S 'of (2)' i And S is equal to j Similarity between gamma' i,j X 2. Meanwhile, a calculation mode S and the method thereofThe similarity itself is defined as 0.
An equivalent transformation set for pattern S is computed. Given the computing pattern S and the edge set e= (E) of its directed acyclic graph G (V, E) 1 ,e 2 ,…,e m ) For a combination (E) of i sides selected arbitrarily from the set E 1 ,e 2 ,…,e i ) Any side of (i=1, 2, …, m) is arbitraryThe set of equivalence transforms of S formed by the equivalence transforms is denoted as S *
An optimal set of transformations for the pattern set is computed. Given a set of calculation modes "= (S 1 ,S 2 ,…,S k ) Equivalent transformation set of calculation modeIf gather (S' 1 ,S′ 2 ,…,S′ K )/> So that the gain function f (S' 1 ,S′ 2 ,…,S′ K ) Taking the maximum value, the set is called the optimal transformation set of set->
And calculating a similarity matrix of the mode set. Given a set of computing modes = (S) 1 ,S 2 ,…,S k ) And its optimal transformation setAnd set gain delta (S) 1 )≥δ(S 2 )≥…δ(S K-1 )≥δ(S k ) Set->The matrix of similarity between any 2 calculation modes is referred to as the similarity matrix of the set, and is denoted as R.
In the formula (4), since the similarity of one calculation pattern S to itself is defined as 0, and γ i,j =γ j,i So when j is less than or equal to i, gamma i,j Set to 0.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (5)

1. A method for sharing logic resources for a special purpose instruction set processor, comprising the steps of:
step S1: for the optimal transformation setPartitioning, namely partitioning the similarity matrix R according to columns;
step S2: for each block, scanning the value of each element line by line from the 1 st line and the 1 st column in the unmasked elements, and searching the maximum element value gamma appearing for the first time i,j As a further extended growth point; if the area constraint A is satisfied, the coordinate (i, j) is indicatedIs calculated in the mode S' i 、S′ j Contained in a group and S' i 、S′ j As a growth point for further expansion of the group; at the same time, the element value gamma is masked i,j
Step S3: finding the maximum element value gamma of the first occurrence on the ith line indicated by the coordinates (i, j) of the growing point i,a Line j finds the maximum element value gamma that occurs for the first time j,b Then shielding all element values of the ith row and the j row; if gamma is i,a Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (i, a) that has not been included in the packet; if gamma is j,b Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (j, b) that has not been included in the packet;
step S4: for newly found element value gamma i,a 、γ j,b Judging, if both are 0, entering step S5; if not, executing shielding or repeating the step; if the element value gamma i,a Not equal to 0, find the maximum element value gamma that appears for the first time in line a a,c Then shielding all element values of the a line; if the element value gamma j,b Not equal to 0, find the maximum element value gamma that appears for the first time in line b b,d Then shielding all element values of the b line; if gamma is a,c Not equal to 0 and area constraint a is satisfied, then including in the packet the calculation mode indicated by coordinates (a, c) that has not been included in the packet; if gamma is b,d Not 0 and area constraint a is satisfied, then the calculation pattern indicated by coordinates (b, d) that has not been included is included in the packet, while the element value γ is masked a,c 、γ b,d Let gamma i,a =γ a,c 、γ j,b =γ b,d And repeating step S4;
step S5: archiving the calculation mode group generated in the step S4 as G idx
2. The method according to claim 1, wherein in the step S4, if 2 newly found element values γ are found i,a 、γ j,b All are 0, the growth process of one packet is ended, and the process proceeds to step S5.
3. The method according to claim 1, wherein in step S5, if there is a calculation pattern in the current block that has not been incorporated into a certain packet and can be included, the index idx is incremented by 1 and steps S2-S4 are repeated.
4. The method according to claim 1, wherein in step S5, if all the calculation modes in the current block are already included in a certain packet, switching to the next block and repeating steps S2-S4.
5. The method according to claim 1, wherein in step S5, if all computing modes in the current block that have not been included in a certain packet cannot be included, the computing modes are incorporated into the next block to repeat steps S2-S4.
CN202110366542.0A 2021-04-06 2021-04-06 Logic resource sharing method for special instruction set processor Active CN113076135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110366542.0A CN113076135B (en) 2021-04-06 2021-04-06 Logic resource sharing method for special instruction set processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110366542.0A CN113076135B (en) 2021-04-06 2021-04-06 Logic resource sharing method for special instruction set processor

Publications (2)

Publication Number Publication Date
CN113076135A CN113076135A (en) 2021-07-06
CN113076135B true CN113076135B (en) 2023-12-26

Family

ID=76615012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110366542.0A Active CN113076135B (en) 2021-04-06 2021-04-06 Logic resource sharing method for special instruction set processor

Country Status (1)

Country Link
CN (1) CN113076135B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3560935A (en) * 1968-03-15 1971-02-02 Burroughs Corp Interrupt apparatus for a modular data processing system
CN1306640A (en) * 1999-03-26 2001-08-01 密克罗奇普技术公司 Microcontroller instruction set
CN104636315A (en) * 2015-02-06 2015-05-20 中国人民解放军国防科学技术大学 GPDSP-oriented matrix LU decomposition vectorization calculation method
CN105283839A (en) * 2013-03-15 2016-01-27 微软技术许可有限责任公司 Personalized community model for surfacing commands within productivity application user interfaces
CN105574269A (en) * 2015-12-16 2016-05-11 青岛大学 Design verification method of special instruction processor
CN106020773A (en) * 2016-05-13 2016-10-12 中国人民解放军信息工程大学 Optimization Method of Finite Difference Algorithm in Heterogeneous Many-Core Architecture
CN110032442A (en) * 2017-12-03 2019-07-19 英特尔公司 Accelerate the framework and mechanism of tuple space search using integrated GPU
CN111176584A (en) * 2019-12-31 2020-05-19 曙光信息产业(北京)有限公司 Data processing method and device based on hybrid memory
CN112100118A (en) * 2020-08-05 2020-12-18 中科驭数(北京)科技有限公司 Neural network computing method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10380063B2 (en) * 2017-09-30 2019-08-13 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3560935A (en) * 1968-03-15 1971-02-02 Burroughs Corp Interrupt apparatus for a modular data processing system
CN1306640A (en) * 1999-03-26 2001-08-01 密克罗奇普技术公司 Microcontroller instruction set
CN105283839A (en) * 2013-03-15 2016-01-27 微软技术许可有限责任公司 Personalized community model for surfacing commands within productivity application user interfaces
CN104636315A (en) * 2015-02-06 2015-05-20 中国人民解放军国防科学技术大学 GPDSP-oriented matrix LU decomposition vectorization calculation method
CN105574269A (en) * 2015-12-16 2016-05-11 青岛大学 Design verification method of special instruction processor
CN106020773A (en) * 2016-05-13 2016-10-12 中国人民解放军信息工程大学 Optimization Method of Finite Difference Algorithm in Heterogeneous Many-Core Architecture
CN110032442A (en) * 2017-12-03 2019-07-19 英特尔公司 Accelerate the framework and mechanism of tuple space search using integrated GPU
CN111176584A (en) * 2019-12-31 2020-05-19 曙光信息产业(北京)有限公司 Data processing method and device based on hybrid memory
CN112100118A (en) * 2020-08-05 2020-12-18 中科驭数(北京)科技有限公司 Neural network computing method, device and storage medium

Also Published As

Publication number Publication date
CN113076135A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN110187965B (en) Operation optimization and data processing method and device of neural network and storage medium
Perkinson et al. G-parking functions and tree inversions
CN104184578A (en) FPGA-based elliptic curve scalar multiplication accelerating circuit and algorithm thereof
Case et al. Beyond Rogers’ non-constructively computable function
CN113794572A (en) Hardware implementation system and method for high-performance elliptic curve digital signature and signature verification
CN113076135B (en) Logic resource sharing method for special instruction set processor
CN115801244A (en) Post-quantum cryptography algorithm implementation method and system for resource-constrained processor
JPWO2016024508A1 (en) Multiprocessor device
Pornin Optimized binary gcd for modular inversion
Soloveichik et al. Combining self-healing and proofreading in self-assembly
Datta et al. A silent self-stabilizing algorithm for the generalized minimal k-dominating set problem
Gu et al. Polynomial time solvable algorithm to linearly constrained binary quadratic programming problems with Q being a five-diagonal matrix
CN110232289A (en) The high speed point doubling method of elliptic curve cipher
CN113741972B (en) SM3 algorithm parallel processing method and electronic equipment
Reinecke et al. Reducing the cost of generating APH-distributed random numbers
CN102231624B (en) Vector processor-oriented floating point complex number block finite impulse response (FIR) vectorization realization method
Kurita et al. Constant amortized time enumeration of Eulerian trails
CN103942195A (en) Data processing system and data processing method
Horácek et al. Computing Boolean border bases
Petković et al. Börsch-Supan-like methods: Point estimation and parallel implementation
Bougerol Matsumoto–Yor Process and Infinite Dimensional Hyperbolic Space
Baier Linear-time suffix sorting
CN106817214B (en) The generation method of the regular NAF sequence of scalar
CN114117896B (en) Binary protocol optimization implementation method and system for ultra-long SIMD pipeline
Wang et al. On the convergence results of a class of nonmonotone accelerated proximal gradient methods for nonsmooth and nonconvex minimization problems. Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant