CN114492782A - On-chip core compiling and mapping method and device of neural network based on reinforcement learning - Google Patents

On-chip core compiling and mapping method and device of neural network based on reinforcement learning Download PDF

Info

Publication number
CN114492782A
CN114492782A CN202210407390.9A CN202210407390A CN114492782A CN 114492782 A CN114492782 A CN 114492782A CN 202210407390 A CN202210407390 A CN 202210407390A CN 114492782 A CN114492782 A CN 114492782A
Authority
CN
China
Prior art keywords
core
mapping
chip
neural network
neurons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210407390.9A
Other languages
Chinese (zh)
Other versions
CN114492782B (en
Inventor
何煜坤
李莹
吕攀
章明
孙世春
邓水光
潘纲
马德
齐勃
金孝飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202210407390.9A priority Critical patent/CN114492782B/en
Publication of CN114492782A publication Critical patent/CN114492782A/en
Application granted granted Critical
Publication of CN114492782B publication Critical patent/CN114492782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning. The method comprises the steps of constructing mapping benefits according to communication cost, mapping area and core internal utilization rate, training a strategy network by using an Actor-Critic algorithm to obtain higher mapping benefits, training a reinforcement learning strategy network to learn the optimal mapping position of any pulse neural network neuron, and finally completing the neural network to be deployed to an on-chip core array by using the trained strategy network, so that the cost of communication distance on interconnected neurons is reduced, the chip calculation efficiency is effectively improved, and the overall power consumption is reduced.

Description

On-chip core compiling and mapping method and device of neural network based on reinforcement learning
Technical Field
The invention relates to the field of high-performance computing of computers, in particular to a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning.
Background
The research of neuroscience on human brain discovers that billions of neurons can generate high parallel computing power through interconnection, and the pulse neural network provided on the basis can complete information transmission through pulse coding to solve complex tasks. The neural form chip based on the pulse neural network simulates the connection of biological neurons in a high-density network-on-chip organization mode, and realizes the parallelization of large-scale calculation. The information communication mode among different cores of the network-on-chip is very sensitive to the generation and the receiving and the sending of the pulse, and extremely high requirements are put forward on the performance constraint generated by the network deployment result.
Different from a traditional compiler, the compiling and mapping of the neural network on the chip network based on pulse transmission to complete reasoning is generally modeled as a multi-objective programming problem under a constraint condition, and the objective is to solve the problems of total power consumption (communication cost) of pulse transmission among all nodes in a chip and optimization of time consumption of application calculation. Due to the graph connection property of the impulse neural network, the mainstream mapping algorithm is divided into two stages: 1) firstly, segmenting an original pulse neural network, and reconstructing a topological structure; 2) each network partition is assigned a binding to a physical node. The core problem of compilation is that solving the optimal mapping scheme under a given optimization objective is an NP-Hard problem, how to find an approximate solution in a short time under the hardware constraints of the chip.
The current compiling environment and framework in the industry are rich in variety, and most of the compiling environment and framework are designed for specific modeling and optimization algorithms aiming at chips at home. The compiler lcompier of the Loihi chip of intel splits the map and maps it to each core and generates binary files one by one. The assignment algorithm aims to optimize the input mapping ratio of Loihi, iteratively trying the assignment of neurons until a given number of cores can be put in. IBM's TrueNorth project chip has accomplished the scale of million grades of neurons then, compiles and maps and adopts similar optimization algorithm in VLSI wiring problem in order to reduce the total communication cost who applies after the chip deployment, through compiler optimization, the maximum pulse quantity of every tick on the port reduces to 2500 from 10000.
The above mapping algorithm on-chip for the impulse neural network is based on the following steps:
1. the spiking neural network is abstracted as a computational graph with neurons or groups of neurons as nodes and synaptic connections as edges.
2. And (4) re-segmenting and aggregating the computation graph to form a physical layer computation graph which can be mapped to the network on chip one by one.
3. And constructing a loss function for measuring the quality of the mapping result, continuously adjusting iteration, and trying to reduce the loss function until convergence by using an algorithm based on heuristic search or other algorithms for solving a multi-target programming problem to finally obtain the mapping result.
Therefore, the scale and the spatial characteristics of the mapping problem place high demands on the searching capability of the algorithm. In recent years, the reinforcement learning algorithm has good performance in solving the nonlinear programming problem. Traditional reinforcement learning algorithms are typically used to control the strategy process of an agent by progressively accumulating data in interaction with the environment and determining how to take optimal action in the current state to obtain the highest gains to complete a given task. In recent years, deep reinforcement learning has been used to process more complex and diverse tasks, and has been well-behaved in problems such as machine vision, natural language processing, complex systems, and the like. Specifically, an Actor-Critic depth reinforcement learning algorithm (an Actor Critic algorithm, Actor Policy Gradient representing a Policy Gradient algorithm, Critic Q-learning representing a value-based reinforcement learning algorithm) based on a Policy uses two sets of independent parameters to update an action value function parameter and a Policy parameter respectively.
The mapping result of the neural network at the chip end directly determines the pulse transmission efficiency between the cores of the chips. An excessively simple mapping scheme cannot quickly search an optimal solution in a huge solution space, and easily falls into a local suboptimal solution caused by unreasonable design of a mapping algorithm and a cost function, so that the transmission speed of inter-core pulse packets becomes slow, and extra communication and power consumption cost are generated.
Disclosure of Invention
In order to solve the defects of the prior art, the invention realizes the purposes of reducing the transmission distance of the pulse packet of the on-chip core, the chip mapping area, the port pulse number, the calculation time consumption and the chip power consumption and improves the transmission efficiency, and adopts the following technical scheme:
the on-chip core compiling and mapping method based on the neural network of reinforcement learning comprises the following steps:
step S1: acquiring topological structure and parameter information of a pulse neural network, wherein the pulse neural network comprises a neuron group and synapses, the neuron group consists of neurons, and the neurons are connected through the synapses; the topology is a graph structure having nodes representing neurons and/or groups of neurons and edges representing synapses as connections between neurons and/or groups of neurons; the parameter information includes characteristic data in neurons and synaptic connections;
step S2: for a chip deploying a pulse neural network, obtaining description information of a network-on-chip computation core matrix of the chip, wherein the computation core is used for distributing neurons for computation, the description information comprises the specification of the computation core, and the spatial position of the computation core on the network-on-chip and the communication connection relation of different computation cores are computed;
step S3: establishing an initialization mapping state through the space position of the distributed computing core on the network on chip, the connection condition of the distributed neurons, the current neuron to be distributed and the connection condition between the current neuron to be distributed and other neurons;
step S4: inputting the mapping state into a trained reinforcement learning strategy neural network, and obtaining the probability distribution of the mapping action of the current neuron to be distributed to different calculation core space positions from the output of the neural network;
step S5: selecting a space position of a computation core with the maximum probability in the probability distribution as a placement position of the current neuron, and filling the space position of the computation core into a digital storage space corresponding to the computation core;
step S6: and repeating the steps until all the neurons are placed, and obtaining all the mapping from the complete pulse neural network to the on-chip network calculation core matrix of the chip.
Further, in step S4, the training of the reinforcement learning strategy neural network includes the following steps:
step S4.1: initialization, including determining the number of training samplesELearning raterCoefficient of return attenuationγExploration rateɛTwo neural network structures, strategy (Actor) and evaluation (Critic);
step S4.2: step S4.3 to step S4.7 are executed until the number of training samples is reachedE
Step S4.3: randomly constructing the neuron and synapse connections of a spiking neural network, the random assignment comprising at leastNA matrix area of the computational cores;
step S4.4: obtaining mapping state according to feature vector formed by current distributed computing core space position, distributed neuron connection, current neuron to be distributed and connection between the neuron and other neuronsS
Step S4.5: will map the stateSInputting a strategy neural network to obtain an output mapping action A, placing a corresponding neuron at a calculation core space position appointed by the mapping action A, and obtaining an actionMapping statesS’And corresponding mapping benefitsR
Step S4.6: respectively map the statesSAnd post-action mapping statesS’Inputting the evaluation neural network, obtaining Time Difference (TD, Time Difference) error, and obtaining the overall mapping benefit according to the final state (after mapping is finished)RReturning each state by adopting an error back-transmission method (random gradient descent), respectively updating the weight of the strategy neural network and the weight of the evaluation neural network, and enabling the mapping action A to move to a better mapping state along with the convergence of the network;
step S4.7: using post-action mapping statesS’Alternate mapping statesSStep S4.4 to step S4.6 are continued until all neuron assignments are completed.
Further, in step S4.5, the mapping action a is to allocate any neuron to an unoccupied computational core space position, and if all neuron allocation is not completed, the mapping benefit of the mapping action is 0; if all the distribution is completed, constructing a mapping profit R of the mapping action by calculating the internal cost of the core, calculating the communication cost of the core, calculating the geometric area of the area occupied by the core and judging whether the profit meets the constraint condition or not;
the internal cost of the computing core comprises the balance degree of the unused digital storage space in the computing core and different digital storage areas in the computing core;
the calculation core communication cost comprises the number of pulse packets received and sent by each calculation core in unit time and the length of a transmission path of each pulse packet on a chip;
constructing a minimum rectangular closure containing all the computing cores used by mapping on a computing core grid, wherein the number of all the cores (including used and unused cores) contained in the rectangular closure is the geometric area of the area occupied by the cores;
the benefit of the constraint condition is a preference constraint that the input layer and the output layer of the impulse neural network are required to be arranged at the specified positions of the grid and/or customized by hardware constraint.
Further, the mapping yieldsRWhen all allocations are complete:
R total (S)= - (C core (S)+C comm (S)+C area (S))+R r (s)
wherein the content of the first and second substances,Sthe state of the mapping is represented,R total (S) Indicating the mapping revenue at the completion of the full allocation,C core (S) Representing the internal cost of all the computational cores,C comm (S) Representing the communication costs of all the computational cores,C area (S) Representing the geometric area of the region occupied by the computational core,R r (s) A benefit indicating whether the constraint condition is satisfied,rmeans that when a certain neuron does not satisfy the constraint, a correspondence is generatedrThe cost of (a) of (b),R r (s) Is the sum of all constraint costs.
Further, the number of used cores is used as the internal cost of the calculation cores, the Manhattan distance between the cores and the communication density of corresponding neurons are used as the communication cost, the total area of the rectangular edge of the region formed by the calculation cores is used as the geometric area, the number of neurons which do not meet the constraint is used as the cost of the given constraint, and a mapping revenue function is constructed:
Figure 233837DEST_PATH_IMAGE001
whereinNRepresenting the total number of computational cores used by the mapNThe number of the main components is one,n i andn j respectively representiIs first and secondjA set of neurons deployed on each computational core,D(i,j) Representing two computational coresiAnd a computational corejThe packet transfer distance of the pulse in between,w(k i ,k j ) Is shown asiNerves of a computational coreYuank i And a firstjNeurons of a computational corek j The density of the communication between the two devices,Areaan area function is represented for calculating the rectangular edge area of the core constituting region,rindicating the number of neurons that do not meet the constraint placement requirements.
Furthermore, the pulse packet includes a target computation core spatial position and pulse data information carried by the target computation core spatial position.
Further, the transmission path of the pulse packet on the chip refers to a route which is determined by a routing algorithm and passes through the pulse packet between the computing core channels of the network on the chip, the routing algorithm on the chip of the chip adopts a GXY algorithm, and coordinates of any two computing cores (a)x i ,y i )、(x j ,y j ) Has a communication cost of- x i - x j ,|+| y i - y j And the total communication cost is the sum of the communication costs of all the computing cores.
Further, the spatial position of the computational core on the network on chip is the computational core coordinate (x,y) And address of the computational core (a) Three-dimensional vector formed by splicing: (x,y,a)。
Further, the computation core in step S2 is a hardware unit with digital storage space and computation function, and the description information further includes specification of the computation core, where the specification includes the digital storage space of the computation core and the number of supported neurons.
The device for compiling and mapping the on-chip core of the reinforcement learning-based neural network comprises a memory and one or more processors, wherein executable codes are stored in the memory, and when the one or more processors execute the executable codes, the one or more processors are used for realizing the method for compiling and mapping the on-chip core of the reinforcement learning-based neural network.
The invention has the advantages and beneficial effects that:
the invention discloses a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning. And constructing mapping benefits according to the communication cost, the mapping area and the core internal utilization rate, and training a strategy network by using an Actor-Critic algorithm to obtain higher mapping benefits. Therefore, the transmission distance of the pulse packets between the cores on the chip is reduced, the transmission efficiency is improved, and the power consumption is reduced.
Drawings
FIG. 1 is a flow chart of a method in an embodiment of the invention.
Fig. 2 is a schematic diagram of a network-on-chip mapping in an embodiment of the invention.
Fig. 3 is a schematic diagram of inter-core communication cost in an embodiment of the present invention.
Fig. 4 is a schematic diagram of the geometric area penalty of the region occupied by the core in an embodiment of the present invention.
FIG. 5 is a block diagram of an apparatus in an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
In order to improve the solving efficiency of the existing mapping algorithm, the problem that the solution space in the mapping is too large and the efficient distribution is lacked is solved. The existing network mapping algorithm usually uses a specific planning algorithm to optimize a loss function, the solving time is long, local optimization is easy to happen, and the searching in the full space cannot be effectively realized.
The invention relates to a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning, which comprises the steps of firstly obtaining topological structure and parameter information of a pulse neural network and on-chip network calculation core matrix information of target hardware; extracting and splicing the characteristic vectors of the information, constructing a mapping state, inputting the trained strategy deep neural network, and obtaining the optimal position of the current neuron to be distributed to be mapped to the on-chip network; determining and filling neurons into corresponding cores, and updating a mapping state; and repeating iteration until the whole mapping from the neural network to the on-chip network is completed. As shown in fig. 1, the method specifically includes the following steps:
step S1: acquiring topological structure and parameter information of a pulse neural network, wherein the pulse neural network comprises neuron groups and synapses, the neuron groups are composed of neurons, and the neurons are connected through the synapses; the topology is a graph structure having nodes representing neurons and/or groups of neurons and edges representing synapses as connections between neurons and/or groups of neurons; the parametric information includes characteristic data in the neurons and synaptic connections.
Topology as shown in the spiking neural network in fig. 2, the circular nodes represent neurons, and the connecting arrows between neurons represent synapses, such as: a typical class task of a 3-layer impulse neural network has 784 neurons in the first layer (input layer), 512 neurons in the second layer, and 10 neurons in the third layer (output layer), and every two adjacent neurons in the two layers have synapses connected with each other.
Step S2: the method comprises the steps of acquiring description information of a network-on-chip computation core matrix of a chip for deploying target hardware of a pulse neural network, wherein the computation core is a hardware unit with a digital storage space and a computation function and is used for distributing neurons to perform computation, and the description information comprises specification information of the computation core (such as the size of a memory space on the computation core, the number of supported maximum neurons and the like), the spatial position of the computation core on the network-on-chip, and communication connection relations of different computation cores.
A typical deployment target hardware case is: the system comprises a 20-by-20 core grid which is arranged in a square shape, and each core is connected with four cores of the upper, lower, left and right cores for communication. A maximum of 128 neurons can be accommodated within each core and a maximum of 16 cores can be connected to.
The spatial position of the computational core on the network-on-chip is the computational core coordinates (x,y) And address of the computational core (a) Three-dimensional vector formed by splicing: (x,y,a)。
Step S3: establishing an initialization mapping state through the space position of the distributed computing core on the network on chip, the connection condition of the distributed neurons, the current neuron to be distributed and the connection condition between the current neuron to be distributed and other neurons;
step S4: and inputting the mapping state into the trained reinforcement learning strategy neural network, and obtaining the probability distribution of the mapping action of the current neuron to be distributed to different calculation core space positions from the output of the neural network.
The structure of the on-chip core compiling and mapping method of the neural network based on reinforcement learning of the embodiment of the invention is shown in table 1:
table 1: structure of compiling mapping method
Figure 478874DEST_PATH_IMAGE002
π(|,) represents the state of the mapping obtained from the policy networkSThe probability distribution of the mapping action of (a) below,αthe action of the mapping is represented and,θrepresenting policy network parameters.
How to obtain the above-mentioned strategy deep neural network is described below.
As shown in fig. 2, the network is obtained by training through an reinforcement learning Actor-Critic algorithm, wherein the input of the network is a mapping state multidimensional vector, and the network is obtained by flattening and connecting a space position 0-1 matrix of a current core grid, a 0-1 matrix connected with distributed neurons, and connection matrix vectors between the current neuron to be distributed and other neurons. The structure of the network is a deep neural network formed by a plurality of convolution layers, pooling layers and full-connection layers alternately, and the feature extraction of the input mapping state is completed. Finally, the position coordinates of the current selected neuron in the whole calculation core space under the mapping action are obtained (x,y,a) Is measured. The magnitude of the probability value for each element in the distribution indicates the score that the neuron is mapped to at that spatial location, with higher scores giving greater probability of being selected. And finally, selecting the position with the highest score as the target position of the neuron mapping.
The strategy neural network adopts deep network training, and comprises the following steps:
step S4.1: initialization, including determining the number of training samplesELearning raterCoefficient of return attenuationγExploration rateɛTwo neural network architectures, strategy (Actor) and evaluation (Critic).
Step S4.2: step S4.3 to step S4.7 are executed until the number of training samples is reachedE
Step S4.3: randomly constructing the neuron and synapse connections of a spiking neural network, randomly given comprising at leastNA matrix area of the computational core.
Step S4.4: obtaining mapping state according to feature vector formed by current distributed computing core space position, distributed neuron connection, current neuron to be distributed and connection between the neuron and other neuronsS
Step S4.5: will map the stateSInputting a strategy neural network to obtain an output mapping action A, placing a corresponding neuron at a computing core space position appointed by the mapping action A, and obtaining a mapping state after actionS’And mapping the benefitsR
Converting the distribution process into a problem under a reinforcement learning scene, and respectively defining a mapping state, a mapping action and a mapping income:
mapping action A is to allocate any neuron to an unoccupied computational core space position, and if all neuron allocation is not completed, the mapping benefit of the mapping action is 0; if all the distribution is completed, constructing a mapping profit R of the mapping action by calculating the internal cost of the core, calculating the communication cost of the core, calculating the geometric area of the area occupied by the core and judging whether the profit meets the constraint condition or not;
computing internal costs of the core, including a degree of balance of unused digital storage space within the core and different digital storage areas in the core;
calculating core communication cost, including the number of pulse packets received and sent by each calculation core in unit time and the length of a transmission path of each pulse packet on a chip;
calculating the geometric area of the area occupied by the core, and constructing a minimum rectangular closure containing all the calculation cores used by mapping on a calculation core grid, wherein the number of all the cores (including used and unused cores) contained in the rectangular closure is the geometric area of the area occupied by the core;
the benefit of constraints is that hardware constraints require that the input and output layers (pins) of the impulse neural network need to be arranged in specific locations of the grid (e.g., in the leftmost and uppermost cores), or user-defined preference constraints.
Mapping revenueRWhen all allocations are complete:
R total (S)= - (C core (S)+C comm (S)+C area (S))+R r (s)
wherein the content of the first and second substances,Sthe state of the mapping is represented,R total (S) Indicating the mapping benefit at the completion of the full allocation,C core (S) Representing the internal cost of all the computational cores,C comm (S) Representing the communication costs of all the computational cores,C area (S) Representing the geometric area of the region occupied by the computational core,R r (s) Indicating whether the benefit of a given constraint is satisfied,rmeans that when a certain neuron does not satisfy the constraint, the corresponding generation is performedrThe cost of (a) of (b),R r (s) Is the sum of all constraint costs.
Taking the number of used cores as the internal cost of the calculation cores, taking the Manhattan distance between the cores and the communication density of corresponding neurons as the communication cost, taking the total area of rectangular edges of a region formed by the calculation cores as a geometric area, taking the number of neurons which do not meet the constraint as the cost of given constraint, and constructing a mapping gain function:
Figure 727453DEST_PATH_IMAGE001
whereinNRepresenting the total number of computational cores used by the mapNThe number of the main components is one,n i andn j respectively representiIs first and secondjA set of neurons deployed on each computational core,D(i,j) Representing two computational coresiAnd a computational corejThe packet transfer distance of the pulse in between,w(k i ,k j ) Is shown asiNeurons of a computational corek i And a firstjNeurons of a computational corek j The density of the communication between the two devices,Areaan area function is represented for calculating the rectangular edge area of the core constituting region,rindicating the number of neurons that do not meet the constraint placement requirements.
Including the target computational core spatial location in the pulse packet (i.e., (i)x,y,a) As a location tag) and pulse data information carried thereby
The transmission path of the pulse packet on the chip refers to a route which is determined by a routing algorithm and passes through the calculation core channels of the network on the chip, and the routing algorithm comprises an X-Y routing algorithm and an E-Cube routing algorithm.
In the embodiment of the invention, the on-chip routing algorithm of the chip adopts a GXY algorithm, and coordinates of any two calculation cores (a)x i ,y i )、(x j ,y j ) Has a communication cost of- x i - x j ,|+| y i - y j And the total communication cost is the sum of the communication costs of all the computing cores.
Specifically, the internal cost of the computing core includes the balance between the memory space not effectively utilized in the computing core and the different memory areas in the computing core: a typical case is an internal costC core (S) Subtracting the number of neurons specifically used by the computational core from 128, i.e. the more sufficient neurons are used, the smaller the internal cost generated by the computational core; this is achieved byFor simplicity, a cost of 1 unit is incurred for each computational core used.
Calculating core communication cost including the number of pulse packets received and sent by each calculation core in unit time, the length of transmission path of each pulse packet on chip, adopting GXY algorithm for network routing algorithm on chip, and any two calculation cores (x i ,y i )、(x j ,y j ) Cost of communication betweenC comm (S) Can be defined as x i - x j ,|+| y i - y j As shown in fig. 3, the communication cost between the computing cores 1 and 14 is D (1,14) =1+4=5, and the total communication cost is the sum of the communication costs between all the computing cores. The geometric area of the region occupied by the computing cores can be defined as that for the core grid, a minimum rectangular closure is constructed to contain all the computing cores used by mapping, and then all the number of the cores (including used and unused cores) contained in the rectangle is the geometric area of the region occupied by the coresC area (S) As shown in fig. 4, in this example,C area (S) =2 × 3= 6. For the calculation of constraint cost, for example, the neuron requirements of the input layer of the neural network are placed in the core at the left side and the upper side of the grid, when a certain neuron does not satisfy the constraint, the corresponding generation is performedrThe cost of (a) of (b),R r (s) Is the sum of all constraint costs.
Step S4.6: respectively map the statesSAnd post-action mapping statesS’Inputting the Time Difference (TD) error into the evaluation network, and obtaining the overall mapping yieldRRespectively updating the weight of the strategy neural network and the weight of the evaluation neural network by adopting error back transmission;
in particular, by mapping statesSObtaining a mapping stateSAll possible post-action mapping states that can be reachedS’And corresponding mapping benefitsRAccording to the overall mapping in the final state (after mapping is finished)Benefit toRAnd (4) returning each state by adopting an error return method (random gradient descent), respectively updating the weight of the strategy neural network and the weight of the evaluation neural network, and moving the mapping action A to a better mapping state along with the convergence of the network.
Step S4.7: using post-action mapping statesS’Alternate mapping statesSStep S4.4 to step S4.6 are continued until all neuron assignments are completed.
Step S5: and selecting the spatial position of the computation core with the maximum probability in the probability distribution as the placement position of the current neuron, and filling the spatial position of the computation core with the placement position of the current neuron in the memory space corresponding to the computation core.
Step S6: and repeating the steps until all the neurons are placed, and obtaining all the mapping from the complete pulse neural network to the on-chip network computation core of the chip.
Corresponding to the embodiment of the method for compiling and mapping the on-chip core of the neural network based on reinforcement learning, the invention also provides an embodiment of a device for compiling and mapping the on-chip core of the neural network based on reinforcement learning.
Referring to fig. 5, the apparatus for compiling and mapping an on-chip core of a reinforcement learning based neural network according to an embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the one or more processors execute the executable codes to implement the method for compiling and mapping an on-chip core of a reinforcement learning based neural network according to the above embodiment.
The embodiment of the on-chip core compiling and mapping device of the neural network based on reinforcement learning can be applied to any equipment with data processing capability, such as computers and other equipment or devices. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a device in a logical sense, a processor of any device with data processing capability reads corresponding computer program instructions in the nonvolatile memory into the memory for operation. In terms of hardware, as shown in fig. 5, the present invention is a hardware structure diagram of any device with data processing capability where an on-chip core compiling and mapping apparatus based on a neural network with reinforcement learning is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any device with data processing capability where the apparatus is located in the embodiment may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.
The specific details of the implementation process of the functions and actions of each unit in the above device are the implementation processes of the corresponding steps in the above method, and are not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the on-chip core compiling and mapping method for a neural network based on reinforcement learning in the foregoing embodiments.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A core compiling and mapping method on a chip of a neural network based on reinforcement learning is characterized by comprising the following steps:
step S1: acquiring topological structure and parameter information of a pulse neural network, wherein the pulse neural network comprises a neuron group and synapses, the neuron group consists of neurons, and the neurons are connected through the synapses; the topology is a graph structure having nodes representing neurons and/or groups of neurons and edges representing synapses as connections between neurons and/or groups of neurons; the parameter information includes characteristic data in the neuron and synaptic connections;
step S2: for a chip deploying a pulse neural network, obtaining description information of a network-on-chip computation core matrix of the chip, wherein the computation core is used for distributing neurons for computation, the description information comprises the specification of the computation core, and the spatial position of the computation core on the network-on-chip and the communication connection relation of different computation cores are computed;
step S3: establishing an initialization mapping state through the space position of the distributed computing core on the network on chip, the connection condition of the distributed neurons, the current neuron to be distributed and the connection condition between the current neuron to be distributed and other neurons;
step S4: inputting the mapping state into a trained reinforcement learning strategy neural network, and obtaining the probability distribution of the mapping action of the current neuron to be distributed to different calculation core space positions from the output of the neural network;
step S5: selecting a space position of a computation core with the maximum probability in the probability distribution as a placement position of the current neuron, and filling the space position of the computation core into a digital storage space corresponding to the computation core;
step S6: and repeating the steps until all the neurons are placed, and obtaining all the mapping from the complete pulse neural network to the on-chip network calculation core matrix of the chip.
2. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 1, wherein: in step S4, the training of the reinforcement learning strategy neural network includes the following steps:
step S4.1: initializing, including determining a number of training samples;
step S4.2: executing the step S4.3 to the step S4.7 until the number of training samples is reached;
step S4.3: randomly constructing the neuron and synapse connections of a spiking neural network, the random assignment comprising at leastNA matrix area of the computational cores;
step S4.4: obtaining mapping state according to feature vector formed by current distributed computing core space position, distributed neuron connection, current neuron to be distributed and connection between the neuron and other neuronsS
Step S4.5: will map the stateSInputting a strategy neural network to obtain an output mapping action A, placing a corresponding neuron at a computing core space position appointed by the mapping action A, and obtaining a mapping state after actionS’ And corresponding mapping benefitsR
Step S4.6: respectively map the statesSAnd post-action mapping statesS’Inputting the evaluation neural network and obtaining the time difference error according to the overall mapping gainRBy error reverse transmissionRespectively updating the weight of the strategy neural network and the weight of the evaluation neural network;
step S4.7: using post-action mapping statesS’ Alternate mapping statesSStep S4.4 to step S4.6 are continued until all neuron assignments are completed.
3. The reinforcement learning based neural network on-chip core compilation mapping method of claim 2, wherein: in step S4.5, the mapping action a is to allocate a neuron to an unoccupied computational core space position, and if all neuron allocation is not completed, the mapping benefit of the mapping action is 0; if all the distribution is completed, constructing a mapping profit R of the mapping action by calculating the internal cost of the core, calculating the communication cost of the core, calculating the geometric area of the area occupied by the core and judging whether the profit meets the constraint condition or not;
the internal cost of the computing core comprises the balance degree of the unused digital storage space in the computing core and different digital storage areas in the computing core;
the calculation core communication cost comprises the number of pulse packets received and sent by each calculation core in unit time and the length of a transmission path of each pulse packet on a chip;
constructing a minimum rectangular closure containing all the calculation cores used for mapping on a calculation core grid according to the geometric area of the area occupied by the calculation cores, wherein the number of all the cores contained in the rectangular closure is the geometric area of the area occupied by the cores;
the benefit of the constraint condition is a preference constraint that the input layer and the output layer of the impulse neural network are required to be arranged at the specified positions of the grid and/or customized by hardware constraint.
4. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 3, wherein: the mapping yieldRWhen all allocations are complete:
R total (S)= - (C core (S)+C comm (S)+C area (S))+R r (s)
wherein the content of the first and second substances,Sthe state of the mapping is represented,R total (S) Indicating the mapping revenue at the completion of the full allocation,C core (S) Representing the internal cost of all the computational cores,C comm (S) Representing the communication costs of all the computational cores,C area (S) Representing the geometric area of the region occupied by the computational core,R r (s) A benefit indicating whether the constraint condition is satisfied,rmeans that when a certain neuron does not satisfy the constraint, a correspondence is generatedrThe cost of (a) of (b),R r (s) Is the sum of all constraint costs.
5. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 3, wherein: constructing a mapping revenue function:
Figure DEST_PATH_IMAGE001
whereinNRepresenting the total number of computational cores used by the mapNThe number of the main components is one,n i andn j respectively represent the firstiA first and a secondjA set of neurons deployed on each computational core,D(i,j) Representing two computational coresiAnd a computational corejThe packet transfer distance of the pulse in between,w(k i ,k j ) Is shown asiNeurons of a computational corek i And a first step ofjNeurons of a computational corek j The density of the communication between the two devices,Areaan area function is represented for calculating the rectangular edge area of the core constituting region,rrepresenting the number of neurons that do not meet the constraint placement requirement.
6. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 3, wherein: the pulse packet comprises a target calculation core space position and pulse data information carried by the target calculation core space position.
7. The reinforcement learning based neural network on-chip core compilation mapping method of claim 3, wherein: the transmission path of the pulse packet on the chip refers to a route which is determined by a routing algorithm and passes through the pulse packet between calculation core channels of the network on chip, the routing algorithm on the chip of the chip adopts a GXY algorithm, and coordinates of any two calculation cores (a)x i ,y i )、(x j ,y j ) Has a communication cost of- x i - x j ,|+| y i - y j And the total communication cost is the sum of the communication costs of all the computing cores.
8. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 1, wherein: the spatial position of the computing core on the network on chip is computing core coordinates (x,y) And address of the computational core (a) Three-dimensional vector formed by splicing: (x,y,a)。
9. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 1, wherein: the computation core in step S2 is a hardware unit with digital storage space and computation function, and the description information further includes specification of the computation core, where the specification includes the digital storage space and the number of neurons supported by the computation core.
10. An on-chip core compiling and mapping device of a neural network based on reinforcement learning is characterized in that: comprising a memory having stored therein executable code and one or more processors for implementing the on-chip core compilation mapping method of a reinforcement learning-based neural network of any of claims 1-9 when executing the executable code.
CN202210407390.9A 2022-04-19 2022-04-19 On-chip core compiling and mapping method and device of neural network based on reinforcement learning Active CN114492782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210407390.9A CN114492782B (en) 2022-04-19 2022-04-19 On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210407390.9A CN114492782B (en) 2022-04-19 2022-04-19 On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN114492782A true CN114492782A (en) 2022-05-13
CN114492782B CN114492782B (en) 2022-09-16

Family

ID=81489437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210407390.9A Active CN114492782B (en) 2022-04-19 2022-04-19 On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114492782B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168281A (en) * 2022-09-09 2022-10-11 之江实验室 Neural network on-chip mapping method and device based on tabu search algorithm
CN115392443A (en) * 2022-10-27 2022-11-25 之江实验室 Pulse neural network application representation method and device of brain-like computer operating system
CN115904394A (en) * 2023-03-02 2023-04-04 之江实验室 Many-core architecture-oriented neural network increment compiling method and device
CN116070682A (en) * 2023-04-06 2023-05-05 浙江大学 SNN model dynamic mapping method and device of neuron computer operating system
CN117688992A (en) * 2024-02-01 2024-03-12 之江实验室 Resource mapping method and device for neuron computer operating system
CN117688992B (en) * 2024-02-01 2024-06-04 之江实验室 Resource mapping method and device for neuron computer operating system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372720A (en) * 2015-07-23 2017-02-01 应用智慧研究公司 Methods and systems for implementing deep spiking neural networks
CN110070181A (en) * 2019-04-30 2019-07-30 深圳朴生智能科技有限公司 A kind of optimization method of the deep learning for edge calculations equipment
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
US20200153535A1 (en) * 2018-11-09 2020-05-14 Bluecom Systems and Consulting LLC Reinforcement learning based cognitive anti-jamming communications system and method
CN113988283A (en) * 2021-10-28 2022-01-28 清华大学 Mapping method and device of logic node, electronic equipment and storage medium
CN114091663A (en) * 2021-11-28 2022-02-25 重庆大学 Lightweight on-chip learning method, system and processor based on impulse neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372720A (en) * 2015-07-23 2017-02-01 应用智慧研究公司 Methods and systems for implementing deep spiking neural networks
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
US20200153535A1 (en) * 2018-11-09 2020-05-14 Bluecom Systems and Consulting LLC Reinforcement learning based cognitive anti-jamming communications system and method
CN110070181A (en) * 2019-04-30 2019-07-30 深圳朴生智能科技有限公司 A kind of optimization method of the deep learning for edge calculations equipment
CN113988283A (en) * 2021-10-28 2022-01-28 清华大学 Mapping method and device of logic node, electronic equipment and storage medium
CN114091663A (en) * 2021-11-28 2022-02-25 重庆大学 Lightweight on-chip learning method, system and processor based on impulse neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NURETTIN BÖLÜCÜ ET AL.: ""Q-Learning-based Routing Algorithm for 3D Network-on-Chips"", 《2021 24TH INTERNATIONAL SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS & SYSTEMS (DDECS)》 *
张国平 等: ""无监督深度学习移动边缘计算卸载资源分配"", 《安庆师范大学学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168281A (en) * 2022-09-09 2022-10-11 之江实验室 Neural network on-chip mapping method and device based on tabu search algorithm
WO2024051388A1 (en) * 2022-09-09 2024-03-14 之江实验室 Neural network on-chip mapping method and device based on tabu search algorithm
CN115392443A (en) * 2022-10-27 2022-11-25 之江实验室 Pulse neural network application representation method and device of brain-like computer operating system
CN115392443B (en) * 2022-10-27 2023-03-10 之江实验室 Pulse neural network application representation method and device of brain-like computer operating system
CN115904394A (en) * 2023-03-02 2023-04-04 之江实验室 Many-core architecture-oriented neural network increment compiling method and device
CN115904394B (en) * 2023-03-02 2023-07-04 之江实验室 Neural network increment compiling method and device for many-core architecture
CN116070682A (en) * 2023-04-06 2023-05-05 浙江大学 SNN model dynamic mapping method and device of neuron computer operating system
CN116070682B (en) * 2023-04-06 2023-08-15 浙江大学 SNN model dynamic mapping method and device of neuron computer operating system
CN117688992A (en) * 2024-02-01 2024-03-12 之江实验室 Resource mapping method and device for neuron computer operating system
CN117688992B (en) * 2024-02-01 2024-06-04 之江实验室 Resource mapping method and device for neuron computer operating system

Also Published As

Publication number Publication date
CN114492782B (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN114492782B (en) On-chip core compiling and mapping method and device of neural network based on reinforcement learning
WO2021190127A1 (en) Data processing method and data processing device
Bellman et al. Mathematical aspects of scheduling and applications: modern applied mathematics and computer science
CN112084038B (en) Memory allocation method and device of neural network
Ghosh et al. Mapping neural networks onto message-passing multicomputers
CN104463324A (en) Convolution neural network parallel processing method based on large-scale high-performance cluster
US20230236888A1 (en) Memory allocation method, related device, and computer-readable storage medium
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN114896937A (en) Integrated circuit layout optimization method based on reinforcement learning
CN115186821B (en) Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN112132287A (en) Distributed quantum computing simulation method and device
CN115168281B (en) Neural network on-chip mapping method and device based on tabu search algorithm
CN115421897B (en) Core particle-oriented deep neural network pipeline parallel scheduling method and device
CN112163601A (en) Image classification method, system, computer device and storage medium
CN112084037A (en) Memory allocation method and device of neural network
CN110132282A (en) Unmanned plane paths planning method and device
CN110059793A (en) The gradually modification of production confrontation neural network
CN114026571A (en) Neural network operation reordering for parallel execution
Dazzi et al. 5 parallel prism: A topology for pipelined implementations of convolutional neural networks using computational memory
CN115001978B (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN111027669A (en) Method and device for realizing deep neural network on field programmable gate array
von Kirchbach et al. Efficient process-to-node mapping algorithms for stencil computations
US11687831B1 (en) Method, product, and apparatus for a multidimensional processing array for hardware acceleration of convolutional neural network inference
Miller et al. Embedding-based placement of processing element networks on FPGAs for physical model simulation
CN117688992B (en) Resource mapping method and device for neuron computer operating system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant