CN114492782A

CN114492782A - On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Info

Publication number: CN114492782A
Application number: CN202210407390.9A
Authority: CN
Inventors: 何煜坤; 李莹; 吕攀; 章明; 孙世春; 邓水光; 潘纲; 马德; 齐勃; 金孝飞
Original assignee: Zhejiang University ZJU; Zhejiang Lab
Current assignee: Zhejiang University ZJU; Zhejiang Lab
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-05-13
Anticipated expiration: 2042-04-19
Also published as: CN114492782B

Abstract

The invention discloses a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning. The method comprises the steps of constructing mapping benefits according to communication cost, mapping area and core internal utilization rate, training a strategy network by using an Actor-Critic algorithm to obtain higher mapping benefits, training a reinforcement learning strategy network to learn the optimal mapping position of any pulse neural network neuron, and finally completing the neural network to be deployed to an on-chip core array by using the trained strategy network, so that the cost of communication distance on interconnected neurons is reduced, the chip calculation efficiency is effectively improved, and the overall power consumption is reduced.

Description

On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Technical Field

The invention relates to the field of high-performance computing of computers, in particular to a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning.

Background

The research of neuroscience on human brain discovers that billions of neurons can generate high parallel computing power through interconnection, and the pulse neural network provided on the basis can complete information transmission through pulse coding to solve complex tasks. The neural form chip based on the pulse neural network simulates the connection of biological neurons in a high-density network-on-chip organization mode, and realizes the parallelization of large-scale calculation. The information communication mode among different cores of the network-on-chip is very sensitive to the generation and the receiving and the sending of the pulse, and extremely high requirements are put forward on the performance constraint generated by the network deployment result.

Different from a traditional compiler, the compiling and mapping of the neural network on the chip network based on pulse transmission to complete reasoning is generally modeled as a multi-objective programming problem under a constraint condition, and the objective is to solve the problems of total power consumption (communication cost) of pulse transmission among all nodes in a chip and optimization of time consumption of application calculation. Due to the graph connection property of the impulse neural network, the mainstream mapping algorithm is divided into two stages: 1) firstly, segmenting an original pulse neural network, and reconstructing a topological structure; 2) each network partition is assigned a binding to a physical node. The core problem of compilation is that solving the optimal mapping scheme under a given optimization objective is an NP-Hard problem, how to find an approximate solution in a short time under the hardware constraints of the chip.

The current compiling environment and framework in the industry are rich in variety, and most of the compiling environment and framework are designed for specific modeling and optimization algorithms aiming at chips at home. The compiler lcompier of the Loihi chip of intel splits the map and maps it to each core and generates binary files one by one. The assignment algorithm aims to optimize the input mapping ratio of Loihi, iteratively trying the assignment of neurons until a given number of cores can be put in. IBM's TrueNorth project chip has accomplished the scale of million grades of neurons then, compiles and maps and adopts similar optimization algorithm in VLSI wiring problem in order to reduce the total communication cost who applies after the chip deployment, through compiler optimization, the maximum pulse quantity of every tick on the port reduces to 2500 from 10000.

The above mapping algorithm on-chip for the impulse neural network is based on the following steps:

1. the spiking neural network is abstracted as a computational graph with neurons or groups of neurons as nodes and synaptic connections as edges.

2. And (4) re-segmenting and aggregating the computation graph to form a physical layer computation graph which can be mapped to the network on chip one by one.

3. And constructing a loss function for measuring the quality of the mapping result, continuously adjusting iteration, and trying to reduce the loss function until convergence by using an algorithm based on heuristic search or other algorithms for solving a multi-target programming problem to finally obtain the mapping result.

Therefore, the scale and the spatial characteristics of the mapping problem place high demands on the searching capability of the algorithm. In recent years, the reinforcement learning algorithm has good performance in solving the nonlinear programming problem. Traditional reinforcement learning algorithms are typically used to control the strategy process of an agent by progressively accumulating data in interaction with the environment and determining how to take optimal action in the current state to obtain the highest gains to complete a given task. In recent years, deep reinforcement learning has been used to process more complex and diverse tasks, and has been well-behaved in problems such as machine vision, natural language processing, complex systems, and the like. Specifically, an Actor-Critic depth reinforcement learning algorithm (an Actor Critic algorithm, Actor Policy Gradient representing a Policy Gradient algorithm, Critic Q-learning representing a value-based reinforcement learning algorithm) based on a Policy uses two sets of independent parameters to update an action value function parameter and a Policy parameter respectively.

The mapping result of the neural network at the chip end directly determines the pulse transmission efficiency between the cores of the chips. An excessively simple mapping scheme cannot quickly search an optimal solution in a huge solution space, and easily falls into a local suboptimal solution caused by unreasonable design of a mapping algorithm and a cost function, so that the transmission speed of inter-core pulse packets becomes slow, and extra communication and power consumption cost are generated.

Disclosure of Invention

In order to solve the defects of the prior art, the invention realizes the purposes of reducing the transmission distance of the pulse packet of the on-chip core, the chip mapping area, the port pulse number, the calculation time consumption and the chip power consumption and improves the transmission efficiency, and adopts the following technical scheme:

the on-chip core compiling and mapping method based on the neural network of reinforcement learning comprises the following steps:

step S1: acquiring topological structure and parameter information of a pulse neural network, wherein the pulse neural network comprises a neuron group and synapses, the neuron group consists of neurons, and the neurons are connected through the synapses; the topology is a graph structure having nodes representing neurons and/or groups of neurons and edges representing synapses as connections between neurons and/or groups of neurons; the parameter information includes characteristic data in neurons and synaptic connections;

step S2: for a chip deploying a pulse neural network, obtaining description information of a network-on-chip computation core matrix of the chip, wherein the computation core is used for distributing neurons for computation, the description information comprises the specification of the computation core, and the spatial position of the computation core on the network-on-chip and the communication connection relation of different computation cores are computed;

step S3: establishing an initialization mapping state through the space position of the distributed computing core on the network on chip, the connection condition of the distributed neurons, the current neuron to be distributed and the connection condition between the current neuron to be distributed and other neurons;

step S4: inputting the mapping state into a trained reinforcement learning strategy neural network, and obtaining the probability distribution of the mapping action of the current neuron to be distributed to different calculation core space positions from the output of the neural network;

step S5: selecting a space position of a computation core with the maximum probability in the probability distribution as a placement position of the current neuron, and filling the space position of the computation core into a digital storage space corresponding to the computation core;

step S6: and repeating the steps until all the neurons are placed, and obtaining all the mapping from the complete pulse neural network to the on-chip network calculation core matrix of the chip.

Further, in step S4, the training of the reinforcement learning strategy neural network includes the following steps:

step S4.1: initialization, including determining the number of training samplesELearning raterCoefficient of return attenuationγExploration rateɛTwo neural network structures, strategy (Actor) and evaluation (Critic);

step S4.2: step S4.3 to step S4.7 are executed until the number of training samples is reachedE；

Step S4.3: randomly constructing the neuron and synapse connections of a spiking neural network, the random assignment comprising at leastNA matrix area of the computational cores;

step S4.4: obtaining mapping state according to feature vector formed by current distributed computing core space position, distributed neuron connection, current neuron to be distributed and connection between the neuron and other neuronsS；

Step S4.5: will map the stateSInputting a strategy neural network to obtain an output mapping action A, placing a corresponding neuron at a calculation core space position appointed by the mapping action A, and obtaining an actionMapping statesS’And corresponding mapping benefitsR；

Step S4.6: respectively map the statesSAnd post-action mapping statesS’Inputting the evaluation neural network, obtaining Time Difference (TD, Time Difference) error, and obtaining the overall mapping benefit according to the final state (after mapping is finished)RReturning each state by adopting an error back-transmission method (random gradient descent), respectively updating the weight of the strategy neural network and the weight of the evaluation neural network, and enabling the mapping action A to move to a better mapping state along with the convergence of the network;

step S4.7: using post-action mapping statesS’Alternate mapping statesSStep S4.4 to step S4.6 are continued until all neuron assignments are completed.

Further, in step S4.5, the mapping action a is to allocate any neuron to an unoccupied computational core space position, and if all neuron allocation is not completed, the mapping benefit of the mapping action is 0; if all the distribution is completed, constructing a mapping profit R of the mapping action by calculating the internal cost of the core, calculating the communication cost of the core, calculating the geometric area of the area occupied by the core and judging whether the profit meets the constraint condition or not;

the internal cost of the computing core comprises the balance degree of the unused digital storage space in the computing core and different digital storage areas in the computing core;

the calculation core communication cost comprises the number of pulse packets received and sent by each calculation core in unit time and the length of a transmission path of each pulse packet on a chip;

constructing a minimum rectangular closure containing all the computing cores used by mapping on a computing core grid, wherein the number of all the cores (including used and unused cores) contained in the rectangular closure is the geometric area of the area occupied by the cores;

the benefit of the constraint condition is a preference constraint that the input layer and the output layer of the impulse neural network are required to be arranged at the specified positions of the grid and/or customized by hardware constraint.

Further, the mapping yieldsRWhen all allocations are complete:

R _total(S)= - (C _core(S)+C _comm(S)+C _area(S))+R _r(s)

wherein the content of the first and second substances,Sthe state of the mapping is represented,R _total(S) Indicating the mapping revenue at the completion of the full allocation,C _core(S) Representing the internal cost of all the computational cores,C _comm(S) Representing the communication costs of all the computational cores,C _area(S) Representing the geometric area of the region occupied by the computational core,R _r(s) A benefit indicating whether the constraint condition is satisfied,rmeans that when a certain neuron does not satisfy the constraint, a correspondence is generatedrThe cost of (a) of (b),R _r(s) Is the sum of all constraint costs.

Further, the number of used cores is used as the internal cost of the calculation cores, the Manhattan distance between the cores and the communication density of corresponding neurons are used as the communication cost, the total area of the rectangular edge of the region formed by the calculation cores is used as the geometric area, the number of neurons which do not meet the constraint is used as the cost of the given constraint, and a mapping revenue function is constructed:

whereinNRepresenting the total number of computational cores used by the mapNThe number of the main components is one,n _iandn _jrespectively representiIs first and secondjA set of neurons deployed on each computational core,D(i,j) Representing two computational coresiAnd a computational corejThe packet transfer distance of the pulse in between,w(k _i,k _j) Is shown asiNerves of a computational coreYuank _iAnd a firstjNeurons of a computational corek _jThe density of the communication between the two devices,Areaan area function is represented for calculating the rectangular edge area of the core constituting region,rindicating the number of neurons that do not meet the constraint placement requirements.

Furthermore, the pulse packet includes a target computation core spatial position and pulse data information carried by the target computation core spatial position.

Further, the transmission path of the pulse packet on the chip refers to a route which is determined by a routing algorithm and passes through the pulse packet between the computing core channels of the network on the chip, the routing algorithm on the chip of the chip adopts a GXY algorithm, and coordinates of any two computing cores (a)x _i,y _i)、(x _j,y _j) Has a communication cost of- x _i - x _j,|+| y _i - y _jAnd the total communication cost is the sum of the communication costs of all the computing cores.

Further, the spatial position of the computational core on the network on chip is the computational core coordinate (x,y) And address of the computational core (a) Three-dimensional vector formed by splicing: (x,y,a)。

Further, the computation core in step S2 is a hardware unit with digital storage space and computation function, and the description information further includes specification of the computation core, where the specification includes the digital storage space of the computation core and the number of supported neurons.

The device for compiling and mapping the on-chip core of the reinforcement learning-based neural network comprises a memory and one or more processors, wherein executable codes are stored in the memory, and when the one or more processors execute the executable codes, the one or more processors are used for realizing the method for compiling and mapping the on-chip core of the reinforcement learning-based neural network.

The invention has the advantages and beneficial effects that:

the invention discloses a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning. And constructing mapping benefits according to the communication cost, the mapping area and the core internal utilization rate, and training a strategy network by using an Actor-Critic algorithm to obtain higher mapping benefits. Therefore, the transmission distance of the pulse packets between the cores on the chip is reduced, the transmission efficiency is improved, and the power consumption is reduced.

Drawings

FIG. 1 is a flow chart of a method in an embodiment of the invention.

Fig. 2 is a schematic diagram of a network-on-chip mapping in an embodiment of the invention.

Fig. 3 is a schematic diagram of inter-core communication cost in an embodiment of the present invention.

Fig. 4 is a schematic diagram of the geometric area penalty of the region occupied by the core in an embodiment of the present invention.

FIG. 5 is a block diagram of an apparatus in an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

In order to improve the solving efficiency of the existing mapping algorithm, the problem that the solution space in the mapping is too large and the efficient distribution is lacked is solved. The existing network mapping algorithm usually uses a specific planning algorithm to optimize a loss function, the solving time is long, local optimization is easy to happen, and the searching in the full space cannot be effectively realized.

The invention relates to a method and a device for compiling and mapping on-chip cores of a neural network based on reinforcement learning, which comprises the steps of firstly obtaining topological structure and parameter information of a pulse neural network and on-chip network calculation core matrix information of target hardware; extracting and splicing the characteristic vectors of the information, constructing a mapping state, inputting the trained strategy deep neural network, and obtaining the optimal position of the current neuron to be distributed to be mapped to the on-chip network; determining and filling neurons into corresponding cores, and updating a mapping state; and repeating iteration until the whole mapping from the neural network to the on-chip network is completed. As shown in fig. 1, the method specifically includes the following steps:

step S1: acquiring topological structure and parameter information of a pulse neural network, wherein the pulse neural network comprises neuron groups and synapses, the neuron groups are composed of neurons, and the neurons are connected through the synapses; the topology is a graph structure having nodes representing neurons and/or groups of neurons and edges representing synapses as connections between neurons and/or groups of neurons; the parametric information includes characteristic data in the neurons and synaptic connections.

Topology as shown in the spiking neural network in fig. 2, the circular nodes represent neurons, and the connecting arrows between neurons represent synapses, such as: a typical class task of a 3-layer impulse neural network has 784 neurons in the first layer (input layer), 512 neurons in the second layer, and 10 neurons in the third layer (output layer), and every two adjacent neurons in the two layers have synapses connected with each other.

Step S2: the method comprises the steps of acquiring description information of a network-on-chip computation core matrix of a chip for deploying target hardware of a pulse neural network, wherein the computation core is a hardware unit with a digital storage space and a computation function and is used for distributing neurons to perform computation, and the description information comprises specification information of the computation core (such as the size of a memory space on the computation core, the number of supported maximum neurons and the like), the spatial position of the computation core on the network-on-chip, and communication connection relations of different computation cores.

A typical deployment target hardware case is: the system comprises a 20-by-20 core grid which is arranged in a square shape, and each core is connected with four cores of the upper, lower, left and right cores for communication. A maximum of 128 neurons can be accommodated within each core and a maximum of 16 cores can be connected to.

The spatial position of the computational core on the network-on-chip is the computational core coordinates (x,y) And address of the computational core (a) Three-dimensional vector formed by splicing: (x,y,a)。

step S4: and inputting the mapping state into the trained reinforcement learning strategy neural network, and obtaining the probability distribution of the mapping action of the current neuron to be distributed to different calculation core space positions from the output of the neural network.

The structure of the on-chip core compiling and mapping method of the neural network based on reinforcement learning of the embodiment of the invention is shown in table 1:

table 1: structure of compiling mapping method

π(|,) represents the state of the mapping obtained from the policy networkSThe probability distribution of the mapping action of (a) below,αthe action of the mapping is represented and,θrepresenting policy network parameters.

How to obtain the above-mentioned strategy deep neural network is described below.

As shown in fig. 2, the network is obtained by training through an reinforcement learning Actor-Critic algorithm, wherein the input of the network is a mapping state multidimensional vector, and the network is obtained by flattening and connecting a space position 0-1 matrix of a current core grid, a 0-1 matrix connected with distributed neurons, and connection matrix vectors between the current neuron to be distributed and other neurons. The structure of the network is a deep neural network formed by a plurality of convolution layers, pooling layers and full-connection layers alternately, and the feature extraction of the input mapping state is completed. Finally, the position coordinates of the current selected neuron in the whole calculation core space under the mapping action are obtained (x,y,a) Is measured. The magnitude of the probability value for each element in the distribution indicates the score that the neuron is mapped to at that spatial location, with higher scores giving greater probability of being selected. And finally, selecting the position with the highest score as the target position of the neuron mapping.

The strategy neural network adopts deep network training, and comprises the following steps:

step S4.1: initialization, including determining the number of training samplesELearning raterCoefficient of return attenuationγExploration rateɛTwo neural network architectures, strategy (Actor) and evaluation (Critic).

Step S4.2: step S4.3 to step S4.7 are executed until the number of training samples is reachedE。

Step S4.3: randomly constructing the neuron and synapse connections of a spiking neural network, randomly given comprising at leastNA matrix area of the computational core.

Step S4.4: obtaining mapping state according to feature vector formed by current distributed computing core space position, distributed neuron connection, current neuron to be distributed and connection between the neuron and other neuronsS。

Step S4.5: will map the stateSInputting a strategy neural network to obtain an output mapping action A, placing a corresponding neuron at a computing core space position appointed by the mapping action A, and obtaining a mapping state after actionS’And mapping the benefitsR。

Converting the distribution process into a problem under a reinforcement learning scene, and respectively defining a mapping state, a mapping action and a mapping income:

mapping action A is to allocate any neuron to an unoccupied computational core space position, and if all neuron allocation is not completed, the mapping benefit of the mapping action is 0; if all the distribution is completed, constructing a mapping profit R of the mapping action by calculating the internal cost of the core, calculating the communication cost of the core, calculating the geometric area of the area occupied by the core and judging whether the profit meets the constraint condition or not;

computing internal costs of the core, including a degree of balance of unused digital storage space within the core and different digital storage areas in the core;

calculating core communication cost, including the number of pulse packets received and sent by each calculation core in unit time and the length of a transmission path of each pulse packet on a chip;

calculating the geometric area of the area occupied by the core, and constructing a minimum rectangular closure containing all the calculation cores used by mapping on a calculation core grid, wherein the number of all the cores (including used and unused cores) contained in the rectangular closure is the geometric area of the area occupied by the core;

the benefit of constraints is that hardware constraints require that the input and output layers (pins) of the impulse neural network need to be arranged in specific locations of the grid (e.g., in the leftmost and uppermost cores), or user-defined preference constraints.

Mapping revenueRWhen all allocations are complete:

R _total(S)= - (C _core(S)+C _comm(S)+C _area(S))+R _r(s)

wherein the content of the first and second substances,Sthe state of the mapping is represented,R _total(S) Indicating the mapping benefit at the completion of the full allocation,C _core(S) Representing the internal cost of all the computational cores,C _comm(S) Representing the communication costs of all the computational cores,C _area(S) Representing the geometric area of the region occupied by the computational core,R _r(s) Indicating whether the benefit of a given constraint is satisfied,rmeans that when a certain neuron does not satisfy the constraint, the corresponding generation is performedrThe cost of (a) of (b),R _r(s) Is the sum of all constraint costs.

Taking the number of used cores as the internal cost of the calculation cores, taking the Manhattan distance between the cores and the communication density of corresponding neurons as the communication cost, taking the total area of rectangular edges of a region formed by the calculation cores as a geometric area, taking the number of neurons which do not meet the constraint as the cost of given constraint, and constructing a mapping gain function:

whereinNRepresenting the total number of computational cores used by the mapNThe number of the main components is one,n _iandn _jrespectively representiIs first and secondjA set of neurons deployed on each computational core,D(i,j) Representing two computational coresiAnd a computational corejThe packet transfer distance of the pulse in between,w(k _i,k _j) Is shown asiNeurons of a computational corek _iAnd a firstjNeurons of a computational corek _jThe density of the communication between the two devices,Areaan area function is represented for calculating the rectangular edge area of the core constituting region,rindicating the number of neurons that do not meet the constraint placement requirements.

Including the target computational core spatial location in the pulse packet (i.e., (i)x,y,a) As a location tag) and pulse data information carried thereby

The transmission path of the pulse packet on the chip refers to a route which is determined by a routing algorithm and passes through the calculation core channels of the network on the chip, and the routing algorithm comprises an X-Y routing algorithm and an E-Cube routing algorithm.

In the embodiment of the invention, the on-chip routing algorithm of the chip adopts a GXY algorithm, and coordinates of any two calculation cores (a)x _i,y _i)、(x _j,y _j) Has a communication cost of- x _i - x _j,|+| y _i - y _jAnd the total communication cost is the sum of the communication costs of all the computing cores.

Specifically, the internal cost of the computing core includes the balance between the memory space not effectively utilized in the computing core and the different memory areas in the computing core: a typical case is an internal costC _core(S) Subtracting the number of neurons specifically used by the computational core from 128, i.e. the more sufficient neurons are used, the smaller the internal cost generated by the computational core; this is achieved byFor simplicity, a cost of 1 unit is incurred for each computational core used.

Calculating core communication cost including the number of pulse packets received and sent by each calculation core in unit time, the length of transmission path of each pulse packet on chip, adopting GXY algorithm for network routing algorithm on chip, and any two calculation cores (x _i,y _i)、(x _j,y _j) Cost of communication betweenC _comm(S) Can be defined as x _i - x _j,|+| y _i - y _jAs shown in fig. 3, the communication cost between the

computing cores

1 and 14 is D (1,14) =1+4=5, and the total communication cost is the sum of the communication costs between all the computing cores. The geometric area of the region occupied by the computing cores can be defined as that for the core grid, a minimum rectangular closure is constructed to contain all the computing cores used by mapping, and then all the number of the cores (including used and unused cores) contained in the rectangle is the geometric area of the region occupied by the coresC _area(S) As shown in fig. 4, in this example,C _area(S) =2 × 3= 6. For the calculation of constraint cost, for example, the neuron requirements of the input layer of the neural network are placed in the core at the left side and the upper side of the grid, when a certain neuron does not satisfy the constraint, the corresponding generation is performedrThe cost of (a) of (b),R _r(s) Is the sum of all constraint costs.

Step S4.6: respectively map the statesSAnd post-action mapping statesS’Inputting the Time Difference (TD) error into the evaluation network, and obtaining the overall mapping yieldRRespectively updating the weight of the strategy neural network and the weight of the evaluation neural network by adopting error back transmission;

in particular, by mapping statesSObtaining a mapping stateSAll possible post-action mapping states that can be reachedS’And corresponding mapping benefitsRAccording to the overall mapping in the final state (after mapping is finished)Benefit toRAnd (4) returning each state by adopting an error return method (random gradient descent), respectively updating the weight of the strategy neural network and the weight of the evaluation neural network, and moving the mapping action A to a better mapping state along with the convergence of the network.

Step S5: and selecting the spatial position of the computation core with the maximum probability in the probability distribution as the placement position of the current neuron, and filling the spatial position of the computation core with the placement position of the current neuron in the memory space corresponding to the computation core.

Step S6: and repeating the steps until all the neurons are placed, and obtaining all the mapping from the complete pulse neural network to the on-chip network computation core of the chip.

Corresponding to the embodiment of the method for compiling and mapping the on-chip core of the neural network based on reinforcement learning, the invention also provides an embodiment of a device for compiling and mapping the on-chip core of the neural network based on reinforcement learning.

Referring to fig. 5, the apparatus for compiling and mapping an on-chip core of a reinforcement learning based neural network according to an embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the one or more processors execute the executable codes to implement the method for compiling and mapping an on-chip core of a reinforcement learning based neural network according to the above embodiment.

The embodiment of the on-chip core compiling and mapping device of the neural network based on reinforcement learning can be applied to any equipment with data processing capability, such as computers and other equipment or devices. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a device in a logical sense, a processor of any device with data processing capability reads corresponding computer program instructions in the nonvolatile memory into the memory for operation. In terms of hardware, as shown in fig. 5, the present invention is a hardware structure diagram of any device with data processing capability where an on-chip core compiling and mapping apparatus based on a neural network with reinforcement learning is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any device with data processing capability where the apparatus is located in the embodiment may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.

The specific details of the implementation process of the functions and actions of each unit in the above device are the implementation processes of the corresponding steps in the above method, and are not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the on-chip core compiling and mapping method for a neural network based on reinforcement learning in the foregoing embodiments.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A core compiling and mapping method on a chip of a neural network based on reinforcement learning is characterized by comprising the following steps:

step S1: acquiring topological structure and parameter information of a pulse neural network, wherein the pulse neural network comprises a neuron group and synapses, the neuron group consists of neurons, and the neurons are connected through the synapses; the topology is a graph structure having nodes representing neurons and/or groups of neurons and edges representing synapses as connections between neurons and/or groups of neurons; the parameter information includes characteristic data in the neuron and synaptic connections;

2. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 1, wherein: in step S4, the training of the reinforcement learning strategy neural network includes the following steps:

step S4.1: initializing, including determining a number of training samples;

step S4.2: executing the step S4.3 to the step S4.7 until the number of training samples is reached;

Step S4.5: will map the stateSInputting a strategy neural network to obtain an output mapping action A, placing a corresponding neuron at a computing core space position appointed by the mapping action A, and obtaining a mapping state after actionS’ And corresponding mapping benefitsR；

Step S4.6: respectively map the statesSAnd post-action mapping statesS’Inputting the evaluation neural network and obtaining the time difference error according to the overall mapping gainRBy error reverse transmissionRespectively updating the weight of the strategy neural network and the weight of the evaluation neural network;

step S4.7: using post-action mapping statesS’ Alternate mapping statesSStep S4.4 to step S4.6 are continued until all neuron assignments are completed.

3. The reinforcement learning based neural network on-chip core compilation mapping method of claim 2, wherein: in step S4.5, the mapping action a is to allocate a neuron to an unoccupied computational core space position, and if all neuron allocation is not completed, the mapping benefit of the mapping action is 0; if all the distribution is completed, constructing a mapping profit R of the mapping action by calculating the internal cost of the core, calculating the communication cost of the core, calculating the geometric area of the area occupied by the core and judging whether the profit meets the constraint condition or not;

constructing a minimum rectangular closure containing all the calculation cores used for mapping on a calculation core grid according to the geometric area of the area occupied by the calculation cores, wherein the number of all the cores contained in the rectangular closure is the geometric area of the area occupied by the cores;

4. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 3, wherein: the mapping yieldRWhen all allocations are complete:

R _total(S)= - (C _core(S)+C _comm(S)+C _area(S))+R _r(s)

5. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 3, wherein: constructing a mapping revenue function:

whereinNRepresenting the total number of computational cores used by the mapNThe number of the main components is one,n _iandn _jrespectively represent the firstiA first and a secondjA set of neurons deployed on each computational core,D(i,j) Representing two computational coresiAnd a computational corejThe packet transfer distance of the pulse in between,w(k _i,k _j) Is shown asiNeurons of a computational corek _iAnd a first step ofjNeurons of a computational corek _jThe density of the communication between the two devices,Areaan area function is represented for calculating the rectangular edge area of the core constituting region,rrepresenting the number of neurons that do not meet the constraint placement requirement.

6. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 3, wherein: the pulse packet comprises a target calculation core space position and pulse data information carried by the target calculation core space position.

7. The reinforcement learning based neural network on-chip core compilation mapping method of claim 3, wherein: the transmission path of the pulse packet on the chip refers to a route which is determined by a routing algorithm and passes through the pulse packet between calculation core channels of the network on chip, the routing algorithm on the chip of the chip adopts a GXY algorithm, and coordinates of any two calculation cores (a)x _i,y _i)、(x _j,y _j) Has a communication cost of- x _i - x _j,|+| y _i - y _jAnd the total communication cost is the sum of the communication costs of all the computing cores.

8. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 1, wherein: the spatial position of the computing core on the network on chip is computing core coordinates (x,y) And address of the computational core (a) Three-dimensional vector formed by splicing: (x,y,a)。

9. The on-chip core compilation mapping method for reinforcement learning-based neural networks of claim 1, wherein: the computation core in step S2 is a hardware unit with digital storage space and computation function, and the description information further includes specification of the computation core, where the specification includes the digital storage space and the number of neurons supported by the computation core.

10. An on-chip core compiling and mapping device of a neural network based on reinforcement learning is characterized in that: comprising a memory having stored therein executable code and one or more processors for implementing the on-chip core compilation mapping method of a reinforcement learning-based neural network of any of claims 1-9 when executing the executable code.