CN114881221A - Mapping scheme optimization method and device, electronic equipment and readable storage medium - Google Patents

Mapping scheme optimization method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114881221A
CN114881221A CN202210539850.3A CN202210539850A CN114881221A CN 114881221 A CN114881221 A CN 114881221A CN 202210539850 A CN202210539850 A CN 202210539850A CN 114881221 A CN114881221 A CN 114881221A
Authority
CN
China
Prior art keywords
core
mapping scheme
inter
connection
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210539850.3A
Other languages
Chinese (zh)
Inventor
张伟豪
曲环宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202210539850.3A priority Critical patent/CN114881221A/en
Publication of CN114881221A publication Critical patent/CN114881221A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Neurology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure provides a mapping scheme optimization method and device, an electronic device and a readable storage medium, wherein the method comprises the following steps: the method comprises the following steps: acquiring an initial first mapping scheme, wherein the first mapping scheme is used for mapping a first neural network to be executed to a plurality of processing cores of a many-core system, each processing core is used for executing at least one neuron of the first neural network, and the first mapping scheme comprises inter-core connection among the processing cores; reconstructing the inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction to obtain the mapping scheme of the n-th inter-core reconstruction; judging whether the mapping scheme of the nth-time inter-core reconstruction meets the inter-core optimization condition; and under the condition that the mapping scheme reconstructed among the cores at the nth time meets the preset optimization condition among the cores, determining the optimized second mapping scheme according to the mapping scheme reconstructed among the cores at the nth time. According to the embodiment of the disclosure, data transfer among cores can be reduced, and the routing pressure of a many-core system is relieved.

Description

Mapping scheme optimization method and device, electronic equipment and readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a mapping scheme optimization method and apparatus, an electronic device, and a readable storage medium.
Background
The popularity of intelligent applications has made neural network computing increasingly more important. However, it is difficult for general-purpose processors to meet the demands of large computation and large memory required for neural networks, and for this reason, related art has developed many-core systems with neural network acceleration architectures. When the many-core system executes the neural network, many neurons of the neural network are distributed on different processing cores according to space, and a mapping scheme is obtained. Because data transfer between the processing cores is realized through a routing system (network on chip) of the many-core system, and the neural network has a large amount of data transfer, the data transfer amount between the processing cores is large, and the unprocessed mapping scheme influences the efficiency of the many-core system for executing the neural network.
Disclosure of Invention
The disclosure provides a mapping scheme optimization method and device based on a many-core system, electronic equipment and a readable storage medium.
In a first aspect, the present disclosure provides a mapping scheme optimization method based on a many-core system, including:
obtaining an initial first mapping scheme, wherein the first mapping scheme is used for mapping a first neural network to be executed to a plurality of processing cores of the many-core system, each processing core is used for executing at least one neuron of the first neural network, and the first mapping scheme includes inter-core connection between the processing cores;
reconstructing the inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction to obtain the mapping scheme of the n-th inter-core reconstruction, wherein n is more than or equal to 1 and is an integer, and the mapping scheme of the 0 th inter-core reconstruction is the first mapping scheme;
judging whether the mapping scheme of the nth-time inter-core reconstruction meets the inter-core optimization condition;
and under the condition that the mapping scheme reconstructed among the cores at the nth time meets the preset optimization condition among the cores, determining the optimized second mapping scheme according to the mapping scheme reconstructed among the cores at the nth time.
In a second aspect, the present disclosure provides a many-core system-based mapping scheme optimization apparatus, including:
an obtaining module, configured to obtain an initial first mapping scheme, where the first mapping scheme is used to map a first neural network to be executed to a plurality of processing cores of the many-core system, each processing core is used to execute at least one neuron of the first neural network, and the first mapping scheme includes inter-core connections between the processing cores;
the reconstruction module is used for reconstructing the inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction to obtain the mapping scheme of the n-th inter-core reconstruction, wherein n is more than or equal to 1 and is an integer, and the mapping scheme of the 0 th inter-core reconstruction is the first mapping scheme;
the judging module is used for judging whether the mapping scheme reconstructed among the n-th kernels meets the optimization condition among the kernels or not;
and the determining module is used for determining an optimized second mapping scheme according to the mapping scheme reconstructed among the nth cores under the condition that the mapping scheme reconstructed among the nth cores meets preset inter-core optimization conditions.
In a third aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the above-described many-core system-based mapping scheme optimization method.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor/processing core, implements the above-described mapping scheme optimization method based on a many-core system.
According to the embodiment provided by the disclosure, the inter-core connection in the mapping scheme reconstructed between the cores at the n-1 th time is reconstructed to obtain the mapping scheme reconstructed between the cores at the n th time, then whether the mapping scheme reconstructed between the cores at the n th time meets the inter-core optimization condition is judged, and under the condition that the mapping scheme reconstructed between the cores at the n th time meets the preset inter-core optimization condition, the optimized second mapping scheme is determined according to the mapping scheme reconstructed between the cores at the n th time, so that the mapping scheme of the neural network is optimized, the intra-clustering of the mapping scheme is realized, the inter-core data transfer is reduced, the routing pressure of a many-core system is relieved, and the efficiency of the many-core system for executing the neural network is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
fig. 1 is a flowchart of a mapping scheme optimization method based on a many-core system according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a mapping scheme obtained by mapping a neural network onto different processing cores of a many-core system in an embodiment of the present disclosure;
fig. 3 is a flowchart of an inter-core reconfiguration optimization method provided in an embodiment of the present disclosure;
fig. 4 is a schematic diagram illustrating a change of a mapping scheme in an inter-core pruning optimization process according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of another method for inter-core reconstruction optimization according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating a change of a mapping scheme in an inter-core reconstruction process according to an embodiment of the present disclosure;
fig. 7 is a flowchart of a method for optimizing intra-core reconnection according to an embodiment of the present disclosure;
fig. 8 is a flowchart of a method for optimizing intra-core reconnection according to an embodiment of the present disclosure;
fig. 9 is a schematic diagram illustrating a change process of a mapping scheme in an intra-core reconnection optimization process according to an embodiment of the present disclosure;
FIG. 10 is a flowchart of a mapping scheme optimization method provided by an embodiment of the present disclosure;
FIG. 11 is a schematic diagram illustrating a change process of a mapping scheme in two rounds of inter-core reconfiguration optimization and intra-core reconnection optimization processes according to an embodiment of the present disclosure;
fig. 12 is a block diagram of a mapping scheme optimization apparatus based on a many-core system according to an embodiment of the present disclosure;
fig. 13 is a block diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising … …, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The current neural network mapping scheme considers factors such as memory and load balance of a many-core system more and considers routing factors less, so that the routing pressure of the mapping scheme is higher.
According to the mapping scheme optimization method disclosed by the embodiment of the disclosure, the mapping scheme can be optimized, and the routing pressure is reduced, so that the efficiency of the many-core system in executing the neural network is improved.
The mapping scheme optimization method according to the embodiment of the disclosure may be performed by an electronic device, which may be a terminal device or an electronic device such as a server, and the terminal device may be a vehicle-mounted device, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. Alternatively, the method may be performed by a server.
In some embodiments, a many-core system may include multiple processing units, which may be chips or processing cores. That is, the many-core system includes a plurality of chips, each chip for executing a sub-task to be processed; alternatively, the many-core system includes a plurality of processing cores, each processing core for executing a sub-task to be processed, wherein the sub-task to be processed may be at least one neuron of a neural network. For convenience of explanation, the following examples are described with reference to processing cores.
In some embodiments, for a neural network (including a plurality of neurons) to be executed, a corresponding mapping scheme may be generated by an external device (e.g., a compiler) to map the neural network to all or part of the processing cores or chips of the many-core system, thereby enabling the many-core system to perform processing tasks corresponding to the neural network.
Fig. 1 is a flowchart of a mapping scheme optimization method based on a many-core system according to an embodiment of the present disclosure. Referring to fig. 1, a mapping scheme optimization method based on a many-core system provided in an embodiment of the present disclosure includes:
in step S101, an initial first mapping scheme is obtained.
The first mapping scheme is used for mapping a first neural network to be executed to a plurality of processing cores of a many-core system, each processing core is used for executing at least one neuron of the first neural network, and the first mapping scheme comprises inter-core connection among the processing cores.
In some embodiments, the many-core system includes a plurality of processing cores, and physical parameters such as storage capacity, operation capacity, and the like of the processing cores may be the same or different. The first mapping scheme is obtained by mapping the first neural network to be executed to each processing core of the many-core system. Illustratively, the first mapping scheme is a scheme determined by mapping the first neural network to the many-core system based on physical parameters of the many-core system. In a first mapping scheme, one processing core may map one or more neurons in a first neural network, where information transmission between neurons located in different processing cores needs to occupy routing resources of a many-core system, and information transmission between neurons located in the same processing core does not occupy routing resources of the many-core system. Thus, an intra-core connection in a first neural network is a connection between neurons located in the same processing core, and an inter-core connection in the first neural network is a connection between neurons located in different processing cores.
In the embodiment of the disclosure, in order to reduce occupation of routing resources of a many-core system, the first mapping scheme may be optimized, and the optimized second mapping scheme may be obtained.
In some embodiments, the first mapping scheme includes not only the processing cores involved in the first neural network and inter-core connections between the processing cores; processing nodes corresponding to the neurons in the first neural network and intranuclear connections between the processing nodes are also included. Each processing node corresponds to at least one neuron.
Fig. 2 is a schematic diagram of a mapping scheme obtained by mapping a neural network onto different processing cores of a many-core system in an embodiment of the present disclosure. In fig. 2, the blocks represent processing cores, the black dots represent processing nodes, the processing nodes correspond to one or a group (layer) of neurons, and the connecting lines between the processing nodes represent data transfer. The mapping scheme shown in fig. 2 includes four processing cores, namely a first processing core 21, a second processing core 22, a third processing core 23 and a fourth processing core 24, wherein each of the first processing core 21, the second processing core 22 and the third processing core 23 includes three processing nodes, and the fourth processing core 24 is provided with four processing nodes. Nine inter-core connections are arranged between the four processing cores, wherein one inter-core connection 201 is arranged between the first processing core 21 and the second processing core 22, two inter-core connections 202A and 202B are arranged between the first processing core 21 and the third processing core 23, one inter-core connection 203 is arranged between the first processing core 21 and the fourth processing core 24, two inter-core connections 204A and 204B are arranged between the second processing core 22 and the third processing core 23, two inter-core connections 205A and 205B are arranged between the second processing core 22 and the fourth processing core 24, and one inter-core connection 206 is arranged between the third processing core 23 and the fourth processing core 24. In the mapping scheme shown in fig. 2, there are many inter-core connections between processing cores and the mapping scheme is complex, so that the mapping scheme needs to be optimized to relieve the routing pressure of the many-core system.
In step S102, the inter-core connection in the mapping scheme reconstructed between cores at the (n-1) th time is reconstructed, and a mapping scheme reconstructed between cores at the nth time is obtained.
Wherein n is an integer greater than or equal to 1, and the mapping scheme reconstructed between the 0 th kernels is the first mapping scheme in step S101, that is, when the mapping scheme is optimized, the mapping scheme adopted for the first time is the initial mapping scheme corresponding to the first neural network to be executed.
In some embodiments, the manner of reconfiguring the inter-core connections may include: deleting some inter-core connections with smaller weights from the mapping scheme reconstructed among the (n-1) th cores; some more costly connections are deleted; deleting some longer inter-core connections and adding some shorter inter-core connections accordingly, etc., which the present disclosure does not limit. In this way, after reconstructing the inter-core connection in the mapping scheme reconstructed between the n-1 th cores, the mapping scheme reconstructed between the n-th cores can be obtained.
In step S103, it is determined whether the mapping scheme reconstructed between cores for the nth time satisfies an inter-core optimization condition.
And returning to the step S102 to reconstruct the inter-core connection again under the condition that the mapping scheme reconstructed among the n-th cores does not meet the preset inter-core optimization condition. In case that the mapping scheme reconstructed among the n-th cores satisfies the preset inter-core optimization condition, step S104 is performed.
In step S104, when the mapping scheme reconstructed between the nth cores meets the preset inter-core optimization condition, the optimized second mapping scheme is determined according to the mapping scheme reconstructed between the nth cores.
In some embodiments, the inter-core optimization condition is that the number of cycles of the inter-core optimization reaches a preset number of cycles. Illustratively, when the preset number of cycles is N, and N is less than N in step S103, returning to step S102; and when N is equal to N, determining the mapping scheme reconstructed among the nth cores as a second mapping scheme. Wherein N is a positive integer.
In some embodiments, the inter-core optimization condition is that the inter-core reconstructed mapping scheme satisfies the routing load requirements. Illustratively, a route load threshold is preset, and when the mapping scheme reconstructed among the cores at the nth time can meet the preset route load threshold, the mapping scheme reconstructed among the cores at the nth time is determined as the second mapping scheme.
According to the embodiment of the disclosure, the inter-core connection in the mapping scheme reconstructed between the cores at the n-1 th time is reconstructed to obtain the mapping scheme reconstructed between the cores at the n th time, then whether the mapping scheme reconstructed between the cores at the n th time meets the inter-core optimization condition is judged, and the optimized second mapping scheme is determined according to the mapping scheme reconstructed between the cores at the n th time under the condition that the mapping scheme reconstructed between the cores at the n th time meets the preset inter-core optimization condition.
In the embodiment of the disclosure, in the optimized second mapping scheme, the inter-core connection of the first neural network is re-determined, and the inter-core connection is reduced, so that the routing pressure of the many-core system is reduced under the condition that the accuracy requirement of the first neural network is met. And mapping the optimized second mapping scheme to the many-core system, and when the many-core system is used for executing the first neural network, the precision requirement of the first neural network can be met, and the routing pressure of the many-core system can be reduced.
Fig. 3 is a flowchart of an inter-core reconfiguration optimization method according to an embodiment of the present disclosure. The method for optimizing the inter-core reconstruction optimizes the mapping scheme in an inter-core pruning mode. As shown in fig. 3, reconstructing inter-core connections in the mapping scheme reconstructed between cores at the n-1 st time in step S102, and obtaining the mapping scheme reconstructed between cores at the n-th time includes:
step S301, determining the connection cost and the connection weight of the connection between the cores in the mapping scheme reconstructed between the cores at the (n-1) th time.
The connection cost indicates a cost of connection between processing cores corresponding to the inter-core connection, that is, a routing resource that needs to be occupied by the inter-core connection when the mapping scheme is used to execute the neural network. The connection weight indicates the weight of the connection between the neurons corresponding to the inter-core connection, i.e. the size of the contribution of this inter-core connection in all inter-core connections involved in the mapping scheme when executing the neural network.
In some embodiments, the connection cost is determined based on the data traffic per time unit a of the inter-core connection and the distance D between the corresponding start-point processing core and end-point processing core of the inter-core connection, i.e., the connection cost C is a function of the data traffic per time unit a and the distance D between the corresponding start-point processing core and end-point processing core of the inter-core connection, as shown in equation (1):
cost(c)=f(A,D) (1)
where cost (c) represents a connection cost, f () represents a cost function, a represents data traffic per unit time, and D represents a distance between the start-point processing core and the end-point processing core. The present disclosure does not limit the specific representation of the cost function.
Step S302, setting an mth cost threshold and an mth first weight threshold of the internuclear connection, wherein m is greater than or equal to 1 and is an integer.
In some embodiments, the Cost threshold Cost of the inter-core connection can be set per cycle according to the situation th And a first weight threshold W of the inter-core connection th
Step S303, deleting the inter-core connection of which the connection cost is greater than the mth cost threshold and the absolute value of the connection weight is less than the mth first weight threshold, so as to obtain the mth inter-core pruning mapping scheme.
In some embodiments, the inter-core connection whose connection cost is greater than the mth cost threshold and whose absolute value of connection weight is less than the mth first weight threshold is deleted, so as to obtain the mth inter-core pruning mapping scheme.
And S304, retraining the second neural network corresponding to the mth internuclear pruning mapping scheme by using the training data set to obtain the mth retrained second neural network.
In some embodiments, the loss function employed to train the second neural network with the training data set includes a connection cost regularization term and a connection weight regularization term.
Illustratively, the Loss function Loss of the training is equation (2):
Figure BDA0003649857010000061
wherein Loss' represents an original Loss function on the training data set, C represents inter-kernel connection, C represents all inter-kernel connections in the mapping scheme, cost (C) represents connection cost, namely a connection cost regular term, and w (C) represents connection weight, namely a connection weight regular term.
Step S305, judging whether the precision of the mth retrained second neural network reaches a preset precision threshold, and returning to execute the step S302 under the condition that the precision of the mth retrained second neural network does not reach the preset precision threshold, namely, setting the cost threshold and the first weight threshold of the inter-core connection again; and executing the step S306 under the condition that the precision of the mth retrained second neural network reaches a preset precision threshold value.
And S306, under the condition that the precision of the mth retrained second neural network reaches a preset precision threshold, determining the mapping scheme of the internuclear pruning corresponding to the mth retrained second neural network as the mapping scheme of the nth internuclear reconstruction.
Fig. 4 is a schematic diagram illustrating a change of a mapping scheme in an inter-core pruning optimization process according to an embodiment of the present disclosure. In FIG. 4, the boxes represent processing cores and the black dots represent processing nodes. The mapping scheme shown in fig. 4 includes four processing cores, namely a first processing core 41, a second processing core 42, a third processing core 43, and a fourth processing core 44. In fig. 4 (a), which is a mapping scheme before inter-core pruning, an inter-core connection 401 is arranged between the first processing core 41 and the second processing core 42, two inter-core connections 402A and 402B are arranged between the first processing core 41 and the third processing core 43, an inter-core connection 403 is arranged between the first processing core 41 and the fourth processing core 44, two inter-core connections 404A and 404B are arranged between the second processing core 42 and the third processing core 43, two inter-core connections 405A and 405B are arranged between the second processing core 42 and the fourth processing core 44, and an inter-core connection 406 is arranged between the third processing core 43 and the fourth processing core 44.
Assuming that step S303 determines that the connection cost of the inter-core connection 402A, the inter-core connection 403, and the inter-core connection 405B is greater than the mth cost threshold and the absolute value of the connection weight is smaller than the mth first weight threshold, the inter-core connection 402A, the inter-core connection 403, and the inter-core connection 405B are deleted (i.e., inter-core pruning), and the second neural network obtained after the pruning is retrained. Under the condition that the precision of the mth retrained second neural network reaches the preset precision threshold, the obtained inter-core connections of the mth inter-core pruning mapping scheme include inter-core connections 401, inter-core connections 402B, inter-core connections 404A, inter-core connections 404B, inter-core connections 405A, and inter-core connections 406, as shown in (B) of fig. 4.
The mapping scheme is optimized in the inter-core pruning mode, so that the mapping scheme is internally aggregated, the inter-core connection is reduced, the inter-core data transmission is reduced, and the routing pressure of the mapping scheme is reduced.
In some mapping schemes, there is a case where the inter-core connection is long, that is, the starting processing core and the terminating processing core corresponding to the inter-core connection are far away from each other, and the longer the inter-core connection, the greater the routing burden, and the shorter the inter-core connection, the smaller the routing burden, so that the longer inter-core connection is replaced by the shorter inter-core connection, and the data transfer between cores can also be reduced.
In some embodiments, reconstructing the inter-core connection in the mapping scheme reconstructed between the n-1 th cores, and obtaining the mapping scheme reconstructed between the n th cores includes: and reconstructing the inter-core connection in the mapping scheme reconstructed between the cores at the n-1 th time by utilizing a Hebb (Hebb) method to obtain the mapping scheme reconstructed between the cores at the n th time.
Fig. 5 is a flowchart of another inter-core reconstruction optimization method according to an embodiment of the present disclosure. The inter-core reconstruction optimization method is used for optimizing a mapping scheme in an inter-core reconnection mode. As shown in fig. 5, reconstructing the inter-core connection in the mapping scheme reconstructed between the cores at the n-1 th time by using the hebrs rule to obtain the mapping scheme reconstructed between the cores at the nth time, which includes:
step S501, processing cores and processing nodes in each processing core in the mapping scheme of the n-1 th inter-core reconstruction are determined.
Wherein each processing node comprises at least one neuron.
Step S502, reconstructing the inter-core connection of any two processing nodes according to the pre-agreed connection probability to obtain a third mapping scheme.
Wherein the two processing nodes involved in the reconstructed inter-core connection are located in different processing cores.
In some embodiments, the connection probability is inversely related to the distance between the processing cores corresponding to the two processing nodes, i.e. the closer the distance between the processing nodes in different processing cores, the higher the connection probability, and vice versa. The length of the inter-core connection can be reduced to the maximum extent by using the inter-core connection determined by the connection probability, so that the routing pressure is reduced.
Step S503, retraining the third neural network corresponding to the third mapping scheme by using the training data set to obtain the retrained third neural network.
In some embodiments, the loss function employed to train the third neural network with the training data set includes a connection cost regularization term and a connection weight regularization term.
The Loss function Loss of the training may be a function shown in formula (2), and is not described herein again.
Step S504, deleting the internuclear connection with the connection weight smaller than a preset second weight threshold value in the retrained third neural network, and obtaining the mapping scheme of the nth internuclear reconstruction.
Fig. 6 is a schematic diagram illustrating a change of a mapping scheme in an inter-core reconstruction process according to an embodiment of the present disclosure. In fig. 6, the boxes represent processing cores, the black dots represent processing nodes, and the mapping scheme shown in fig. 6 includes six processing cores, i.e., a first processing core 61, a second processing core 62, a third processing core 63, a fourth processing core 64, a fifth processing core 65, and a sixth processing core 66. As shown in fig. 6 (a), in the original mapping scheme before performing the inter-core reconfiguration, an inter-core connection 601 is provided between the first processing core 61 and the second processing core 62, an inter-core connection 602 is provided between the first processing core 61 and the fourth processing core 64, an inter-core connection 603 is provided between the third processing core 63 and the fourth processing core 64, an inter-core connection 604 is provided between the second processing core 62 and the fifth processing core 65, and an inter-core connection 605 is provided between the fifth processing core 65 and the sixth processing core 66, where the first processing core 61 and the fourth processing core 64 are far apart from each other, and the length of the corresponding inter-core connection 602 is long. After the inter-core reconfiguration, an inter-core connection 607 is reconfigured between the first processing core 61 and the sixth processing core 66, an inter-core connection 608 is reconfigured between the fourth processing core 64 and the fifth processing core 65, the first processing core 61 and the sixth processing core 66, and the fourth processing core 64 and the fifth processing core 65 are closer to each other, the length of the inter-core connection 607 and the length of the inter-core connection 608 are shorter, and the inter-core connection 607 and the inter-core connection 608 are used for replacing the inter-core connection 602, so that the length of the inter-core connection is shortened, and the routing burden is reduced.
In step S104, the mapping scheme reconstructed among the cores at the nth time may be directly determined as the second mapping scheme, however, since the second mapping scheme is only the mapping scheme obtained by reconstructing the inter-core connection, and reconstructing the inter-core connection may cause the precision of the neural network to be reduced, and especially the precision of the neural network is easily reduced by the inter-core pruning mapping scheme, some means may be adopted to improve the precision of the neural network.
It should be noted that the first mapping scheme further includes an intra-core connection, i.e., an intra-core connection between processing nodes in each processing core, and each processing node includes at least one neuron. In some embodiments, the mapping scheme reconstructed among the nth cores may be optimized in an intra-core reconnection manner, and the optimized second mapping scheme is determined, so that the capacity of the neural network is improved, and the precision of the neural network is improved.
Fig. 7 is a flowchart of a method for optimizing intra-core reconnection according to an embodiment of the present disclosure. The method for optimizing the inter-core reconnection is characterized in that a mapping scheme of the inter-core reconstruction is further optimized in an intra-core reconnection mode to compensate the influence of the inter-core reconstruction on a neural network. As shown in fig. 7, in step S104, determining an optimized second mapping scheme according to the mapping scheme reconstructed between the nth kernels includes:
step S701, determining the intra-core connection between the processing nodes in each processing core in the mapping scheme of the (r-1) th intra-core reconnection.
Wherein r is more than or equal to 1 and is an integer, and the mapping scheme of the 0 th-time intra-core reconnection is the mapping scheme of the nth-time inter-core reconfiguration.
In some embodiments, all processing cores in the mapping scheme for the r-1 th intra-core reconnection are determined, and then the processing nodes in each processing core and the intra-core connections between the processing nodes are determined.
Step S702, the unconnected processing nodes in at least one processing core are connected to obtain the intermediate mapping scheme of the re-connection in the nth core.
In some embodiments, all the unconnected processing nodes in one or each processing core may be connected, and then part of the connections in the cores may be deleted randomly, or the unconnected processing nodes in one or each processing core may be selectively connected, so as to obtain the intermediate mapping scheme for the nth core reconnection.
Step S703, retraining the fourth neural network corresponding to the intermediate mapping scheme reconnected in the nth kernel by using the training data set, to obtain the fourth neural network retrained for the nth time.
In some embodiments, the Loss function employed in retraining may be a raw Loss function, such as Loss' mentioned above.
Step S704, using the intra-core reconnection mapping scheme corresponding to the fourth neural network after the nth retraining as the mapping scheme for the nth intra-core reconnection.
Step S705, judging whether the fourth neural network after the nth retraining meets the in-core optimization condition, if not, returning to the step S702; if yes, go to step S706.
Step S706, under the condition that the mapping scheme of the (r) th intra-core reconnection meets the intra-core optimization condition, determining the mapping scheme of the (r) th intra-core reconnection as the optimized second mapping scheme.
According to the embodiment of the disclosure, the mapping scheme is reconstructed in an intra-core reconnection mode, so that the capacity of the neural network is improved, and the accuracy reduction of the neural network caused by inter-core reconstruction is compensated, so that the routing load is reduced by the mapping scheme, and the accuracy of the neural network is not influenced. In the optimized second mapping scheme, the inter-core connection and the intra-core connection of the first neural network are re-determined, and the inter-core connection is reduced by increasing the intra-core connection, so that the routing pressure of the many-core system is reduced under the condition of meeting the precision requirement of the first neural network. And mapping the optimized second mapping scheme to the many-core system, and when the many-core system is used for executing the first neural network, the precision requirement of the first neural network can be met, and the routing pressure of the many-core system can be reduced.
Fig. 8 is a flowchart of a method for optimizing intra-core reconnection according to an embodiment of the present disclosure. The method for optimizing the intra-core reconnection also optimizes a mapping scheme of the inter-core reconstruction in an intra-core reconnection mode to compensate the influence of the inter-core reconstruction on a neural network, and is different in the construction mode of the intra-core reconnection. As shown in fig. 8, after the fourth neural network after the nth retraining is obtained in step S704, determining an optimized second mapping scheme according to the mapping scheme reconstructed between the nth kernels in step S104, further includes:
step S801, deleting the intra-core connection with the connection weight smaller than a preset fifth weight threshold value in the p-1 th intra-core mapping scheme to obtain the p-th intra-core mapping scheme.
Wherein p is more than or equal to 1 and is an integer, and the 0 th intranuclear mapping scheme is an intranuclear reconnection mapping scheme corresponding to the fourth neural network after the nth retraining. Except for the first intra-core reconnection mapping scheme, other intra-core reconnection mapping schemes delete intra-core connections with connection weights smaller than a preset third weight threshold in the p-1 th intra-core reconnection mapping scheme to obtain a p-th intra-core mapping scheme.
Step S802, retraining a fifth neural network corresponding to the pth intranuclear mapping scheme by using a training data set to obtain a pth retrained fifth neural network.
In some embodiments, the penalty function employed in retraining may be an original penalty function, such as Loss' mentioned above.
Step S803, judging whether the fifth neural network after the p retraining meets the preset precision condition, if not, returning to the step S801; if yes, go to step S804.
Step S804, determining the mapping scheme in the core corresponding to the fifth retrained p-th neural network as the mapping scheme for the core reconnection at the r-th time under the condition that the fifth retrained p-th neural network meets the preset accuracy condition. The preset precision condition may be, for example, that the processing precision of the neural network on the test sample reaches a precision threshold, and the specific setting manner of the precision condition is not limited in the present disclosure.
In the embodiment of the present disclosure, the pth intra-core mapping scheme is determined based on the connection weight on the basis of the pth-1 intra-core reconnection mapping scheme, and compared with the intra-core reconnection optimization method shown in fig. 7, the fifth neural network can meet the intra-core optimization condition faster, and the training time is shortened.
Fig. 9 is a schematic diagram of a change process of a mapping scheme in an intra-core reconnection optimization process according to an embodiment of the present disclosure. In fig. 9, the boxes represent processing cores and the black dots represent processing nodes. The mapping scheme shown in fig. 9 includes four processing cores, namely a first processing core 91, a second processing core 92, a third processing core 93, and a fourth processing core 94. In fig. 9 (a) is a mapping scheme before the intra-core reconnection, in the first processing core 91, an intra-core connection 9101 is provided between the first processing node and the third processing node, in the second processing core 92, an intra-core connection 9201 is provided between the second processing node and the third processing node, in the third processing core 93, an intra-core connection 9301 is provided between the first processing node and the third processing node, in the fourth processing core 94, an intra-core connection 9401 is provided between the first processing node and the fourth processing node, and an intra-core connection 9402 is provided between the third processing node and the fourth processing node.
After the inter-core reconnection optimization, as shown in fig. 9 (b), an intra-core connection 9102 between the first processing node and the second processing node is added in the first processing core 91, an intra-core connection 9202 between the first processing node and the third processing node is added in the second processing core 92, an intra-core connection 9403 between the first processing node and the second processing node and an intra-core connection 9404 between the second processing node and the fourth processing node are added in the fourth processing core 94. The effect of inter-core pruning on the accuracy of the neural network is compensated for by intra-core connections 9102, 9202, 9403 and 9404.
In some embodiments, after a round of inter-core reconstruction and intra-core reconnection determines the second mapping scheme, particularly the second mapping scheme determined when the number of cycles of the inter-core optimization reaches a preset number of cycles, the second mapping scheme may be further optimized to further reduce the routing pressure of the many-core system and improve the accuracy of the neural network.
Fig. 10 is a flowchart of a mapping scheme optimization method according to an embodiment of the present disclosure. The mapping scheme optimization method is that after the second mapping scheme is obtained, at least one round of inter-core reconstruction and intra-core reconnection is carried out, so that the routing pressure of a many-core system is further reduced, and meanwhile, the precision of a neural network is improved. As shown in fig. 10, after determining the optimized second mapping scheme, the method further includes:
and S1001, performing the inter-core reconstruction optimization on the mapping scheme after the optimization of the (k-1) th round to obtain the inter-core optimized mapping scheme which meets the inter-core optimization conditions and is subjected to the optimization of the k round.
And k is more than or equal to 2 and is an integer, and the mapping scheme after the 1 st round of optimization is a second mapping scheme.
In step S1001, the specific way of performing the inter-core reconstruction optimization on the mapping scheme after the k-1 th round of optimization is referred to in steps S102 to S104, which is not described herein again.
And step S1002, performing intra-core reconnection optimization on the k-th optimized inter-core optimized mapping scheme to obtain the k-th optimized intra-core optimized mapping scheme meeting the intra-core optimization conditions.
In step S1002, for a specific manner of performing the intra-core reconnection optimization on the k-th optimized inter-core optimization mapping scheme, refer to steps S501 to S504, which are not described herein again.
Step S1003, determining whether the optimized mapping scheme in the kernel after the kth round meets a preset target condition, if yes, executing step S1004, and if not, returning to step S1001.
Step S1004, determining the optimized mapping scheme in the core after the kth round as the target mapping scheme under the condition that the optimized mapping scheme in the core after the kth round meets the preset target condition.
Fig. 11 is a schematic diagram of a change process of a mapping scheme in two rounds of inter-core reconstruction optimization and intra-core reconnection optimization processes provided by the embodiment of the present disclosure. In fig. 11, the blocks represent processing cores, the black dots represent processing nodes, and the mapping scheme shown in fig. 11 includes four processing cores, i.e., a first processing core 1101, a second processing core 1102, a third processing core 1103, and a fourth processing core 1104.
In the original mapping scheme before the inter-core reconfiguration is not performed, as shown in fig. 11 (a), one inter-core connection 111 is provided between the first processing core 1101 and the second processing core 1102, two inter-core connections 112A and 112B are provided between the first processing core 1101 and the third processing core 1103, one inter-core connection 113 is provided between the first processing core 1101 and the fourth processing core 1104, two inter-core connections 114A and 114B are provided between the second processing core 1102 and the third processing core 1103, two inter-core connections 115A and 115B are provided between the second processing core 1102 and the fourth processing core 1104, and one inter-core connection 116 is provided between the third processing core 1103 and the fourth processing core 1104.
In the first processing core 1101, an intra-core connection 1301 is provided between the first processing node and the third processing node, in the second processing core 1102, an intra-core connection 2301 is provided between the second processing node and the third processing node, in the third processing core 1103, an intra-core connection 3301 is provided between the first processing node and the third processing node, in the fourth processing core 1104, an intra-core connection 4301 is provided between the first processing node and the fourth processing node, and an intra-core connection 4302 is provided between the third processing node and the fourth processing node.
After the first round of inter-core reconfiguration optimization, the inter-core connection 112A, the inter-core connection 113, and the inter-core connection 115B are deleted, and the mapping scheme includes the inter-core connection 111, the inter-core connection 112B, the inter-core connection 114A, the inter-core connection 114B, the inter-core connection 115A, and the inter-core connection 116, as shown in fig. 11 (B).
After the first round of in-core reconnection optimization, in the first processing core 1101, an in-core connection 1302 between the first processing node 1201 and the third processing node 1203 is added, in the fourth processing core 1104, an in-core connection 4303 between the first processing node 4201 and the second processing node 4202, and an in-core connection 4304 between the second processing node 422 and the fourth processing node 4204 are added, as shown in fig. 11 (c).
After the second round of inter-core reconfiguration optimization, the inter-core connection 114A between the second processing core 1102 and the third processing core 1103 is deleted, and the mapping scheme includes an inter-core connection 111, an inter-core connection 112B, an inter-core connection 114B, an inter-core connection 115A, and an inter-core connection 116, as shown in (d) in fig. 11.
After the first round of intra-core reconnection optimization, an intra-core connection 1303 between the second processing node and the third processing node is added in the first processing core 1101, an intra-core connection 2302 between the first processing node and the second processing node is added in the second processing core 1102, and an intra-core connection 3302 between the second processing node and the third processing node is added in the third processing core 1103, as shown in fig. 11 (e).
The embodiment of the disclosure further optimizes the mapping scheme of the neural network through multi-round inter-core reconstruction and intra-core reconnection, further reduces the routing pressure of the many-core system, and improves the precision of the neural network.
The neural networks (first to fifth neural networks) provided by the embodiments of the present disclosure are used for performing any one of an image processing task, a voice processing task, a text processing task, and a video processing task. The present disclosure is not limited as to the specific type of task that the neural network performs.
It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.
Fig. 12 is a block diagram of a mapping scheme optimization apparatus based on a many-core system according to an embodiment of the present disclosure.
Referring to fig. 12, an embodiment of the present disclosure provides a many-core system-based mapping scheme optimization apparatus, including:
the obtaining module 121 is configured to obtain an initial first mapping scheme, where the first mapping scheme is used to map a first neural network to be executed to a plurality of processing cores of a many-core system, each processing core is used to execute at least one neuron of the first neural network, and the first mapping scheme includes inter-core connections between the processing cores.
The reconstruction module 122 is configured to reconstruct the inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction, to obtain a mapping scheme of the n-th inter-core reconstruction, where n is greater than or equal to 1 and is an integer, and the mapping scheme of the 0 th inter-core reconstruction is the first mapping scheme.
A judging module 123, configured to judge whether the mapping scheme reconstructed between the n-th kernels meets an inter-kernel optimization condition;
the determining module 124 is configured to determine the optimized second mapping scheme according to the mapping scheme reconstructed among the nth cores when the mapping scheme reconstructed among the nth cores meets the preset inter-core optimization condition.
The mapping scheme optimization device based on the many-core system provided by the embodiment of the disclosure can be used for realizing any one of the mapping scheme optimization methods based on the many-core system provided by the disclosure, and the corresponding technical scheme and the description thereof are referred to the corresponding record of the method part, and are not described again.
According to the embodiment provided by the disclosure, an obtaining module obtains an initial first mapping scheme, a reconstructing module reconstructs the inter-core connection in the mapping scheme reconstructed between the cores for the (n-1) th time, an nth-time inter-core reconstructed mapping scheme is obtained, a judging module judges whether the nth-time inter-core reconstructed mapping scheme meets the inter-core optimization condition, and a determining module determines an optimized second mapping scheme according to the nth-time inter-core reconstructed mapping scheme under the condition that the nth-time inter-core reconstructed mapping scheme meets the preset inter-core optimization condition.
Fig. 13 is a block diagram of an electronic device provided in an embodiment of the present disclosure.
Referring to fig. 13, an embodiment of the present disclosure provides an electronic device including: at least one processor 1301; and memory 1302 communicatively coupled to the at least one processor 1301; the memory 1302 stores one or more computer programs that are executable by the at least one processor 1301 and are executed by the at least one processor 1301, so that the at least one processor 1301 can perform the mapping scheme optimization method based on the many-core system.
In some embodiments, the electronic device may be a brain-like chip, which may adopt a vectorization calculation method and need to call in parameters such as weight information of the neural network model through an external memory, for example, a Double Data Rate (DDR) synchronous dynamic random access memory. Therefore, the operation efficiency of batch processing is high in the embodiment of the disclosure.
The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor/processing core, implements the above-mentioned mapping scheme optimization method based on a many-core system. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.
The disclosed embodiments also provide a computer program product, which includes computer readable code or a non-transitory computer readable storage medium carrying computer readable code, and when the computer readable code runs in a processor of an electronic device, the processor in the electronic device executes the above mapping scheme optimization method based on a many-core system.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), Static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read-only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer. In addition, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (14)

1. A mapping scheme optimization method based on a many-core system is characterized by comprising the following steps:
obtaining an initial first mapping scheme, wherein the first mapping scheme is used for mapping a first neural network to be executed to a plurality of processing cores of the many-core system, each processing core is used for executing at least one neuron of the first neural network, and the first mapping scheme includes inter-core connection between the processing cores;
reconstructing the inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction to obtain the mapping scheme of the n-th inter-core reconstruction, wherein n is more than or equal to 1 and is an integer, and the mapping scheme of the 0 th inter-core reconstruction is the first mapping scheme;
judging whether the mapping scheme of the nth-time inter-core reconstruction meets the inter-core optimization condition;
and under the condition that the mapping scheme reconstructed among the cores at the nth time meets the preset optimization condition among the cores, determining the optimized second mapping scheme according to the mapping scheme reconstructed among the cores at the nth time.
2. The many-core system-based mapping scheme optimization method of claim 1, wherein reconstructing inter-core connections in the n-1 st inter-core reconstructed mapping scheme to obtain the nth inter-core reconstructed mapping scheme comprises:
determining connection cost and connection weight of each inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction, wherein the connection cost indicates the cost of connection between processing cores corresponding to the inter-core connection, and the connection weight indicates the weight of connection between neurons corresponding to the inter-core connection;
setting an mth cost threshold and an mth first weight threshold of the inter-core connection, wherein m is greater than or equal to 1 and is an integer;
deleting the inter-core connection with the connection cost more than the mth cost threshold and the absolute value of the connection weight less than the mth first weight threshold to obtain an mth inter-core pruning mapping scheme;
retraining the second neural network corresponding to the mth internuclear pruning mapping scheme by using a training data set to obtain an mth retrained second neural network;
and under the condition that the precision of the mth retrained second neural network reaches a preset precision threshold, determining the inter-core pruning mapping scheme corresponding to the mth retrained second neural network as the mapping scheme for the nth inter-core reconstruction.
3. The many-core system-based mapping scheme optimization method of claim 2, wherein the connection cost of the inter-core connection is determined based on data traffic per unit time of the inter-core connection and a distance between a starting-point processing core and an ending-point processing core corresponding to the inter-core connection.
4. The many-core system-based mapping scheme optimization method of claim 1, wherein reconstructing inter-core connections in the n-1 st inter-core reconstructed mapping scheme to obtain the nth inter-core reconstructed mapping scheme comprises:
and reconstructing the inter-core connection in the mapping scheme reconstructed among the cores at the n-1 th time by utilizing a Hubbu rule to obtain the mapping scheme reconstructed among the cores at the n th time.
5. The many-core system-based mapping scheme optimization method according to claim 4, wherein reconstructing the inter-core connection in the n-1 th inter-core reconstructed mapping scheme by using a hebry method to obtain the nth inter-core reconstructed mapping scheme comprises:
determining processing cores and processing nodes in each processing core in the mapping scheme of the n-1 th inter-core reconstruction, wherein each processing node comprises at least one neuron;
reconstructing the inter-core connection of any two processing nodes according to the pre-agreed connection probability to obtain a third mapping scheme; wherein the two processing nodes are located in different processing cores;
retraining a third neural network corresponding to the third mapping scheme by using a training data set to obtain a retrained third neural network;
deleting the internuclear connection with the connection weight smaller than a preset second weight threshold value in the retrained third neural network to obtain the mapping scheme of the nth internuclear reconstruction.
6. The many-core system-based mapping scheme optimization method of claim 5, wherein the connection probability is inversely related to the distance between the processing cores corresponding to two of the processing nodes.
7. The many-core system-based mapping scheme optimization method of claim 2, wherein the loss function used to train the second neural network with the training data set comprises a connection cost regularization term and a connection weight regularization term.
8. The many-core system-based mapping scheme optimization method of claim 1, wherein the inter-core optimization condition is that the inter-core optimized loop times reach a preset loop times or the inter-core reconstructed mapping scheme meets a routing load requirement.
9. The many-core system-based mapping scheme optimization method of claim 1, wherein the first mapping scheme further comprises intra-core connections between processing nodes within each of the processing cores, each of the processing nodes comprising at least one neuron;
wherein the determining an optimized second mapping scheme according to the mapping scheme reconstructed between the nth kernels includes:
determining the intra-core connection between the processing nodes in each processing core in the mapping scheme of the (r-1) th intra-core reconnection, wherein r is more than or equal to 1 and is an integer, and the mapping scheme of the 0 th intra-core reconnection is the mapping scheme of the n-th inter-core reconstruction;
connecting unconnected processing nodes in at least one processing core to obtain an intermediate mapping scheme for reconnection in the nth core;
retraining a fourth neural network corresponding to the intermediate mapping scheme of the nth intra-core reconnection by using a training data set to obtain a fourth neural network after the nth retraining, and using the intra-core reconnection mapping scheme corresponding to the fourth neural network after the nth retraining as the mapping scheme of the nth intra-core reconnection;
and under the condition that the mapping scheme of the (r) th intra-core reconnection meets the intra-core optimization condition, determining the mapping scheme of the (r) th intra-core reconnection as the optimized second mapping scheme.
10. The method of claim 9, wherein after obtaining an r-th retrained fourth neural network, the determining an optimized second mapping scheme according to the n-th inter-core reconstructed mapping scheme further comprises:
deleting the intra-core connections with the connection weight smaller than a preset fifth weight threshold value in the p-1 th intra-core mapping scheme to obtain a p th intra-core mapping scheme, wherein p is larger than or equal to 1 and is an integer, and the 0 th intra-core mapping scheme is the intra-core reconnection mapping scheme corresponding to the fourth neural network after the r times of retraining;
retraining a fifth neural network corresponding to the p-th intra-core mapping scheme by using a training data set to obtain the p-th retrained fifth neural network;
and under the condition that the fifth neural network after the p retraining meets a preset precision condition, determining the mapping scheme in the core corresponding to the fifth neural network after the p retraining as the mapping scheme of the core reconnection for the r time.
11. The many-core system-based mapping scheme optimization method of claim 9 or 10, wherein after determining the optimized second mapping scheme, the method further comprises:
performing inter-core reconstruction optimization on the mapping scheme after the k-1 round of optimization to obtain an inter-core optimized mapping scheme which meets the inter-core optimization condition and is subjected to the k round of optimization, wherein k is more than or equal to 2 and is an integer, and the mapping scheme after the 1 st round of optimization is the second mapping scheme;
performing intra-nuclear reconnection optimization on the k-th optimized inter-nuclear optimized mapping scheme to obtain a k-th optimized intra-nuclear optimized mapping scheme meeting intra-nuclear optimization conditions;
and under the condition that the optimized mapping scheme in the kernel after the kth round meets a preset target condition, determining the optimized mapping scheme in the kernel after the kth round as a target mapping scheme.
12. A mapping scheme optimization device based on a many-core system is characterized by comprising:
an obtaining module, configured to obtain an initial first mapping scheme, where the first mapping scheme is used to map a first neural network to be executed to a plurality of processing cores of the many-core system, each processing core is used to execute at least one neuron of the first neural network, and the first mapping scheme includes inter-core connections between the processing cores;
the reconstruction module is used for reconstructing the inter-core connection in the mapping scheme of the n-1 th inter-core reconstruction to obtain the mapping scheme of the n-th inter-core reconstruction, wherein n is more than or equal to 1 and is an integer, and the mapping scheme of the 0 th inter-core reconstruction is the first mapping scheme;
the judging module is used for judging whether the mapping scheme reconstructed among the nth cores meets the optimization condition among the cores;
and the determining module is used for determining an optimized second mapping scheme according to the mapping scheme reconstructed among the nth cores under the condition that the mapping scheme reconstructed among the nth cores meets preset inter-core optimization conditions.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the many-core system-based mapping scheme optimization method of any of claims 1-11.
14. A computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the many-core system based mapping scheme optimization method of any of claims 1-11.
CN202210539850.3A 2022-05-18 2022-05-18 Mapping scheme optimization method and device, electronic equipment and readable storage medium Pending CN114881221A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210539850.3A CN114881221A (en) 2022-05-18 2022-05-18 Mapping scheme optimization method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210539850.3A CN114881221A (en) 2022-05-18 2022-05-18 Mapping scheme optimization method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114881221A true CN114881221A (en) 2022-08-09

Family

ID=82676663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210539850.3A Pending CN114881221A (en) 2022-05-18 2022-05-18 Mapping scheme optimization method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114881221A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099395A (en) * 2022-08-25 2022-09-23 北京灵汐科技有限公司 Neural network construction method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099395A (en) * 2022-08-25 2022-09-23 北京灵汐科技有限公司 Neural network construction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN109063825B (en) Convolutional neural network accelerator
US20210374503A1 (en) Network-centric architecture and algorithms to accelerate distributed training of neural networks
CN105260776B (en) Neural network processor and convolutional neural networks processor
US20210089871A1 (en) Processing system and method for binary weight convolutional neural network
CN107256424B (en) Three-value weight convolution network processing system and method
CN114915630B (en) Task allocation method, network training method and device based on Internet of Things equipment
CN107944545B (en) Computing method and computing device applied to neural network
CN109993275B (en) Signal processing method and device
CN114970814A (en) Processing method and processing device of neural network computation graph
CN116310667B (en) Self-supervision visual characterization learning method combining contrast loss and reconstruction loss
CN114841323A (en) Processing method and processing device of neural network computation graph
CN114626503A (en) Model training method, target detection method, device, electronic device and medium
CN114881221A (en) Mapping scheme optimization method and device, electronic equipment and readable storage medium
CN117811586A (en) Data encoding method and device, data processing system, device and medium
CN113608881A (en) Memory allocation method, device, equipment, readable storage medium and program product
CN115473841A (en) Method and device for determining network path and storage medium
JP7418570B2 (en) Method and apparatus for multirate neural image compression using stackable nested model structures
CN111344719A (en) Data processing method and device based on deep neural network and mobile device
CN111738424B (en) Neural network processing method and device, electronic equipment and storage medium
CN110097184B (en) Information processing method and information processing system
CN115099395B (en) Neural network construction method, device, equipment and medium
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN115983372A (en) Neural network training method and device, computing equipment and storage medium
US20220414458A1 (en) Deep learning network device, memory access method and non-volatile storage medium
KR20240036594A (en) Subsum management and reconfigurable systolic flow architectures for in-memory computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination