CN115883442A

CN115883442A - Method and device for determining data transmission path and electronic equipment

Info

Publication number: CN115883442A
Application number: CN202211510876.1A
Authority: CN
Inventors: 张子豪
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-31

Abstract

The invention discloses a method and a device for determining a data transmission path and electronic equipment, and relates to the technical field of artificial intelligence. Wherein, the method comprises the following steps: acquiring at least one piece of data to be transmitted, wherein the at least one piece of data to be transmitted is data communicated among different processing units contained in a target computing system; and sequentially inputting at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and outputting a target path of each piece of data to be transmitted, wherein the target path planning model is obtained based on training of a preset algorithm, the target path is a path with the shortest link contention time within a preset range, and the link contention time is the waiting time required by the data to be transmitted along the links of the current route set. The invention solves the technical problem of communication link contention in the computing system based on the network on chip in the prior art.

Description

Method and device for determining data transmission path and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for determining a data transmission path and electronic equipment.

Background

Due to the characteristics of high performance and low energy consumption, heterogeneous Computing Systems (HCSs) based on network-on-chip (NoC) are widely used for processing computing tasks. NoC-based HCSs are composed of Processing Elements (PEs) with different properties. As more and more cores are integrated on a chip, in the process of on-chip communication (on-chip communication) between different PEs, the contention problem of communication links becomes more and more serious, unnecessary link latency is increased, and the computational efficiency of the whole HCSs is reduced.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a data transmission path and electronic equipment, which are used for at least solving the technical problem of communication link contention in a computing system based on a network on chip in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a method for determining a data transmission path, including: acquiring at least one piece of data to be transmitted, wherein the at least one piece of data to be transmitted is data communicated among different processing units contained in a target computing system; and sequentially inputting at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and outputting a target path of each piece of data to be transmitted, wherein the target path planning model is obtained based on training of a preset algorithm, the target path is a path with the shortest link contention time in a preset range, and the link contention time is the waiting time required by the data to be transmitted along the links of the current route set.

Further, a target path planning model is generated by: acquiring at least one sample data to be transmitted; and training the first neural network and the second neural network of the preset algorithm according to at least one sample data to be transmitted to obtain a target path planning model.

Further, the method for determining the data transmission path further includes: selecting first sample data to be transmitted from at least one sample data to be transmitted; the method comprises the steps of obtaining the current position of first sample data to be transmitted in a transmission link to obtain a first router, wherein the first router is used for transmitting the first sample data to be transmitted; determining at least one transmission direction of first sample data to be transmitted according to the first neural network, wherein the at least one transmission direction is used for indicating the first sample data to be transmitted from the first router to the target router; and recording at least one transmission direction to obtain a route set of the first sample data to be transmitted.

Further, the method for determining the data transmission path further includes: evaluating the route set according to the second neural network to obtain an evaluation result, wherein the evaluation result is used for representing the link contention time of the route set; and determining a target path of the first sample data to be transmitted based on the evaluation result.

Further, the method for determining the data transmission path further includes: acquiring a link state set occupied by a route in a current route set, wherein the link state set is used for representing the use condition of the link; and calculating the link contention time of the current routing set based on the link state set to obtain an evaluation result.

Further, the method for determining the data transmission path further includes: comparing the link contention time of the current route set with the link contention time of the target route set; updating the target route set under the condition that the link contention time of the current route set is less than the link contention time of the target route set; and generating a target path according to the target route set.

Further, the method for determining the data transmission path further includes: evaluating at least one transmission direction according to the second neural network to obtain an award value of the at least one transmission direction, wherein the award value is used for measuring the advantages and disadvantages of the at least one transmission direction; calculating an advantage value of at least one transmission direction according to the reward value, wherein the advantage value represents the degree of superiority of the current state of the first sample data to be transmitted compared with a target state, and the target state represents the average value of all states of the first sample data to be transmitted; updating the first neural network and the second neural network based on the dominance values to obtain an updated first neural network and an updated second neural network; and under the condition that the updated first neural network and the updated second neural network meet preset conditions, generating a target path planning model.

Further, the method for determining the data transmission path further includes: sequencing at least one data to be transmitted based on a preset principle to obtain a sequencing result; and determining the sequence of performing route calculation on at least one data to be transmitted according to the sequencing result to obtain a target calculation sequence.

According to another aspect of the embodiments of the present invention, there is also provided a device for determining a data transmission path, including: the acquisition module is used for acquiring at least one piece of data to be transmitted, wherein the at least one piece of data to be transmitted is data communicated between different processing units contained in a target computing system; the determining module is used for sequentially inputting at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence and outputting a target path of each piece of data to be transmitted, wherein the target path planning model is obtained based on training of a preset algorithm, the target path is a path with shortest link contention time within a preset range, and the link contention time is the waiting time required by the data to be transmitted when the data to be transmitted is transmitted along links of a current route set.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-mentioned method for determining a data transmission path when running.

According to another aspect of embodiments of the present invention, there is also provided an electronic device, including one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out a method for executing a program, wherein the program is arranged to carry out the method for determining a data transmission path described above when executed.

In the embodiment of the invention, a mode of planning a data transmission path by a reinforcement learning algorithm is adopted, at least one data to be transmitted is firstly obtained, then the at least one data to be transmitted is sequentially input into a target path planning model according to a target calculation sequence, and a target path of each data to be transmitted is output. The target path planning model is obtained based on preset algorithm training, the target path is a path with the shortest link contention time in a preset range, the link contention time is the waiting time needed by the data to be transmitted when the data to be transmitted is transmitted along the link of the current route set, and at least one data to be transmitted is data for communication among different processing units included in the target computing system.

In the process, a data basis is provided for subsequent transmission route calculation by acquiring at least one data to be transmitted; based on the target path planning model, the transmission link of the data to be transmitted can be calculated, so that the transmission link can be planned, the data transmission path with the shortest link contention time can be obtained, the process that the data to be transmitted is transmitted along the transmission link with the shortest link contention time is realized, the communication conflict in the data transmission process is reduced, the waiting time of data transmission is reduced, the calculation efficiency of a calculation system is improved, and the high-speed operation of the calculation system is ensured.

Therefore, through the technical scheme of the invention, the purpose of reducing the communication link contention in the computing system based on the network on chip is achieved, so that the technical effects of reducing the link waiting time and improving the computing efficiency of the computing system are realized, and the technical problem of the communication link contention in the computing system based on the network on chip in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a flowchart of a method for determining an alternative data transmission path according to an embodiment of the present invention;

FIG. 2 is a diagram of an alternative computing task, according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an alternative architecture of NoC-based HCSs according to an embodiment of the invention;

FIG. 4 is a flow diagram of an alternative dominant actor critic based algorithm for routing according to an embodiment of the present invention;

FIG. 5 is a flow diagram of an alternative method of determining the operation of a computing task, according to an embodiment of the invention;

fig. 6 is a schematic diagram of an alternative data transmission path determining apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the related information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) related to the present invention are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or organization, before obtaining the relevant information, an obtaining request needs to be sent to the user or organization through the interface, and after receiving the consent information fed back by the user or organization, the relevant information is obtained.

Example 1

Where a method embodiment of a method for determining a data transmission path is provided according to an embodiment of the present invention, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

Fig. 1 is a flowchart of an alternative data transmission path determining method according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

step S101, at least one data to be transmitted is obtained, wherein the at least one data to be transmitted is data communicated between different processing units included in the target computing system.

In the above steps, at least one piece of data to be transmitted may be acquired by an application system, a processor, an electronic device, or the like. Optionally, fig. 2 is an alternative calculation task diagram according to an embodiment of the present invention, as shown in fig. 2, including 4 circular nodes v ₀ ～v ₃ Each circular node represents a computation task, each computation task needs to be handed over to the processing unit PE for processing, and the unidirectional edges between the nodes represent the pre-relationships between the tasks, i.e. in fig. 2, v ₀ ，v ₁ ，v ₂ Is v ₃ Task v, a pre-task of ₃ Need to wait v ₀ ，v ₁ ，v ₂ All are calculated and the data of the calculation results are transmitted to v ₃ And then execution can begin. Optionally, the data to be transmitted may be the calculation result data.

It should be noted that, in the above process, by obtaining at least one to-be-transmitted data, a data basis is provided for subsequent transmission route calculation.

And S102, sequentially inputting at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and outputting a target path of each piece of data to be transmitted, wherein the target path planning model is obtained based on training of a preset algorithm, the target path is a path with shortest link contention time in a preset range, and the link contention time is the waiting time required by the data to be transmitted along the links of the current route set.

In the above steps, the preset algorithm is an advantageous Actor Critic algorithm (A2C), and A2C is a reinforcement learning algorithm composed of an "Actor" (Actor) and a "Critic" (Critic). Reinforcement learning is an area under machine learning that focuses on training a subject how to act based on the current environment to achieve maximum expected benefit. The training process of reinforcement learning generally comprises: the main body (agent) observes the environment (environment) at time t and knows that the main body is in state (state) S _t The agent then selects an action (action) a from the action space A _t Executing, the environment follows the state transfer function

Slave state S _t Transition to S _t+1 . Then rewarding function R (S) _t ) Giving a new state S _t+1 Is used to train the subject to learn the value of the step. This process may continue until the current state has caused the environment to end (e.g., when the environment is a maze, the principal has walked out of the maze) or the developer terminates prematurely.

Specifically, actor and Critic can be regarded as two neural networks, and the Actor accepts the state S of the subject at time t _t As input, and outputting that the subject is in state S _t Action a to be performed next _t Critic commenting action a _t I.e. output a value Q (S) _t ，a _t ) For measuring in state S _t Take action a _t After that, the value ofThrough Q (S) _t ，a _t ) To further calculate the dominance value A (S) _t ，a _t ) The calculation method of the merit value is A (S) _t ，a _t )＝Q(S _t ，a _t )-V(S _t ) Wherein, V (S) _t ) Is in a state S _t Is the average value of _t The state-action values Q (S, a) corresponding to all possible actions to be taken next are multiplied by the probability of taking the action and summed.

Optionally, the computation task in fig. 2 is mapped to PE and converted into NoC-based HCS as shown in fig. 3 _s A schematic diagram of an architecture to more intuitively display a routing situation of data transmission, as shown in fig. 3, a circular node is a PE with computing capability, a computing task is allocated to the PE to be executed, after the execution is finished, the computing result data is transmitted to the next computing task (if any) through a router and a communication link, a square node is a router connected with the PE and is responsible for on-chip communication of the NoC, all routers and between routers and the PE can be connected together through a bidirectional transmission link to perform data transmission and interaction, optionally, a link from the router m to the router n is denoted as l _m,n 。

Optionally, since the routers and the PEs may be connected together through bidirectional links for data transmission and interaction, there may be a link contention condition. For example, as shown in FIG. 3, assume v ₀ ,v ₁ ,v ₂ Three tasks are simultaneously calculated and completed, namely v is about to be measured ₃ Data is transmitted, this time by task v ₀ To v ₃ The data transmission needs to pass through the link l _0,1 ,l _1,4 And task v ₁ To v ₃ The data transmission of (1) also needs to pass through the link _1,4 Task v ₂ To v ₃ The same applies to the data transmission of (1), so that the three calculation result data are in the link (l) _1,4 Contention occurs, and the link l can be used only after other data transmission is finished _1,4 The computational efficiency of the entire HCSs is reduced and unnecessary latency is increased. Therefore, in the present invention, after acquiring at least one data to be transmitted, comments are made based on an advantageous actorAnd the home algorithm is used for respectively carrying out route calculation on at least one data to be transmitted to obtain a route set of each data to be transmitted, thereby providing a data basis for path planning.

Alternatively, the route set may be a link through which the calculation result data is transmitted from the current router to the target router, for example, by task v in fig. 3 ₀ To v ₃ The data transmission of (2) needs to pass through the link l _0,1 ,l _1,4 Link l _0,1 And a link l _1,4 Form a route set l _0,1 ,l _1,4 }. The preset range can be the training times set by the developer according to the requirement, so as to match the task v in FIG. 3 ₀ To v ₃ For example, when starting the routing calculation, the number of training times is set to 10, and the set of routes obtained by the first training is { l } _0,1 ,l _1,4 The route set corresponds to a link contention time, and the route set obtained by the second training is { l } _0,3 ,l _3,4 And f, the route set also corresponds to a link contention time, and so on until the training is finished, and the target path is the path with the shortest link contention time in the 10 times of training. Among them, there is a link contention time because different PEs require different times to process different computation tasks and, for a complex task graph, there are cases where multiple data are transmitted in parallel.

It should be noted that, in the foregoing process, by outputting the target path of each to-be-transmitted data, the to-be-transmitted data can be transmitted along the transmission link with the shortest link contention time, so that communication conflicts in the data transmission process are reduced, the waiting time for data transmission can be reduced, and the calculation efficiency of the computing system is further improved.

Based on the schemes defined in the steps S101 to S102, it can be known that, in the embodiment of the present invention, a data transmission path is planned through a reinforcement learning algorithm, at least one to-be-transmitted data is first obtained, then the at least one to-be-transmitted data is sequentially input into a target path planning model according to a target calculation sequence, and a target path of each to-be-transmitted data is output. The target path planning model is obtained based on preset algorithm training, the target path is a path with the shortest link contention time in a preset range, the link contention time is the waiting time needed by the data to be transmitted when the data to be transmitted is transmitted along the link of the current route set, and at least one data to be transmitted is data for communication among different processing units included in the target computing system.

It is easy to note that, in the above process, by obtaining at least one data to be transmitted, a data basis is provided for the subsequent transmission route calculation; based on the target path planning model, the transmission link of the data to be transmitted can be calculated, so that the transmission link can be planned, the data transmission path with the shortest link contention time can be obtained, the process that the data to be transmitted is transmitted along the transmission link with the shortest link contention time is realized, the communication conflict in the data transmission process is reduced, the waiting time of data transmission is reduced, the calculation efficiency of a calculation system is improved, and the high-speed operation of the calculation system is ensured.

In an alternative embodiment, the target path planning model is generated by: acquiring at least one sample data to be transmitted; and training the first neural network and the second neural network of the preset algorithm according to at least one sample data to be transmitted to obtain a target path planning model.

In an optional embodiment, after obtaining at least one sample data to be transmitted, selecting a first sample data to be transmitted from the at least one sample data to be transmitted, then obtaining a current position of the first sample data to be transmitted in a transmission link, obtaining a first router, then determining at least one transmission direction of the first sample data to be transmitted according to a first neural network, and then recording the at least one transmission direction, so as to obtain a route set of the first sample data to be transmitted. The first router is used for transmitting first sample data to be transmitted, and at least one transmission direction is used for indicating the first sample data to be transmitted to the target router from the first router.

Optionally, although reinforcement learning is widely known as a mature technology in the field of artificial intelligence, due to the complexity of the task graphs currently applied, each task graph usually has tens or even hundreds of data transmissions to be routed, and therefore, it is still difficult to apply reinforcement learning to the routing planning of network-on-chip communication.

Optionally, modeling is performed first, and an environment (environment), a training agent (agent), an action space (action space), and a reward (reward) are defined, so that the reinforcement learning algorithm can be applied to the routing planning of network on chip communication. Specifically, the concept in reinforcement learning corresponding to the data in the NoC is defined as follows: an environment (environment) is defined as an occupation situation of a communication link in the NoC, and it should be noted that the communication link may be occupied by other data to be transmitted in a certain time period. Defining a training agent (agent) as a certain data m to be transmitted (i.e. sample data to be transmitted), although the data is a continuous process during transmission, in the process of route planning, the data can be regarded as being transmitted hop by hop between each router, and after the agent reaches one router, an action (action) needs to be taken again to reach the next router. In order to make the data eventually definitely reach the destination router, it is specified that the data is not transmitted in the reverse direction, and therefore there are two actions (actions) defining the training body, namely, transmission along the X axis to the destination router and transmission along the Y axis to the destination router. The state (state) is defined as the router that has been passed by the time the data was transmitted. The reward (reward) for a state is defined as the inverse of the contention time of the link waiting for transmitting data along the route corresponding to the state, i.e., the shorter the contention time, the higher the reward.

Alternatively, taking the task graph in fig. 2 and the NoC architecture in fig. 3 as examples, assume task v ₀ To transmit data to v ₃ That is, the route from router No. 0 to router No. 4 needs to be calculated, and the initial state of the data transmission is S ₀ = 0, i.e. the data stays at router number 0 at the beginning, the next time step (time step), the data is transmitted one hop down to router number 3, and the state is S at this time ₁ = {0,3}, destination router No. 4 has not been reached; and at the next time step, transmitting data to the right for one hop to reach the router No. 4, wherein the state is S ₂ = 0,3,4, the data arrives at the destination router. Wherein the state of each time step may correspond to a route, e.g. state S ₂ The represented route is that data is transmitted from the router No. 0 to the router No. 3, and then from the router No. 3 to the router No. 4.

Optionally, after defining the concept in reinforcement learning corresponding to the data in the NoC, a reinforcement learning algorithm may be used for training. Optionally, in the embodiment of the present invention, an algorithm (i.e., a preset algorithm) of the dominant actor critic is used for training. Specifically, at time step t, the training agent has reached a router but has not yet reached the final destination, waiting for the "Actor" (Actor) to tell the training agent which direction it should transmit in at the next time step. An "Actor" (Actor) gives an action a _t Training a subject to perform action A _t And transmitted to the next router with the current state being S _t Transfer to S _t+1 . After this, "Critic" (Critic) will give the status S according to its current own understanding _t And S _t+1 Then calculates the dominance value based on the reward given, and then uses the dominance value to update the "actor" and "critic" strategies to enable them to make a smarter choice the next time. Optionally, fig. 4 is a flowchart of an optional routing planning based on a dominant actor critic algorithm according to an embodiment of the present invention, and as shown in fig. 4, a data m to be transmitted and a NoC communication link usage u are given and initialized, so that the data m to be transmitted and the NoC communication link usage u will be sent

As the best state, and then judging whether or notThe maximum training times are reached, if yes, S is returned _b (ii) a If the maximum training frequency is not reached, the state of the initial training subject is S = S ₀ . Then, whether the state S is finished or not is judged, if yes, the current state is compared with the optimal state S _b If S is better, the state is updated to S _b = S; if state S is not over, "actor" gives action A _t The subject performs action A _t The current state is represented by S _t Transfer to S _t+1 Then "critic" evaluates status S according to u _t And S _t+1 Then, the dominance value is calculated, and the strategies of "actor" and "critic" are updated.

Optionally, for a calculation task graph which needs to transmit data for multiple times, each outgoing edge in the task graph represents one data transmission, and in the calculation process, a route planning method based on a dominant actor critic algorithm needs to be called multiple times to calculate a route for each data transmission, so that firstly, an order, namely a target calculation order, needs to be arranged for the outgoing edges of the routes to be calculated to determine which outgoing edge route is calculated first, and a specific ordering manner is described in detail later.

Optionally, after obtaining at least one sample data to be transmitted, the data that is first subjected to the routing calculation, that is, the first sample data to be transmitted, is selected, and then the current position of the data in the transmission link is obtained, for example, the sample data to be transmitted stays in the router No. 0 at the beginning, and the current position is the router No. 0, that is, the first router.

Further, according to a first neural network of a preset algorithm, at least one transmission direction of the first sample data to be transmitted may be determined. Optionally, the transmission direction may be along the X axis to the destination router, or along the Y axis to the destination router. Specifically, suppose that the data to be transmitted stays in router No. 0 at the current moment, at this time, the initial state is S ₀ The set of routes in the route planning scheme is empty, and then the training agent, i.e. the data to be transmitted, asks the Actor, i.e. the first neural network, what action it should now perform, i.e. whether it should transmit along the X axis to the destination router or whether it should transmit along the X axisAlong the Y-axis, the Actor selects an action from all actions that can be performed, assuming that a is ₀ (transmitting along the X axis to the destination router), the training body, i.e. the data to be transmitted, is executed, for example, the Actor tells the training body that the data to be transmitted should be transmitted along the X axis to the destination router, and occupies the link l _0,1 And carrying out transmission.

Further, at least one transmission direction is recorded, and a route set of the first sample data to be transmitted is obtained. Specifically, after the training agent, i.e., the data to be transmitted, performs the above-mentioned action, it comes to router No. 1, and its state is from S ₀ Becomes S ₁ The route set in the route planning scheme adds l _0,1 Become { l _0,1 }. At the next moment, the steps are repeated, the training main body, namely the data to be transmitted, continues to inquire the Actor, namely the first neural network, at the moment, the Actor tells the training main body that the data to be transmitted should be transmitted to the destination router along the Y axis, namely, the link l is occupied _1,4 The training subject, i.e. the data to be transmitted, performs the above-mentioned action and then comes to router number 4, and its state is from S ₁ Becomes S ₂ The route set in the route planning scheme adds l _1,4 Become { l _0,1 ,l _1,4 State S, since router number 4 is the end of the executive body ₂ To end, the route set { l _0,1 ,l _1,4 The route planned for the data m to be transmitted in the training is.

In an optional embodiment, after at least one transmission direction is recorded to obtain a route set of the first sample data to be transmitted, the route set is evaluated according to the second neural network to obtain an evaluation result, and then a target path of the first sample data to be transmitted is determined based on the evaluation result. Wherein, the evaluation result is used for characterizing the link contention time of the route set.

In an optional embodiment, in the process of determining the target path of the first sample data to be transmitted based on the evaluation result, the link contention time of the current route set is compared with the link contention time of the target route set, and in the case that the link contention time of the current route set is smaller than the link contention time of the target route set, the target route set is updated, and then the target path is generated according to the target route set.

Optionally, the target routing set may be an individually recorded optimal routing set, after a new training and learning, the model provides the current routing set, that is, the current routing set, compares the link contention time of the current routing set with the link contention time corresponding to the individually recorded optimal routing set, that is, the target routing set, updates the recorded optimal routing set under the condition that the link contention time of the current routing set is less than the link contention time of the target routing set, and then generates the target path according to the optimal routing set.

In an optional embodiment, in the process of evaluating the route set according to the second neural network to obtain the evaluation result, a link state set occupied by the routes in the current route set is obtained, and then the link contention time of the current route set is calculated based on the link state set to obtain the evaluation result. Wherein the link state set is used for characterizing the use condition of the link.

Optionally, in the reinforcement learning training process, the training learning can be performed only when the waiting link contention time for transmitting data needs to be known. Generally, the link contention time needs to be monitored for clock cycles when the link contention time runs on a real chip or is simulated to run on platforms such as a GEM5, and if the GEM5 needs to be called to completely simulate a task graph to run every training, a large amount of unnecessary time overhead is brought, and the interaction overhead among the cross-platforms is also a problem. The present invention thus provides a method of estimating link contention time that can be run using all programming languages after a given task graph.

Specifically, for a data m to be transmitted, given a planned route r and a set U of link usage of each communication link in the current NoC, a link contention time to be waited for the data to be transmitted along the route r is estimated. The transmission time required for transmitting the data m depends on the data mSize, the transmission time of m is denoted as T. The end time of the source task of m is recorded as t ₁ ,t ₁ And at the earliest m times can start transmission. In the ideal case, the links that route r needs to use do not have any contention, then m may be at time t ₁ Start transmission at time t ₁ + T end transmission, with time T ₁ + T as T ₂ I.e. in the ideal case m has a transmission time interval of t ₁ ，t ₂ ]. At this time, in the link use case set U, the time interval [ t ] is set for all the links in the route r ₁ ，t ₂ ]The flag is occupied by data m.

However, in practical scenarios, link contention is unavoidable. At time t ₁ The current link usage set U is queried and it is checked whether the communication link that needs to be used when transmitting along the route r is occupied by other data transmissions. If there is a link in route r occupied by other data transmission, then m needs a waiting time t _c Until all links in r are available. In this case, the transmission time interval of m becomes t ₁ +t _c ，t ₂ +t _c ],t _c Is the estimated link contention time.

Therefore, to calculate t _c First, the time interval t needs to be known by querying the link usage set U ₁ ，t ₂ ]Which links in route r are occupied by other data transmissions. Specifically, a state set of links occupied by all planned routes is obtained, and the state set of the links represents the occupied condition of each link in all time periods.

Similarly, when other data occupies the link, a time interval is also given in U to indicate which time period the data needs to occupy the link, so that in all the links occupied by other data transmission, the time interval [ t ] is occupied ₁ ，t ₂ ]Find the latest data ending transmission and record the transmission ending time as t' ₂ . Subsequently, t' ₂ -t ₁ As t _c And then checking the value in the time zone [ 2 ]t ₁ +t _c ，t ₂ +t _c ]Whether all links in route r are available. If so, then t at this time _c Is the final result of the estimation; if not, then gradually increasing t _c Is increased by 1 each time and attempts to check for a new time interval t ₁ +t _c ，t ₂ +t _c ]Whether all links in the route r are available until all links are available.

Further, an estimated t is obtained _c After the value of (d), the link use condition set U is updated, and the time interval [ t ] is set for the link used in the route r ₁ +t _c ，t ₂ +t _c ]The flag is occupied by message m in order to calculate the contention time for other routes next.

In an optional embodiment, before the route set is evaluated according to the second neural network to obtain an evaluation result, at least one transmission direction is evaluated according to the second neural network to obtain an incentive value of the at least one transmission direction, then an advantage value of the at least one transmission direction is calculated according to the incentive value, then the first neural network and the second neural network are updated based on the advantage value to obtain an updated first neural network and an updated second neural network, and the target path planning model is generated under the condition that the updated first neural network and the updated second neural network meet preset conditions. The reward value is used for measuring the quality of at least one transmission direction, the advantage value represents the quality degree of the current state of the first sample data to be transmitted compared with the target state, and the target state represents the average value of all states of the first sample data to be transmitted.

Optionally, at t ₁ At time subject agent is in state S (t) ₁ ) It performs different actions that may reach different states at time t, so state S (t) is many possible. The meaning of the merit value is that after the action given by the Actor has been performed, its state S (t) has been determined at time t, and the bonus value of this determined S (t) is compared with the bonus average of all possible occurrences of S (t). I.e. the dominance value refers to the dominance of the currently determined state S (t) with respect to the average state at time t. It should be noted that, by comparing the current state value with the average state value, the convergence rate of the training can be increased.

Optionally, in the foregoing process, the training subject, that is, the data to be transmitted, executes the action a notified by the Actor ₀ Then, critic, the second neural network, will be in state S for the training subject, i.e. the data to be transmitted ₀ Performing action a ₀ And evaluating, giving a reward value reward, calculating an advantage value according to the reward, updating the strategies of Actor and Critic according to the advantage value to make the strategies smarter and better judgment at the next moment. Optionally, the preset condition may be that a preset training frequency is reached, for example, the preset training frequency is set to 10 times when training is started, and after the frequency of updating the strategies of Actor and Critic reaches 10 times according to the advantage value, the target path planning model is obtained.

In an optional embodiment, after the at least one piece of data to be transmitted is obtained, the at least one piece of data to be transmitted is sorted based on a preset principle to obtain a sorting result, and then a sequence of performing routing calculation on the at least one piece of data to be transmitted is determined according to the sorting result to obtain a target calculation sequence.

Optionally, as mentioned above, for a calculation task graph that needs to transmit data multiple times, each outgoing edge in the task graph represents one data transmission, and in the calculation process, a route planning method based on the dominant actor critic algorithm needs to be called multiple times to calculate a route for each data transmission, so that an order, that is, a target calculation order, needs to be arranged for outgoing edges of the route to be calculated first to determine which outgoing edge route is calculated first. Specifically, an empty queue following a first-in first-out principle is initialized, when the whole algorithm starts to execute, the outgoing edge of a task which can be executed immediately (namely, the task does not need to wait for other preposed tasks to transmit data to the task) is added into the queue, and the sequence of the data to be transmitted is arranged according to a 'short task priority' principle, namely, a preset principle. And (2) taking the outgoing edge of the first queue and calculating a route for the outgoing edge, deleting the outgoing edge from the queue after the route calculation of one edge is finished, checking whether a task can be executed after the data transmission corresponding to the outgoing edge is finished, and adding all the outgoing edges (which are arranged in sequence according to the principle of 'short task priority') of the task into the queue if the task can be executed. And finishing the route planning of all the data to be transmitted until the queue is empty again.

It should be noted that, in the above process, by sequencing at least one to-be-transmitted data, the calculation sequence of the to-be-transmitted data is reasonably planned, the waiting time for data transmission is further reduced, and the calculation efficiency of the calculation system is improved.

In some other optional embodiments, the route planning method proposed by the present invention may be applied to the calculation of a NoC, so as to improve the calculation speed of the NoC. Specifically, in the embodiment of the present invention, a NoC (neural network on chip) based on a two-dimensional Mesh structure 2D-Mesh is used.

Optionally, fig. 5 is a flowchart of an optional method for determining running of a computation task according to an embodiment of the present invention, and as shown in fig. 5, for a given task graph and a NoC architecture on which the task graph runs, first, it is necessary to map each task node (computation task) in the task graph to each PE of the NoC, where a mapping scheme may be selected by a developer according to a requirement, and is not limited herein. After the mapping scheme is determined, the optimal routing transmission scheme is calculated for the routing planning among the PE mapped with the tasks by the method for performing the routing planning based on the dominant actor critic algorithm, and the data transmission with the minimized link contention time is realized. And after the mapping scheme and the routing planning result are obtained, returning the calculation method comprising the mapping scheme and the routing planning result.

Optionally, the returned calculation method may also be imported into the GEM5 simulator to simulate the operation of the task graph. Specifically, in order to test the effect of the invention, three common task charts, namely audiogram, autocor and H.264, are selected as experimental objects. The detailed parameters of the task graph are shown in table 1. Wherein, the in/out degree represents the maximum in/out degree of the node in the task graph.

TABLE 1 task graph parameters

Specifically, a 2D-Mesh NoC based on a two-dimensional Mesh structure is set on a GEM5 simulator platform as an experimental platform, and experiments are carried out under two sizes of 8 × 8 and 16 × 16 and compared with other route planning methods. The NoC infrastructure used for the experiments was arssmart, which is an improved SMART NoC design that supports arbitrary turn-around transmission.

Specifically, in the aspect of time consumption of the algorithm, compared with the current SOTA route planning method, the method provided by the invention has the advantages of obviously reducing the time consumption and running quickly. As shown in table 2, the time in table 2 is the time required for running the route planning method for one time to perform route planning for the task graph under fixed allocation.

TABLE 2 time-consuming list of algorithm

Specifically, in terms of reducing link contention time, the method provided by the invention obtains the effect which is not weaker than that of the prior art compared with the XY routing algorithm which is widely applied and the current SOTA routing planning method. As shown in table 3, the time in table 3 is the contention time waiting for transmitting data according to the route proposed by the route planning method, and the unit of measurement is the number of clock cycles.

TABLE 3 Contention time List

/>

Example 2

According to an embodiment of the present invention, an embodiment of a device for determining a data transmission path is provided, where fig. 6 is a schematic diagram of an optional device for determining a data transmission path according to an embodiment of the present invention, and as shown in fig. 6, the device includes: an obtaining module 601, configured to obtain at least one piece of data to be transmitted, where the at least one piece of data to be transmitted is data for performing communication between different processing units included in a target computing system; the determining module 602 is configured to sequentially input at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and output a target path of each piece of data to be transmitted, where the target path planning model is obtained based on training of a preset algorithm, the target path is a path with the shortest link contention time within a preset range, and the link contention time is a time that needs to be waited when the data to be transmitted is transmitted along a link of a current route set.

It should be noted that the acquiring module 601 and the determining module 602 correspond to steps S101 to S102 in the foregoing embodiment, and the two modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1.

Optionally, the apparatus for determining the data transmission path further includes: the first acquisition module is used for acquiring at least one sample data to be transmitted; the first training module is used for training a first neural network and a second neural network of a preset algorithm according to at least one sample data to be transmitted to obtain a target path planning model.

Optionally, the apparatus for determining a data transmission path further includes: the second acquisition module is used for selecting first sample data to be transmitted from at least one sample data to be transmitted; the third obtaining module is configured to obtain a current position of the first to-be-transmitted sample data in the transmission link, and obtain a first router, where the first router is configured to transmit the first to-be-transmitted sample data; the first determining module is used for determining at least one transmission direction of the first sample data to be transmitted according to the first neural network, wherein the at least one transmission direction is used for indicating the first sample data to be transmitted from the first router to the target router; and the first recording module is used for recording at least one transmission direction to obtain a route set of the first sample data to be transmitted.

Optionally, the apparatus for determining the data transmission path further includes: the first evaluation module is used for evaluating the route set according to the second neural network to obtain an evaluation result, wherein the evaluation result is used for representing the link contention time of the route set; and the second determination module is used for determining a target path of the first sample data to be transmitted based on the evaluation result.

Optionally, the first evaluation module includes: a fourth obtaining module, configured to obtain a link state set occupied by a route in a current route set, where the link state set is used to characterize a use condition of a link; and the first calculation module is used for calculating the link contention time of the current routing set based on the link state set to obtain an evaluation result.

Optionally, the second determining module includes: the first comparison module is used for comparing the link contention time of the current route set with the link contention time of the target route set; a first updating module, configured to update the target route set if the link contention time of the current route set is less than the link contention time of the target route set; and the first generating module is used for generating a target path according to the target route set.

Optionally, the apparatus for determining a data transmission path further includes: the evaluation module is used for evaluating at least one transmission direction according to the second neural network to obtain an award value of at least one transmission direction, wherein the award value is used for measuring the quality of at least one transmission direction; the second calculation module is used for calculating an advantage value of at least one transmission direction according to the reward value, wherein the advantage value represents the quality degree of the current state of the first sample data to be transmitted compared with the target state, and the target state represents the average value of all states of the first sample data to be transmitted; the second updating module is used for updating the first neural network and the second neural network based on the dominant value to obtain an updated first neural network and an updated second neural network; and the second generation module is used for generating the target path planning model under the condition that the updated first neural network and the updated second neural network meet preset conditions.

Optionally, the apparatus for determining a data transmission path further includes: the sorting module is used for sorting at least one data to be transmitted based on a preset principle to obtain a sorting result; and the third determining module is used for determining the sequence of performing route calculation on at least one data to be transmitted according to the sequencing result to obtain a target calculation sequence.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned method for determining a data transmission path when running.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, wherein fig. 7 is a schematic diagram of an alternative electronic device according to the embodiments of the present invention, as shown in fig. 7, the electronic device includes one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for executing a program, wherein the program is arranged to execute the method for determining a data transmission path described above. The processor executes the program to realize the following steps: acquiring at least one piece of data to be transmitted, wherein the at least one piece of data to be transmitted is data communicated among different processing units contained in a target computing system; and sequentially inputting at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and outputting a target path of each piece of data to be transmitted, wherein the target path planning model is obtained based on training of a preset algorithm, the target path is a path with the shortest link contention time in a preset range, and the link contention time is the waiting time required by the data to be transmitted along the links of the current route set.

Optionally, the following steps are also implemented when the processor executes the program: generating a target path planning model by: acquiring at least one sample data to be transmitted; and training the first neural network and the second neural network of the preset algorithm according to at least one sample data to be transmitted to obtain a target path planning model.

Optionally, the processor executes the program to further implement the following steps: selecting first sample data to be transmitted from at least one sample data to be transmitted; the method comprises the steps of obtaining the current position of first sample data to be transmitted in a transmission link to obtain a first router, wherein the first router is used for transmitting the first sample data to be transmitted; determining at least one transmission direction of first sample data to be transmitted according to the first neural network, wherein the at least one transmission direction is used for indicating the first sample data to be transmitted from the first router to the target router; and recording at least one transmission direction to obtain a route set of the first sample data to be transmitted.

Optionally, the following steps are also implemented when the processor executes the program: evaluating the route set according to the second neural network to obtain an evaluation result, wherein the evaluation result is used for representing the length of the link contention time of the route set; and determining a target path of the first sample data to be transmitted based on the evaluation result.

Optionally, the following steps are also implemented when the processor executes the program: acquiring a link state set occupied by a route in a current route set, wherein the link state set is used for representing the use condition of the link; and calculating the link contention time of the current routing set based on the link state set to obtain an evaluation result.

Optionally, the processor executes the program to further implement the following steps: comparing the link contention time of the current route set with the link contention time of the target route set; under the condition that the link contention time of the current route set is less than that of the target route set, updating the target route set; and generating a target path according to the target route set.

Optionally, the following steps are also implemented when the processor executes the program: evaluating at least one transmission direction according to the second neural network to obtain an award value of the at least one transmission direction, wherein the award value is used for measuring the advantages and disadvantages of the at least one transmission direction; calculating an advantage value of at least one transmission direction according to the reward value, wherein the advantage value represents the goodness of the current state of the first sample data to be transmitted compared with the target state, and the target state represents the average value of all states of the first sample data to be transmitted; updating the first neural network and the second neural network based on the dominance values to obtain an updated first neural network and an updated second neural network; and generating a target path planning model under the condition that the updated first neural network and the updated second neural network meet preset conditions.

Optionally, the processor executes the program to further implement the following steps: based on a preset principle, sequencing at least one data to be transmitted to obtain a sequencing result; and determining the sequence of the routing calculation of at least one data to be transmitted according to the sequencing result to obtain a target calculation sequence.

The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for determining a data transmission path, comprising:

acquiring at least one piece of data to be transmitted, wherein the at least one piece of data to be transmitted is data communicated among different processing units contained in a target computing system;

and sequentially inputting the at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and outputting a target path of each piece of data to be transmitted, wherein the target path planning model is obtained based on training of a preset algorithm, the target path is a path with the shortest link contention time within a preset range, and the link contention time is the waiting time required by the data to be transmitted along the links of the current route set.

2. The method of claim 1, wherein the target path planning model is generated by:

acquiring at least one sample data to be transmitted;

and training the first neural network and the second neural network of the preset algorithm according to the at least one sample data to be transmitted to obtain the target path planning model.

3. The method of claim 2, wherein after obtaining at least one sample data to be transmitted, the method further comprises:

selecting first sample data to be transmitted from the at least one sample data to be transmitted;

obtaining the current position of the first sample data to be transmitted in a transmission link to obtain a first router, wherein the first router is used for transmitting the first sample data to be transmitted;

determining at least one transmission direction of the first sample data to be transmitted according to the first neural network, wherein the at least one transmission direction is used for indicating that the first sample data to be transmitted is transmitted from the first router to a target router;

and recording the at least one transmission direction to obtain a route set of the first sample data to be transmitted.

4. The method according to claim 3, wherein after recording the at least one transmission direction and obtaining the route set of the first sample data to be transmitted, the method further comprises:

evaluating the route set according to the second neural network to obtain an evaluation result, wherein the evaluation result is used for representing the link contention time of the route set;

and determining a target path of the first sample data to be transmitted based on the evaluation result.

5. The method of claim 4, wherein evaluating the set of routes according to the second neural network to obtain an evaluation result comprises:

acquiring a link state set occupied by a route in a current route set, wherein the link state set is used for representing the use condition of the link;

and calculating the link contention time of the current routing set based on the link state set to obtain the evaluation result.

6. The method according to claim 5, wherein determining a target path for the first sample data to be transmitted based on the evaluation result comprises:

comparing the link contention time for the current set of routes to the link contention time for the target set of routes;

updating the target set of routes if the link contention time for the current set of routes is less than the link contention time for the target set of routes;

and generating the target path according to the target route set.

7. The method of claim 3, wherein before evaluating the set of routes according to the second neural network to obtain an evaluation result, the method further comprises:

evaluating the at least one transmission direction according to the second neural network to obtain an incentive value of the at least one transmission direction, wherein the incentive value is used for measuring the quality of the at least one transmission direction;

calculating a dominance value of the at least one transmission direction according to the reward value, wherein the dominance value represents a relative goodness of a current state of the first sample data to be transmitted with respect to a target state, and the target state represents an average value of all states of the first sample data to be transmitted;

updating the first neural network and the second neural network based on the dominance values to obtain an updated first neural network and an updated second neural network;

and generating the target path planning model under the condition that the updated first neural network and the updated second neural network meet preset conditions.

8. The method of claim 1, wherein after obtaining at least one data to be transmitted, the method further comprises:

based on a preset principle, sequencing the at least one data to be transmitted to obtain a sequencing result;

and determining the sequence of performing route calculation on the at least one data to be transmitted according to the sequencing result to obtain the target calculation sequence.

9. An apparatus for determining a data transmission path, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring at least one piece of data to be transmitted, and the at least one piece of data to be transmitted is data communicated among different processing units contained in a target computing system;

the determining module is configured to sequentially input the at least one piece of data to be transmitted into a target path planning model according to a target calculation sequence, and output a target path of each piece of data to be transmitted, where the target path planning model is obtained based on training of a preset algorithm, the target path is a path with the shortest link contention time within a preset range, and the link contention time is a time that needs to wait when the data to be transmitted is transmitted along a link of a current route set.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the method for determining a data transmission path according to any one of claims 1 to 8 when executed.

11. An electronic device, wherein the electronic device comprises one or more processors; memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the method for determining a data transmission path of any one of claims 1 to 8 when run.