CN115956247A

CN115956247A - Neural network model optimization method and device

Info

Publication number: CN115956247A
Application number: CN202080103328.XA
Authority: CN
Inventors: 焦建兵; 张卫兵
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2023-04-11
Also published as: CN115956247A8; WO2022041015A1

Abstract

The application provides a neural network model optimization method and device, relates to the technical field of artificial intelligence, and is used for improving the calculation performance of a neural network model and reducing the time for executing a calculation task. The method comprises the following steps: acquiring a first computational graph of the neural network model; generating a second calculation graph according to a preset rule and the first calculation graph; aiming at the same input data, the time for calculating the input data by the second computation graph is less than that for calculating the input data by the first computation graph; the preset rules include at least one of: a mathematical split rule, an instruction fusion rule, an instruction split rule, and a hardware fusion rule; and outputting the second calculation map. Therefore, the first computational graph of the neural network model is optimized into the second computational graph which has stronger computational performance and needs less computation time when a computational task is executed. Therefore, the calculation speed of the neural network model in executing the calculation task is improved, and the time for executing the calculation task is reduced.

Description

Neural network model optimization method and device

Technical Field

The present application relates to the technical field of Artificial Intelligence (AI), and in particular, to a neural network model optimization method and apparatus.

Background

In neural network models, the computational processes of the neural network model are typically characterized by computational graphs (graphs). The calculation graph of the neural network model is obtained by splitting each neuron in the neural network model into tensor data-oriented operators. The calculation graph can represent the mathematical expression of each operator and the connection relation between the operators, namely the mathematical expression of the neurons of the neural network model and the connection relation between the neurons.

Because the neural network structure is usually complex, after the neural network model is mapped into a computation graph, the topology of the computation graph is also complex, the computation complexity is high, and the computation time required for executing the computation task is long.

Disclosure of Invention

The application provides a neural network model optimization method and device, and solves the problem that in the prior art, when a calculation task is executed through a neural network model, the required calculation time is long.

In order to solve the problems, the following technical scheme is adopted in the application:

in a first aspect, a neural network model optimization method is provided, including: acquiring a first computational graph of the neural network model; generating a second calculation graph according to a preset rule and the first calculation graph; wherein, the time for calculating the first input data by using the second computation graph is less than the time for calculating the first input data by using the first computation graph; the preset rules include at least one of: a mathematical fusion rule, a mathematical split rule, an instruction fusion rule, an instruction split rule, and a hardware fusion rule; and outputting the second calculation map.

Based on the technical scheme, the neural network model optimization method provided by the application can optimize the first calculation graph of the neural network model into the second calculation graph which has stronger calculation performance and requires less calculation time when a calculation task is executed. Therefore, the calculation speed of the neural network model in executing the calculation task is improved, and the time required by the neural network model in executing the calculation task is reduced.

Correspondingly, when a terminal device (hereinafter referred to as a terminal device) configured with the neural network model calls the neural network model to execute a calculation task, the neural network model is optimized by using the neural network model optimization method provided by the embodiment of the application, so that the calculation performance of the terminal device can be improved, and the calculation time of the terminal device can be saved.

With reference to the first aspect, in a possible implementation manner, the mathematical fusion rule is: fusing a plurality of first computing nodes into a second computing node; the mathematical expression corresponding to the second computing node is as follows: the mathematical expressions corresponding to the first computing nodes are subjected to mathematical derivation to determine mathematical expressions; the time for computing the second input data using the plurality of first compute nodes is greater than the time for computing the second input data using the second compute nodes.

Based on this, after neural network model optimization device adopted the mathematics to fuse to first calculation graph, the computational node quantity of calculation graph that obtains was less, and the topological structure of calculation graph is simpler, and the computing power of calculation graph is stronger simultaneously, and the time that the calculation data needs is less. Therefore, when the neural network model optimization device adopts the mathematical fusion rule to optimize the calculation graph of the neural network model, the calculation performance of the neural network model calculation graph can be improved, and the calculation time required by the neural network model calculation graph to execute the calculation task is reduced.

With reference to the first aspect, in a possible implementation manner, the mathematical splitting rule is: the mathematical split rule is: splitting a third computing node into a plurality of fourth computing nodes; wherein, the mathematical expression corresponding to the third computing node is: the mathematical expressions corresponding to the plurality of fourth calculation nodes are subjected to mathematical derivation to determine mathematical expressions; the time to compute the third input data using the third computing node is greater than the time to compute the third input data using the plurality of fourth computing nodes.

Based on the method, the neural network model optimization device adopts a mathematical splitting rule, and after one computing node is split into a plurality of computing nodes, the time for executing the computing task by the plurality of split computing nodes is less than the time for executing the computing task by one computing node before splitting. Therefore, the neural network model optimizing device adopts the mathematical splitting rule to optimize the calculation graph of the neural network model, the calculation performance of the calculation graph of the neural network model can be improved, and the time required by the calculation graph of the neural network model to calculate data is reduced.

With reference to the first aspect, in a possible implementation manner, the instruction fusion rule is: according to the received node fusion instruction, fusing the fifth computing nodes into a sixth computing node; the node fusion instruction is used for indicating that a plurality of fifth computing nodes are fused into a sixth computing node; the time to compute the fourth input data using the plurality of fifth compute nodes is greater than the time to compute the fourth input data using the sixth compute node.

Based on this, after the neural network model optimization device adopts the instruction fusion rule to fuse the first calculation graph, the number of calculation nodes of the obtained calculation graph is less, the topological structure of the calculation graph is simpler, meanwhile, the calculation ability of the calculation graph is stronger, and the time required by data calculation is less. Therefore, when the neural network model optimization device adopts the instruction fusion rule to optimize the calculation graph of the neural network model, the calculation performance of the neural network model calculation graph can be improved, and the calculation time required by the calculation task executed by the neural network model calculation graph is reduced.

Further, the node fusion instruction in the instruction fusion rule may be an instruction manually input. At the moment, the neural network model optimization device can fuse nodes in the neural network model calculation graph according to the manually input instruction, and the application scene of the neural network model optimization method is improved.

With reference to the first aspect, in a possible implementation manner, the instruction splitting rule is configured to: splitting a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; wherein the node splitting instruction is used for instructing to split one seventh computing node into a plurality of eighth computing nodes; the time to compute the fifth input data using the seventh compute node is greater than the time to compute the fifth input data using the plurality of eighth compute nodes.

Based on the method, the neural network model optimization device adopts an instruction splitting rule, and after one computing node is split into a plurality of computing nodes, the time for executing the computing task by the plurality of split computing nodes is shorter than the time for executing the computing task by one computing node before splitting. Therefore, the neural network model optimization device adopts the instruction splitting rule to optimize the calculation graph of the neural network model, the calculation performance of the neural network model calculation graph can be improved, and the time required by the calculation of data of the neural network model calculation graph is reduced.

Furthermore, the node splitting instruction in the instruction fusion rule may be a manually entered instruction. At the moment, the neural network model optimization device can split the nodes in the neural network model calculation graph according to the manually input instruction, and the application scene of the neural network model optimization method is improved.

With reference to the first aspect, in a possible implementation manner, the hardware fusion rule is: the ninth computing node transmits data to the tenth computing node by adopting a first transmission path; the time for the ninth computing node to transmit data to the tenth node by adopting the first transmission path is shorter than the time for the ninth computing node to transmit data to the tenth node by adopting the second transmission path; the second transmission path is a transmission path for transmitting data from the ninth computation node to the tenth node in the first computation graph.

Based on the method, the neural network model can achieve the purposes of improving the calculation performance of the neural network model calculation graph and reducing the time required by the neural network model calculation graph to execute the calculation task by optimizing the transmission path of the data in the nodes.

In a second aspect, there is provided a neural network model optimization apparatus, including: a communication unit and a processing unit. The communication unit is used for acquiring a first computational graph of the neural network model; the processing unit is used for generating a second calculation graph according to a preset rule and the first calculation graph; the second computation graph computes the first input data in less time than the first computation graph; the preset rules include at least one of: a mathematical fusion rule, a mathematical split rule, an instruction fusion rule, an instruction split rule, and a hardware fusion rule; and the communication unit is also used for outputting the second calculation map.

With reference to the second aspect, in a possible implementation manner, the mathematical fusion rule is: fusing a plurality of first computing nodes into a second computing node; the mathematical expression corresponding to the second computing node is as follows: the mathematical expressions corresponding to the first computing nodes are subjected to mathematical derivation to determine mathematical expressions; the time for computing the second input data by the plurality of first computing nodes is longer than the time for computing the second input data by the second computing node.

With reference to the second aspect, in a possible implementation manner, the mathematical splitting rule is: splitting a third computing node into a plurality of fourth computing nodes; the mathematical expression corresponding to the third computing node is as follows: the mathematical expressions corresponding to the plurality of fourth calculation nodes are subjected to mathematical derivation to determine mathematical expressions; the time for calculating the third input data using the third computing node is greater than the time for calculating the third input data using the plurality of fourth computing nodes.

With reference to the second aspect, in a possible implementation manner, the instruction fusion rule is: according to the received node fusion instruction, fusing a plurality of fifth computing nodes into a sixth computing node; the node fusion instruction is used for indicating that a plurality of fifth computing nodes are fused into a sixth computing node; and the time for calculating the fourth input data by using the plurality of fifth calculation nodes is greater than the time for calculating the fourth input data by using the sixth calculation node.

With reference to the second aspect, in a possible implementation manner, the instruction splitting rule is configured to: splitting a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; the node splitting instruction is used for instructing one seventh computing node to be split into a plurality of eighth computing nodes; the time for calculating the fifth input data using the seventh calculation node is greater than the time for calculating the fifth input data using the plurality of eighth calculation nodes.

With reference to the second aspect, in a possible implementation manner, the hardware fusion rule is: the ninth computing node transmits data to the tenth computing node by adopting a first transmission path; the time for the ninth computing node to transmit data to the tenth node by adopting the first transmission path is shorter than the time for the ninth computing node to transmit data to the tenth node by adopting the second transmission path; the second transmission path is a transmission path for transmitting data from the ninth computation node to the tenth node in the first computation graph.

In a third aspect, the present application provides a neural network model optimization apparatus, including: a processor and a storage medium; the storage medium comprises instructions for execution by a processor to perform the method as described in the first aspect and any one of the possible implementations of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having stored therein instructions that, when run on a neural network model optimization device, cause the neural network model optimization device to perform the method as described in the first aspect and any one of the possible implementations of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on a neural network model optimization device, cause the neural network model optimization device to perform the method as described in the first aspect and any one of the possible implementations of the first aspect.

It should be appreciated that the description of technical features, solutions, benefits, or similar language in this application does not imply that all of the features and advantages may be realized in any single embodiment. Rather, it is to be understood that the description of a feature or advantage is intended to include the specific features, aspects or advantages in at least one embodiment. Therefore, descriptions of technical features, technical solutions or advantages in this specification do not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and advantages described in the present embodiments may also be combined in any suitable manner. One skilled in the relevant art will recognize that an embodiment may be practiced without one or more of the specific features, aspects, or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.

Drawings

Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a convolutional neural network used in embodiments of the present application;

fig. 3 is a schematic structural diagram of a computation graph of a neural network model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating an architecture of a software stack according to the prior art provided by an embodiment of the present application;

fig. 5 is a schematic flowchart of a neural network model optimization method according to an embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating an improved software stack according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of node optimization by using a mathematical fusion rule according to an embodiment of the present application;

fig. 8 is a schematic diagram of node optimization by using a mathematical splitting rule according to an embodiment of the present application;

fig. 9a is a schematic flowchart of a computing node executing a computing task in the prior art according to an embodiment of the present application;

fig. 9b is a schematic flowchart of a computing task executed by a computing node optimized by using a hardware fusion rule according to the embodiment of the present application;

fig. 10 is a schematic structural diagram of a neural network model optimization apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of another neural network model optimization device provided in an embodiment of the present application;

fig. 12 is a schematic hardware structure diagram of a neural network model optimization apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic hardware structure diagram of another neural network model optimization device provided in the embodiment of the present application.

Detailed Description

In the description of this application, "/" means "or" unless otherwise stated, for example, A/B may mean A or B. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" means one or more, "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.

It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.

The neural network model provided by the present application may be any artificial neural network model, such as a convolutional neural network model, a Back Propagation (BP) neural network model, and the like, which is not specifically limited in the embodiment of the present application.

Fig. 1 is a system architecture 100 according to an embodiment of the present disclosure. In fig. 1, a data acquisition device 160 is used to acquire training data. Taking the target model 101 for image processing as an example, the training data may include training images and corresponding classification results of the training images, where the results of the training images may be manually pre-labeled results. The target model 101 may also be referred to as a target rule 101.

After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.

Describing the target model 101 obtained by the training device 120 based on the training data, the training device 120 processes the input original image, and compares the output image with the original image until the difference between the output image and the original image of the training device 120 is smaller than a certain threshold, thereby completing the training of the target model 101.

The target model 101 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.

The target model 101 obtained by training according to the training device 120 may be applied to different systems or devices, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR), an AR/Virtual Reality (VR), a vehicle-mounted terminal, and the like, and may also be a server or a cloud.

The training device 120 may generate corresponding object models 101 for different objects or different tasks based on different training data, and the corresponding object models 101 may be used to achieve the above objects or complete the above tasks, thereby providing the user with the desired results.

The target model 101 is obtained by training according to the training device 120, and may be CNN, deep Convolutional Neural Networks (DCNN), recurrent Neural Networks (RNNS), and so on.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in fig. 1, the type of training data, and the type or function of the neural network do not constitute any limitation. For example, in FIG. 1, model translator 110 may be located in client device 140. As another example, the training data may be text, speech, or other types of data. As another example, model transformer may also be named other, such as a model compiler, etc., as long as a device or apparatus that can perform a similar function as model transformer 110 can be understood as a model transformer in this application.

The model files of the target model 101 trained by the training device 120 are platform-independent (i.e. compiled to run on different hardware platforms), and if the target model 101 is to be applied to the client device 140, the trained target model 101 of the training device 120 needs to be processed by the model converter 110 to compile the model files of the target model 101 from the current format to a format supported by the client device.

For example, if the object model 101 is a model developed under a tensrflow framework, the model file of the object model 101 needs to be input into the model converter 110, the model converter 110 compiles the object model 101 to obtain a model file supported by the client device 140, and then deploys the compiled model file to the client device 140. Generally, the conversion process of the object model 101 by the model converter 110 may also be referred to as compilation.

In order to successfully compile, the custom operator developer also needs to provide the parameter definition function, parameter analysis function, derivation function of output tensor (shape) size, implementation function, and call (forward) function of the operator included in each layer of the model to model converter 110.

For another example, when the target model 101 is a model developed under a tensrflow framework, and an operator in a part or all of layers in the target model 101 is customized by a developer, that is, when the operator does not belong to an operator in an AI software stack of the tensrflow framework, the developer inputs a model file of the target model 101 into the model converter 110 to compile the model file through the model converter 110 to obtain a model file that can be run on a client device, the developer also needs to provide contents such as a parameter definition function, a parameter analysis function, a derivation function of an output size (shape), an implementation function, and a call (forward) function of the customized operator to the model converter 110.

The structure of the neural network in the embodiment of the present application may be as shown in fig. 2.

As shown in fig. 2, convolutional Neural Network (CNN) 200 may include an input layer 210, a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230.

Convolutional layer/pooling layer 220:

and (3) rolling layers:

the convolutional/pooling layer 220 as shown in fig. 2 may include layers as in examples 221-226, for example: in one implementation, 221 layers are convolutional layers, 222 layers are pooling layers, 223 layers are convolutional layers, 224 layers are pooling layers, 225 layers are convolutional layers, 226 layers are pooling layers; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.

Convolutional layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter for extracting specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed pixel by pixel (or two pixels by two pixels) \8230; \8230, depending on the value of the step size stride) in the horizontal direction on the input image, thereby completing the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of isotyping matrices, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 200 can make correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 200 increases, the more convolutional layers (e.g., 226) that go further forward extract more and more complex features, such as features with high levels of semantics, the more semantic features are suitable for the problem to be solved.

A pooling layer:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, either one layer of convolutional layers followed by one pooling layer or multiple layers of convolutional layers followed by one or more pooling layers, as exemplified by 220 in FIG. 2. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as a result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The neural network layer 230:

after processing by convolutional layer/pooling layer 220, convolutional neural network 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to generate one or a set of the required number of classes of output using the neural network layer 230. Accordingly, a plurality of hidden layers (231, 232 to 23n shown in fig. 2) and an output layer 240 may be included in the neural network layer 230, and parameters included in the hidden layers may be pre-trained according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.

After the hidden layers in the neural network layer 230, i.e. the last layer of the whole convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from the direction 210 to 240 in fig. 2 is the forward propagation) of the whole convolutional neural network 200 is completed, the backward propagation (i.e. the propagation from the direction 240 to 210 in fig. 2 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 200, and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in fig. 2 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.

In order to facilitate understanding of technical solutions provided in the embodiments of the present application, some terms in the embodiments of the present application are first explained.

1. Neural network model

The neural network model is an information processing system formed by connecting a large number of processing units (marked as neurons) with each other, and the neurons in the neural network model contain corresponding mathematical expressions. After the data is input into the neuron, the neuron runs the included mathematical expression, calculates the input data and generates output data. Wherein, the input data of each neuron is the output data of the last neuron connected with the neuron; the output data of each neuron is the input data of the next neuron connected to it.

In the neural network model, after data is input, the neural network model selects corresponding neurons for the input data according to self learning training, calculates the input data according to the neurons, and determines and outputs a final operation result. Meanwhile, the neural network can continuously learn and evolve in the data operation process, the operation process of the neural network is continuously optimized according to the feedback of the operation result, the more times of operation training of the neural network model are, the more the obtained result is fed back, and the more accurate the calculation result is. The number of neurons of the neural network model is usually fixed, but the mathematical expression in each neuron, or the weight value corresponding to a neuron, may be varied according to the continuous training of the neural network model.

2. Calculating picture (Graph)

The calculation graph is used for expressing the calculation process of the neural network model when the calculation task is executed in an intuitive form, so that the calculation process of the neural network model when the calculation task is executed is clearer.

And when the terminal equipment calls the neural network model to execute the calculation task, the terminal equipment calls the corresponding neural network model according to the calculation task and converts the neural network into a corresponding calculation graph. And then, the terminal equipment further splits the calculation graph into a single operator form, issues the single operator form to the chip in a lower-layer language form which can be identified by the chip, and then runs each operator by the chip to fulfill the aim of executing a calculation task according to the neural network model.

An example is shown in fig. 3, which is a structure of a computation graph of a neural network model provided in an embodiment of the present application. Where abc represents three inputs, node 1, node 2, and node 3 represent three computation nodes of the computation graph, respectively, and the connection relationships between the nodes are shown by line segments with arrows, where the directions of the arrows are denoted as the transmission directions of the data. The calculation process of the calculation graph shown in fig. 3 is as follows: the terminal device inputs 3 pieces of input data: a =4, b =6, c =3 are input into the calculation map of the neural network model. First, data b =6 and c =3 are input into the computing node 1, and the computing process in the computing node 1 is executed, resulting in output data of u =18. Next, data a =4 and output data u =18 at the computation node 1 are input to the computation node 2, and the computation process at the computation node 2 is executed, resulting in output data v =22. Finally, the output data v =22 at the computation node 2 is input to the computation node 3, and the final output result j =66 is obtained.

It should be noted that fig. 3 is only an exemplary illustration, and the computation diagram in actual application may be more complex.

3. Operator

The operator is used for representing the calculation process of each calculation node in the calculation graph. For example, in FIG. 3 above, the mathematical expression in node 1 is computed: u = b × c, noted as operator in node 1.

4. Software stack combining chip and neural network models

In order to make the chip and the neural network model better combined to better exert the computational performance of the chip and the neural network model, a software stack combining the chip and the neural network model is proposed as shown in fig. 4.

As shown in fig. 4, the software stack includes the following four parts: the system comprises a user program layer, a calculation framework layer, a calculation layer and a chip layer.

The user program layer is an upper language expression of the neural network model, for example, the neural network model expressed by using python language.

The computation framework layer is used for converting the neural network model expressed by the upper language into a general or specific computation graph expression form.

The computation layer is used for splitting each computation node in the computation graph of the computation framework, converting the computation nodes into a lower layer language which can be identified by the chip, and issuing the converted computation nodes to the chip.

The chip layer is used for operating each issued computing node, and the effect of computing data by using the neural network model is achieved.

The foregoing has outlined rather broadly the subject matter and concepts that are related to the present application.

In order to solve the problems that in the prior art, the calculation graph of a neural network model is complex in topology and high in calculation complexity, and the required calculation time is long when a calculation task is executed, in the neural network model optimization method provided by the embodiment of the application, a neural network model optimization device is used after a first calculation graph of the neural network model is obtained; generating a second calculation graph according to a preset rule and the first calculation graph; aiming at the same input data, the time for calculating the input data by the second calculation graph is less than that for calculating the input data by the first calculation graph; after that, the neural network model optimization device outputs a second computation graph.

Based on the technical scheme, the neural network model optimization method provided by the application can optimize the first computational graph of the neural network model into the second computational graph which is stronger in computational performance and requires less computation time when a computation task is executed. Therefore, the calculation speed of the neural network model in executing the calculation task is improved, and the time required by the neural network model in executing the calculation task is reduced.

Correspondingly, when the terminal equipment calls the neural network model to execute a calculation task, the neural network model is optimized by adopting the neural network model optimization method provided by the embodiment of the application, so that the calculation performance of the terminal equipment can be improved, and the calculation time of the terminal equipment is saved.

Hereinafter, the neural network model optimization method provided by the present application is described in detail. As shown in fig. 5, the neural network model optimization method provided in the embodiment of the present application includes:

s501, the neural network model optimizing device obtains a first calculation chart of the neural network model.

The first calculation graph is a calculation graph directly generated by the terminal device according to the topological structure of the neural network model. The number of the computing nodes in the first computational graph is the same as or similar to the number of the neurons in the neural network model.

It should be noted that, in a current terminal device (for example, a mobile phone), different neural network models are usually preset for different applications, and when a terminal executes a computing task of different applications, the terminal executes the computing task by calling the neural network model corresponding to the application.

For example, the terminal device configures a neural network model for image processing (denoted as a first neural network model) for the camera application in advance, and configures a voice recognition neural network model (denoted as a second neural network model) for the voice assistant in advance.

After the camera application of the terminal device is opened and the photographing action is performed. And the terminal equipment calls the first neural network model to carry out optimization processing on the shot image to generate the shot image.

After the voice assistant application of the terminal equipment is opened and the voice input of the voice assistant application is detected, the terminal equipment calls a second neural network model, processes the voice input by the user and determines the voice input of the user. And the terminal equipment executes corresponding operation according to the voice input of the user.

The neural network model optimization device described in the embodiments of the present application may be a terminal device, or may be a module or unit in the terminal device, or may be a device integrated in the terminal device.

And S502, generating a second calculation graph by the neural network model optimization device according to a preset rule and the first calculation graph.

And the time for calculating the first input data by using the second computation graph is less than the time for calculating the first input data by using the first computation graph.

In a possible implementation manner, the preset rule is used for optimizing the computation graph of the neural network model to obtain the computation graph with better computation performance and less time required for performing the computation task. Therefore, the time required for the second computation graph to compute the input data is shorter than the time required for the first computation graph to compute the input data for the same input data.

Specifically, the neural network model optimization device calculates the same input data using the first computation graph and the second computation graph, respectively. The neural network model optimization device calculates the time of the input data according to the first computation graph, calculates the time of the input data according to the second computation graph, and determines whether the time of the input data calculation of the second computation graph is less than the time of the input data calculation of the first computation graph.

And S503, outputting a second calculation chart by the neural network model optimization device.

In a possible implementation manner, after obtaining the second computation graph output by the neural network model optimization device, the neural network model optimization device splits the second computation graph into a plurality of corresponding operators, converts the operators into a lower-layer expression that can be understood by the chip, and issues the lower-layer expression to the chip, so that the chip operates the neural network model according to the issued operators.

Correspondingly, when the terminal device calls the neural network model to execute the calculation task, the neural network model is optimized by adopting the neural network model optimization method provided by the embodiment of the application, so that the calculation performance of the terminal device can be improved, and the calculation time of the terminal device can be saved.

In a possible implementation manner, in combination with the software stack shown in fig. 4, as shown in fig. 6, in this embodiment of the present application, the software stack may be modified into 5 layers, that is, a computational graph optimization layer is added between the computational framework layer and the operator layer shown in fig. 4; the computational graph optimization layer is used for realizing the optimization method of the neural network model described in the embodiment of the application.

Specifically, when the terminal device executes a calculation task according to the software stack shown in fig. 6, the following steps may be implemented:

step 1, after receiving a calculation task, a terminal device calls a corresponding user program layer to determine a neural network model for executing the calculation task.

It should be noted that, a plurality of neural network models for executing different calculation tasks may be preset in the terminal device; for example, a neural network model for performing image processing, a neural network model for performing speech recognition, a neural network model for performing data processing, and the like. After the terminal receives the computing task, the corresponding neural network model can be selected to execute the computing task according to the type of the computing task.

In one example, if the computational task received by the terminal device is an image processing computational task, the terminal device determines to perform the computational task using a neural network model for image processing.

For another example, if the computing task received by the terminal device is a speech recognition computing task, the terminal device determines to perform the computing task using a neural network model for speech recognition.

In yet another example, if the computing task received by the terminal device is a data processing computing task, the terminal device determines to perform the computing task using a neural network model for data processing.

And 2, calling a computation framework layer by the terminal equipment, and converting the neural network model into a first computation graph.

And 3, the terminal equipment instructs the neural network model optimization device to call a calculation graph optimization layer, and the first calculation graph is optimized into a second calculation graph.

Specifically, the terminal device may instruct the neural network model optimization apparatus to optimize the first computation graph into the second computation graph by executing the optimization method of the neural network model described in the embodiment of the present application.

Step 4, the terminal equipment calls an algorithm layer to split each computing node in the second computing graph; and the terminal equipment converts each computing node into a lower layer language which can be identified by the chip and sends the lower layer language to the chip layer.

And step 5, the terminal equipment indicating chip executes the calculation task according to the issued calculation node.

In yet another possible implementation manner, in combination with the software stack shown in fig. 4, in this embodiment of the present application, a 4-layer structure of the software stack may still be maintained, and the computation graph optimization layer is multiplexed in the computation framework layer, so as to implement the optimization method for a neural network model described in this embodiment of the present application.

In this case, the specific implementation process of the terminal calling the neural network model to perform the calculation task is similar to the above steps 1 to 5. The difference is that the terminal device combines the step 2 and the step 3, and when the terminal device calls the computation framework layer, the contents recorded in the step 2 and the step 3 are sequentially realized.

In another possible implementation manner, in combination with the software stack shown in fig. 4, in this embodiment of the present application, a 4-layer structure of the software stack may still be maintained, and the computation graph optimization layer is multiplexed in the computation framework layer, so as to implement the optimization method for a neural network model described in this embodiment of the present application.

In this case, the specific implementation process of the terminal calling the neural network model to perform the calculation task is similar to the above steps 1 to 5. The difference is that the terminal device combines the step 3 and the step 4, and when the terminal device calls the operator layer, the contents recorded in the step 3 and the step 4 are sequentially realized.

In a possible implementation manner, with reference to the foregoing S502, the preset rule described in the embodiment of the present application includes at least one of the following: a mathematical fusion rule, a mathematical split rule, an instruction fusion rule, an instruction split rule, and a hardware fusion rule. The five preset rules are described below.

I, math fusion rule

The mathematical fusion rule is: fusing a plurality of first computing nodes into a second computing node; the mathematical expression corresponding to the second computing node is as follows: the mathematical expressions corresponding to the first computing nodes are subjected to mathematical derivation to determine mathematical expressions; the time for computing the second input data using the plurality of first compute nodes is greater than the time for computing the second input data using the second compute nodes.

It should be noted that the time for calculating the second input data by using the plurality of first calculation nodes is longer than the time for calculating the second input data by using the second calculation nodes; the sum of the time for the terminal device to call the plurality of first computing nodes to compute the second input data is greater than the time for the terminal device to call the second computing nodes to compute the second input data.

In a possible implementation manner, the neural network model optimization device fuses a plurality of first computing nodes into one second computing node according to a mathematical fusion rule, which may specifically be implemented as follows:

the neural network model optimization device traverses the calculation nodes in the first calculation graph, and under the condition that mathematical expressions corresponding to a plurality of continuous first calculation nodes can be deduced to be one mathematical expression, the neural network model optimization device fuses the plurality of first calculation nodes into a second calculation node. And the mathematical expression corresponding to the second computing node is a mathematical expression deduced according to the mathematical expressions corresponding to the plurality of first computing nodes.

In a specific implementation manner, the neural network model optimization device is provided with a template for fusing a plurality of mathematical expressions into a mathematical expression. After the neural network model optimization device determines the corresponding mathematical expressions in the first nodes, the mathematical expressions are matched with a mathematical fusion template in the neural network model optimization device, and after the corresponding mathematical fusion template is matched, the fused mathematical expressions corresponding to the mathematical expressions are determined according to the mathematical fusion template.

For example, as shown in fig. 7, the first computation graph includes a computation node 1 and a computation node 2, where the computation node 1 is an uplink node of the computation node 2, and data sequentially passes through the computation node 1 and the computation node 2 for computation.

The mathematical expression corresponding to the calculation node 1 is shown in the following formula 1:

a×x ₁ + b formula 1

Wherein, a and b are fixed parameters of the mathematical expression in the calculation node 1, and the values of a and b are fixed values; x is the number of ₁ Is the input data of the computing node 1 (i.e., the data output by the upstream node of the computing node 1).

The mathematical expression corresponding to the calculation node 2 is shown in the following formula 2:

c×x ₂ + d formula 2

Wherein c and d are fixed parameters of the mathematical expression in the calculation node 2, and the values of c and d are fixed values; x is the number of ₂ Is the input data of the computing node 2 (i.e., the data output by the upstream node of the computing node 1).

The neural network model optimization device deduces the formula 1 and the formula 2 to obtain the following formula 3:

e×x ₃ + f formula 3

Wherein e = a × c, f = b × c + d, x ₃ To calculate the input data of node 1 (i.e., the data output by the upstream node of node 1), the values of a and b in equation 3 are the same as those of a and b in equation 1, and the values of c and d in equation 3 are the same as those of c and d in equation 2.

The neural network model optimization device fuses the calculation node 1 and the calculation node 2 into a calculation node 3, and the mathematical expression corresponding to the calculation node 3 is the formula 3.

Therefore, the neural network model optimization device can fuse the computing nodes 1 and the computing nodes 2 into the computing nodes 3, so that the computing performance of the neural network model computation graph is improved, and the computing time required by the neural network model computation graph to execute computing tasks is reduced. In addition, the neural network model optimization device also reduces the number of nodes of the neural network model calculation graph and reduces the complexity of the calculation graph.

II, math splitting rule

The mathematical split rule is: splitting a third computing node into a plurality of fourth computing nodes; wherein, the mathematical expression corresponding to the third computing node is: the mathematical expressions corresponding to the plurality of fourth calculation nodes are subjected to mathematical derivation to determine mathematical expressions; the time for computing the third input data using the third computing node is greater than the time for computing the third input data using the plurality of fourth computing nodes.

In a computational graph of a neural network model, for a certain computational node in the computational graph, the following situations may exist: the time for the computing node to execute the computing task is longer than the time required for the plurality of split computing nodes to execute the computing task in sequence after the computing node is split into the plurality of computing nodes.

For example, when a mathematical expression in a compute node is too complex, the computational complexity of the mathematical expression may exceed the computational power of the compute node. This results in a reduction in the computational performance of the computing node, which takes a long time to perform a computational task.

For another example, a computing node is weak in computing a complex mathematical expression, and when the computing node is split into a plurality of computing nodes and a plurality of computing nodes compute a part of the complex mathematical expression, the computing power of the plurality of computing nodes is increased.

Aiming at the situation, the neural network model optimization device can divide the calculation node into a plurality of calculation nodes through a mathematical division rule so as to improve the calculation performance of the neural network model calculation graph and reduce the time required by the neural network model calculation graph to execute the calculation task.

For example, as shown in fig. 8, the mathematical expression corresponding to the calculation node 4 is shown in the following formula 4:

g×x ₄ +h ² equation 4

Where g and h are fixed parameters of the mathematical expression in the compute node 4, the values of g and h are fixed values, x ₄ Is the input data of the computing node 4 (i.e., the data output by the upstream node of the computing node 4).

With respect to equation 4, the neural network model optimization apparatus may split equation 4 into the following two calculation equations, equation 5 and equation 6:

g×x ₅ equation 5

Wherein the value of g is the same as that of g in the above formula 4, and x ₅ Is the input data of the computing node 4 (i.e., the data output by the upstream node of the computing node 4).

x ₆ +h ² Equation 6

Wherein the value of h is the same as that of h in the above formula 4, and x ₆ Which is the output data determined after the operation according to equation 5 (i.e. the output data of the computation node 5, the input data of the computation node 6).

The neural network model optimization device determines that the time for the computing node to execute the computing task according to the formula 4 is greater than the time for the computing node 5 to execute the computing task according to the formula 5 and the time for the computing node 6 to execute the computing task according to the formula 6. That is, for the same input data, the time for the computing node 4 to compute the input data is greater than the time for the computing nodes 5 and 6 to compute the input data in turn.

At this time, the neural network model optimization device splits the calculation node 4 into a calculation node 5 and a calculation node 6, where a mathematical expression corresponding to the calculation node 5 is formula 5, and a mathematical expression corresponding to the calculation node 6 is formula 6.

III, instruction fusion rules

The instruction fusion rule is as follows: according to the received node fusion instruction, fusing the fifth computing nodes into a sixth computing node; the node fusion instruction is used for indicating that a plurality of fifth computing nodes are fused into a sixth computing node; the time to compute the fourth input data using the plurality of fifth compute nodes is greater than the time to compute the fourth input data using the sixth compute node.

The fusion instruction in the instruction fusion rule may be issued by a worker through a compiler or the like, or may be issued by another device interacting with the neural network model optimization device.

In the following, the following description will be given by taking an example in which a worker issues a neural network model through a compiler.

And the neural network model optimizing device optimizes the first calculation graph according to one or more rules of a mathematical fusion rule, a mathematical splitting rule and a hardware fusion rule to obtain a third calculation graph. At this time, the staff may perform manual review on the third computation graph to determine whether a node capable of performing node fusion exists in the third computation graph. And if the staff determines that the node capable of performing node fusion exists in the third calculation graph, the staff determines the fusion mode of the node and issues a node fusion instruction through the compiler. And the compiler issues the node fusion instruction to the neural network model optimization device. And the neural network model optimizing device optimizes the corresponding nodes according to the received instruction.

It should be noted that when a worker issues a node fusion instruction through the compiler, the worker inputs a program code written by upper layer speech corresponding to the node fusion instruction in the compiler; the compiler compiles the program code compiled by the upper layer speech into a lower layer language which can be identified by the neural network model optimizing device and sends the lower layer language to the neural network model optimizing device.

Based on the technical scheme, after the neural network model optimization device adopts the instruction fusion rule to fuse the first calculation graph, the number of the calculation nodes of the obtained calculation graph is less, the topological structure of the calculation graph is simpler, meanwhile, the calculation capacity of the calculation graph is stronger, and the time required by calculating data is less. Therefore, when the neural network model optimization device adopts the instruction fusion rule to optimize the calculation graph of the neural network model, the calculation performance of the neural network model calculation graph can be improved, and the calculation time required by the neural network model calculation graph to execute the calculation task is reduced.

Further, the node fusion instruction in the instruction fusion rule may be an instruction manually input. At the moment, the neural network model optimization device can fuse the nodes in the neural network model calculation graph according to the manually input instruction, and the application scene of the neural network model optimization method is improved.

IV, instruction splitting rule

The instruction splitting rule is as follows: splitting a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; the node splitting instruction is used for indicating that one seventh computing node is split into a plurality of eighth computing nodes; the time to compute the fifth input data using the seventh compute node is greater than the time to compute the fifth input data using the plurality of eighth compute nodes.

It should be noted that, in contrast to the instruction fusion rule described above, the instruction splitting rule is used to instruct to split one node into multiple nodes.

The specific implementation manner of the instruction splitting rule is similar to the instruction fusion rule, and only the node fusion related content in the instruction splitting rule needs to be replaced by the node splitting content, and the specific implementation manner may refer to the description of the node fusion rule, which is not described herein again.

Based on this, the neural network model optimization device adopts the instruction splitting rule, and after one computing node is split into a plurality of computing nodes, the time for executing the computing task by the plurality of split computing nodes is less than the time for executing the computing task by one computing node before splitting. Therefore, the neural network model optimization device adopts the instruction splitting rule to optimize the calculation graph of the neural network model, the calculation performance of the neural network model calculation graph can be improved, and the time required by the calculation of data of the neural network model calculation graph is reduced.

V, hardware fusion rule

It should be noted that the hardware fusion rule is: the ninth computing node transmits data to the tenth computing node by adopting the first transmission path; the time for the ninth computing node to transmit data to the tenth node by adopting the first transmission path is shorter than the time for the ninth computing node to transmit data to the tenth node by adopting the second transmission path; the second transmission path is a transmission path for transmitting data from the ninth computation node to the tenth node in the first computation graph.

Taking hardware fusion for a storage device as an example, a detailed description is given to a hardware fusion rule:

in the prior art, when a terminal device calls a computing node to execute a computing task, the terminal device generally performs in an off-chip storage-on-chip computation-off-chip storage mode.

As shown in fig. 9a, a process of the terminal device executing a computing task when invoking two connected computing nodes (computing node 7 and computing node 8, where computing node 7 is an uplink node of computing node 8) in the computing graph is as follows:

in step i, the computing node 7 reads first data (corresponding to input data of the computing node 7) from a storage device.

In step ii, the computing node 7 computes the first data to generate second data (corresponding to the output data of the computing node 7 or the input data of the computing node 8).

And step III, the computing node 7 stores the second data in a storage device.

And step IV, the computing node 8 reads the second data from the storage device.

And step V, calculating the second data by the calculating node 8 to generate third data.

And step VI, the computing node 8 stores the third data in the storage device.

Based on the above process, when the terminal device calls the neural network model to execute the calculation task by using the prior art, the read-write process needs to be performed twice for each calculation node. For example, steps i and iii performed by compute node 7; step iv and step vi performed by the computation node 8.

Under the condition that the number of computing nodes in the computing graph is large, a large number of reading and writing processes need to be carried out by the terminal equipment. The read-write performance of the storage device is limited, and when the terminal device calls the neural network model to execute the calculation task, a large amount of time is needed for reading and writing data.

In view of the above situation, in the embodiment of the present application, the computing nodes in the computational graph are improved, so that all or part of the computing nodes in the computational graph may transmit data to each other. Therefore, the number of times of interaction between the computing node and the storage device can be reduced, the computing performance of the computational graph of the neural network model is improved, and the time required for the computational graph of the neural network model to execute the computing task is reduced.

For example, the computing node 7 and the computing node 8 may transmit data to each other, and the terminal device invokes the computing node 7 and the computing node 8 to perform a computing task, as shown in fig. 9 b:

and step VII, the computing node 7 reads the first data from the storage device.

And step VIII, calculating the first data by the calculation node 7 to generate second data.

And step IX, the computing node 7 sends second data to the computing node 8. Accordingly, compute node 8 receives the second data from compute node 7.

Step X, the computing node 8 computes the second data to generate third data.

Step XI, the compute node 8 stores the third data in the storage device.

Based on the process, after the neural network model optimization device optimizes the calculation graph according to the hardware fusion rule, the interaction times of the calculation nodes and the hardware equipment are reduced. Data can be transmitted between each computing node in the computational graph through a more reasonable and faster transmission path. Therefore, the purpose of improving the calculation performance of the neural network model calculation graph and reducing the time required by the neural network model calculation graph to execute the calculation task is achieved.

It should be noted that in the hardware fusion rule, the hardware fusion of the neural network model computation graph can be realized by replacing the original low-speed storage device with a high-speed storage device, and at least one of the ways of increasing the bandwidth of the neural network model for accessing the storage device is promoted.

It should be noted that, in the embodiment of the present application, the neural network model optimization apparatus calculates the same input data by using the operator before optimization and the operator after optimization, respectively. The neural network model optimizing device calculates the time of the input data according to the operator before optimization and the time of the input data calculated by the operator after optimization, and determines whether the time of the input data calculated by the operator after optimization is less than the time of the input data calculated by the operator before optimization.

All the schemes in the above embodiments of the present application can be combined without contradiction.

The above-mentioned scheme of the embodiment of the present application is introduced mainly from the perspective of interaction between network elements. It is to be understood that each network element, for example, the neural network model optimizing device, includes at least one of a hardware structure and a software module corresponding to each function in order to realize the functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the neural network model optimization device may be divided into the functional units according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

In one possible design, as shown in fig. 10, the neural network model optimization apparatus 1000 includes: a northbound interface 1001, a southbound interface 1002, and one or more of: a math fusion module 1003, a math split module 1004, a hardware fusion module 1005, an instruction fusion module 1006, and an instruction split module 1007.

Wherein the northbound interface 1001 is used to interface with the previous computing framework layer. After the computation framework generates a corresponding first computation graph according to the neural network model, the neural network model optimization device 1000 obtains the first computation graph from the computation framework through the northbound interface 1001.

Southbound interface 1002 is used to interface with the operator layer of the next layer. After the neural network model optimization device 1000 optimizes the first computation graph and generates the second computation graph, the neural network model optimization device 1000 issues the second computation graph to the operator layer through the southbound interface 1002, so that the operator layer splits the second computation graph and issues each computation node of the second computation graph to the chip.

The mathematical fusion module 1003 is configured to optimize the first computation graph according to the mathematical fusion rule described in the foregoing embodiment.

The math splitting module 1004 is configured to optimize the first computation graph according to the math splitting rule described in the foregoing embodiment.

A hardware fusion module 1005, configured to optimize the first computation graph according to the hardware fusion rule described in the foregoing embodiment.

The instruction fusion module 1006 is configured to optimize the first computation graph according to the instruction fusion rule described in the foregoing embodiment.

The instruction splitting module 1007 is configured to optimize the first computation graph according to the instruction splitting rule described in the foregoing embodiment.

It should be noted that the north interface 1001 and the south interface 1002 may be integrated into a single unit. For example, the northbound interface 1001 and the southbound interface 1002 are implemented integrated in a communication unit.

One or more of the above-mentioned math fusion module 1003, math splitting module 1004, hardware fusion module 1005, instruction fusion module 1006, and instruction splitting module 1007 may also be integrated into one unit. For example, one or more of the math fusion module 1003, the math split module 1004, the hardware fusion module 1005, the instruction fusion module 1006, and the instruction split module 1007 are integrated in a processing unit.

In the case of using integrated units, fig. 11 shows a schematic diagram of still another possible structure of the neural network model optimization device (denoted as a neural network model optimization device 1100) in the foregoing embodiment, where the neural network model optimization device 1100 includes a processing unit 1101, a communication unit 1102, and may further include a storage unit 1103. The schematic structural diagram shown in fig. 11 can be used to illustrate the structure of the neural network model optimization apparatus in the above embodiment.

When the schematic structural diagram shown in fig. 11 is used to illustrate the structure of the neural network model optimization device in the above embodiment, the processing unit 1101 is configured to perform control management on the actions of the network device, for example, control the neural network model optimization device to perform the actions performed by the neural network model optimization device in S501, S502, and S503 in fig. 5 and/or other processes described in this embodiment. The processing unit 1101 can communicate with other devices through the communication unit 1102. The storage unit 1103 is used to store program codes and data of the neural network model optimization apparatus.

When the schematic structural diagram shown in fig. 11 is used to illustrate the structure of the neural network model optimization device according to the above embodiment, the neural network model optimization device 1100 may be a neural network model optimization device, or may be a chip in the neural network model optimization device.

When the neural network model optimization device 1100 is a neural network model optimization device, the processing unit 1101 may be a processor or a controller, and the communication unit 1102 may be a communication interface, a transceiver circuit, a transceiver device, or the like. The communication interface is a generic term, and may include one or more interfaces. The storage unit 1103 may be a memory. When the neural network model optimization device 1100 is a neural network model optimization device or a chip in the neural network model optimization device, the processing unit 1101 may be a processor or a controller, and the communication unit 1102 may be an input interface and/or an output interface, a pin or a circuit, etc. The storage unit 1103 may be a storage unit (e.g., a register, a cache, etc.) inside the chip, or may be a storage unit (e.g., a read-only memory (ROM), a Random Access Memory (RAM), etc.) outside the chip inside the neural network model optimization apparatus or the neural network model optimization apparatus.

The communication unit may also be referred to as a transceiver unit. The antenna and the control circuit having the transmitting and receiving functions in the neural network model optimization device 1100 may be regarded as the communication unit 1102 of the neural network model optimization device 1100, and the processor having the processing function may be regarded as the processing unit 1101 of the neural network model optimization device 1100. Alternatively, the device in the communication unit 1102 for implementing the receiving function may be regarded as a communication unit, the communication unit is configured to perform the receiving step in the embodiment of the present application, and the communication unit may be a receiver, a receiving circuit, and the like. The device for realizing the transmission function in the communication unit 1102 may be regarded as a transmission unit for performing the steps of transmission in the embodiment of the present application, and the transmission unit may be a transmitter, a transmission circuit, or the like.

The integrated unit in fig. 11, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a neural network model optimization device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. A storage medium storing a computer software product comprising: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The elements of FIG. 11 may also be referred to as modules, and the processing elements may be referred to as processing modules, for example.

An embodiment of the present application further provides a schematic diagram of a hardware structure of a neural network model optimization device (denoted as a neural network model optimization device 1200), referring to fig. 12 or fig. 13, where the neural network model optimization device 1200 includes a processor 1201, and optionally, further includes a memory 1202 connected to the processor 1201.

In a first possible implementation, referring to fig. 12, the neural network model optimization apparatus 1200 further includes a transceiver 1203. The processor 1201, the memory 1202, and the transceiver 1203 are connected by a bus. The transceiver 1203 is used for communication with other devices or communication networks. Optionally, the transceiver 1203 may include a transmitter and a receiver. The means for performing the receiving function in the transceiver 1203 may be regarded as a receiver for performing the receiving step in the embodiment of the present application. The means for implementing the transmitting function in the transceiver 1203 may be regarded as a transmitter for performing the steps of transmitting in the embodiments of the present application.

Based on the first possible implementation manner, the schematic structural diagram shown in fig. 12 may be used to illustrate the structure of the neural network model optimization device or the neural network model optimization device in the foregoing embodiment.

When the schematic structural diagram shown in fig. 12 is used to illustrate the structure of the neural network model optimization device in the above embodiment, the processor 1201 is configured to control and manage the actions of the neural network model optimization device, for example, the processor 1201 is configured to support the neural network model optimization device to perform the actions performed by the neural network model optimization device in S501, S502, and S503 in fig. 5 and/or other processes described in this embodiment. The processor 1201 may communicate with other network entities via the transceiver 1203. The memory 1202 is used for storing program codes and data of the neural network model optimizing device.

In a second possible implementation, the processor 1201 includes logic circuitry and at least one of an input interface and an output interface. The output interface is used for executing the sending action in the corresponding method, and the input interface is used for executing the receiving action in the corresponding method.

Based on the second possible implementation manner, referring to fig. 13, the schematic structural diagram shown in fig. 13 may be used to illustrate the structure of the neural network model optimization device in the above embodiment.

When the schematic structural diagram shown in fig. 13 is used to illustrate the structure of the neural network model optimization device in the above embodiment, the processor 1201 is configured to control and manage the actions of the neural network model optimization device, for example, the processor 1201 is configured to support the neural network model optimization device to perform the actions performed by the neural network model optimization device in S501, S502, and S503 in fig. 5 and/or other processes described in this embodiment. The processor 1201 may communicate with other network entities through at least one of an input interface and an output interface. The memory 1202 is used for storing program codes and data of the neural network model optimizing device.

Fig. 12 and 13 may also illustrate a system chip in the neural network model optimization apparatus. In this case, the actions executed by the neural network model optimization device may be implemented by the system chip, and the specific executed actions may be referred to above and are not described herein again. Fig. 12 and 13 may also illustrate a system chip in the neural network model optimization apparatus. In this case, the actions executed by the neural network model optimization device may be implemented by the system chip, and the specific executed actions may be referred to above and are not described herein again.

In implementation, the steps of the method provided by this embodiment may be implemented by hardware integrated logic circuits in a processor or instructions in the form of software. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in a processor.

Processors in the present application may include, but are not limited to, at least one of: various computing devices that run software, such as a Central Processing Unit (CPU), a microprocessor, a Digital Signal Processor (DSP), a Microcontroller (MCU), or an artificial intelligence processor, may each include one or more cores for executing software instructions to perform operations or processing. The processor may be a single semiconductor chip, or may be integrated with other circuits to form a semiconductor chip, for example, an SoC (system on chip) with other circuits (such as a codec circuit, a hardware acceleration circuit, or various buses and interface circuits), or may be integrated in the ASIC as a built-in processor of an ASIC, which may be packaged separately or may be packaged with other circuits. The processor may further include necessary hardware accelerators such as Field Programmable Gate Arrays (FPGAs), PLDs (programmable logic devices), or logic circuits implementing dedicated logic operations, in addition to cores for executing software instructions for performing operations or processing.

The memory in the embodiment of the present application may include at least one of the following types: read-only memory (ROM) or other types of static memory devices that may store static information and instructions, random Access Memory (RAM) or other types of dynamic memory devices that may store information and instructions, and Electrically erasable programmable read-only memory (EEPROM). In some scenarios, the memory may also be, but is not limited to, a compact disk-read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

Embodiments of the present application also provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform any of the above methods.

Embodiments of the present application also provide a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the methods described above.

Embodiments of the present application further provide a chip, where the chip includes a processor and an interface circuit, where the interface circuit is coupled to the processor, the processor is configured to execute a computer program or instructions to implement the foregoing method, and the interface circuit is configured to communicate with other modules outside the chip.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application are all or partially generated upon loading and execution of computer program instructions on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). Computer-readable storage media can be any available media that can be accessed by a computer or can comprise one or more data storage devices, such as servers, data centers, and the like, that can be integrated with the media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

While the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A neural network model optimization method is characterized by comprising the following steps:

acquiring a first computational graph of the neural network model;

generating a second calculation graph according to a preset rule and the first calculation graph; wherein the time for calculating the first input data using the second computation graph is less than the time for calculating the first input data using the first computation graph; the preset rules comprise at least one of the following: a mathematical fusion rule, a mathematical split rule, an instruction fusion rule, an instruction split rule, and a hardware fusion rule;

and outputting the second calculation map.
The method of claim 1, wherein the mathematical fusion rule is: fusing a plurality of first computing nodes into a second computing node; wherein, the mathematical expression corresponding to the second computing node is: a mathematical expression determined after mathematical derivation of mathematical expressions corresponding to the plurality of first computing nodes; the time for computing the second input data using the plurality of first compute nodes is greater than the time for computing the second input data using the second compute nodes.
The method of claim 2, wherein the mathematical split rule is: splitting a third computing node into a plurality of fourth computing nodes; wherein, the mathematical expression corresponding to the third computing node is: the mathematical expressions corresponding to the plurality of fourth calculation nodes are subjected to mathematical derivation to determine mathematical expressions; the time for computing the third input data using the third computing node is greater than the time for computing the third input data using the plurality of fourth computing nodes.
The method according to claim 2 or 3, wherein the instruction fusion rule is: according to the received node fusion instruction, fusing a plurality of fifth computing nodes into a sixth computing node; wherein the node fusion instruction is configured to instruct the plurality of fifth computing nodes to be fused into the one sixth computing node; the time to compute the fourth input data using the plurality of fifth compute nodes is greater than the time to compute the fourth input data using the sixth compute node.
The method according to any one of claims 2-4, wherein the instruction splitting rule is configured to: splitting a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; wherein the node splitting instruction is to instruct splitting of the one seventh computing node into the plurality of eighth computing nodes; the time for calculating the fifth input data using the seventh computational node is greater than the time for calculating the fifth input data using the plurality of eighth computational nodes.
The method according to any of claims 2-5, wherein the hardware fusion rule is: the ninth computing node transmits data to the tenth computing node by adopting the first transmission path; the time for the ninth computing node to transmit data to the tenth node by adopting the first transmission path is less than the time for the ninth computing node to transmit data to the tenth node by adopting the second transmission path; the second transmission path is a transmission path for transmitting data from the ninth computation node to the tenth node in the first computation graph.
An apparatus for neural network model optimization, comprising: a communication unit and a processing unit;

the communication unit is used for acquiring a first calculation graph of the neural network model;

the processing unit is used for generating a second calculation graph according to a preset rule and the first calculation graph; the second computation graph computes the first input data less time than the first computation graph computes the first input data; the preset rule comprises at least one of the following items: a mathematical fusion rule, a mathematical split rule, an instruction fusion rule, an instruction split rule, and a hardware fusion rule;

the communication unit is further configured to output the second computation graph.
The apparatus of claim 7, wherein the mathematical fusion rule is: fusing a plurality of first computing nodes into a second computing node; wherein, the mathematical expression corresponding to the second computing node is: the mathematical expressions corresponding to the plurality of first calculation nodes are subjected to mathematical derivation to determine mathematical expressions; and the time for calculating the second input data by using the plurality of first calculation nodes is greater than the time for calculating the second input data by using the second calculation nodes.
The apparatus of claim 8, wherein the mathematical split rule is: splitting a third computing node into a plurality of fourth computing nodes; wherein, the mathematical expression corresponding to the third computing node is: the mathematical expressions corresponding to the plurality of fourth calculation nodes are subjected to mathematical derivation to determine mathematical expressions; the time for calculating the third input data by the third computing node is greater than the time for calculating the third input data by the plurality of fourth computing nodes.
The apparatus according to claim 8 or 9, wherein the instruction fusion rule is: according to the received node fusion instruction, fusing the fifth computing nodes into a sixth computing node; wherein the node fusion instruction is configured to instruct to fuse the plurality of fifth computing nodes into the one sixth computing node; and the time for calculating the fourth input data by utilizing the plurality of fifth calculation nodes is greater than the time for calculating the fourth input data by utilizing the sixth calculation node.
The apparatus of any of claims 8-10, wherein the instruction splitting rule is configured to: splitting a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; wherein the node splitting instruction is to instruct splitting of the seventh computing node into the plurality of eighth computing nodes; and the time for calculating the fifth input data by using the seventh calculation node is greater than the time for calculating the fifth input data by using the plurality of eighth calculation nodes.
The apparatus according to any one of claims 8-11, wherein the hardware fusion rule is: the ninth computing node transmits data to the tenth computing node by adopting the first transmission path; the time for the ninth computing node to transmit data to the tenth node by adopting the first transmission path is less than the time for the ninth computing node to transmit data to the tenth node by adopting the second transmission path; the second transmission path is a transmission path for transmitting data from the ninth computation node to the tenth node in the first computation graph.
An apparatus for neural network model optimization, the apparatus comprising a processor and a storage medium comprising instructions that, when executed by the processor, cause the apparatus to perform the method of any one of claims 1 to 6.
A computer-readable storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of any one of claims 1 to 6.