CN113128670B

CN113128670B - Neural network model optimization method and device

Info

Publication number: CN113128670B
Application number: CN202110382904.5A
Authority: CN
Inventors: 杜源; 陈凯; 杜力
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2024-03-19
Anticipated expiration: 2041-04-09
Also published as: CN113128670A

Abstract

The method comprises the steps of obtaining a structure of a neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized, converting the neural network model to be optimized into an initial model; operator layer fusion processing is carried out on the initial model, and an intermediate model is obtained; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; according to the weight adjustment coefficient range and the weight parameter, performing weight optimization processing on the intermediate model; and carrying out quantization treatment on the intermediate model with optimized weight to obtain an optimized neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited.

Description

Neural network model optimization method and device

Technical Field

The application relates to the technical field of internet, in particular to a neural network model optimization method and device.

Background

The method for identifying the target object based on the neural network model has extremely high accuracy, so that the method is widely applied to various fields, such as automatic driving, intelligent security, intelligent robots and the like. Accordingly, neural network models are also widely deployed in hardware accelerators that are compatible with various fields. The hardware accelerator is used for matching with a central processing unit where the hardware accelerator is located, and processing special work at high speed. Wherein the hardware accelerator deploying the neural network model is used to run the neural network model exclusively. With the continuous widening of the application field of the neural network model, more and more research and development efforts are put into the neural network model. In order to reduce the research and development cost of the neural network model, a researcher completes the work of building and training the neural network model on a general CPU, a GPU and other processors, and then adapts to a hardware accelerator in the specific application field.

The neural network model can accurately identify the target object, and is based on the calculation complexity of the self model height. This requires that the hardware accelerator used to deploy the neural network model have significant memory and be capable of carrying complex computational power. However, in practical applications, the hardware for deploying the neural network model does not have a large memory space, which results in that the neural network model cannot be applied to a hardware accelerator with a small memory.

Disclosure of Invention

The application provides an optimization method and device for a neural network model, which can be used for solving the technical problem that the neural network model in the prior art cannot be adapted in hardware with smaller memory.

In a first aspect, the present application provides a method for optimizing a neural network model, the method comprising:

acquiring the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;

converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;

performing operator layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of adjacent operator layers on a structure or a plurality of operator layers which are juxtaposed on a structure into one operator layer through the operator layer fusion processing;

determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range;

according to the weight adjustment coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model;

and carrying out quantization treatment on the intermediate model with optimized weight to obtain an optimized neural network model.

With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter includes:

according to the connection sequence of the operator layers, carrying out weight optimization treatment on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers; the target operator layer is any operator layer with the weight in the weight adjusting coefficient range in the intermediate model.

With reference to the first aspect, in an implementation manner of the first aspect, according to the weight adjustment coefficient range and the weight parameter of the target operator layer, performing weight optimization processing on the target operator layer includes:

determining a weight adjustment coefficient of the target operator layer according to the maximum weight value and the minimum weight value of the target operator layer;

determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;

and carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.

With reference to the first aspect, in an implementation manner of the first aspect, the performing weight optimization processing on the target operator layer further includes:

and adding a preset normalization layer after the target operator layer is subjected to weight optimization treatment.

With reference to the first aspect, in an implementation manner of the first aspect, performing an operator layer fusion process on the initial model to obtain an intermediate model, including:

merging a normalization layer connected with a convolution layer into the convolution layer to obtain the intermediate model; the convolution layers are operator layers of one type, the normalization layers are operator layers of another type, and one normalization layer is connected behind each convolution layer.

if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, merging the plurality of branch convolution layers into one convolution layer to obtain the intermediate model; the plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed with each other.

In a second aspect, the present application provides an optimization apparatus for a neural network model, the apparatus including:

the acquisition module is used for acquiring the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;

the transformation module is used for transforming the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;

the processing module is used for carrying out operator layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of adjacent operator layers on a structure or a plurality of operator layers which are juxtaposed on a structure into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range; according to the weight adjustment coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.

With reference to the second aspect, in an implementation manner of the second aspect, the processing module is specifically configured to:

With reference to the second aspect, in an implementation manner of the second aspect, the apparatus further includes an adding module, configured to:

According to the method, the operator layers are fused, the calculation dimension of the neural network model is reduced, and then the neural network model is subjected to weight optimization according to the weight range which can be accommodated by the target hardware accelerator, so that the target hardware accelerator is adapted to the neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.

Drawings

Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present application;

fig. 2 is a flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an optimizing device of a neural network model according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present application. The implementation of the application provides a method comprising the following steps:

step S101, obtaining the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized.

The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.

It should be noted that, the neural network model to be optimized in the embodiment of the present application may be a classical model with an open-source framework, or a neural network model built and trained by a user. The neural network model comprises a plurality of operator layers, and each operator layer is a convolution layer or a normalization layer. The convolution layers provided in the embodiments of the present application may be one or more of mainstream convolution layers such as a normal convolution layer, a depth convolution layer, a hole convolution layer, and a packet convolution layer. The size, step size, filling, channel number, and the like of the convolution layer are not particularly limited.

In one implementation, the entire neural network model to be optimized is traversed, and the structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized can be obtained.

Step S102, converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized.

The format corresponding to the initial model is the format of the open neural network switching model (Open Neural Network Exchange, ONNX).

And step S103, performing operator layer fusion processing on the initial model to obtain an intermediate model.

And the initial model fuses a plurality of adjacent operator layers on the structure or a plurality of operator layers which are juxtaposed on the structure into one operator layer through operator layer fusion processing.

It should be noted that, step S102 and step S103 are performed synchronously, that is, the method provided in the embodiment of the present application converts the initial model into the format of the open neural network exchange model while performing the operator layer fusion processing on the initial model.

According to different structures of the neural network model to be optimized, the implementation of the application provides a plurality of operator layer fusion processing methods. One method includes fusing a normalization layer connected with a convolution layer into the convolution layer to obtain an intermediate model. The convolution layers are one type of operator layers, the normalization layers are the other type of operator layers, and one normalization layer is connected behind each convolution layer. The new convolutional layer after fusion is no longer followed by a normalization layer.

In another operator layer fusion processing method, if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer, and an intermediate model is obtained. The plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed to each other. Specifically, if the initial model has multiple branches from the input layer to the output layer, and the multiple branches have the same branching point and junction point, all branch convolution layers are considered to be structurally juxtaposed to each other. Further, whether the convolution kernels of all the branch convolution layers are the same is determined, and if the convolution kernels of all the branch convolution layers are the same, for example, 3*3 or 1*1, the branch convolution layers are fused into one convolution layer.

The operator layer fusion processing method further comprises the step of fusing the plurality of normalization layers into one normalization layer if the neural network model to be optimized has the plurality of continuous normalization layers.

According to the operator layer fusion processing method, the neural network model to be optimized is structurally simplified, and meanwhile accuracy of the neural network model to be optimized is maintained.

After executing step S103, the method provided in the embodiment of the present application further needs to perform an equivalent transformation on the operator layer types in the initial model according to the operator layer types supported by the target hardware accelerator. For example, the hardware accelerator only supports the convolution layer of 3*3, while the initial model includes both 3*3 and other types of convolution layers, and then equivalent processing of the other types of convolution layers in the initial model is required. For another example, if the hardware accelerator does not support hole convolution, zeros are padded on corresponding nulls in the initial model such that the hole convolution is equivalent to a normal convolution layer.

In order to facilitate the subsequent steps to directly acquire the information of the intermediate model, the method adds the information related to the operator layers, such as input tensor information, output tensor information, parameter information and the like of each operator layer by layer in the process of carrying out operator layer fusion processing and equivalent processing on the initial model.

Step S104, determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and the preset coefficient.

The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to a configuration file of the target hardware accelerator, or according to a preset weight range.

The method provided by the embodiment of the application represents the weight supported by the hardware accelerator by B bit widths, and the weight range of the neural network model which can be accommodated by the target hardware accelerator is determined by the following method:

A＝[-2 ^B-1 ,+(2 ^B-1 -1)]formula (1)

In formula (1), a is a weight range, p=2 ^B-1 -1，N= -2, which is the maximum value in the weight range ^B-1 Is the minimum in the weight range.

The weight adjustment coefficient range is determined by the following method:

D＝[n,n+(p-n)*S _th ]formula (2)

In the formula (2), D is a weight adjustment coefficient range; n= -2 ^B-1 Is the minimum value in the weight range; p=2 ^B-1 -1, being the maximum value in the weight range; s is S _th ＝0.8。

Step S105, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter.

In one implementation, according to the connection sequence of the operator layers, the weight optimization processing is performed on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers.

The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.

It should be noted that, the method provided by the embodiment of the application does not need to perform weight optimization processing on all operator layers in the intermediate model, and only needs to perform weight optimization processing on the target operator layers with weights within the weight adjustment coefficient range. Therefore, according to the method provided by the embodiment of the application, according to the connection sequence of the operator layers, from the input layer to the output layer, whether the weight corresponding to the current operator layer is in the weight adjustment coefficient range is judged in sequence, and if the weight corresponding to the current operator layer is in the weight adjustment coefficient range, the current operator layer is taken as the target operator layer.

Fig. 2 is a schematic flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application. The weight optimization processing method provided by the embodiment of the application comprises the following steps:

step S201, determining the weight adjustment coefficient of the target operator layer according to the weight maximum value and the weight minimum value of the target operator layer.

In step S101, the weight parameter corresponding to each operator layer has been acquired, and the weight parameter corresponding to the operator layer has not been changed through steps S101 to S104. And comparing the maximum value of the weight of each operator layer with the minimum value of the weight of each operator layer, wherein the absolute value between the maximum value and the minimum value is larger, and the maximum value and the minimum value are used as the weight adjustment coefficient of the target operator layer. And if the weight adjustment coefficient range corresponding to the operator layer is in the adjustment range, determining the operator layer as a target operator layer. Correspondingly, the maximum weight value of the operator layer is the maximum weight value of the target operator layer, and the minimum weight value of the operator layer is the minimum weight value of the target operator layer. The target operator layer satisfies the following conditions:

n≤W _max ≤n+(p-n)*S _th formula (3)

In the formula (3), n is the minimum value in the weight adjustment coefficient range; n+ (p-n) S _th Is the maximum value in the weight adjustment coefficient range; p=2 ^B-1 -1, being the maximum value in the weight range; s is S _th ＝0.8；W _max The coefficients are adjusted for the weights of the target operator layers.

Step S202, determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.

In the embodiment of the application, the weight scale factors are determined by adopting the following method:

in the formula (4), scale is a weight Scale factor; p is the maximum value in the weight range; alpha is a coefficient, and the value is 0.95; w (W) _max The coefficients are adjusted for the weights of the target operator layers.

And step S203, carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.

According to the method provided by the embodiment of the application, the scale factors are multiplied by the weights of the target operator layers, so that the weights of the target operator layers after optimization are obtained.

The method provided by the embodiment of the application further comprises the step of adding a preset normalization layer after the target operator layer is subjected to weight optimization.

It should be noted that, the preset normalization layer provided in the embodiment of the present application satisfies the following relation:

in the formula (5), BN _out Outputting a preset normalization layer; x is the input of a preset normalization layer; μ is the mean, μ=0; sigma is standard deviation, sigma ² =1; e is a very small amount to prevent zero-removal introduction; gamma is the scale of the pre-set normalization layer,beta is the offset parameter, beta=0.

And S106, carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.

In the embodiment of the application, the following method is adopted for quantization treatment:

in the formula (6) of the present invention,is quantized data; />Is the data before quantization; the minimum in the n weight range; p is the maximum value in the weight range; />int represents rounding, clip represents truncation, max represents taking a larger value, abs represents taking an absolute value.

After optimizing the model, if the model is quantized by using B bit widths currently, testing the neural network model by using a small number of pictures, comparing the neural network model with the neural network model which is not quantized at all, and if the error of the neural network model and the neural network model is larger, indicating that the B bit widths are insufficient to meet the precision requirement. At this time, the size of B may be increased and the optimization may be performed again. It should be noted that, compared with the method of directly quantizing according to B bit widths, the method provided by the embodiment of the application has a significant improvement in accuracy.

According to the method provided by the embodiment of the application, the operator layers are fused, the calculation dimension of the neural network model is reduced, and then the neural network model is subjected to weight optimization according to the weight range which can be accommodated by the target hardware accelerator, so that the target hardware accelerator is adapted to the neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application. Fig. 3 schematically illustrates a structural diagram of an optimization apparatus of a neural network model according to an embodiment of the present application. As shown in FIG. 3, the device has the optimization function for realizing the neural network model, and the function can be realized by hardware or by executing corresponding software by hardware. The apparatus may include an acquisition module 301, a conversion module 302, and a processing module 303.

The obtaining module 301 is configured to obtain a structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized. The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.

The conversion module 302 is configured to convert the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized. The format corresponding to the initial model is the format of an open neural network exchange model.

And the processing module 303 is used for performing operator layer fusion processing on the initial model to obtain an intermediate model. And the initial model fuses a plurality of adjacent operator layers on the structure or a plurality of operator layers which are juxtaposed on the structure into one operator layer through operator layer fusion processing. And determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and the preset coefficient. The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to a configuration file of the target hardware accelerator, or according to a preset weight range. And carrying out weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter. And carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.

Optionally, the processing module 303 is specifically configured to:

and according to the connection sequence of the operator layers, carrying out weight optimization processing on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers. The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.

Optionally, the processing module 303 is specifically configured to:

and determining the weight adjustment coefficient of the target operator layer according to the weight maximum value and the weight minimum value of the target operator layer.

And determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.

And carrying out weight optimization treatment on the target operator layer according to the scale factors and the weights of the target operator layer.

Optionally, the apparatus further comprises an adding module for:

Optionally, the processing module 303 is specifically configured to:

and fusing the normalization layer connected with the convolution layer into the convolution layer to obtain an intermediate model. The convolution layers are one type of operator layers, the normalization layers are the other type of operator layers, and each convolution layer is connected with one normalization layer.

Optionally, the processing module 303 is specifically configured to:

if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer, and an intermediate model is obtained. The plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed to each other.

The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for optimizing a neural network model, the method comprising:

2. The method of claim 1, wherein performing a weight optimization process on the intermediate model based on the weight adjustment coefficient range and the weight parameter comprises:

3. The method according to claim 2, wherein the weight optimization processing is performed on the target operator layer according to the weight adjustment coefficient range and the weight parameter of the target operator layer, including:

4. The method according to claim 2, wherein the weight optimization process is performed on the target operator layer, and further comprising:

5. The method of claim 1, wherein performing an operator layer fusion process on the initial model to obtain an intermediate model comprises:

6. The method of claim 1, wherein performing an operator layer fusion process on the initial model to obtain an intermediate model comprises:

7. An apparatus for optimizing a neural network model, the apparatus comprising:

8. The apparatus of claim 7, wherein the processing module is specifically configured to:

9. The apparatus of claim 7, wherein the processing module is specifically configured to:

10. The apparatus of claim 7, further comprising an adding module for: