CN113128670B - Neural network model optimization method and device - Google Patents

Neural network model optimization method and device Download PDF

Info

Publication number
CN113128670B
CN113128670B CN202110382904.5A CN202110382904A CN113128670B CN 113128670 B CN113128670 B CN 113128670B CN 202110382904 A CN202110382904 A CN 202110382904A CN 113128670 B CN113128670 B CN 113128670B
Authority
CN
China
Prior art keywords
weight
neural network
operator
layer
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110382904.5A
Other languages
Chinese (zh)
Other versions
CN113128670A (en
Inventor
杜源
陈凯
杜力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110382904.5A priority Critical patent/CN113128670B/en
Publication of CN113128670A publication Critical patent/CN113128670A/en
Application granted granted Critical
Publication of CN113128670B publication Critical patent/CN113128670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Feedback Control In General (AREA)

Abstract

The method comprises the steps of obtaining a structure of a neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized, converting the neural network model to be optimized into an initial model; operator layer fusion processing is carried out on the initial model, and an intermediate model is obtained; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; according to the weight adjustment coefficient range and the weight parameter, performing weight optimization processing on the intermediate model; and carrying out quantization treatment on the intermediate model with optimized weight to obtain an optimized neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited.

Description

Neural network model optimization method and device
Technical Field
The application relates to the technical field of internet, in particular to a neural network model optimization method and device.
Background
The method for identifying the target object based on the neural network model has extremely high accuracy, so that the method is widely applied to various fields, such as automatic driving, intelligent security, intelligent robots and the like. Accordingly, neural network models are also widely deployed in hardware accelerators that are compatible with various fields. The hardware accelerator is used for matching with a central processing unit where the hardware accelerator is located, and processing special work at high speed. Wherein the hardware accelerator deploying the neural network model is used to run the neural network model exclusively. With the continuous widening of the application field of the neural network model, more and more research and development efforts are put into the neural network model. In order to reduce the research and development cost of the neural network model, a researcher completes the work of building and training the neural network model on a general CPU, a GPU and other processors, and then adapts to a hardware accelerator in the specific application field.
The neural network model can accurately identify the target object, and is based on the calculation complexity of the self model height. This requires that the hardware accelerator used to deploy the neural network model have significant memory and be capable of carrying complex computational power. However, in practical applications, the hardware for deploying the neural network model does not have a large memory space, which results in that the neural network model cannot be applied to a hardware accelerator with a small memory.
Disclosure of Invention
The application provides an optimization method and device for a neural network model, which can be used for solving the technical problem that the neural network model in the prior art cannot be adapted in hardware with smaller memory.
In a first aspect, the present application provides a method for optimizing a neural network model, the method comprising:
acquiring the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
performing operator layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of adjacent operator layers on a structure or a plurality of operator layers which are juxtaposed on a structure into one operator layer through the operator layer fusion processing;
determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range;
according to the weight adjustment coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model;
and carrying out quantization treatment on the intermediate model with optimized weight to obtain an optimized neural network model.
With reference to the first aspect, in an implementation manner of the first aspect, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter includes:
according to the connection sequence of the operator layers, carrying out weight optimization treatment on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers; the target operator layer is any operator layer with the weight in the weight adjusting coefficient range in the intermediate model.
With reference to the first aspect, in an implementation manner of the first aspect, according to the weight adjustment coefficient range and the weight parameter of the target operator layer, performing weight optimization processing on the target operator layer includes:
determining a weight adjustment coefficient of the target operator layer according to the maximum weight value and the minimum weight value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.
With reference to the first aspect, in an implementation manner of the first aspect, the performing weight optimization processing on the target operator layer further includes:
and adding a preset normalization layer after the target operator layer is subjected to weight optimization treatment.
With reference to the first aspect, in an implementation manner of the first aspect, performing an operator layer fusion process on the initial model to obtain an intermediate model, including:
merging a normalization layer connected with a convolution layer into the convolution layer to obtain the intermediate model; the convolution layers are operator layers of one type, the normalization layers are operator layers of another type, and one normalization layer is connected behind each convolution layer.
With reference to the first aspect, in an implementation manner of the first aspect, performing an operator layer fusion process on the initial model to obtain an intermediate model, including:
if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, merging the plurality of branch convolution layers into one convolution layer to obtain the intermediate model; the plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed with each other.
In a second aspect, the present application provides an optimization apparatus for a neural network model, the apparatus including:
the acquisition module is used for acquiring the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
the transformation module is used for transforming the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
the processing module is used for carrying out operator layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of adjacent operator layers on a structure or a plurality of operator layers which are juxtaposed on a structure into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range; according to the weight adjustment coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.
With reference to the second aspect, in an implementation manner of the second aspect, the processing module is specifically configured to:
according to the connection sequence of the operator layers, carrying out weight optimization treatment on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers; the target operator layer is any operator layer with the weight in the weight adjusting coefficient range in the intermediate model.
With reference to the second aspect, in an implementation manner of the second aspect, the processing module is specifically configured to:
determining a weight adjustment coefficient of the target operator layer according to the maximum weight value and the minimum weight value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.
With reference to the second aspect, in an implementation manner of the second aspect, the apparatus further includes an adding module, configured to:
and adding a preset normalization layer after the target operator layer is subjected to weight optimization treatment.
With reference to the second aspect, in an implementation manner of the second aspect, the processing module is specifically configured to:
merging a normalization layer connected with a convolution layer into the convolution layer to obtain the intermediate model; the convolution layers are operator layers of one type, the normalization layers are operator layers of another type, and one normalization layer is connected behind each convolution layer.
With reference to the second aspect, in an implementation manner of the second aspect, the processing module is specifically configured to:
if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, merging the plurality of branch convolution layers into one convolution layer to obtain the intermediate model; the plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed with each other.
According to the method, the operator layers are fused, the calculation dimension of the neural network model is reduced, and then the neural network model is subjected to weight optimization according to the weight range which can be accommodated by the target hardware accelerator, so that the target hardware accelerator is adapted to the neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
Drawings
Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present application;
fig. 2 is a flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an optimizing device of a neural network model according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present application. The implementation of the application provides a method comprising the following steps:
step S101, obtaining the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized.
The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.
It should be noted that, the neural network model to be optimized in the embodiment of the present application may be a classical model with an open-source framework, or a neural network model built and trained by a user. The neural network model comprises a plurality of operator layers, and each operator layer is a convolution layer or a normalization layer. The convolution layers provided in the embodiments of the present application may be one or more of mainstream convolution layers such as a normal convolution layer, a depth convolution layer, a hole convolution layer, and a packet convolution layer. The size, step size, filling, channel number, and the like of the convolution layer are not particularly limited.
In one implementation, the entire neural network model to be optimized is traversed, and the structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized can be obtained.
Step S102, converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized.
The format corresponding to the initial model is the format of the open neural network switching model (Open Neural Network Exchange, ONNX).
And step S103, performing operator layer fusion processing on the initial model to obtain an intermediate model.
And the initial model fuses a plurality of adjacent operator layers on the structure or a plurality of operator layers which are juxtaposed on the structure into one operator layer through operator layer fusion processing.
It should be noted that, step S102 and step S103 are performed synchronously, that is, the method provided in the embodiment of the present application converts the initial model into the format of the open neural network exchange model while performing the operator layer fusion processing on the initial model.
According to different structures of the neural network model to be optimized, the implementation of the application provides a plurality of operator layer fusion processing methods. One method includes fusing a normalization layer connected with a convolution layer into the convolution layer to obtain an intermediate model. The convolution layers are one type of operator layers, the normalization layers are the other type of operator layers, and one normalization layer is connected behind each convolution layer. The new convolutional layer after fusion is no longer followed by a normalization layer.
In another operator layer fusion processing method, if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer, and an intermediate model is obtained. The plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed to each other. Specifically, if the initial model has multiple branches from the input layer to the output layer, and the multiple branches have the same branching point and junction point, all branch convolution layers are considered to be structurally juxtaposed to each other. Further, whether the convolution kernels of all the branch convolution layers are the same is determined, and if the convolution kernels of all the branch convolution layers are the same, for example, 3*3 or 1*1, the branch convolution layers are fused into one convolution layer.
The operator layer fusion processing method further comprises the step of fusing the plurality of normalization layers into one normalization layer if the neural network model to be optimized has the plurality of continuous normalization layers.
According to the operator layer fusion processing method, the neural network model to be optimized is structurally simplified, and meanwhile accuracy of the neural network model to be optimized is maintained.
After executing step S103, the method provided in the embodiment of the present application further needs to perform an equivalent transformation on the operator layer types in the initial model according to the operator layer types supported by the target hardware accelerator. For example, the hardware accelerator only supports the convolution layer of 3*3, while the initial model includes both 3*3 and other types of convolution layers, and then equivalent processing of the other types of convolution layers in the initial model is required. For another example, if the hardware accelerator does not support hole convolution, zeros are padded on corresponding nulls in the initial model such that the hole convolution is equivalent to a normal convolution layer.
In order to facilitate the subsequent steps to directly acquire the information of the intermediate model, the method adds the information related to the operator layers, such as input tensor information, output tensor information, parameter information and the like of each operator layer by layer in the process of carrying out operator layer fusion processing and equivalent processing on the initial model.
Step S104, determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and the preset coefficient.
The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to a configuration file of the target hardware accelerator, or according to a preset weight range.
The method provided by the embodiment of the application represents the weight supported by the hardware accelerator by B bit widths, and the weight range of the neural network model which can be accommodated by the target hardware accelerator is determined by the following method:
A=[-2 B-1 ,+(2 B-1 -1)]formula (1)
In formula (1), a is a weight range, p=2 B-1 -1,N= -2, which is the maximum value in the weight range B-1 Is the minimum in the weight range.
The weight adjustment coefficient range is determined by the following method:
D=[n,n+(p-n)*S th ]formula (2)
In the formula (2), D is a weight adjustment coefficient range; n= -2 B-1 Is the minimum value in the weight range; p=2 B-1 -1, being the maximum value in the weight range; s is S th =0.8。
Step S105, performing weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter.
In one implementation, according to the connection sequence of the operator layers, the weight optimization processing is performed on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers.
The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.
It should be noted that, the method provided by the embodiment of the application does not need to perform weight optimization processing on all operator layers in the intermediate model, and only needs to perform weight optimization processing on the target operator layers with weights within the weight adjustment coefficient range. Therefore, according to the method provided by the embodiment of the application, according to the connection sequence of the operator layers, from the input layer to the output layer, whether the weight corresponding to the current operator layer is in the weight adjustment coefficient range is judged in sequence, and if the weight corresponding to the current operator layer is in the weight adjustment coefficient range, the current operator layer is taken as the target operator layer.
Fig. 2 is a schematic flow chart corresponding to a method for performing weight optimization processing on a target operator layer according to an embodiment of the present application. The weight optimization processing method provided by the embodiment of the application comprises the following steps:
step S201, determining the weight adjustment coefficient of the target operator layer according to the weight maximum value and the weight minimum value of the target operator layer.
In step S101, the weight parameter corresponding to each operator layer has been acquired, and the weight parameter corresponding to the operator layer has not been changed through steps S101 to S104. And comparing the maximum value of the weight of each operator layer with the minimum value of the weight of each operator layer, wherein the absolute value between the maximum value and the minimum value is larger, and the maximum value and the minimum value are used as the weight adjustment coefficient of the target operator layer. And if the weight adjustment coefficient range corresponding to the operator layer is in the adjustment range, determining the operator layer as a target operator layer. Correspondingly, the maximum weight value of the operator layer is the maximum weight value of the target operator layer, and the minimum weight value of the operator layer is the minimum weight value of the target operator layer. The target operator layer satisfies the following conditions:
n≤W max ≤n+(p-n)*S th formula (3)
In the formula (3), n is the minimum value in the weight adjustment coefficient range; n+ (p-n) S th Is the maximum value in the weight adjustment coefficient range; p=2 B-1 -1, being the maximum value in the weight range; s is S th =0.8;W max The coefficients are adjusted for the weights of the target operator layers.
Step S202, determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.
In the embodiment of the application, the weight scale factors are determined by adopting the following method:
in the formula (4), scale is a weight Scale factor; p is the maximum value in the weight range; alpha is a coefficient, and the value is 0.95; w (W) max The coefficients are adjusted for the weights of the target operator layers.
And step S203, carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.
According to the method provided by the embodiment of the application, the scale factors are multiplied by the weights of the target operator layers, so that the weights of the target operator layers after optimization are obtained.
The method provided by the embodiment of the application further comprises the step of adding a preset normalization layer after the target operator layer is subjected to weight optimization.
It should be noted that, the preset normalization layer provided in the embodiment of the present application satisfies the following relation:
in the formula (5), BN out Outputting a preset normalization layer; x is the input of a preset normalization layer; μ is the mean, μ=0; sigma is standard deviation, sigma 2 =1; e is a very small amount to prevent zero-removal introduction; gamma is the scale of the pre-set normalization layer,beta is the offset parameter, beta=0.
And S106, carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.
In the embodiment of the application, the following method is adopted for quantization treatment:
in the formula (6) of the present invention,is quantized data; />Is the data before quantization; the minimum in the n weight range; p is the maximum value in the weight range; />int represents rounding, clip represents truncation, max represents taking a larger value, abs represents taking an absolute value.
After optimizing the model, if the model is quantized by using B bit widths currently, testing the neural network model by using a small number of pictures, comparing the neural network model with the neural network model which is not quantized at all, and if the error of the neural network model and the neural network model is larger, indicating that the B bit widths are insufficient to meet the precision requirement. At this time, the size of B may be increased and the optimization may be performed again. It should be noted that, compared with the method of directly quantizing according to B bit widths, the method provided by the embodiment of the application has a significant improvement in accuracy.
According to the method provided by the embodiment of the application, the operator layers are fused, the calculation dimension of the neural network model is reduced, and then the neural network model is subjected to weight optimization according to the weight range which can be accommodated by the target hardware accelerator, so that the target hardware accelerator is adapted to the neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application. Fig. 3 schematically illustrates a structural diagram of an optimization apparatus of a neural network model according to an embodiment of the present application. As shown in FIG. 3, the device has the optimization function for realizing the neural network model, and the function can be realized by hardware or by executing corresponding software by hardware. The apparatus may include an acquisition module 301, a conversion module 302, and a processing module 303.
The obtaining module 301 is configured to obtain a structure of the neural network model to be optimized, parameters of the neural network model to be optimized, and weight parameters corresponding to each operator layer of the neural network model to be optimized. The neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values.
The conversion module 302 is configured to convert the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized. The format corresponding to the initial model is the format of an open neural network exchange model.
And the processing module 303 is used for performing operator layer fusion processing on the initial model to obtain an intermediate model. And the initial model fuses a plurality of adjacent operator layers on the structure or a plurality of operator layers which are juxtaposed on the structure into one operator layer through operator layer fusion processing. And determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and the preset coefficient. The target hardware accelerator is a hardware accelerator adapted to the neural network model to be optimized. The weight range is determined according to a configuration file of the target hardware accelerator, or according to a preset weight range. And carrying out weight optimization processing on the intermediate model according to the weight adjustment coefficient range and the weight parameter. And carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.
Optionally, the processing module 303 is specifically configured to:
and according to the connection sequence of the operator layers, carrying out weight optimization processing on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers. The target operator layer is any operator layer with the weight within the weight adjusting coefficient range in the intermediate model.
Optionally, the processing module 303 is specifically configured to:
and determining the weight adjustment coefficient of the target operator layer according to the weight maximum value and the weight minimum value of the target operator layer.
And determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator.
And carrying out weight optimization treatment on the target operator layer according to the scale factors and the weights of the target operator layer.
Optionally, the apparatus further comprises an adding module for:
and adding a preset normalization layer after the target operator layer is subjected to weight optimization treatment.
Optionally, the processing module 303 is specifically configured to:
and fusing the normalization layer connected with the convolution layer into the convolution layer to obtain an intermediate model. The convolution layers are one type of operator layers, the normalization layers are the other type of operator layers, and each convolution layer is connected with one normalization layer.
Optionally, the processing module 303 is specifically configured to:
if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, the plurality of branch convolution layers are fused into one convolution layer, and an intermediate model is obtained. The plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed to each other.
According to the method, the operator layers are fused, the calculation dimension of the neural network model is reduced, and then the neural network model is subjected to weight optimization according to the weight range which can be accommodated by the target hardware accelerator, so that the target hardware accelerator is adapted to the neural network model. The method solves the problem that the layout of the neural network model on the hardware accelerator is limited on the premise of not reducing the accuracy of the neural network model.
The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A method for optimizing a neural network model, the method comprising:
acquiring the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
converting the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
performing operator layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of adjacent operator layers on a structure or a plurality of operator layers which are juxtaposed on a structure into one operator layer through the operator layer fusion processing;
determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range;
according to the weight adjustment coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model;
and carrying out quantization treatment on the intermediate model with optimized weight to obtain an optimized neural network model.
2. The method of claim 1, wherein performing a weight optimization process on the intermediate model based on the weight adjustment coefficient range and the weight parameter comprises:
according to the connection sequence of the operator layers, carrying out weight optimization treatment on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers; the target operator layer is any operator layer with the weight in the weight adjusting coefficient range in the intermediate model.
3. The method according to claim 2, wherein the weight optimization processing is performed on the target operator layer according to the weight adjustment coefficient range and the weight parameter of the target operator layer, including:
determining a weight adjustment coefficient of the target operator layer according to the maximum weight value and the minimum weight value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.
4. The method according to claim 2, wherein the weight optimization process is performed on the target operator layer, and further comprising:
and adding a preset normalization layer after the target operator layer is subjected to weight optimization treatment.
5. The method of claim 1, wherein performing an operator layer fusion process on the initial model to obtain an intermediate model comprises:
merging a normalization layer connected with a convolution layer into the convolution layer to obtain the intermediate model; the convolution layers are operator layers of one type, the normalization layers are operator layers of another type, and one normalization layer is connected behind each convolution layer.
6. The method of claim 1, wherein performing an operator layer fusion process on the initial model to obtain an intermediate model comprises:
if the initial model comprises a plurality of branch convolution layers with the same convolution kernel, merging the plurality of branch convolution layers into one convolution layer to obtain the intermediate model; the plurality of branch convolution layers are all convolution layers, and all branch convolution layers are structurally juxtaposed with each other.
7. An apparatus for optimizing a neural network model, the apparatus comprising:
the acquisition module is used for acquiring the structure of the neural network model to be optimized, parameters of the neural network model to be optimized and weight parameters corresponding to each operator layer of the neural network model to be optimized; the neural network model to be optimized comprises a plurality of operator layers, and the weight parameters comprise weights, weight maximum values and weight minimum values;
the transformation module is used for transforming the neural network model to be optimized into an initial model according to the structure of the neural network model to be optimized and parameters of the neural network model to be optimized; the format corresponding to the initial model is the format of an open neural network exchange model;
the processing module is used for carrying out operator layer fusion processing on the initial model to obtain an intermediate model; the initial model fuses a plurality of adjacent operator layers on a structure or a plurality of operator layers which are juxtaposed on a structure into one operator layer through the operator layer fusion processing; determining a weight adjustment coefficient range according to the weight range of the neural network model which can be accommodated by the target hardware accelerator and a preset coefficient; the target hardware accelerator is a hardware accelerator matched with the neural network model to be optimized; the weight range is determined according to the configuration file of the target hardware accelerator or according to a preset weight range; according to the weight adjustment coefficient range and the weight parameter, carrying out weight optimization processing on the intermediate model; and carrying out quantization processing on the intermediate model with optimized weight to obtain an optimized neural network model.
8. The apparatus of claim 7, wherein the processing module is specifically configured to:
according to the connection sequence of the operator layers, carrying out weight optimization treatment on the target operator layers according to the weight adjustment coefficient range and the weight parameters of the target operator layers; the target operator layer is any operator layer with the weight in the weight adjusting coefficient range in the intermediate model.
9. The apparatus of claim 7, wherein the processing module is specifically configured to:
determining a weight adjustment coefficient of the target operator layer according to the maximum weight value and the minimum weight value of the target operator layer;
determining a weight scale factor according to the weight adjustment coefficient of the target operator layer and the weight range of the neural network model which can be accommodated by the target hardware accelerator;
and carrying out weight optimization processing on the target operator layer according to the scale factors and the weights of the target operator layer.
10. The apparatus of claim 7, further comprising an adding module for:
and adding a preset normalization layer after the target operator layer is subjected to weight optimization treatment.
CN202110382904.5A 2021-04-09 2021-04-09 Neural network model optimization method and device Active CN113128670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110382904.5A CN113128670B (en) 2021-04-09 2021-04-09 Neural network model optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110382904.5A CN113128670B (en) 2021-04-09 2021-04-09 Neural network model optimization method and device

Publications (2)

Publication Number Publication Date
CN113128670A CN113128670A (en) 2021-07-16
CN113128670B true CN113128670B (en) 2024-03-19

Family

ID=76775672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110382904.5A Active CN113128670B (en) 2021-04-09 2021-04-09 Neural network model optimization method and device

Country Status (1)

Country Link
CN (1) CN113128670B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426027A (en) * 2013-07-24 2013-12-04 浙江大学 Intelligent normal pool level optimal selection method based on genetic neural network models
CN110378470A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 Optimization method, device and the computer storage medium of neural network model
CN110826692A (en) * 2019-10-24 2020-02-21 腾讯科技(深圳)有限公司 Automatic model compression method, device, equipment and storage medium
CN111310684A (en) * 2020-02-24 2020-06-19 东声(苏州)智能科技有限公司 Model training method and device, electronic equipment and storage medium
CN111602145A (en) * 2018-10-30 2020-08-28 深圳鲲云信息科技有限公司 Optimization method of convolutional neural network and related product
CN112200297A (en) * 2020-09-04 2021-01-08 厦门星宸科技有限公司 Neural network optimization method, device and processor
CN112257840A (en) * 2019-07-22 2021-01-22 华为技术有限公司 Neural network processing method and related equipment
CN112465108A (en) * 2020-11-11 2021-03-09 上海交通大学 Neural network compiling method for storage and calculation integrated platform
CN112541159A (en) * 2020-09-30 2021-03-23 华为技术有限公司 Model training method and related equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11501130B2 (en) * 2016-09-09 2022-11-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
CN107895174B (en) * 2017-11-09 2020-01-07 京东方科技集团股份有限公司 Image classification and conversion method, device and image processing system
CN108259997B (en) * 2018-04-02 2019-08-23 腾讯科技(深圳)有限公司 Image correlation process method and device, intelligent terminal, server, storage medium
US11645493B2 (en) * 2018-05-04 2023-05-09 Microsoft Technology Licensing, Llc Flow for quantized neural networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426027A (en) * 2013-07-24 2013-12-04 浙江大学 Intelligent normal pool level optimal selection method based on genetic neural network models
CN111602145A (en) * 2018-10-30 2020-08-28 深圳鲲云信息科技有限公司 Optimization method of convolutional neural network and related product
CN110378470A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 Optimization method, device and the computer storage medium of neural network model
CN112257840A (en) * 2019-07-22 2021-01-22 华为技术有限公司 Neural network processing method and related equipment
CN110826692A (en) * 2019-10-24 2020-02-21 腾讯科技(深圳)有限公司 Automatic model compression method, device, equipment and storage medium
CN111310684A (en) * 2020-02-24 2020-06-19 东声(苏州)智能科技有限公司 Model training method and device, electronic equipment and storage medium
CN112200297A (en) * 2020-09-04 2021-01-08 厦门星宸科技有限公司 Neural network optimization method, device and processor
CN112541159A (en) * 2020-09-30 2021-03-23 华为技术有限公司 Model training method and related equipment
CN112465108A (en) * 2020-11-11 2021-03-09 上海交通大学 Neural network compiling method for storage and calculation integrated platform

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A DNN Optimization Framework with Unlabeled Data for Efficient and Accurate Reconfigurable Hardware Inference;Kai Chen et.al;2021 IEEE International Symposium on Circuits and Systems (ISCAS);全文 *
Improved Deep Neural Network Hardware-Accelerators Based on Non-Volatile-Memory: The Local Gains Technique;Irem Boybat et.al;2017 IEEE International Conference on Rebooting Computing (ICRC);全文 *
基于ARM NEON目标检测网络算法的加速技术研究;邢景;中国优秀硕士学位论文全文数据库;全文 *
遗传算法优化前向神经网络结构和权重矢量;黎明, 严超华, 刘高航;中国图象图形学报(06);全文 *
面向硬件实现的深度神经网络模型优化与加速方法研究;陈凯;中国优秀硕士学位论文全文数据库;全文 *

Also Published As

Publication number Publication date
CN113128670A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
Hand et al. nbodykit: An open-source, massively parallel toolkit for large-scale structure
CN111950225B (en) Chip layout method and device, storage medium and electronic equipment
US8204714B2 (en) Method and computer program product for finding statistical bounds, corresponding parameter corners, and a probability density function of a performance target for a circuit
CN110149238B (en) Method and device for predicting flow
CN110929862B (en) Fixed-point neural network model quantification device and method
CN113095129A (en) Attitude estimation model training method, attitude estimation device and electronic equipment
CN112418427A (en) Method, device, system and equipment for providing deep learning unified reasoning service
WO2021011412A1 (en) Systems and methods for simulating a quantum processor
CN111311480A (en) Image fusion method and device
CN113128670B (en) Neural network model optimization method and device
JP2011186991A (en) Method, program and system for solving ordinary differential equation
CN110046670B (en) Feature vector dimension reduction method and device
Zhang et al. Practical edge kernels for integer-only vision transformers under post-training quantization
CN113723712B (en) Wind power prediction method, system, equipment and medium
Singha et al. LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning
CN110222777B (en) Image feature processing method and device, electronic equipment and storage medium
US20150142630A1 (en) Risk scenario generation
CN113760497A (en) Scheduling task configuration method and device
CN114365151A (en) Neural network model transformation method, device, server and storage medium
CN111310794A (en) Target object classification method and device and electronic equipment
CN117573123B (en) Page generation method and device applied to webpage application and electronic equipment
CN107220429B (en) Automatic selection method and system for optimal device in device modeling
CN111598037B (en) Human body posture predicted value acquisition method, device, server and storage medium
EP4343603A1 (en) System and method for managing geometric designs
Fuchs et al. TorchClim v1. 0: a deep-learning plugin for climate model physics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant