CN118297099A

CN118297099A - Model optimization method, model optimization device and computer storage medium

Info

Publication number: CN118297099A
Application number: CN202410263751.6A
Authority: CN
Inventors: 林超; 张磊; 魏程峰; 孙伶君; 张海玉; 姜晓卫
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2024-03-07
Filing date: 2024-03-07
Publication date: 2024-07-05

Abstract

The application provides a model optimization method, a model optimization device and a computer storage medium. The model optimization method comprises the following steps: acquiring a neural network model and analyzing operator parameters in the neural network model; optimizing an operator template of an operator library by utilizing the equipment environment information and/or the operator parameters to determine an optimized operator of the neural network model; and outputting an operation code according to the optimization operator, compiling a runtime library of the neural network model, and obtaining the optimized neural network model. By the model optimization method, when the model is converted, the operator is secondarily optimized based on the model parameters, so that the performance of the operator and the integral reasoning performance of the converted model are improved.

Description

Model optimization method, model optimization device and computer storage medium

Technical Field

The present application relates to the field of model conversion reasoning, and in particular, to a model optimization method, a model optimization apparatus, and a computer storage medium.

Background

The deep learning technology is practically applied, and relates to neural network model deployment, wherein the model deployment mainly comprises two stages, namely model conversion and model reasoning. The model conversion refers to converting a trained model into a model format of a target platform, and aims to optimize the model and improve reasoning performance through modes such as graph optimization, optimal operator selection and the like. Model reasoning refers to the process that a runtime library carries out calculation on specified equipment according to a model by loading the converted model, and reasoning performance is strongly related to operator optimization.

The existing inference operator optimization is mainly based on general coding of operators by optimization engineers, a large number of operators need to be considered for parameter entering, and a large number of repeated coding works possibly exist for different parameters, so that model optimization difficulty is high and effect is poor.

Disclosure of Invention

In order to solve the technical problems, the application provides a model optimization method, a model optimization device and a computer storage medium.

In order to solve the technical problems, the application provides a model optimization method, which comprises the following steps:

acquiring a neural network model and analyzing operator parameters in the neural network model;

optimizing an operator template of an operator library by utilizing the equipment environment information and/or the operator parameters to determine an optimized operator of the neural network model;

and outputting an operation code according to the optimization operator, compiling a runtime library of the neural network model, and obtaining the optimized neural network model.

Before the operator templates of the operator library are optimized by using the equipment environment information and/or the operator parameters, the model optimization method further comprises the following steps:

analyzing operator types of each model layer of the neural network model based on the operator parameters;

and acquiring a corresponding operator template from the operator library according to the operator type.

The optimizing the operator template of the operator library by using the equipment environment information and/or the operator parameters to determine an optimizing operator of the neural network model comprises the following steps:

And replacing variables to be replaced of the operator template by using constants and variables in the operator parameters to generate the optimization operator.

Wherein replacing the variables to be replaced of the operator templates with constants and variables in the operator parameters comprises:

acquiring constants in an expression in response to the operator parameters including the expression;

Calculating the constants according to the expression, and combining the constants into an optimized constant;

And replacing variables to be replaced of the operator templates by using the optimization constants and the variables.

Wherein the replacing the variables to be replaced of the operator templates with the constants and the variables in the operator parameters to generate the optimization operator comprises the following steps:

Setting the optimization constant as an initial value of the operator template;

operating the operator template based on the initial value and different cache block sizes, and acquiring operation time consumption corresponding to each cache block size;

taking the cache block size with the least running time consumption as an optimized cache block size;

and replacing variables to be replaced of the operator template by using the optimization constant and the variables, and generating the optimization operator according to the size of the optimization cache block.

The optimizing the operator template by utilizing the equipment environment information and/or the operator parameters to determine an optimizing operator of the neural network model comprises the following steps:

analyzing a cycle to be expanded of the neural network model based on the operator parameters;

acquiring the number of registers based on the equipment environment information;

determining the number of the expansion loops to be expanded according to the register output and the register input of the loops to be expanded and the number of the registers;

and optimizing the operator templates according to the unfolding quantity, and determining the optimizing operator.

based on the operator parameters, acquiring operator branches with constant judgment conditions;

acquiring the number of data groups calculated in parallel based on the equipment environment information;

calculating the operator branches according to the data group number, and determining the optimized branches of the operator templates;

and optimizing the operator template according to the optimizing branch, and determining the optimizing operator.

Wherein the running code is source code.

In order to solve the technical problem, the application also provides a model optimizing device, which comprises a memory and a processor coupled with the memory; wherein the memory is configured to store program data and the processor is configured to execute the program data to implement a model optimization method as described above.

In order to solve the above technical problem, the present application further provides a computer storage medium for storing program data, which when executed by a computer, is used to implement the above model optimization method.

Compared with the prior art, the application has the beneficial effects that: the method comprises the steps that a model optimization device obtains a neural network model and analyzes operator parameters in the neural network model; optimizing an operator template of an operator library by utilizing the equipment environment information and/or the operator parameters to determine an optimized operator of the neural network model; and outputting an operation code according to the optimization operator, compiling a runtime library of the neural network model, and obtaining the optimized neural network model. By the model optimization method, when the model is converted, the operator is secondarily optimized based on the model parameters, so that the performance of the operator and the integral reasoning performance of the converted model are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Wherein:

FIG. 1 is a flow chart of an embodiment of a model conversion scheme provided by the present application;

FIG. 2 is a flow chart of another embodiment of a model conversion scheme provided by the present application;

FIG. 3 is a flow chart of an embodiment of a model optimization method provided by the present application;

FIG. 4 is a schematic diagram of model optimization content provided by the present application;

FIG. 5 is a schematic diagram of an embodiment of a model optimization device according to the present application;

FIG. 6 is a schematic view of another embodiment of a model optimizing apparatus provided by the present application;

Fig. 7 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The application aims at solving the problems that in the model conversion stage, operators are regenerated through optimization strategies such as constant optimization, cyclic expansion, branch selection optimization, block size adjustment and the like aiming at model layer information, so that the running efficiency of a model is improved, and meanwhile, engineers are easy to debug.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a model conversion scheme according to the present application.

As shown in fig. 1, the current model conversion scheme specifically includes the following steps: inputting an original model, analyzing the original model, matching operator libraries, outputting the model and outputting a runtime library. The scheme is that matched operators are individually packed and recompiled from an operator library into a runtime library, wherein the runtime library is a special computer program library which is used by a compiler to realize a programming language built-in function to provide support for the runtime (execution) of the language program. Such libraries typically include basic input and output or memory management support.

The model conversion scheme of fig. 1 only establishes an operator mapping table according to an operator library, and has no design on the performance of operators in the operator library, and the converted model can only be used in scenes where performance is not pursued.

Therefore, the application further provides another model conversion scheme, an operator generation module is added on the basis of the model conversion scheme shown in fig. 1, operator source codes are dynamically generated according to model parameters when operators are matched, and then compiling is carried out to generate a runtime library. Referring to fig. 2 specifically, fig. 2 is a flow chart of another embodiment of the model conversion scheme provided in the present application.

The inputs to the operator generation module in fig. 2 are: operator parameters, operator templates and equipment environment information; the output is: the code is run. There are three types of data for operator templates: variables, constants, and variables to be replaced. There are two types of loops for operator templates: conventional cycles and to-be-deployed cycles.

It should be noted that, the running code used in the application can be binary code or source code, preferably source code, because the source code is easy for engineers to analyze and adjust, and the debuggeability is improved while the optimal performance is maintained.

Based on the model conversion scheme shown in fig. 2, the present application proposes a specific model optimization method, referring to fig. 3 and fig. 4 specifically, fig. 3 is a schematic flow chart of an embodiment of the model optimization method provided by the present application, and fig. 4 is a schematic diagram of the model optimization content provided by the present application.

The model optimization method is applied to a model optimization device, wherein the model optimization device can be a server, terminal equipment or a system formed by mutually matching the server and the terminal equipment. Accordingly, each part, for example, each unit, sub-unit, module, and sub-module, included in the model optimization device may be all disposed in the server, may be all disposed in the terminal device, or may be disposed in the server and the terminal device, respectively.

Further, the server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing a distributed server, or may be implemented as a single software or software module, which is not specifically limited herein.

As shown in fig. 3, the specific steps are as follows:

Step S11: acquiring a neural network model, and analyzing operator parameters in the neural network model.

In the embodiment of the present application, as shown in fig. 2, the model optimizing device inputs an original model, such as an image processing model, and the model optimizing device obtains information of each model layer, such as operator parameters of each model layer, by analyzing the image processing model. In addition, the neural network model related to the application can also be other types of original models, such as sentence query models, audio analysis models and the like. In the following embodiments, an image processing model will be described as an example.

Step S12: and optimizing an operator template of the operator library by using the equipment environment information and/or the operator parameters to determine an optimized operator of the neural network model.

In the embodiment of the application, the model optimizing device optimizes the operator through the model optimizing content shown in fig. 4. The following details the model optimization of each type in fig. 4:

The model optimizing device performs data frequent quantization on the operator templates. Firstly, the model optimizing device replaces variables to be replaced in the operator template with values obtained by analyzing the image processing model, such as operator parameters. If the analysis from the image processing model results in the case of an expression, the model optimizing means continues to detect the expression. When a plurality of constants exist in the expression, the plurality of constants are operated, the plurality of constants are combined into a new constant according to the operation logic of the expression, and the operation is repeated until the constants cannot be combined.

For example, the expression resolved in the image processing model is as follows: in_step=in_w-kernel_w. Assuming that in_w is 10, kernel_w is 3, stride_w is 2, since all variables in the expression are operator parameters, in_step can be calculated in advance, resulting in 4. The model optimizing device can replace variables to be replaced in the operator template with the final calculation result.

Further, as shown in fig. 2, the model optimization device inputs the operator library, and obtains the operator template by analyzing the operator library. When the operator templates are obtained, the model optimizing device can obtain operator types of all model layers by analyzing model layer information of the image processing model, and obtain corresponding operator templates by matching operator libraries according to the operator types.

The model optimizing device circularly expands the operator templates. The model optimizing device expands the loop to be expanded in the operator template, and the conventional loop relates to multi-level nested logic, so that logic errors can be caused if the loop is forcibly expanded.

Specifically, the cyclic expansion of the embodiment of the application mainly accumulates data in registers, that is, the positions where the data is accessed during operation of the computing unit, and how many registers on the computing unit often limit the maximum length of the cyclic expansion. Therefore, the model optimizing device expands the key cycle in the operator template according to the acquired equipment environment information, including equipment hardware conditions and equipment software conditions, reduces memory reading and accelerates operation speed.

For example: assuming that a set of loops needs to occupy 1 register output, two register inputs, the inputs only participate in one computation, the input registers can be multiplexed, and in the case of 16 registers, (16-2)/1 can be circularly spread out for a total of 14 spreads.

The model optimizing device performs optimizing branch selection aiming at the operator template. The model optimizing device detects codes with constant judging conditions in all branches in the codes through the obtained model shape, namely model shape and other data, fixes the branches in advance, and reduces the situation of branch prediction errors.

For example, in the model optimization apparatus, in order to increase the calculation speed, according to the device characteristics, the x86 instruction set and the avx instruction set may calculate 8 sets of floating point data at the same time according to SIMD (single instruction multiple data) instructions, where the case in the model often cannot be satisfied, and in general procedures, cases from 1 to 7 need to be judged, and the remainder can be determined by the case data obtained in conversion, so that branches are fixed in advance.

Among them, a single instruction stream multiple data stream is a technique of employing one controller to control a plurality of processors while performing the same operation on each of a set of data (also referred to as "data vector") respectively, thereby achieving spatial parallelism.

The model optimization device adjusts the size of the blocks in the device cache aiming at the operator template. In particular, the determination of the cache partition size of the operator template often requires continuous debugging by excellent engineers in the prior art. In the embodiment of the application, the model optimizing device determines the final constant of the buffer memory block size, which is firstly determined by the data constant quantization, as an initial value; then, the model optimizing device measures different blocking conditions in a certain range, namely operator operation time consumption of different blocking sizes according to the initial value as a center; and finally, determining the final cache block size according to the running time consumption of each block situation.

For example, the model optimizing device calculates the initial value of the operator template determined in the data constant quantization as 64, and then the operator template is respectively aimed at the following different blocking situations: and [16, 32, 64, 128, 256] carrying out operator operation time consumption statistics, and taking the block size value with the least time consumption as a final block scheme.

Step S13: and outputting an operation code according to the optimization operator, compiling a runtime library of the neural network model, and obtaining the optimized neural network model.

In the embodiment of the application, after the operator generating module shown in fig. 2 is executed, the model optimizing device can output the source code which is correspondingly analyzed, and finally, the final runtime library is obtained according to the source code compiling.

In the embodiment of the application, a model optimization device acquires a neural network model and analyzes operator parameters in the neural network model; inputting the operator parameters, the operator templates of the operator library and the equipment environment information into an operator generation module, and optimizing the operator templates by utilizing the equipment environment information and/or the operator parameters to determine an optimized operator of the neural network model; and outputting an operation code according to the optimization operator by utilizing the operator generation module, and compiling a runtime library of the neural network model. By the model optimization method, an operator generation module is introduced during model conversion, and operators are secondarily optimized based on model parameters, so that the performance of the operators and the integral reasoning performance of the converted model are improved.

In the embodiment of the application, the model optimizing device reduces branch prediction errors and block size adjustment through designed operator secondary optimizing strategies including parameter frequent quantization, cyclic expansion, branch fixation, so that the operator performance is superior to that of an operator library in the conventional operation.

In the embodiment of the application, the model optimizing device outputs the source codes first and then compiles the source codes into the runtime library, so that the debugging of optimizing personnel is facilitated.

Taking an image processing model as an example, the application process of the optimized image processing model is as follows: the terminal equipment inputs the image to be processed into an optimized image processing model, a feature extraction operator of the image processing model extracts image features of the image to be processed, then the image features are input into an image processing operator for feature transformation, and finally an image reconstruction operator is input to reconstruct the processed image. The feature extraction operator, the image processing operator and the image reconstruction operator in the process can acquire the optimization operator through the model optimization method, so that the performance of the image processing model is improved.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

In order to implement the above model optimization method, the present application further provides a model optimization device, and referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of the model optimization device provided by the present application.

The model optimizing apparatus 300 of the present embodiment includes an parsing module 31, an optimizing module 32, and a compiling module 33.

The parsing module 31 is configured to obtain a neural network model, and parse operator parameters in the neural network model.

The optimizing module 32 is configured to input the operator parameters, the operator templates of the operator library, and the device environment information into the operator generating module, and optimize the operator templates by using the device environment information and/or the operator parameters to determine an optimizing operator of the neural network model.

And a compiling module 33, configured to output running codes according to the optimizing operator by using the operator generating module, and compile a runtime library of the neural network model.

In order to implement the above model optimization method, the present application further provides another model optimization device, and referring to fig. 6 specifically, fig. 6 is a schematic structural diagram of another embodiment of the model optimization device provided by the present application.

The model optimizing apparatus 400 of the present embodiment includes a processor 41, a memory 42, an input-output device 43, and a bus 44.

The processor 41, the memory 42 and the input/output device 43 are respectively connected to the bus 44, and the memory 42 stores program data, and the processor 41 is configured to execute the program data to implement the model optimization method described in the above embodiment.

In an embodiment of the present application, the processor 41 may also be referred to as a CPU (Central Processing Unit ). The processor 41 may be an integrated circuit chip with signal processing capabilities. Processor 41 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an Application Specific Integrated Circuit (ASIC), a field programmable gate array (FPGA, field Programmable GATE ARRAY) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 41 may be any conventional processor or the like.

The present application further provides a computer storage medium, please continue to refer to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the computer storage medium provided by the present application, in which a computer program 61 is stored in the computer storage medium 600, and the computer program 61 is used to implement the model optimization method of the above embodiment when being executed by a processor.

Embodiments of the present application may be stored in a computer readable storage medium when implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. A model optimization method, characterized in that the model optimization method comprises:

2. The model optimization method according to claim 1, wherein,

3. The model optimization method according to claim 1, wherein,

The optimizing the operator template of the operator library by utilizing the equipment environment information and/or the operator parameters to determine an optimizing operator of the neural network model comprises the following steps:

And replacing variables to be replaced of the operator templates by using constants and variables in the operator parameters, and generating the optimization.

4. The model optimization method according to claim 3, characterized in that,

The replacing the variables to be replaced of the operator templates by using the constants and the variables in the operator parameters comprises the following steps:

5. The method for model optimization according to claim 4, wherein,

The replacing the variables to be replaced of the operator templates by using the constants and the variables in the operator parameters to generate the optimization operator comprises the following steps:

Setting the optimization constant as an initial value of the operator template;

6. The model optimization method according to claim 1, wherein,

7. The model optimization method according to claim 1, wherein,

8. The model optimization method according to claim 1, wherein,

The running code is source code.

9. A model optimization device, wherein the model optimization device comprises a memory and a processor coupled to the memory;

Wherein the memory is for storing program data and the processor is for executing the program data to implement the model optimization method of any one of claims 1 to 8.

10. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the model optimization method according to any one of claims 1 to 8.