CN115906963A

CN115906963A - Model conversion method and system for deep learning model inference hardware acceleration

Info

Publication number: CN115906963A
Application number: CN202211166984.1A
Authority: CN
Inventors: 林广栋; 陆俊峰; 洪一
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-04-04

Abstract

The invention provides a model conversion method, a system, a storage medium and electronic equipment for deep learning model reasoning hardware acceleration, and relates to the technical field of deep learning. The invention comprises the following steps: replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model with a plurality of layers of small convolution kernels; determining the weight of each layer of small convolution kernels respectively by adopting a genetic algorithm according to the weight difference of the large convolution kernels and each layer of small convolution kernels; and deploying the converted deep learning model on hardware for obtaining the convolution result of the input feature map so as to obtain the final inference result of the deep learning model. The large convolution kernel is replaced by the small convolution kernels which are executed in series, the weights of the small convolution kernels are obtained through direct calculation of the weights of the large convolution kernel, the influence of the replacement process on output is reduced as much as possible, a network model does not need to be retrained, the calculated amount is small, and the network model is directly deployed.

Description

Model conversion method and system for deep learning model inference hardware acceleration

Technical Field

The invention relates to the technical field of deep learning, in particular to a model conversion method, a model conversion system, a storage medium and electronic equipment for deep learning model reasoning hardware acceleration.

Background

In order to complete the reasoning of the deep learning model more efficiently, artificial intelligence chips dedicated to deep learning computation are gradually emerging. Such artificial intelligence chips contain an acceleration core dedicated to deep learning inference calculations, for energy-efficient execution of deep learning model reasoning. These deep learning acceleration cores generally perform computations in parallel using many computing devices, and take advantage of the reusability of data to minimize the movement of data.

Convolution calculation is one of the most common calculations in a deep learning model, and according to statistics, the calculated amount of the convolution calculation in a convolution neural network is more than 90%. Limited by the hardware implementation, the number of parallel computations simultaneously supported by any deep learning hardware architecture is limited. For example, if the hardware is designed to efficiently support convolution operations with convolution kernels of size 3x3, the hardware architecture, once determined, cannot directly support the computation of larger (e.g., 5x5 or 7x 7) convolution kernels. If the neural network architecture is designed to contain a larger convolution kernel, such as a convolution kernel of 5 × 5 size, it cannot be directly deployed on such a hardware architecture.

(1) One method is to split the 5x5 convolution kernel into 4 convolution kernels of 3x3 size, perform the convolution calculations separately, and then add up the convolution results to obtain the final convolution calculation result. The relationship between the weights of the new 4 convolution kernels with 3x3 and the weights of the convolution kernels with 5x5 size in this splitting mode is shown in fig. 1. In this split mode, the new weights of the small convolution kernels can be obtained directly from the weights of the original large convolution kernels, and the network does not need to be retrained. In this splitting manner, it can be considered that the original 5 × 5 convolution kernel is split into 4 convolution kernels that are executed in parallel by 3 × 3, and the relationship between the new network architecture and the original network architecture is shown in fig. 2. In this splitting mode, it should be ensured that all weights of the original convolution kernel are split into small hardware-size-adaptive convolution kernels. And filling 0 for the part which is not enough to be small in the size of the convolution kernel after the original convolution kernel is split.

(2) Another approach is to split the neural network layer of one large convolution kernel into multiple small convolution kernel layers that are executed serially. After splitting into serial convolutional layers, the reception field of each output of the next layer is kept unchanged, so that information loss caused by the replacement process can be avoided. For example, as shown in fig. 3, for a 5x5 convolution kernel whose output receptive field is a 5x5 window, two layers of 3x3 small convolution kernel layers are required for replacement. As another example, for a 7x7 convolution kernel, the field of each output is a 7x7 window, which needs to be replaced with 3 layers of 3x3 small convolution layers. After the convolution kernel replacement is executed by the method, the original layer of the large convolution kernel does not exist, a plurality of layers of small convolution kernel layers which are executed in series are newly added, the network needs to be retrained, and the weight of the plurality of layers of small convolution layers after the replacement is obtained. Namely, the method needs to redesign the network structure and retrain. The training process is often data intensive and takes a long time. The method cannot achieve the effect of rapid deployment.

Therefore, the method faces the problem of large inference calculation amount or the problem of needing retraining and being incapable of being deployed quickly.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a model conversion method, a system, a storage medium and electronic equipment for deep learning model inference hardware acceleration, and solves the technical problems that the data calculation amount is large, or the rapid deployment cannot be realized due to retraining.

(II) technical scheme

In order to realize the purpose, the invention is realized by the following technical scheme:

a model transformation method for deep learning model inference hardware acceleration, comprising:

s1, replacing an original large convolution kernel which cannot be directly deployed in hardware in a deep learning model with a plurality of layers of small convolution kernels;

s2, determining the weight of each layer of small convolution kernels respectively by adopting a genetic algorithm according to the weight difference of the large convolution kernels and each layer of small convolution kernels;

and S3, deploying the converted deep learning model on hardware, and obtaining a convolution result of the input feature map.

Preferably, when the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the S2 specifically includes:

s21 the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel is set as (0,0), the coordinates of the position of the upper left corner of the receptive field

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

The coordinates of other positions are analogized in turn; wherein, r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents the row index number, and the right half part represents the column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;

before replacing with a plurality of small convolution kernels, the original convolution calculation results are as follows:

wherein x is _j，k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of _j，k Represents the weight at coordinate (j, k) in the large convolution kernel;

s22, setting the original large convolution kernel to be replaced by n layers of small convolution kernels, wherein the coordinate of the center position of the ith layer of small convolution kernel is (0,0), and the coordinate of the position of the upper left corner

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

The coordinates of other positions are analogized in turn; wherein r is _i 、c _i The number of rows and the number of columns of the i-th layer of small convolution kernels are respectively expressed, the numbers are odd numbers, and the conditions are satisfied as follows: />

And &>

Substituting the above calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original receptive field is replaced, and merging the same kind of items of the items participating in calculation of the feature maps in all the original receptive fields, respectively extracting the values of the feature maps, and obtaining the final calculation result in the following expression mode:

wherein u is _i,s,t Representing the weight at the coordinates (s, t) inside the i-th layer of small convolution kernels; for x before merging _j,k Any term, s, involved in the calculation _i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t _i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma _i s _i (= j) and ∑ _i t _i K, coefficient of

After merging the same kind of terms, x _j,k Has a coefficient of/>

S23, based on the two formulas in steps S21 and S22, with the goal of minimizing the error between the replaced final convolution result and the original convolution result, and taking the input feature map xj, k as a variable, the difference between the coefficients of the two formulas is expressed as:

s24, setting u in the formula of S23 _j,k Is constant in all u _i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel _i,s,t 。

Preferably, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the S2 specifically includes:

(1) If the number of rows r is even and the number of columns c is odd

Firstly using a convolution kernel with the height of 2 and the width of 1 in a plurality of replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the original large convolution kernel _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;

(2) If the number of columns c is an even number, and the number of rows r is odd number

After replacementIn the small convolution kernels, firstly, a convolution kernel with the width of 2 and the height of 1 is used, the column number of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used for replacing an original large convolution kernel, and r is _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed and are both odd numbers, and the conditions are satisfied as follows:

and &>

(3) If the number of rows r is even, the number of columns c is also even

Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the row number of the characteristic diagram after the convolution operation is performed is r-1 and the column number is c-1 and becomes odd numbers, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel, and the original large convolution kernel is replaced by the original large convolution kernel _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed and are both odd numbers, and the conditions are satisfied as follows:

and &>

And then, acquiring the weight of each layer of small convolution kernel through a genetic algorithm according to the weight difference between each layer of small convolution kernel and the original large convolution kernel.

A model transformation system for deep learning model inference hardware acceleration, comprising:

the replacing module is used for replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model by adopting multiple layers of small convolution kernels;

the determining module is used for determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel;

and the convolution module is used for deploying the converted deep learning model on hardware and acquiring a convolution result of the input feature map.

Preferably, when the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the determining module specifically includes:

s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate of the position of the upper left corner of the receptive field

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

wherein x is _j，k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of _j，k Representing the weight at coordinate (j, k) in the large convolution kernel;

s22, setting the i-th layer of small volume by replacing the original large convolution kernel with the n layers of small convolution kernelsThe coordinate of the central position of the kernel is (0,0), and the coordinate of the position of the upper left corner

The coordinate of the position of the upper right corner is->

Lower left corner position of has the coordinate of->

The coordinate of the lower right corner position is->

And

substituting the above calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original receptive field is replaced, and merging the same-class items for the items participating in calculation of the feature maps in the original receptive field, respectively extracting the values of the feature maps, and obtaining the final calculation result expression mode as follows:

wherein u is _i，s，t Indicating the coordinates inside the small convolution kernel of the ith layer (s, at t) of a weight; for x before merging _j，k Any term, s, involved in the calculation _i The row index number of the ith layer of small convolution kernel weight participating in the calculation; t is t _i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma _i _s i = j is a function of and sigma _i t _i K, the coefficient of the termIs composed of

After merging the same kind of terms, x _j，k Is->

S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x _j，k As the variable, the number of times, the difference between the coefficients of the two equations is expressed as:

s24, setting u in the formula S23 _j，k Is constant, in all u _i，s，t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel _i，s，t 。

Preferably, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the determining module is specifically configured to:

(1) If the number of rows r is even and the number of columns c is odd

Firstly using a convolution kernel with the height of 2 and the width of 1 in a plurality of replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, and then using n layers of small convolution kernels to replace the original large convolution kernel, wherein n +1 layers of small convolution kernels are used for replacing the original large convolution kernel, and r is _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

Then, according to each layer of small convolution kernel and originalThe weight difference of the large convolution kernels is obtained, and the weight of each layer of small convolution kernels is obtained through a genetic algorithm;

(2) If the number of rows c is even and the number of rows r is odd

Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel _i 、c _i Respectively representing the row number and the column number of the ith layer of small convolution kernel of the next n layers of small convolution kernels, are all odd numbers and should satisfy:

and &>

(3) If the number of rows r is an even number and is provided with a plurality of groups, the number of columns c is also even

Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the number of lines of the characteristic diagram after the convolution operation is performed is r-1 and the number of columns is c-1 and is all odd, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

A storage medium storing a computer program for deep learning model inference hardware accelerated model conversion, wherein, the computer program causes a computer to execute the model conversion method for deep learning model inference hardware acceleration as described above.

An electronic device, comprising:

one or more a processor;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the model transformation method for deep learning model inference hardware acceleration as described above

(III) advantageous effects

The invention provides a model conversion method, a model conversion system, a storage medium and electronic equipment for deep learning model inference hardware acceleration. Compared with the prior art, the method has the following beneficial effects:

the invention the method comprises the following steps: replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model with a plurality of layers of small convolution kernels; determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel; and deploying the converted deep learning model on hardware for obtaining a convolution result of the input feature map so as to obtain a final reasoning result of the deep learning model. The large convolution kernel is replaced by the small convolution kernels which are executed in series, the weights of the small convolution kernels are obtained through direct calculation of the weights of the large convolution kernel, the influence of the replacement process on output is reduced as far as possible, a network model does not need to be retrained, the calculation amount is small, and the network model is directly deployed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating a prior art manner of breaking a 5x5 large convolution kernel into a hardware supported 3x3 small convolution kernel;

FIG. 2 is a schematic diagram of a prior art method of replacing a 5x5 large convolution kernel with 4 3x3 small convolution kernels as shown in FIG. 1;

FIG. 3 is a schematic diagram of a prior art approach for replacing a 5x5 large convolution kernel with a 2-level 3x3 small convolution kernel;

FIG. 4 is a flowchart illustrating a model transformation method for deep learning model inference hardware acceleration according to an embodiment of the present invention;

fig. 5 is a weight diagram of a convolution kernel with a size of 5 × 5 according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an input feature map representation of a 5 × 5 convolution kernel in a perceptual domain according to an embodiment of the present invention;

fig. 7 is a schematic diagram illustrating a 2-layer 3 × 3 convolutional layer according to an embodiment of the present invention.

Detailed Description

To make the objects, aspects and advantages of the embodiments of the present invention clearer, the technical scheme of the embodiment of the invention is clearly and completely described, it is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

By providing the model conversion method, the system, the storage medium and the electronic device for deep learning model inference hardware acceleration, the embodiment of the application solves the problem that inference calculation quantity is large or technical problems that retraining is needed and rapid deployment cannot be achieved are solved.

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model with a plurality of layers of small convolution kernels; determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel; and deploying the converted deep learning model on hardware for obtaining a convolution result of the input feature map so as to obtain a final reasoning result of the deep learning model. By replacing the large convolution kernel with a smaller convolution kernel of smaller size suitable for hardware implementation, and keeping the magnitude of the receptive field of each output constant, meanwhile, for any input, the output after replacement is close to the original output as much as possible.

It should be added that the Convolution calculation according to the method provided in the embodiment of the present invention includes two-dimensional Convolution calculation with an arbitrary step size, also includes hole Convolution (distorted Convolution) calculation, and also includes Convolution calculation such as Deformable Convolution calculation (Deformable Convolution). Although the shapes of the input feature map receptive fields for these convolution calculations differ in the original input feature map, in the present invention, the input feature maps within the receptive fields participating in the convolution calculations are treated as a two-dimensional matrix in a unified manner, and after arranging the coordinates in the form of a two-dimensional matrix, the processing is performed in a unified manner. The method provided by the invention can be applied to the deployment of a deep learning model in the field of computer vision on an artificial intelligence chip, wherein one application field is the field of industrial color sorters. For example, in the field of color sorters, equipment is required to identify the type of grain, such as whether the grain is normal or defective, in an image captured by a color sorter within a defined number of milliseconds. Obviously, such a device imposes strict requirements on the inference time of the deep learning model. On a conventional CPU-based device, for a deep learning model that is computationally very large, the inference of the deep learning model cannot be completed within a defined time. On the artificial intelligence chip specially designed for deep learning model reasoning, a large number of hardware units can be used for simultaneously executing the calculation in the deep learning model reasoning, so that the reasoning of the deep model can be efficiently completed. However, once the hardware architecture of the artificial intelligence chip based on the hardware acceleration kernel is fixed, the artificial intelligence chip cannot directly support the calculation of convolution kernels with different sizes.

In summary, the method provided by the embodiment of the invention can be applied to the field of deep learning model inference including color sorter image processing.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example (b):

in a first aspect, as shown in fig. 4, an embodiment of the present invention provides a model transformation method for deep learning model inference hardware acceleration, including:

s2, respectively determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel;

According to the embodiment of the invention, the large convolution kernel is replaced by the plurality of small convolution kernels which are executed in series, the weights of the plurality of small convolution kernels are obtained by directly calculating the weight of the large convolution kernel, the influence of the replacement process on the output is reduced as much as possible, the network model does not need to be retrained, the calculated amount is small, and the network model is directly deployed.

The following will describe the steps of the above technical solution in detail:

in step S1, for an original large convolution kernel that cannot be directly deployed in hardware in the deep learning model, a multilayer small convolution kernel is used to replace the large convolution kernel.

In step S2, the weights of the small convolution kernels of each layer are determined by a genetic algorithm according to the weight difference between the large convolution kernel and the small convolution kernels of each layer.

When the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the S2 specifically includes:

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

The coordinates of other positions are analogized in turn; wherein r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents a row index number, and the right half part represents a column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;

Wherein x is _j，k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; zxfoom u _j，k Represents the weight at coordinate (j, k) in the large convolution kernel;

The coordinate of the upper right corner position is->

The coordinate of the lower right corner position is->

The coordinates of other positions are analogized in turn; wherein r is _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernel are respectively expressed, are odd numbers, and the conditions are as follows: />

And

substituting the calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original experience field is replaced, executing combination of similar items for all items participating in calculation of the characteristic diagram in the original experience field, respectively extracting the values of the characteristic diagrams, and obtaining the final calculation result expression mode as follows:

wherein u is _i，s，t Represents the coordinates inside the i-th layer small convolution kernel (s, weight at t); for x before merging _j，k Any term, s, involved in the calculation _i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t _i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma _i s _i (= j) and ∑ _i t _i K, coefficient of

After merging the same kind of terms, x _j，k Has a coefficient of->

S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x _j，k As a variable, between the coefficients of the two formulaeThe difference of (a) is expressed as:

s24 setting u in S23 formula _j，k Is constant in all u _i，s，t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel _i，st 。

When either the number of rows or the number of columns of the original large convolution kernel is even, i.e. when

(1) If the number of rows r is an even number, and the number of rows c is odd

Firstly using a convolution kernel with the height of 2 and the width of 1 in the plurality of replaced small convolution kernels, leading the row number of the characteristic diagram after the convolution operation to be r-1, becomes odd number, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used to replace the original large convolution kernel, r _i 、c _i The number of rows and the number of columns of the f-th layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

(2) If the number of rows c is even and the number of rows r is odd

<xnotran> 2, 1 , c-1, , n , n +1 , r </xnotran> _i 、c _i The number of rows and the number of columns of the f-th layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, and acquiring the weight of each layer of small convolution kernel through a genetic algorithm.

(3) If the number of rows r is even and the number of columns c is also even

and device for selecting or keeping>

The model conversion (convolution kernel replacement) process introduced in steps S1 and S2 is performed only once before deployment, and the weights of the convolution layers after replacement are obtained. After deployment on hardware, there is no need to compute the convolution kernel weights again at each inference.

The transformation method provided by the embodiment of the invention comprises the following steps:

on one hand, the problem that when a deep learning model is deployed to a specific hardware architecture, a large convolution kernel is split into multiple layers of small convolution kernels, and retraining is needed can be solved;

on the other hand, the problem of large calculation amount when a large convolution kernel is split into a plurality of parallel small convolution kernels when the deep learning model is deployed to a specific hardware architecture can be solved. By using the method, the method is only needed to be used for acquiring the convolution kernel coefficients of each layer once before the neural network model is deployed to hardware, and then the convolution kernel coefficients are not needed to be acquired again every time of reasoning on the hardware, so that the calculation amount is reduced compared with the traditional method every time of reasoning.

The detailed demonstration procedure is as follows:

note that the height of the output feature map is H, and the width is W, and the calculated amount of the layer is about H × W × r × c. After the method provided by the embodiment of the invention is used, the calculated amount is

Computing measurement of and greatly reduced.

In the embodiment of the invention, for a large convolution kernel which cannot be directly deployed in hardware, r and c should be at least greater than 5. When r =5, c =5, a 2-level 3 × 3 small convolution kernel may be substituted. The original calculated amount is H × W25, and the calculated amount after replacement is H × W (3 × 3+3 × 3), and obviously the calculated amount after replacement is smaller. For larger convolution kernels, however, the amount of computation after replacement is always smaller than that of the original convolution layer before replacement, as can be demonstrated by mathematical induction.

Still taking the convolution kernel size of 5 × 5 as an example, the expression manner of the weights is shown in fig. 5, and the expression manner of the input feature map corresponding to one output thereof is shown in fig. 6. For ease of understanding, the row index number and the column index of the weight and input feature map both start at 1.

When the original convolution kernel is used, the output is calculated as follows:

result＝x _1，1 *i _1，1 +x _1，2 *u _1，2 +x _1，3 *u _1，3 +x _1，4 *u _1，4 +x _1，5 *u _1，5 +x _2，1 *u _2，1 +x _2，2 *u _2，2 +x _2，3 *u _2，3 +x _2，4 *u _2，4 +x _2，5 *u _2，5 +x _3，1 *u _3，1 +x _3，2 *u _3，2 +x _3，3 *u _3，3 +x _3，4 *u _3，4 +x _3，5 *u _3，5 +x _4，1 *u _4，1 +x _4，2 *u _4，2 +x _4，3 *u _4，3 +x _4，4 *u _4，4 +x _4，5 *u _4，5 +x _5，1 *u _5，1 +x _5，2 *u _5，2 +x _5，3 *u _5，3 +x _5，4 *u _5，4 +x _5，5 *u _5，5 (1)

if 2 layers of 3x3 convolutional layers are replaced, the two layers of convolutional layer weights are expressed as shown in fig. 7, where W represents the first layer of convolutional kernel weights and V represents the second layer of convolutional kernel weights. For ease of understanding, the row index number and column index of the small convolution kernel start at 1 after the replacement.

After replacing with 2 layers of 3 × 3 convolutional layers, the output of the same position is calculated as follows:

result＝v _1，1 *(x _1，1 *w _1，1 +x _1，2 *w _1，2 +x _1，3 *w _1，3 +x _2，1 *w _2，1 +x _2，2 *w _2，2 +x _2，3 *w _2，3 +x _3，1 *w _3，1 +x _3，2 *w _3，2 +x _3，3 *w _3，3 )+v _1，2 *(x _1，2 *w _1，1 +x _1，3 *w _1，2 +x _1，4 *w _1，3 +x _2，2 *w _2，1 +x _2，3 *w _2，2 +x _2，4 *w _2，3 +x _3，2 *w _3，1 +x _3，3 *w _3，2 +x _3，4 *w _3，3 )+v _1，3 *(x _1，3 *w _1，1 +x _1，4 *w _1，2 +x _1，5 *w _1，3 +x _2，3 *w _2，1 +x _2，4 *w _2，2 +x _2，5 *w _2，3 +x _3，3 *w _3，1 +x _3，4 *w _3，2 +x _3，5 *w _3，3 )+v _2，1 *(x _2，1 *w _1，1 +x _2，2 *w _1，2 +x _2，3 *w _1，3 +x _3，1 *w _2，1 +x _3，2 *w _2，2 +x _3，3 *w _2，3 +x _4，1 *w _3，1 +x _4，2 *w _3，2 +x _4，3 *w _3，3 )+v _2，2 *(x _2，2 *w _1，1 +x _2，3 *w _1，2 +x _2，4 *w _1，3 +x _3，2 *w _2，1 +x _3，3 *w _2，2 +x _3，4 *w _2，3 +x _4，2 *w _3，1 +x _4，3 *w _3，2 +x _4，4 *w _3，3 )+v _2，3 *(x _2，3 *w _1，1 +x _2，4 *w _1，2 +x _2，5 *w _1，3 +x _3，3 *w _2，1 +x _3，4 *w _2，2 +x _3，5 *w _2，3 +x _4，3 *w _3，1 +x _4，4 *w _3，2 +x _4，5 *w _3，3 )+v _3，1 *(x _3，1 *w _1，1 +x _3，2 *w _1，2 +x _3，3 *w _1，3 +x _4，1 *w _2，1 +x _4，2 *w _2，2 +x _4，3 *w _2，3 +x _5，1 *w _3，1 +x _5，2 *w _3，2 +x _5，3 *w _3，3 )+v _3，2 *(x _3，2 *w _1，1 +x _3，3 *w _1，2 +x _3，4 *w _1，3 +x _4，2 *w _2，1 +x _4，3 *w _2，2 +x _4，4 *w _2，3 +x _5，2 *w _3，1 +x _5，3 *w _3，2 +x _5，4 *w _3，3 )+v _3，3 *(x _3，3 *w _1，1 +x _3，4 *w _1，2 +x _3，5 *w _1，3 +x _4，3 *w _2，1 +x _4，4 *w _2，2 +x _4，5 *w _2，3 +x _5，3 *w _3，1 +x _5，4 *w _3，2 +x _5，5 *w _3，3 ) (2)

combining the same kind of terms for the input feature map elements is performed on the terms containing the same input feature map elements in the formula (2), which can be rewritten as follows:

result＝x _1，1 *v _1，1 *w _1，1 +x _1，2 *(v _1，1 *w _1，2 +v _1，2 *w _1，1 )+x _1，3 *(v _1，1 *w _1，3 +v _1，2 *w _1，2 +v _1，3 *w _1，1 )+x _1，4 *(v _1，2 *w _1，3 +v _1，3 *w _1，2 )+x _1，5 *(v _1，3 *w _1，3 )+x _2，1 *(v _1，1 *w _2，1 +v _2，1 *w _1，1 )+x _2，2 *(v _1，1 *w _2，2 +v _1，2 *w _2，1 +v _2，1 *w _1，2 +v _2，2 *w _1，1 )+x _2，3 *(v _1，1 *w _2，3 +v _1，2 *w _2，2 +v _1，3 *w _2，1 +v _2，1 *w _1，3 +v _2，2 *w _1，2 +v _2，3 *w _1，1 )+x _2， 4*(v _1，2 *w _2，3 +v _1，3 *w _2，2 +v _2，2 *w _1，3 +v _2，3 *w _1，2 )+x _2，5 *(v _1，3 *w _2，3 +v _2，3 *w _1，3 )+x _3，1 *(v _1，1 *w _3，1 +v _2，1 *w _2，1 +v _3，1 *w _1，1 )+x _3，2 *(v _1，1 *w _3，2 +v _1，2 *w _3，1 +v _2，1 *w _2，2 +v _2，2 *w _2，1 +v _3，1 *w _1，2 +v _3，2 *w _1，1 )+x _3，3 *(v _1，1 *w _3，3 +v _1，2 *w _3，2 +v _1，3 *w _3，1 +v _2，1 *w _2，3 +v _2，2 *w _2，2 +v _2，3 *w _3，1 +v _3，1 *w _1，3 +v _3，2 *w _1，2 +v _3，3 *w _1，1 )+x _3，4 *(v _1，2 *w _3，3 +v _1，3 *w _3，2 +v _2，2 *w _2，3 +v _2，3 *w _2，2 +v _3，2 *w _1，3 +v _3，3 *w _1，2 )+x _3，5 *(v _1，3 *w _3，3 +v _2，3 *w _2，3 +v _3，3 *w _1，3 )+x _4，1 *(v _2，1 *w _3，1 +v _3，1 *w _2，1 )+x _4，2 *(v _2，1 *w _3，2 +v _2，2 *w _3，1 +v _3，1 *w _2，2 +v _3，2 *w _2，1 )+x _4，3 *(v _2，1 *w _3，3 +v _2，2 *w _3，2 +v _2，3 *w _3，1 +v _3，1 *w _2，3 +v _3，2 *w _2，2 +v _3，3 *w _2，1 )+x _4，4 *(v _2，2 *w _3，3 +v _2，3 *w _3，2 +v _3，2 *w _2，3 +v _3，3 *w _2，2 )+x _4，5 *(v _2，3 *w _3，3 +v _3，3 *w _2，3 )+x _5，1 *v _3，1 *w _3，1 +x _5，2 *(v _3，1 *w _3，2 +v _3，2 *w _3，1 )+x _5，3 *(v _3，1 *w _3，3 +v _3，2 *w _3，2 +v _3，3 *w _3，1 )+x _5，4 *(v _3，2 *w _3，3 +v _3，3 *w _3，2 )+x _5，5 *v _3，3 *w _3，3 (3)

the purpose of replacing 2-layer 3x3 convolutional layers is to make the same output positions after replacement as equal as possible to the original 5x5 convolutional kernels, to reduce the impact on accuracy.

The weights are fixed after the 5x5 sized convolution kernel replacement is completed with 2 layers of 3x3 sized convolution kernels. When deploying the network model, one is faced with 25 inputs in the field that may take arbitrary values. I.e. the variables in equation (3) are 25 inputs, while the weights may be considered as invariant coefficients.

In order to keep the output after completion of the convolution kernel replacement as constant as possible as the output before replacement, the coefficients of equation (3) and equation (1) are required to be as equal as possible.

The coefficients in equation (3) are:

u′ _1，1 ＝v _1，1 *w _1，1

u′ _1，2 ＝v _1，1 *w _1，2 +v _1，2 *w _1，1

u′ _1，3 ＝v _1，1 *w _1，3 +v _1，2 *w _1，2 +v _1，3 *w _1，1

u′ _1，4 ＝v _1，2 *w _1，3 +v _1，3 *w _1，2

u′ _1，5 ＝v _1，3 *w _1，3

u′ _2，1 ＝v _1，1 *w _2，1 +v _2，1 *w _1，1

u′ _2，2 ＝v _1，1 *w _2，2 +v _1，2 *w _2，1 +v _2，1 *w _1，2 +v _2，2 *w _1，1

u′ _2，3 ＝v _1，1 *w _2，3 +v _1，2 *w _2，2 +v _1，3 *w _2，1 +v _2，1 *w _1，3 +v _2，2 *w _1，2 +v _2，3 *w _1，1

u′ _2，4 ＝v _1，2 *w _2，3 +v _1，3 *w _2，2 +v _2，2 *w _1，3 +v _2，3 *w _1，2

u′ _2，5 ＝v _1，3 *w _2，3 +v _2，3 *w _1，3

u′ _3，1 ＝v _1，1 *w _3，1 +v _2，1 *w _2，1 +v _3，1 *w _1，1

u′ _3，2 ＝v _1，1 *w _3，2 +v _1，2 *w _3，1 +v _2，1 *w _2，2 +v _2，2 *w _2，1 +v _3，1 *w _1，2 +v _3，2 *w _1，1

u′ _3，3 ＝v _1，1 *w _3，3 +v _1，2 *w _3，2 +v _1，3 *w _3，1 +v _2，1 *w _2，3 +v _2，2 *w _2，2 +v _2，3 *w _3，1 +v _3，1 *w _1，3 +v _3，2 *w _1，2 +v _3，3 *w _1，1

u′ _3，4 ＝v _1，2 *w _3，3 +v _1，3 *w _3，2 +v _2，2 *w _2，3 +v _2，3 *w _2，2 +v _3，2 *w _1，3 +v _3，3 *w _1，2

u′ _3，5 ＝v _1，3 *w _3，3 +v _2，3 *w _2，3 +v _3，3 *w _1，3

u′ _4，1 ＝v _2，1 *w _3，1 +v _3，1 *w _2，1

u′ _4，2 ＝v _2，1 *w _3，2 +v _2，2 *w _3，1 +v _3，1 *w _2，2 +v _3，2 *w _2，1

u′ _4，3 ＝v _2，1 *w _3，3 +v _2，2 *w _3，2 +v _2，3 *w _3，1 +v _3，1 *w _2，3 +v _3，2 *w _2，2 +v _3，3 *w _2，1

u′ _4，4 ＝v _2，2 *w _3，3 +v _2，3 *w _3，2 +v _3，2 *w _2，3 +v _3，3 *w _2，2

u′ _4，5 ＝v _2，3 *w _3，3 +v _3，3 *w _2，3

u′ _5，1 ＝v _3，1 *w _3，1

u′ _5，2 ＝v _3，1 *w _3，2 +v _3，2 *w _3，1

u′ _5，3 ＝v _3，1 *w _3，3 +v _3，2 *w _3，2 +v _3，3 *w _3，1

u′ _5，4 ＝v _3，2 *w _3，3 +v _3，3 *w _3，2

u′ _5，5 ＝v _3，3 *w _3，3

after replacing with 2 layers of convolution kernels with the size of 3 × 3, the output calculation formula can be rewritten as:

result＝x _1，1 *u′ _1，1 +x _1，2 *u′ _1，2 +x _1，3 *u′ _1，3 +x _1，4 *u′ _1，4 +x _1，5 *u′ _1，5 +x _2，1 *u′ _2，1 +x _2，2 *u′ _2，2 +x _2，3 *u′ _2，3 +x _2，4 *u′ _2，4 +x _2，5 *u′ _2，5 +x _3，1 *u′ _3，1 +x _3，2 *u′ _3，2 +x _3，3 *u′ _3，3 +x _3，4 *u′ _3，4 +x _3，5 *u′ _3，5 x _4，1 *u′ _4，1 +x _4，2 *u′ _4，2 +x _4，3 *u′ _4，3 +x _4，4 *u′ _4，4 +x _4，5 *u′ _4，5 +x _5，1 *u′ _5，1 +x _5，2 *u′ _5，2 +x _5，3 *u′ _5，3 +x _5，4 *u′ _5，4 +x _5，5 *u′ _5，5

if the output at this time is to be as close as possible to the output before replacement, that is, the calculation results of the expressions (1) and (3), the coefficients of the expressions should be as close as possible. The difference between the two sets of coefficients is described using the L2 norm, the target is then converted to make the following function as small as possible:

diff＝(u _1，1 -u′ _1，1 ) ² +(u _1，2 -u′ _1，2 ) ² +(u _1，3 -u′ _1，3 ) ² +(u _1，4 -u′ _1，4 ) ² +(u _1，5 -u′ _1，5 ) ² +(u _2，1 -u′ _2，1 ) ² +(u _2，2 -u′ _2，2 ) ² +(u _2，3 -u′ _2，3 ) ² +(u _2，4 -u′ _2，4 ) ² +(u _2，5 -u′ _2，5 ) ² +(u _3，1 -u′ _3，1 ) ² +(u _3，2 -u′ _3，2 ) ² +(u _3，3 -u′ _3，3 ) ² +(u _3，4 -u′ _3，4 ) ² +(u _3，5 -u′ _3，5 ) ² +(u _4，1 -u′ _4，1 ) ² +(u _4，2 -u′ _4，2 ) ² +(u _4，3 -u′ _4，3 ) ² +(u _4，4 -u′ _4，4 ) ² +(u _4，5 -u′ _4，5 ) ² +(u _5，1 -u′ _5，1 ) ² +(u _5，2 -u′ _5，2 ) ² +(u _5，3 -u′ _5，3 ) ² +(u _5，4 -u′ _5，4 ) ² +(u _5，5 -u′ _5，5 ) ² (4)

in the above formula, the symbol without being added with the "'" indicates the weight of the 5x5 convolution kernel before the replacement, and the symbol with the "'" indicates the coefficient corresponding to the same input after the replacement of the 2 layers of 3x3 convolution kernels. In this formula, the symbols not labeled with "'" are fixed weights, and the symbols labeled with "'" are coefficients combined by the weights of two layers of 3x3 convolution kernels that need to be solved.

Thus, in equation (4), the sign of the plus "'" superscript is variable, but the change is governed by the weights of the two 3x3 convolution kernels. The problem is to calculate weights of two 3 × 3 convolution kernels that minimize expression (4), where 2 × 3=18 variables, that is, 18 values that minimize expression (4) expressed by 18 variables.

After solving 18 weights for minimizing equation (4) by a genetic algorithm, the weights of the 2-level 3x3 convolution kernel that need to perform the replacement are obtained. These 18 weights can minimize equation (4), i.e., can make the values of equation (3) and equation (1) as close as possible to any input feature map. After that, the replacement may be directly performed. The purpose of the replacement is to replace the original one-layer convolution kernel of 5x5 size with two layers of convolution kernels of 3x3 size, so that the change of the replaced output for any input is as small as possible. The first layer of the convolution layer after replacement is not needed an offset and activation function is added.

Similarly, the convolution kernel of 7 × 7 size may be replaced by a convolution layer of 3 layers of convolution kernels of 3 × 3 size, in a similar manner as described above.

Similarly, if the maximum convolution kernel supported by the hardware is not 3x3, but 5x5, the method proposed in the embodiment of the present invention may be used to replace the 1-layer convolution layer with the size of 9x9 with the 2-layer convolution layer with the size of 5x5 convolution kernel. <xnotran> , 1 7x7 1 5x5 1 3x3 . </xnotran>

Compared with the large convolution kernel splitting mode shown in fig. 2, the embodiment of the invention has the advantage of small calculation amount. For example, for a large convolution kernel of 5 × 5, in order to split the convolution operation into convolution operations of 3 × 3 size, the method of fig. 2 needs to perform 3 × 3 convolution operations 4 times, and calculate the calculation amount per output to be 4 × 3=36. With the method of the present invention, only 2 times of convolution operations of 3 × 3 are required, and the calculation amount of each output is calculated on average as 2 × 3=18, which is half of the calculation amount in the manner shown in fig. 2. For another example, for a large convolution kernel of 7 × 7, in order to split the convolution kernel into convolution kernels of 3 × 3 size, the method shown in fig. 2 needs to split the convolution kernel into 9 convolution kernels of 3 × 3 size, needs to perform 3 × 3 convolution operations 9 times, and needs to perform multiply-accumulate computation 9 times for each output. By using the method provided by the invention, only 3 continuous convolution layers with the size of 3x3 are required to be replaced, a convolution operation of 3 times 3x3 is performed, averaging requires 3 × 3=27 multiply-accumulate calculations per output, which amount to only 1/3 of the way shown in fig. 2.

Compared to the large convolution kernel alternative shown in figure 3, the embodiment of the invention has the advantages that the network model does not need to be retrained, the calculation amount is small, and the aim of rapid deployment can be achieved.

In a second aspect, an embodiment of the present invention provides a model transformation system for deep learning model inference hardware acceleration, including:

In an embodiment, when the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the determining module specifically includes:

Upper right corner position of has the coordinate of->

Lower left corner position coordinates of (2) is->

The coordinate of the lower right corner position is->

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

And

wherein u is _i，s，t Indicating the coordinates inside the small convolution kernel of the ith layer (s, weight at t); for x before merging _j，k Any term, s, involved in the calculation _i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; zxfoom t _i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma _i s _i (= j) and ∑ _i t _i K, coefficient of

After merging the same kind of terms, x _j，k Coefficient of (2) is composed of device for selecting or keeping>

s24 setting u in S23 formula _j，k Is constant in all u _i，s，t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel _i，s，t 。

In an embodiment, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the determining module is specifically configured to:

(1) If the number of rows r is an even number and is provided with a plurality of groups, and the number of rows c is odd

Firstly using a convolution kernel with the height of 2 and the width of 1 in a plurality of replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, and then ensuring that the line number of the characteristic diagram after the convolution operation is executed is r-1<xnotran> n , n +1 , r </xnotran> _i 、c _i Respectively representing the row number and the column number of the ith layer of small convolution kernel of the next n layers of small convolution kernels, are all odd numbers, and should satisfy:

and &>

(2) If the number of rows c is even and the number of rows r is odd

Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel _i 、c _i The number of rows and the number of columns of the f-th layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and device for selecting or keeping>

(3) If the number of rows r is even, the number of columns c is also even

Firstly using a convolution kernel with the width of 2 and the height of 2 in the replaced small convolution kernels, leading the line number of the characteristic diagram after the convolution operation to be r-1, the columns are c-1 and are all odd numbers, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used for replacing the original large convolution kernel, r _i 、c _i Respectively representing the number of rows of the ith layer of small convolution kernels of the next n layers of small convolution kernels the number of columns, are all odd numbers and should satisfy:

and &>

In a third aspect, the present invention provides a storage medium storing a computer program for deep learning model inference hardware accelerated model conversion, wherein the computer program makes a computer execute the model conversion method for deep learning model inference hardware accelerated as described above.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the model transformation method for deep learning model inference hardware acceleration as described above.

It can be understood that the model conversion system, the storage medium and the electronic device for deep learning model inference hardware acceleration provided by the embodiments of the present invention correspond to the model conversion method for deep learning model inference hardware acceleration provided by the embodiments of the present invention, for the explanation, example, beneficial effects and the like of the related contents, reference may be made to the corresponding parts in the model conversion method for deep learning model inference hardware acceleration, and details are not described herein again.

In summary, compared with the prior art, the method has the following beneficial effects:

1. on one hand, the problem that when the deep learning model is deployed to a specific hardware architecture, when splitting a large convolution kernel into multiple layers of small convolution kernels, the problem of requiring retraining.

2. On the other hand, the problem of large calculation amount when a large convolution kernel is split into a plurality of parallel small convolution kernels when the deep learning model is deployed to a specific hardware architecture can be solved. By using the method, the method is only needed to be used for acquiring the convolution kernel coefficients of each layer once before the neural network model is deployed to hardware, and then the convolution kernel coefficients are not needed to be acquired again every time of reasoning on the hardware, so that the calculation amount is reduced compared with the traditional method every time of reasoning.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements, but also includes other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: <xnotran> , ; </xnotran> And such modifications or alterations may be made to the present invention, the essence of the corresponding technical solutions does not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A model transformation method for deep learning model inference hardware acceleration, comprising:

2. The model conversion method for deep learning model inference hardware acceleration as claimed in claim 1, wherein when the number of rows and columns of the original large convolution kernel are both odd, said S2 specifically comprises:

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

Lower right corner position has the coordinate of->

wherein x is _j,k Indication inputIn the original large convolution of the characteristic diagram the coordinates in the receptive field of the nucleus (j, k) value; u. of _j,k Represents the weight at coordinate (j, k) in the large convolution kernel;

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

And

wherein u is _i,s,t Representing weights at coordinates (s, t) inside the i-th layer of small convolution kernelsWeighing; for x before merging _j,k Any term, s, involved in the calculation _i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t _i Column index numbers of the u-th layer small convolution kernel weight participating in the calculation should satisfy sigma _i s _i (= j) and ∑ _i t _i In the range of = k, the item coefficient of

After merging the same kind of terms, x _j,k Has a coefficient of->

S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x _j,k As a variable, the difference between the coefficients of the two equations is expressed as:

s24, setting u in the formula S23 _j,k Is constant, in all u _i,s,t As variables to be solved, aiming at minimizing the diff value in the S23 formula, solving the formula in S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel _i,s,t 。

3. The model transformation method for deep learning model inference hardware acceleration as claimed in claim 1, wherein when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, said S2 specifically includes:

(1) If the number of rows r is even and the number of columns c is odd

Firstly using a convolution kernel with the height of 2 and the width of 1 in the replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, and then using n layers of small convolution kernels to totally n +1 layers of small convolution kernelsKernel replacement of the original large convolution kernel, r _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed and are both odd numbers, and the conditions are satisfied as follows:

and &>

(2) If the number of rows c is even and the number of rows r is odd

and &>

(3) If the number of rows r is even and the number of columns c is also even

Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the row number of the characteristic diagram after the convolution operation is performed is r-1 and the column number is c-1 and becomes odd numbers, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel, and the original large convolution kernel is replaced by the original large convolution kernel _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

4. A model transformation system for deep learning model inference hardware acceleration, comprising:

the determining module is used for determining the weight difference of the large convolution kernel and each layer of small convolution kernels, respectively determining the weight of each layer of small convolution kernel by adopting a genetic algorithm;

5. The model transformation system for deep learning model inference hardware acceleration of claim 4, wherein when the number of rows and the number of columns of the original large convolution kernel are both odd, the determining module specifically comprises:

s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate at the position of the upper left corner of the receptive field

Position of upper right corner has the coordinate of->

The coordinate of the lower left corner position is->

Lower right corner positionHas the coordinate of->

Of the other positions the coordinates are analogized in sequence; wherein r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents a row index number, and the right half part represents a column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;

wherein x is _j,k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of _j,k Represents the weight at coordinate (j, k) in the large convolution kernel;

The coordinate of the upper right corner position is->

The coordinate of the lower left corner position is->

The coordinate of the lower right corner position is->

And &>

wherein u is _i,s,t Representing the weight at the coordinates (s, t) inside the i-th layer of small convolution kernels; for x before merging _j,k Any term, s, involved in the calculation _i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t _i For the column index of the ith layer small convolution kernel weight participating in the computation, should satisfy ∑ _i s _i (= j) and ∑ _i t _i K, coefficient of

After merging the same kind of terms, x _j,k Has a coefficient of->

s24 setting u in S23 formula _j,k Is constant in all u _i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel _i,s,t 。

6. The model transformation system for deep learning model inference hardware acceleration of claim 1, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the replacement module specifically includes:

(1) If the number of rows r is even and the number of columns c is odd

Firstly using a convolution kernel with the height of 2 and the width of 1 in the plurality of replaced small convolution kernels, leading the row number of the characteristic diagram after the convolution operation to be r-1, becomes odd number, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used to replace the original large convolution kernel, r _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

(2) If the number of rows c is even and the number of rows r is odd

Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel _i 、c _i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:

and &>

(3) If the number of rows r is an even number, the number of columns c is also even

and &>

7. A storage medium storing a computer program for deep learning model inference hardware accelerated model conversion, wherein the computer program causes a computer to execute the model conversion method for deep learning model inference hardware accelerated according to any one of claims 1 to 3.

8. An electronic device is provided, which comprises a display panel, it is characterized by comprising:

one or more a processor;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the model transformation method for deep learning model inference hardware acceleration of any of claims 1-3.