CN115906963A - Model conversion method and system for deep learning model inference hardware acceleration - Google Patents

Model conversion method and system for deep learning model inference hardware acceleration Download PDF

Info

Publication number
CN115906963A
CN115906963A CN202211166984.1A CN202211166984A CN115906963A CN 115906963 A CN115906963 A CN 115906963A CN 202211166984 A CN202211166984 A CN 202211166984A CN 115906963 A CN115906963 A CN 115906963A
Authority
CN
China
Prior art keywords
convolution kernel
small
convolution
layer
kernels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211166984.1A
Other languages
Chinese (zh)
Inventor
林广栋
陆俊峰
洪一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN202211166984.1A priority Critical patent/CN115906963A/en
Publication of CN115906963A publication Critical patent/CN115906963A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention provides a model conversion method, a system, a storage medium and electronic equipment for deep learning model reasoning hardware acceleration, and relates to the technical field of deep learning. The invention comprises the following steps: replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model with a plurality of layers of small convolution kernels; determining the weight of each layer of small convolution kernels respectively by adopting a genetic algorithm according to the weight difference of the large convolution kernels and each layer of small convolution kernels; and deploying the converted deep learning model on hardware for obtaining the convolution result of the input feature map so as to obtain the final inference result of the deep learning model. The large convolution kernel is replaced by the small convolution kernels which are executed in series, the weights of the small convolution kernels are obtained through direct calculation of the weights of the large convolution kernel, the influence of the replacement process on output is reduced as much as possible, a network model does not need to be retrained, the calculated amount is small, and the network model is directly deployed.

Description

Model conversion method and system for deep learning model inference hardware acceleration
Technical Field
The invention relates to the technical field of deep learning, in particular to a model conversion method, a model conversion system, a storage medium and electronic equipment for deep learning model reasoning hardware acceleration.
Background
In order to complete the reasoning of the deep learning model more efficiently, artificial intelligence chips dedicated to deep learning computation are gradually emerging. Such artificial intelligence chips contain an acceleration core dedicated to deep learning inference calculations, for energy-efficient execution of deep learning model reasoning. These deep learning acceleration cores generally perform computations in parallel using many computing devices, and take advantage of the reusability of data to minimize the movement of data.
Convolution calculation is one of the most common calculations in a deep learning model, and according to statistics, the calculated amount of the convolution calculation in a convolution neural network is more than 90%. Limited by the hardware implementation, the number of parallel computations simultaneously supported by any deep learning hardware architecture is limited. For example, if the hardware is designed to efficiently support convolution operations with convolution kernels of size 3x3, the hardware architecture, once determined, cannot directly support the computation of larger (e.g., 5x5 or 7x 7) convolution kernels. If the neural network architecture is designed to contain a larger convolution kernel, such as a convolution kernel of 5 × 5 size, it cannot be directly deployed on such a hardware architecture.
<xnotran> , : </xnotran>
(1) One method is to split the 5x5 convolution kernel into 4 convolution kernels of 3x3 size, perform the convolution calculations separately, and then add up the convolution results to obtain the final convolution calculation result. The relationship between the weights of the new 4 convolution kernels with 3x3 and the weights of the convolution kernels with 5x5 size in this splitting mode is shown in fig. 1. In this split mode, the new weights of the small convolution kernels can be obtained directly from the weights of the original large convolution kernels, and the network does not need to be retrained. In this splitting manner, it can be considered that the original 5 × 5 convolution kernel is split into 4 convolution kernels that are executed in parallel by 3 × 3, and the relationship between the new network architecture and the original network architecture is shown in fig. 2. In this splitting mode, it should be ensured that all weights of the original convolution kernel are split into small hardware-size-adaptive convolution kernels. And filling 0 for the part which is not enough to be small in the size of the convolution kernel after the original convolution kernel is split.
(2) Another approach is to split the neural network layer of one large convolution kernel into multiple small convolution kernel layers that are executed serially. After splitting into serial convolutional layers, the reception field of each output of the next layer is kept unchanged, so that information loss caused by the replacement process can be avoided. For example, as shown in fig. 3, for a 5x5 convolution kernel whose output receptive field is a 5x5 window, two layers of 3x3 small convolution kernel layers are required for replacement. As another example, for a 7x7 convolution kernel, the field of each output is a 7x7 window, which needs to be replaced with 3 layers of 3x3 small convolution layers. After the convolution kernel replacement is executed by the method, the original layer of the large convolution kernel does not exist, a plurality of layers of small convolution kernel layers which are executed in series are newly added, the network needs to be retrained, and the weight of the plurality of layers of small convolution layers after the replacement is obtained. Namely, the method needs to redesign the network structure and retrain. The training process is often data intensive and takes a long time. The method cannot achieve the effect of rapid deployment.
Therefore, the method faces the problem of large inference calculation amount or the problem of needing retraining and being incapable of being deployed quickly.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a model conversion method, a system, a storage medium and electronic equipment for deep learning model inference hardware acceleration, and solves the technical problems that the data calculation amount is large, or the rapid deployment cannot be realized due to retraining.
(II) technical scheme
In order to realize the purpose, the invention is realized by the following technical scheme:
a model transformation method for deep learning model inference hardware acceleration, comprising:
s1, replacing an original large convolution kernel which cannot be directly deployed in hardware in a deep learning model with a plurality of layers of small convolution kernels;
s2, determining the weight of each layer of small convolution kernels respectively by adopting a genetic algorithm according to the weight difference of the large convolution kernels and each layer of small convolution kernels;
and S3, deploying the converted deep learning model on hardware, and obtaining a convolution result of the input feature map.
Preferably, when the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the S2 specifically includes:
s21 the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel is set as (0,0), the coordinates of the position of the upper left corner of the receptive field
Figure BDA0003862109420000031
The coordinate of the upper right corner position is->
Figure BDA0003862109420000032
The coordinate of the lower left corner position is->
Figure BDA0003862109420000033
The coordinate of the lower right corner position is->
Figure BDA0003862109420000034
The coordinates of other positions are analogized in turn; wherein, r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents the row index number, and the right half part represents the column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;
before replacing with a plurality of small convolution kernels, the original convolution calculation results are as follows:
Figure BDA0003862109420000035
wherein x is j,k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of j,k Represents the weight at coordinate (j, k) in the large convolution kernel;
s22, setting the original large convolution kernel to be replaced by n layers of small convolution kernels, wherein the coordinate of the center position of the ith layer of small convolution kernel is (0,0), and the coordinate of the position of the upper left corner
Figure BDA0003862109420000036
The coordinate of the upper right corner position is->
Figure BDA0003862109420000037
The coordinate of the lower left corner position is->
Figure BDA0003862109420000038
The coordinate of the lower right corner position is->
Figure BDA0003862109420000039
The coordinates of other positions are analogized in turn; wherein r is i 、c i The number of rows and the number of columns of the i-th layer of small convolution kernels are respectively expressed, the numbers are odd numbers, and the conditions are satisfied as follows: />
Figure BDA00038621094200000310
And &>
Figure BDA00038621094200000311
Substituting the above calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original receptive field is replaced, and merging the same kind of items of the items participating in calculation of the feature maps in all the original receptive fields, respectively extracting the values of the feature maps, and obtaining the final calculation result in the following expression mode:
Figure BDA0003862109420000041
wherein u is i,s,t Representing the weight at the coordinates (s, t) inside the i-th layer of small convolution kernels; for x before merging j,k Any term, s, involved in the calculation i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma i s i (= j) and ∑ i t i K, coefficient of
Figure BDA0003862109420000042
After merging the same kind of terms, x j,k Has a coefficient of/>
Figure BDA0003862109420000043
S23, based on the two formulas in steps S21 and S22, with the goal of minimizing the error between the replaced final convolution result and the original convolution result, and taking the input feature map xj, k as a variable, the difference between the coefficients of the two formulas is expressed as:
Figure BDA0003862109420000044
s24, setting u in the formula of S23 j,k Is constant in all u i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel i,s,t
Preferably, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the S2 specifically includes:
(1) If the number of rows r is even and the number of columns c is odd
Firstly using a convolution kernel with the height of 2 and the width of 1 in a plurality of replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000045
and &>
Figure BDA0003862109420000046
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(2) If the number of columns c is an even number, and the number of rows r is odd number
After replacementIn the small convolution kernels, firstly, a convolution kernel with the width of 2 and the height of 1 is used, the column number of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used for replacing an original large convolution kernel, and r is i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed and are both odd numbers, and the conditions are satisfied as follows:
Figure BDA0003862109420000051
and &>
Figure BDA0003862109420000052
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(3) If the number of rows r is even, the number of columns c is also even
Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the row number of the characteristic diagram after the convolution operation is performed is r-1 and the column number is c-1 and becomes odd numbers, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel, and the original large convolution kernel is replaced by the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed and are both odd numbers, and the conditions are satisfied as follows:
Figure BDA0003862109420000053
and &>
Figure BDA0003862109420000054
And then, acquiring the weight of each layer of small convolution kernel through a genetic algorithm according to the weight difference between each layer of small convolution kernel and the original large convolution kernel.
A model transformation system for deep learning model inference hardware acceleration, comprising:
the replacing module is used for replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model by adopting multiple layers of small convolution kernels;
the determining module is used for determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel;
and the convolution module is used for deploying the converted deep learning model on hardware and acquiring a convolution result of the input feature map.
Preferably, when the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the determining module specifically includes:
s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate of the position of the upper left corner of the receptive field
Figure BDA0003862109420000061
The coordinate of the upper right corner position is->
Figure BDA0003862109420000062
The coordinate of the lower left corner position is->
Figure BDA0003862109420000063
The coordinate of the lower right corner position is->
Figure BDA0003862109420000064
The coordinates of other positions are analogized in turn; wherein, r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents the row index number, and the right half part represents the column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;
before replacing with a plurality of small convolution kernels, the original convolution calculation results are as follows:
Figure BDA0003862109420000065
wherein x is j,k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of j,k Representing the weight at coordinate (j, k) in the large convolution kernel;
s22, setting the i-th layer of small volume by replacing the original large convolution kernel with the n layers of small convolution kernelsThe coordinate of the central position of the kernel is (0,0), and the coordinate of the position of the upper left corner
Figure BDA0003862109420000066
The coordinate of the position of the upper right corner is->
Figure BDA0003862109420000067
Lower left corner position of has the coordinate of->
Figure BDA0003862109420000068
The coordinate of the lower right corner position is->
Figure BDA0003862109420000069
The coordinates of other positions are analogized in turn; wherein r is i 、c i The number of rows and the number of columns of the i-th layer of small convolution kernels are respectively expressed, the numbers are odd numbers, and the conditions are satisfied as follows: />
Figure BDA00038621094200000610
And
Figure BDA00038621094200000611
substituting the above calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original receptive field is replaced, and merging the same-class items for the items participating in calculation of the feature maps in the original receptive field, respectively extracting the values of the feature maps, and obtaining the final calculation result expression mode as follows:
Figure BDA00038621094200000612
wherein u is i,s,t Indicating the coordinates inside the small convolution kernel of the ith layer (s, at t) of a weight; for x before merging j,k Any term, s, involved in the calculation i The row index number of the ith layer of small convolution kernel weight participating in the calculation; t is t i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma i s i = j is a function of and sigma i t i K, the coefficient of the termIs composed of
Figure BDA0003862109420000071
After merging the same kind of terms, x j,k Is->
Figure BDA0003862109420000072
S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x j,k As the variable, the number of times, the difference between the coefficients of the two equations is expressed as:
Figure BDA0003862109420000073
s24, setting u in the formula S23 j,k Is constant, in all u i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel i,s,t
Preferably, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the determining module is specifically configured to:
(1) If the number of rows r is even and the number of columns c is odd
Firstly using a convolution kernel with the height of 2 and the width of 1 in a plurality of replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, and then using n layers of small convolution kernels to replace the original large convolution kernel, wherein n +1 layers of small convolution kernels are used for replacing the original large convolution kernel, and r is i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000074
and &>
Figure BDA0003862109420000075
Then, according to each layer of small convolution kernel and originalThe weight difference of the large convolution kernels is obtained, and the weight of each layer of small convolution kernels is obtained through a genetic algorithm;
(2) If the number of rows c is even and the number of rows r is odd
Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel i 、c i Respectively representing the row number and the column number of the ith layer of small convolution kernel of the next n layers of small convolution kernels, are all odd numbers and should satisfy:
Figure BDA0003862109420000081
and &>
Figure BDA0003862109420000082
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(3) If the number of rows r is an even number and is provided with a plurality of groups, the number of columns c is also even
Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the number of lines of the characteristic diagram after the convolution operation is performed is r-1 and the number of columns is c-1 and is all odd, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000083
and &>
Figure BDA0003862109420000084
<xnotran> , , . </xnotran>
A storage medium storing a computer program for deep learning model inference hardware accelerated model conversion, wherein, the computer program causes a computer to execute the model conversion method for deep learning model inference hardware acceleration as described above.
An electronic device, comprising:
one or more a processor;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the model transformation method for deep learning model inference hardware acceleration as described above
(III) advantageous effects
The invention provides a model conversion method, a model conversion system, a storage medium and electronic equipment for deep learning model inference hardware acceleration. Compared with the prior art, the method has the following beneficial effects:
the invention the method comprises the following steps: replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model with a plurality of layers of small convolution kernels; determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel; and deploying the converted deep learning model on hardware for obtaining a convolution result of the input feature map so as to obtain a final reasoning result of the deep learning model. The large convolution kernel is replaced by the small convolution kernels which are executed in series, the weights of the small convolution kernels are obtained through direct calculation of the weights of the large convolution kernel, the influence of the replacement process on output is reduced as far as possible, a network model does not need to be retrained, the calculation amount is small, and the network model is directly deployed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating a prior art manner of breaking a 5x5 large convolution kernel into a hardware supported 3x3 small convolution kernel;
FIG. 2 is a schematic diagram of a prior art method of replacing a 5x5 large convolution kernel with 4 3x3 small convolution kernels as shown in FIG. 1;
FIG. 3 is a schematic diagram of a prior art approach for replacing a 5x5 large convolution kernel with a 2-level 3x3 small convolution kernel;
FIG. 4 is a flowchart illustrating a model transformation method for deep learning model inference hardware acceleration according to an embodiment of the present invention;
fig. 5 is a weight diagram of a convolution kernel with a size of 5 × 5 according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating an input feature map representation of a 5 × 5 convolution kernel in a perceptual domain according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating a 2-layer 3 × 3 convolutional layer according to an embodiment of the present invention.
Detailed Description
To make the objects, aspects and advantages of the embodiments of the present invention clearer, the technical scheme of the embodiment of the invention is clearly and completely described, it is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
By providing the model conversion method, the system, the storage medium and the electronic device for deep learning model inference hardware acceleration, the embodiment of the application solves the problem that inference calculation quantity is large or technical problems that retraining is needed and rapid deployment cannot be achieved are solved.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model with a plurality of layers of small convolution kernels; determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel; and deploying the converted deep learning model on hardware for obtaining a convolution result of the input feature map so as to obtain a final reasoning result of the deep learning model. By replacing the large convolution kernel with a smaller convolution kernel of smaller size suitable for hardware implementation, and keeping the magnitude of the receptive field of each output constant, meanwhile, for any input, the output after replacement is close to the original output as much as possible.
It should be added that the Convolution calculation according to the method provided in the embodiment of the present invention includes two-dimensional Convolution calculation with an arbitrary step size, also includes hole Convolution (distorted Convolution) calculation, and also includes Convolution calculation such as Deformable Convolution calculation (Deformable Convolution). Although the shapes of the input feature map receptive fields for these convolution calculations differ in the original input feature map, in the present invention, the input feature maps within the receptive fields participating in the convolution calculations are treated as a two-dimensional matrix in a unified manner, and after arranging the coordinates in the form of a two-dimensional matrix, the processing is performed in a unified manner. The method provided by the invention can be applied to the deployment of a deep learning model in the field of computer vision on an artificial intelligence chip, wherein one application field is the field of industrial color sorters. For example, in the field of color sorters, equipment is required to identify the type of grain, such as whether the grain is normal or defective, in an image captured by a color sorter within a defined number of milliseconds. Obviously, such a device imposes strict requirements on the inference time of the deep learning model. On a conventional CPU-based device, for a deep learning model that is computationally very large, the inference of the deep learning model cannot be completed within a defined time. On the artificial intelligence chip specially designed for deep learning model reasoning, a large number of hardware units can be used for simultaneously executing the calculation in the deep learning model reasoning, so that the reasoning of the deep model can be efficiently completed. However, once the hardware architecture of the artificial intelligence chip based on the hardware acceleration kernel is fixed, the artificial intelligence chip cannot directly support the calculation of convolution kernels with different sizes.
In summary, the method provided by the embodiment of the invention can be applied to the field of deep learning model inference including color sorter image processing.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example (b):
in a first aspect, as shown in fig. 4, an embodiment of the present invention provides a model transformation method for deep learning model inference hardware acceleration, including:
s1, replacing an original large convolution kernel which cannot be directly deployed in hardware in a deep learning model with a plurality of layers of small convolution kernels;
s2, respectively determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel;
and S3, deploying the converted deep learning model on hardware, and obtaining a convolution result of the input feature map.
According to the embodiment of the invention, the large convolution kernel is replaced by the plurality of small convolution kernels which are executed in series, the weights of the plurality of small convolution kernels are obtained by directly calculating the weight of the large convolution kernel, the influence of the replacement process on the output is reduced as much as possible, the network model does not need to be retrained, the calculated amount is small, and the network model is directly deployed.
The following will describe the steps of the above technical solution in detail:
in step S1, for an original large convolution kernel that cannot be directly deployed in hardware in the deep learning model, a multilayer small convolution kernel is used to replace the large convolution kernel.
In step S2, the weights of the small convolution kernels of each layer are determined by a genetic algorithm according to the weight difference between the large convolution kernel and the small convolution kernels of each layer.
When the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the S2 specifically includes:
s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate of the position of the upper left corner of the receptive field
Figure BDA0003862109420000121
The coordinate of the upper right corner position is->
Figure BDA0003862109420000122
The coordinate of the lower left corner position is->
Figure BDA0003862109420000123
The coordinate of the lower right corner position is->
Figure BDA0003862109420000124
The coordinates of other positions are analogized in turn; wherein r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents a row index number, and the right half part represents a column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;
<xnotran> , : </xnotran>
Figure BDA0003862109420000125
Wherein x is j,k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; zxfoom u j,k Represents the weight at coordinate (j, k) in the large convolution kernel;
s22, setting the original large convolution kernel to be replaced by n layers of small convolution kernels, wherein the coordinate of the center position of the ith layer of small convolution kernel is (0,0), and the coordinate of the position of the upper left corner
Figure BDA0003862109420000126
The coordinate of the upper right corner position is->
Figure BDA0003862109420000127
<xnotran> /</xnotran>>
Figure BDA0003862109420000128
The coordinate of the lower right corner position is->
Figure BDA0003862109420000129
The coordinates of other positions are analogized in turn; wherein r is i 、c i The number of rows and the number of columns of the ith layer of small convolution kernel are respectively expressed, are odd numbers, and the conditions are as follows: />
Figure BDA00038621094200001210
And
Figure BDA00038621094200001211
substituting the calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original experience field is replaced, executing combination of similar items for all items participating in calculation of the characteristic diagram in the original experience field, respectively extracting the values of the characteristic diagrams, and obtaining the final calculation result expression mode as follows:
Figure BDA00038621094200001212
wherein u is i,s,t Represents the coordinates inside the i-th layer small convolution kernel (s, weight at t); for x before merging j,k Any term, s, involved in the calculation i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma i s i (= j) and ∑ i t i K, coefficient of
Figure BDA0003862109420000131
After merging the same kind of terms, x j,k Has a coefficient of->
Figure BDA0003862109420000132
S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x j,k As a variable, between the coefficients of the two formulaeThe difference of (a) is expressed as:
Figure BDA0003862109420000133
s24 setting u in S23 formula j,k Is constant in all u i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel i,st
When either the number of rows or the number of columns of the original large convolution kernel is even, i.e. when
(1) If the number of rows r is an even number, and the number of rows c is odd
Firstly using a convolution kernel with the height of 2 and the width of 1 in the plurality of replaced small convolution kernels, leading the row number of the characteristic diagram after the convolution operation to be r-1, becomes odd number, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used to replace the original large convolution kernel, r i 、c i The number of rows and the number of columns of the f-th layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000134
and &>
Figure BDA0003862109420000135
And then, acquiring the weight of each layer of small convolution kernel through a genetic algorithm according to the weight difference between each layer of small convolution kernel and the original large convolution kernel.
(2) If the number of rows c is even and the number of rows r is odd
<xnotran> 2, 1 , c-1, , n , n +1 , r </xnotran> i 、c i The number of rows and the number of columns of the f-th layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000141
and &>
Figure BDA0003862109420000142
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, and acquiring the weight of each layer of small convolution kernel through a genetic algorithm.
(3) If the number of rows r is even and the number of columns c is also even
Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the number of lines of the characteristic diagram after the convolution operation is performed is r-1 and the number of columns is c-1 and is all odd, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000143
and device for selecting or keeping>
Figure BDA0003862109420000144
And then, acquiring the weight of each layer of small convolution kernel through a genetic algorithm according to the weight difference between each layer of small convolution kernel and the original large convolution kernel.
<xnotran> S3 , , . </xnotran>
The model conversion (convolution kernel replacement) process introduced in steps S1 and S2 is performed only once before deployment, and the weights of the convolution layers after replacement are obtained. After deployment on hardware, there is no need to compute the convolution kernel weights again at each inference.
The transformation method provided by the embodiment of the invention comprises the following steps:
on one hand, the problem that when a deep learning model is deployed to a specific hardware architecture, a large convolution kernel is split into multiple layers of small convolution kernels, and retraining is needed can be solved;
on the other hand, the problem of large calculation amount when a large convolution kernel is split into a plurality of parallel small convolution kernels when the deep learning model is deployed to a specific hardware architecture can be solved. By using the method, the method is only needed to be used for acquiring the convolution kernel coefficients of each layer once before the neural network model is deployed to hardware, and then the convolution kernel coefficients are not needed to be acquired again every time of reasoning on the hardware, so that the calculation amount is reduced compared with the traditional method every time of reasoning.
The detailed demonstration procedure is as follows:
note that the height of the output feature map is H, and the width is W, and the calculated amount of the layer is about H × W × r × c. After the method provided by the embodiment of the invention is used, the calculated amount is
Figure BDA0003862109420000145
Computing measurement of and greatly reduced.
In the embodiment of the invention, for a large convolution kernel which cannot be directly deployed in hardware, r and c should be at least greater than 5. When r =5, c =5, a 2-level 3 × 3 small convolution kernel may be substituted. The original calculated amount is H × W25, and the calculated amount after replacement is H × W (3 × 3+3 × 3), and obviously the calculated amount after replacement is smaller. For larger convolution kernels, however, the amount of computation after replacement is always smaller than that of the original convolution layer before replacement, as can be demonstrated by mathematical induction.
Still taking the convolution kernel size of 5 × 5 as an example, the expression manner of the weights is shown in fig. 5, and the expression manner of the input feature map corresponding to one output thereof is shown in fig. 6. For ease of understanding, the row index number and the column index of the weight and input feature map both start at 1.
When the original convolution kernel is used, the output is calculated as follows:
result=x 1,1 *i 1,1 +x 1,2 *u 1,2 +x 1,3 *u 1,3 +x 1,4 *u 1,4 +x 1,5 *u 1,5 +x 2,1 *u 2,1 +x 2,2 *u 2,2 +x 2,3 *u 2,3 +x 2,4 *u 2,4 +x 2,5 *u 2,5 +x 3,1 *u 3,1 +x 3,2 *u 3,2 +x 3,3 *u 3,3 +x 3,4 *u 3,4 +x 3,5 *u 3,5 +x 4,1 *u 4,1 +x 4,2 *u 4,2 +x 4,3 *u 4,3 +x 4,4 *u 4,4 +x 4,5 *u 4,5 +x 5,1 *u 5,1 +x 5,2 *u 5,2 +x 5,3 *u 5,3 +x 5,4 *u 5,4 +x 5,5 *u 5,5 (1)
if 2 layers of 3x3 convolutional layers are replaced, the two layers of convolutional layer weights are expressed as shown in fig. 7, where W represents the first layer of convolutional kernel weights and V represents the second layer of convolutional kernel weights. For ease of understanding, the row index number and column index of the small convolution kernel start at 1 after the replacement.
After replacing with 2 layers of 3 × 3 convolutional layers, the output of the same position is calculated as follows:
result=v 1,1 *(x 1,1 *w 1,1 +x 1,2 *w 1,2 +x 1,3 *w 1,3 +x 2,1 *w 2,1 +x 2,2 *w 2,2 +x 2,3 *w 2,3 +x 3,1 *w 3,1 +x 3,2 *w 3,2 +x 3,3 *w 3,3 )+v 1,2 *(x 1,2 *w 1,1 +x 1,3 *w 1,2 +x 1,4 *w 1,3 +x 2,2 *w 2,1 +x 2,3 *w 2,2 +x 2,4 *w 2,3 +x 3,2 *w 3,1 +x 3,3 *w 3,2 +x 3,4 *w 3,3 )+v 1,3 *(x 1,3 *w 1,1 +x 1,4 *w 1,2 +x 1,5 *w 1,3 +x 2,3 *w 2,1 +x 2,4 *w 2,2 +x 2,5 *w 2,3 +x 3,3 *w 3,1 +x 3,4 *w 3,2 +x 3,5 *w 3,3 )+v 2,1 *(x 2,1 *w 1,1 +x 2,2 *w 1,2 +x 2,3 *w 1,3 +x 3,1 *w 2,1 +x 3,2 *w 2,2 +x 3,3 *w 2,3 +x 4,1 *w 3,1 +x 4,2 *w 3,2 +x 4,3 *w 3,3 )+v 2,2 *(x 2,2 *w 1,1 +x 2,3 *w 1,2 +x 2,4 *w 1,3 +x 3,2 *w 2,1 +x 3,3 *w 2,2 +x 3,4 *w 2,3 +x 4,2 *w 3,1 +x 4,3 *w 3,2 +x 4,4 *w 3,3 )+v 2,3 *(x 2,3 *w 1,1 +x 2,4 *w 1,2 +x 2,5 *w 1,3 +x 3,3 *w 2,1 +x 3,4 *w 2,2 +x 3,5 *w 2,3 +x 4,3 *w 3,1 +x 4,4 *w 3,2 +x 4,5 *w 3,3 )+v 3,1 *(x 3,1 *w 1,1 +x 3,2 *w 1,2 +x 3,3 *w 1,3 +x 4,1 *w 2,1 +x 4,2 *w 2,2 +x 4,3 *w 2,3 +x 5,1 *w 3,1 +x 5,2 *w 3,2 +x 5,3 *w 3,3 )+v 3,2 *(x 3,2 *w 1,1 +x 3,3 *w 1,2 +x 3,4 *w 1,3 +x 4,2 *w 2,1 +x 4,3 *w 2,2 +x 4,4 *w 2,3 +x 5,2 *w 3,1 +x 5,3 *w 3,2 +x 5,4 *w 3,3 )+v 3,3 *(x 3,3 *w 1,1 +x 3,4 *w 1,2 +x 3,5 *w 1,3 +x 4,3 *w 2,1 +x 4,4 *w 2,2 +x 4,5 *w 2,3 +x 5,3 *w 3,1 +x 5,4 *w 3,2 +x 5,5 *w 3,3 ) (2)
combining the same kind of terms for the input feature map elements is performed on the terms containing the same input feature map elements in the formula (2), which can be rewritten as follows:
result=x 1,1 *v 1,1 *w 1,1 +x 1,2 *(v 1,1 *w 1,2 +v 1,2 *w 1,1 )+x 1,3 *(v 1,1 *w 1,3 +v 1,2 *w 1,2 +v 1,3 *w 1,1 )+x 1,4 *(v 1,2 *w 1,3 +v 1,3 *w 1,2 )+x 1,5 *(v 1,3 *w 1,3 )+x 2,1 *(v 1,1 *w 2,1 +v 2,1 *w 1,1 )+x 2,2 *(v 1,1 *w 2,2 +v 1,2 *w 2,1 +v 2,1 *w 1,2 +v 2,2 *w 1,1 )+x 2,3 *(v 1,1 *w 2,3 +v 1,2 *w 2,2 +v 1,3 *w 2,1 +v 2,1 *w 1,3 +v 2,2 *w 1,2 +v 2,3 *w 1,1 )+x 2, 4*(v 1,2 *w 2,3 +v 1,3 *w 2,2 +v 2,2 *w 1,3 +v 2,3 *w 1,2 )+x 2,5 *(v 1,3 *w 2,3 +v 2,3 *w 1,3 )+x 3,1 *(v 1,1 *w 3,1 +v 2,1 *w 2,1 +v 3,1 *w 1,1 )+x 3,2 *(v 1,1 *w 3,2 +v 1,2 *w 3,1 +v 2,1 *w 2,2 +v 2,2 *w 2,1 +v 3,1 *w 1,2 +v 3,2 *w 1,1 )+x 3,3 *(v 1,1 *w 3,3 +v 1,2 *w 3,2 +v 1,3 *w 3,1 +v 2,1 *w 2,3 +v 2,2 *w 2,2 +v 2,3 *w 3,1 +v 3,1 *w 1,3 +v 3,2 *w 1,2 +v 3,3 *w 1,1 )+x 3,4 *(v 1,2 *w 3,3 +v 1,3 *w 3,2 +v 2,2 *w 2,3 +v 2,3 *w 2,2 +v 3,2 *w 1,3 +v 3,3 *w 1,2 )+x 3,5 *(v 1,3 *w 3,3 +v 2,3 *w 2,3 +v 3,3 *w 1,3 )+x 4,1 *(v 2,1 *w 3,1 +v 3,1 *w 2,1 )+x 4,2 *(v 2,1 *w 3,2 +v 2,2 *w 3,1 +v 3,1 *w 2,2 +v 3,2 *w 2,1 )+x 4,3 *(v 2,1 *w 3,3 +v 2,2 *w 3,2 +v 2,3 *w 3,1 +v 3,1 *w 2,3 +v 3,2 *w 2,2 +v 3,3 *w 2,1 )+x 4,4 *(v 2,2 *w 3,3 +v 2,3 *w 3,2 +v 3,2 *w 2,3 +v 3,3 *w 2,2 )+x 4,5 *(v 2,3 *w 3,3 +v 3,3 *w 2,3 )+x 5,1 *v 3,1 *w 3,1 +x 5,2 *(v 3,1 *w 3,2 +v 3,2 *w 3,1 )+x 5,3 *(v 3,1 *w 3,3 +v 3,2 *w 3,2 +v 3,3 *w 3,1 )+x 5,4 *(v 3,2 *w 3,3 +v 3,3 *w 3,2 )+x 5,5 *v 3,3 *w 3,3 (3)
the purpose of replacing 2-layer 3x3 convolutional layers is to make the same output positions after replacement as equal as possible to the original 5x5 convolutional kernels, to reduce the impact on accuracy.
The weights are fixed after the 5x5 sized convolution kernel replacement is completed with 2 layers of 3x3 sized convolution kernels. When deploying the network model, one is faced with 25 inputs in the field that may take arbitrary values. I.e. the variables in equation (3) are 25 inputs, while the weights may be considered as invariant coefficients.
In order to keep the output after completion of the convolution kernel replacement as constant as possible as the output before replacement, the coefficients of equation (3) and equation (1) are required to be as equal as possible.
The coefficients in equation (3) are:
u′ 1,1 =v 1,1 *w 1,1
u′ 1,2 =v 1,1 *w 1,2 +v 1,2 *w 1,1
u′ 1,3 =v 1,1 *w 1,3 +v 1,2 *w 1,2 +v 1,3 *w 1,1
u′ 1,4 =v 1,2 *w 1,3 +v 1,3 *w 1,2
u′ 1,5 =v 1,3 *w 1,3
u′ 2,1 =v 1,1 *w 2,1 +v 2,1 *w 1,1
u′ 2,2 =v 1,1 *w 2,2 +v 1,2 *w 2,1 +v 2,1 *w 1,2 +v 2,2 *w 1,1
u′ 2,3 =v 1,1 *w 2,3 +v 1,2 *w 2,2 +v 1,3 *w 2,1 +v 2,1 *w 1,3 +v 2,2 *w 1,2 +v 2,3 *w 1,1
u′ 2,4 =v 1,2 *w 2,3 +v 1,3 *w 2,2 +v 2,2 *w 1,3 +v 2,3 *w 1,2
u′ 2,5 =v 1,3 *w 2,3 +v 2,3 *w 1,3
u′ 3,1 =v 1,1 *w 3,1 +v 2,1 *w 2,1 +v 3,1 *w 1,1
u′ 3,2 =v 1,1 *w 3,2 +v 1,2 *w 3,1 +v 2,1 *w 2,2 +v 2,2 *w 2,1 +v 3,1 *w 1,2 +v 3,2 *w 1,1
u′ 3,3 =v 1,1 *w 3,3 +v 1,2 *w 3,2 +v 1,3 *w 3,1 +v 2,1 *w 2,3 +v 2,2 *w 2,2 +v 2,3 *w 3,1 +v 3,1 *w 1,3 +v 3,2 *w 1,2 +v 3,3 *w 1,1
u′ 3,4 =v 1,2 *w 3,3 +v 1,3 *w 3,2 +v 2,2 *w 2,3 +v 2,3 *w 2,2 +v 3,2 *w 1,3 +v 3,3 *w 1,2
u′ 3,5 =v 1,3 *w 3,3 +v 2,3 *w 2,3 +v 3,3 *w 1,3
u′ 4,1 =v 2,1 *w 3,1 +v 3,1 *w 2,1
u′ 4,2 =v 2,1 *w 3,2 +v 2,2 *w 3,1 +v 3,1 *w 2,2 +v 3,2 *w 2,1
u′ 4,3 =v 2,1 *w 3,3 +v 2,2 *w 3,2 +v 2,3 *w 3,1 +v 3,1 *w 2,3 +v 3,2 *w 2,2 +v 3,3 *w 2,1
u′ 4,4 =v 2,2 *w 3,3 +v 2,3 *w 3,2 +v 3,2 *w 2,3 +v 3,3 *w 2,2
u′ 4,5 =v 2,3 *w 3,3 +v 3,3 *w 2,3
u′ 5,1 =v 3,1 *w 3,1
u′ 5,2 =v 3,1 *w 3,2 +v 3,2 *w 3,1
u′ 5,3 =v 3,1 *w 3,3 +v 3,2 *w 3,2 +v 3,3 *w 3,1
u′ 5,4 =v 3,2 *w 3,3 +v 3,3 *w 3,2
u′ 5,5 =v 3,3 *w 3,3
after replacing with 2 layers of convolution kernels with the size of 3 × 3, the output calculation formula can be rewritten as:
result=x 1,1 *u′ 1,1 +x 1,2 *u′ 1,2 +x 1,3 *u′ 1,3 +x 1,4 *u′ 1,4 +x 1,5 *u′ 1,5 +x 2,1 *u′ 2,1 +x 2,2 *u′ 2,2 +x 2,3 *u′ 2,3 +x 2,4 *u′ 2,4 +x 2,5 *u′ 2,5 +x 3,1 *u′ 3,1 +x 3,2 *u′ 3,2 +x 3,3 *u′ 3,3 +x 3,4 *u′ 3,4 +x 3,5 *u′ 3,5 x 4,1 *u′ 4,1 +x 4,2 *u′ 4,2 +x 4,3 *u′ 4,3 +x 4,4 *u′ 4,4 +x 4,5 *u′ 4,5 +x 5,1 *u′ 5,1 +x 5,2 *u′ 5,2 +x 5,3 *u′ 5,3 +x 5,4 *u′ 5,4 +x 5,5 *u′ 5,5
if the output at this time is to be as close as possible to the output before replacement, that is, the calculation results of the expressions (1) and (3), the coefficients of the expressions should be as close as possible. The difference between the two sets of coefficients is described using the L2 norm, the target is then converted to make the following function as small as possible:
diff=(u 1,1 -u′ 1,1 ) 2 +(u 1,2 -u′ 1,2 ) 2 +(u 1,3 -u′ 1,3 ) 2 +(u 1,4 -u′ 1,4 ) 2 +(u 1,5 -u′ 1,5 ) 2 +(u 2,1 -u′ 2,1 ) 2 +(u 2,2 -u′ 2,2 ) 2 +(u 2,3 -u′ 2,3 ) 2 +(u 2,4 -u′ 2,4 ) 2 +(u 2,5 -u′ 2,5 ) 2 +(u 3,1 -u′ 3,1 ) 2 +(u 3,2 -u′ 3,2 ) 2 +(u 3,3 -u′ 3,3 ) 2 +(u 3,4 -u′ 3,4 ) 2 +(u 3,5 -u′ 3,5 ) 2 +(u 4,1 -u′ 4,1 ) 2 +(u 4,2 -u′ 4,2 ) 2 +(u 4,3 -u′ 4,3 ) 2 +(u 4,4 -u′ 4,4 ) 2 +(u 4,5 -u′ 4,5 ) 2 +(u 5,1 -u′ 5,1 ) 2 +(u 5,2 -u′ 5,2 ) 2 +(u 5,3 -u′ 5,3 ) 2 +(u 5,4 -u′ 5,4 ) 2 +(u 5,5 -u′ 5,5 ) 2 (4)
in the above formula, the symbol without being added with the "'" indicates the weight of the 5x5 convolution kernel before the replacement, and the symbol with the "'" indicates the coefficient corresponding to the same input after the replacement of the 2 layers of 3x3 convolution kernels. In this formula, the symbols not labeled with "'" are fixed weights, and the symbols labeled with "'" are coefficients combined by the weights of two layers of 3x3 convolution kernels that need to be solved.
Thus, in equation (4), the sign of the plus "'" superscript is variable, but the change is governed by the weights of the two 3x3 convolution kernels. The problem is to calculate weights of two 3 × 3 convolution kernels that minimize expression (4), where 2 × 3=18 variables, that is, 18 values that minimize expression (4) expressed by 18 variables.
After solving 18 weights for minimizing equation (4) by a genetic algorithm, the weights of the 2-level 3x3 convolution kernel that need to perform the replacement are obtained. These 18 weights can minimize equation (4), i.e., can make the values of equation (3) and equation (1) as close as possible to any input feature map. After that, the replacement may be directly performed. The purpose of the replacement is to replace the original one-layer convolution kernel of 5x5 size with two layers of convolution kernels of 3x3 size, so that the change of the replaced output for any input is as small as possible. The first layer of the convolution layer after replacement is not needed an offset and activation function is added.
Similarly, the convolution kernel of 7 × 7 size may be replaced by a convolution layer of 3 layers of convolution kernels of 3 × 3 size, in a similar manner as described above.
Similarly, if the maximum convolution kernel supported by the hardware is not 3x3, but 5x5, the method proposed in the embodiment of the present invention may be used to replace the 1-layer convolution layer with the size of 9x9 with the 2-layer convolution layer with the size of 5x5 convolution kernel. <xnotran> , 1 7x7 1 5x5 1 3x3 . </xnotran>
Compared with the large convolution kernel splitting mode shown in fig. 2, the embodiment of the invention has the advantage of small calculation amount. For example, for a large convolution kernel of 5 × 5, in order to split the convolution operation into convolution operations of 3 × 3 size, the method of fig. 2 needs to perform 3 × 3 convolution operations 4 times, and calculate the calculation amount per output to be 4 × 3=36. With the method of the present invention, only 2 times of convolution operations of 3 × 3 are required, and the calculation amount of each output is calculated on average as 2 × 3=18, which is half of the calculation amount in the manner shown in fig. 2. For another example, for a large convolution kernel of 7 × 7, in order to split the convolution kernel into convolution kernels of 3 × 3 size, the method shown in fig. 2 needs to split the convolution kernel into 9 convolution kernels of 3 × 3 size, needs to perform 3 × 3 convolution operations 9 times, and needs to perform multiply-accumulate computation 9 times for each output. By using the method provided by the invention, only 3 continuous convolution layers with the size of 3x3 are required to be replaced, a convolution operation of 3 times 3x3 is performed, averaging requires 3 × 3=27 multiply-accumulate calculations per output, which amount to only 1/3 of the way shown in fig. 2.
Compared to the large convolution kernel alternative shown in figure 3, the embodiment of the invention has the advantages that the network model does not need to be retrained, the calculation amount is small, and the aim of rapid deployment can be achieved.
In a second aspect, an embodiment of the present invention provides a model transformation system for deep learning model inference hardware acceleration, including:
the replacing module is used for replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model by adopting multiple layers of small convolution kernels;
the determining module is used for determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel;
and the convolution module is used for deploying the converted deep learning model on hardware and acquiring a convolution result of the input feature map.
In an embodiment, when the number of rows and the number of columns of the original large convolution kernel are both odd numbers, the determining module specifically includes:
s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate of the position of the upper left corner of the receptive field
Figure BDA0003862109420000201
Upper right corner position of has the coordinate of->
Figure BDA0003862109420000202
Lower left corner position coordinates of (2) is->
Figure BDA0003862109420000203
The coordinate of the lower right corner position is->
Figure BDA0003862109420000204
The coordinates of other positions are analogized in turn; wherein r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents a row index number, and the right half part represents a column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;
before replacing with a plurality of small convolution kernels, the original convolution calculation results are as follows:
Figure BDA0003862109420000205
wherein x is j,k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of j,k Represents the weight at coordinate (j, k) in the large convolution kernel;
s22, setting the original large convolution kernel to be replaced by n layers of small convolution kernels, wherein the coordinate of the center position of the ith layer of small convolution kernel is (0,0), and the coordinate of the position of the upper left corner
Figure BDA0003862109420000211
The coordinate of the upper right corner position is->
Figure BDA0003862109420000212
The coordinate of the lower left corner position is->
Figure BDA0003862109420000213
The coordinate of the lower right corner position is->
Figure BDA0003862109420000214
The coordinates of other positions are analogized in turn; wherein r is i 、c i The number of rows and the number of columns of the ith layer of small convolution kernel are respectively expressed, are odd numbers, and the conditions are as follows: />
Figure BDA0003862109420000215
And
Figure BDA0003862109420000216
substituting the calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original experience field is replaced, executing combination of similar items for all items participating in calculation of the characteristic diagram in the original experience field, respectively extracting the values of the characteristic diagrams, and obtaining the final calculation result expression mode as follows:
Figure BDA0003862109420000217
wherein u is i,s,t Indicating the coordinates inside the small convolution kernel of the ith layer (s, weight at t); for x before merging j,k Any term, s, involved in the calculation i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; zxfoom t i Column index numbers of the ith layer small convolution kernel weight participating in the calculation should meet sigma i s i (= j) and ∑ i t i K, coefficient of
Figure BDA0003862109420000218
After merging the same kind of terms, x j,k Coefficient of (2) is composed of device for selecting or keeping>
Figure BDA0003862109420000219
S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x j,k As the variable, the number of times, the difference between the coefficients of the two equations is expressed as:
Figure BDA00038621094200002110
s24 setting u in S23 formula j,k Is constant in all u i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel i,s,t
In an embodiment, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the determining module is specifically configured to:
(1) If the number of rows r is an even number and is provided with a plurality of groups, and the number of rows c is odd
Firstly using a convolution kernel with the height of 2 and the width of 1 in a plurality of replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, and then ensuring that the line number of the characteristic diagram after the convolution operation is executed is r-1<xnotran> n , n +1 , r </xnotran> i 、c i Respectively representing the row number and the column number of the ith layer of small convolution kernel of the next n layers of small convolution kernels, are all odd numbers, and should satisfy:
Figure BDA0003862109420000221
and &>
Figure BDA0003862109420000222
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(2) If the number of rows c is even and the number of rows r is odd
Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel i 、c i The number of rows and the number of columns of the f-th layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure BDA0003862109420000223
and device for selecting or keeping>
Figure BDA0003862109420000224
And then, acquiring the weight of each layer of small convolution kernel through a genetic algorithm according to the weight difference between each layer of small convolution kernel and the original large convolution kernel.
(3) If the number of rows r is even, the number of columns c is also even
Firstly using a convolution kernel with the width of 2 and the height of 2 in the replaced small convolution kernels, leading the line number of the characteristic diagram after the convolution operation to be r-1, the columns are c-1 and are all odd numbers, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used for replacing the original large convolution kernel, r i 、c i Respectively representing the number of rows of the ith layer of small convolution kernels of the next n layers of small convolution kernels the number of columns, are all odd numbers and should satisfy:
Figure BDA0003862109420000225
and &>
Figure BDA0003862109420000226
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, and acquiring the weight of each layer of small convolution kernel through a genetic algorithm.
In a third aspect, the present invention provides a storage medium storing a computer program for deep learning model inference hardware accelerated model conversion, wherein the computer program makes a computer execute the model conversion method for deep learning model inference hardware accelerated as described above.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the model transformation method for deep learning model inference hardware acceleration as described above.
It can be understood that the model conversion system, the storage medium and the electronic device for deep learning model inference hardware acceleration provided by the embodiments of the present invention correspond to the model conversion method for deep learning model inference hardware acceleration provided by the embodiments of the present invention, for the explanation, example, beneficial effects and the like of the related contents, reference may be made to the corresponding parts in the model conversion method for deep learning model inference hardware acceleration, and details are not described herein again.
In summary, compared with the prior art, the method has the following beneficial effects:
1. on one hand, the problem that when the deep learning model is deployed to a specific hardware architecture, when splitting a large convolution kernel into multiple layers of small convolution kernels, the problem of requiring retraining.
2. On the other hand, the problem of large calculation amount when a large convolution kernel is split into a plurality of parallel small convolution kernels when the deep learning model is deployed to a specific hardware architecture can be solved. By using the method, the method is only needed to be used for acquiring the convolution kernel coefficients of each layer once before the neural network model is deployed to hardware, and then the convolution kernel coefficients are not needed to be acquired again every time of reasoning on the hardware, so that the calculation amount is reduced compared with the traditional method every time of reasoning.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements, but also includes other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: <xnotran> , ; </xnotran> And such modifications or alterations may be made to the present invention, the essence of the corresponding technical solutions does not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A model transformation method for deep learning model inference hardware acceleration, comprising:
s1, replacing an original large convolution kernel which cannot be directly deployed in hardware in a deep learning model with a plurality of layers of small convolution kernels;
s2, respectively determining the weight of each layer of small convolution kernel by adopting a genetic algorithm according to the weight difference of the large convolution kernel and each layer of small convolution kernel;
and S3, deploying the converted deep learning model on hardware, and obtaining a convolution result of the input feature map.
2. The model conversion method for deep learning model inference hardware acceleration as claimed in claim 1, wherein when the number of rows and columns of the original large convolution kernel are both odd, said S2 specifically comprises:
s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate of the position of the upper left corner of the receptive field
Figure FDA0003862109410000011
The coordinate of the upper right corner position is->
Figure FDA0003862109410000012
The coordinate of the lower left corner position is->
Figure FDA0003862109410000013
Lower right corner position has the coordinate of->
Figure FDA0003862109410000014
The coordinates of other positions are analogized in turn; wherein r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents a row index number, and the right half part represents a column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;
before replacing with a plurality of small convolution kernels, the original convolution calculation results are as follows:
Figure FDA0003862109410000015
wherein x is j,k Indication inputIn the original large convolution of the characteristic diagram the coordinates in the receptive field of the nucleus (j, k) value; u. of j,k Represents the weight at coordinate (j, k) in the large convolution kernel;
s22, setting the original large convolution kernel to be replaced by n layers of small convolution kernels, wherein the coordinate of the center position of the ith layer of small convolution kernel is (0,0), and the coordinate of the position of the upper left corner
Figure FDA0003862109410000021
The coordinate of the upper right corner position is->
Figure FDA0003862109410000022
The coordinate of the lower left corner position is->
Figure FDA0003862109410000023
The coordinate of the lower right corner position is->
Figure FDA0003862109410000024
The coordinates of other positions are analogized in turn; wherein r is i 、c i The number of rows and the number of columns of the ith layer of small convolution kernel are respectively expressed, are odd numbers, and the conditions are as follows: />
Figure FDA0003862109410000025
And
Figure FDA0003862109410000026
substituting the calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original experience field is replaced, executing combination of similar items for all items participating in calculation of the characteristic diagram in the original experience field, respectively extracting the values of the characteristic diagrams, and obtaining the final calculation result expression mode as follows:
Figure FDA0003862109410000027
wherein u is i,s,t Representing weights at coordinates (s, t) inside the i-th layer of small convolution kernelsWeighing; for x before merging j,k Any term, s, involved in the calculation i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t i Column index numbers of the u-th layer small convolution kernel weight participating in the calculation should satisfy sigma i s i (= j) and ∑ i t i In the range of = k, the item coefficient of
Figure FDA0003862109410000028
After merging the same kind of terms, x j,k Has a coefficient of->
Figure FDA0003862109410000029
S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x j,k As a variable, the difference between the coefficients of the two equations is expressed as:
Figure FDA00038621094100000210
s24, setting u in the formula S23 j,k Is constant, in all u i,s,t As variables to be solved, aiming at minimizing the diff value in the S23 formula, solving the formula in S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel i,s,t
3. The model transformation method for deep learning model inference hardware acceleration as claimed in claim 1, wherein when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, said S2 specifically includes:
(1) If the number of rows r is even and the number of columns c is odd
Firstly using a convolution kernel with the height of 2 and the width of 1 in the replaced small convolution kernels to ensure that the line number of the characteristic diagram after the convolution operation is executed is r-1 and becomes an odd number, and then using n layers of small convolution kernels to totally n +1 layers of small convolution kernelsKernel replacement of the original large convolution kernel, r i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed and are both odd numbers, and the conditions are satisfied as follows:
Figure FDA0003862109410000031
and &>
Figure FDA0003862109410000032
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(2) If the number of rows c is even and the number of rows r is odd
Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel i 、c i Respectively representing the row number and the column number of the ith layer of small convolution kernel of the next n layers of small convolution kernels, are all odd numbers and should satisfy:
Figure FDA0003862109410000033
and &>
Figure FDA0003862109410000034
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(3) If the number of rows r is even and the number of columns c is also even
Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the row number of the characteristic diagram after the convolution operation is performed is r-1 and the column number is c-1 and becomes odd numbers, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel, and the original large convolution kernel is replaced by the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure FDA0003862109410000035
and &>
Figure FDA0003862109410000036
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, and acquiring the weight of each layer of small convolution kernel through a genetic algorithm.
4. A model transformation system for deep learning model inference hardware acceleration, comprising:
the replacing module is used for replacing an original large convolution kernel which cannot be directly deployed in hardware in the deep learning model by adopting multiple layers of small convolution kernels;
the determining module is used for determining the weight difference of the large convolution kernel and each layer of small convolution kernels, respectively determining the weight of each layer of small convolution kernel by adopting a genetic algorithm;
and the convolution module is used for deploying the converted deep learning model on hardware and acquiring a convolution result of the input feature map.
5. The model transformation system for deep learning model inference hardware acceleration of claim 4, wherein when the number of rows and the number of columns of the original large convolution kernel are both odd, the determining module specifically comprises:
s21, setting the coordinate of the input feature map at the central position of the receptive field of the original large convolution kernel to be (0,0), and setting the coordinate at the position of the upper left corner of the receptive field
Figure FDA0003862109410000041
Position of upper right corner has the coordinate of->
Figure FDA0003862109410000042
The coordinate of the lower left corner position is->
Figure FDA0003862109410000043
Lower right corner positionHas the coordinate of->
Figure FDA0003862109410000044
Of the other positions the coordinates are analogized in sequence; wherein r and c respectively represent the row number and the column number of the large convolution kernel, the left half part of each coordinate represents a row index number, and the right half part represents a column index number; and the coordinates of each weight in the large convolution kernel are agreed according to the same method;
before replacing with a plurality of small convolution kernels, the original convolution calculation results are as follows:
Figure FDA0003862109410000045
wherein x is j,k Representing the value of the input feature map at the coordinate (j, k) in the receptive field of the original large convolution kernel; u. of j,k Represents the weight at coordinate (j, k) in the large convolution kernel;
s22, setting the original large convolution kernel to be replaced by n layers of small convolution kernels, wherein the coordinate of the center position of the ith layer of small convolution kernel is (0,0), and the coordinate of the position of the upper left corner
Figure FDA0003862109410000046
The coordinate of the upper right corner position is->
Figure FDA0003862109410000047
The coordinate of the lower left corner position is->
Figure FDA0003862109410000048
The coordinate of the lower right corner position is->
Figure FDA0003862109410000051
The coordinates of other positions are analogized in turn; wherein r is i 、c i The number of rows and the number of columns of the ith layer of small convolution kernel are respectively expressed, are odd numbers, and the conditions are as follows: />
Figure FDA0003862109410000052
And &>
Figure FDA0003862109410000053
Substituting the calculation process into a convolution calculation formula layer by layer until the characteristic diagram in the original experience field is replaced, executing combination of similar items for all items participating in calculation of the characteristic diagram in the original experience field, respectively extracting the values of the characteristic diagrams, and obtaining the final calculation result expression mode as follows:
Figure FDA0003862109410000054
wherein u is i,s,t Representing the weight at the coordinates (s, t) inside the i-th layer of small convolution kernels; for x before merging j,k Any term, s, involved in the calculation i Row index numbers of the ith layer of small convolution kernel weights participating in the calculation; t is t i For the column index of the ith layer small convolution kernel weight participating in the computation, should satisfy ∑ i s i (= j) and ∑ i t i K, coefficient of
Figure FDA0003862109410000055
After merging the same kind of terms, x j,k Has a coefficient of->
Figure FDA0003862109410000056
S23, based on the two formulas in the steps S21 and S22, the error between the replaced final convolution result and the original convolution result is minimized to input the feature diagram x j,k As a variable, the difference between the coefficients of the two equations is expressed as:
Figure FDA0003862109410000057
s24 setting u in S23 formula j,k Is constant in all u i,s,t Taking the minimum diff value in the S23 formula as a target to be solved, solving the formula in the S23 by adopting a genetic algorithm, and respectively obtaining the weight u of each layer of small convolution kernel i,s,t
6. The model transformation system for deep learning model inference hardware acceleration of claim 1, when any one of the number of rows or the number of columns of the original large convolution kernel is an even number, the replacement module specifically includes:
(1) If the number of rows r is even and the number of columns c is odd
Firstly using a convolution kernel with the height of 2 and the width of 1 in the plurality of replaced small convolution kernels, leading the row number of the characteristic diagram after the convolution operation to be r-1, becomes odd number, then n layers of small convolution kernels are used, and n +1 layers of small convolution kernels are used to replace the original large convolution kernel, r i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure FDA0003862109410000061
and &>
Figure FDA0003862109410000062
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(2) If the number of rows c is even and the number of rows r is odd
Firstly using a convolution kernel with the width of 2 and the height of 1 in a plurality of replaced small convolution kernels to ensure that the number of columns of the characteristic diagram after the convolution operation is executed is c-1 and becomes an odd number, then using n layers of small convolution kernels, and replacing the original large convolution kernel by n +1 layers of small convolution kernels, wherein r is the number of the original large convolution kernel and r is the number of the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure FDA0003862109410000063
and &>
Figure FDA0003862109410000064
Then, according to the weight difference between each layer of small convolution kernel and the original large convolution kernel, obtaining the weight of each layer of small convolution kernel through a genetic algorithm;
(3) If the number of rows r is an even number, the number of columns c is also even
Firstly using a convolution kernel with the width of 2 and the height of 2 in a plurality of replaced small convolution kernels to ensure that the row number of the characteristic diagram after the convolution operation is performed is r-1 and the column number is c-1 and becomes odd numbers, then using n layers of small convolution kernels and n +1 layers of small convolution kernels to replace the original large convolution kernel, wherein r is the original large convolution kernel, and the original large convolution kernel is replaced by the original large convolution kernel i 、c i The number of rows and the number of columns of the ith layer of small convolution kernels of the following n layers of small convolution kernels are respectively expressed, are odd numbers, and the conditions are as follows:
Figure FDA0003862109410000065
and &>
Figure FDA0003862109410000066
And then, acquiring the weight of each layer of small convolution kernel through a genetic algorithm according to the weight difference between each layer of small convolution kernel and the original large convolution kernel.
7. A storage medium storing a computer program for deep learning model inference hardware accelerated model conversion, wherein the computer program causes a computer to execute the model conversion method for deep learning model inference hardware accelerated according to any one of claims 1 to 3.
8. An electronic device is provided, which comprises a display panel, it is characterized by comprising:
one or more a processor;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the model transformation method for deep learning model inference hardware acceleration of any of claims 1-3.
CN202211166984.1A 2022-09-23 2022-09-23 Model conversion method and system for deep learning model inference hardware acceleration Pending CN115906963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211166984.1A CN115906963A (en) 2022-09-23 2022-09-23 Model conversion method and system for deep learning model inference hardware acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211166984.1A CN115906963A (en) 2022-09-23 2022-09-23 Model conversion method and system for deep learning model inference hardware acceleration

Publications (1)

Publication Number Publication Date
CN115906963A true CN115906963A (en) 2023-04-04

Family

ID=86492326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211166984.1A Pending CN115906963A (en) 2022-09-23 2022-09-23 Model conversion method and system for deep learning model inference hardware acceleration

Country Status (1)

Country Link
CN (1) CN115906963A (en)

Similar Documents

Publication Publication Date Title
JP6504590B2 (en) System and computer implemented method for semantic segmentation of images and non-transitory computer readable medium
KR101298393B1 (en) Training convolutional neural networks on graphics processing units
CN112613581B (en) Image recognition method, system, computer equipment and storage medium
US20210241071A1 (en) Architecture of a computer for calculating a convolution layer in a convolutional neural network
TWI766396B (en) Data temporary storage apparatus, data temporary storage method and operation method
JP6956796B2 (en) Arithmetic circuits, arithmetic methods, and programs
EP3671572A1 (en) Information processing apparatus, neural network program, and processing method for neural network
CN109034175B (en) Image processing method, device and equipment
US10936938B2 (en) Method for visualizing neural network models
CN114792378B (en) Quantum image recognition method and device
US11120328B1 (en) Systems and methods for reducing power consumption of convolution operations for artificial neural networks
US20210201120A1 (en) Inference apparatus, convolution operation execution method, and program
Montesinos López et al. Convolutional neural networks
CN112633340B (en) Target detection model training and detection method, device and storage medium
CN115906963A (en) Model conversion method and system for deep learning model inference hardware acceleration
US5627944A (en) Parallel data processing system
JP7251354B2 (en) Information processing device, information processing program, and information processing method
CN112801201B (en) Deep learning visual inertial navigation combined navigation design method based on standardization
CN111340089A (en) Image feature learning method, model, apparatus and computer storage medium
CN113010430A (en) Test mode recommendation method and device and electronic equipment
US20240143886A1 (en) Method of correcting layout for semiconductor process using machine learning, method of manufacturing semiconductor device using the same, and layout correction system performing the same
Chatzinikolaou et al. Irregular learning cellular automata for the resolution of complex logic puzzles
JP7485028B2 (en) Learning device, method and program
CN117173552B (en) Underwater target detection method, system, electronic equipment and storage medium
CN110163352B (en) Circuit planning result generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination