CN110909874A

CN110909874A - Convolution operation optimization method and device of neural network model

Info

Publication number: CN110909874A
Application number: CN201911155114.2A
Authority: CN
Inventors: 杜渂; 邱祥平; 陈春东; 雷霆; 彭明喜; 周赵云; 陈健; 王聚全; 杨博; 刘冉东; 王月; 王孟轩; 张胜; 韩国令; 和传志; 曹若麟
Original assignee: Di'aisi Information Technology Ltd By Share Ltd
Current assignee: Di'aisi Information Technology Ltd By Share Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-03-24
Also published as: CN116416561A

Abstract

The invention provides a convolution operation optimization method and a convolution operation optimization device of a neural network model, wherein the method comprises the following steps: segmenting the input feature map in the channel dimension to obtain sub-input feature maps of each group; respectively carrying out different group convolution operations on each sub-input feature map, and extracting feature information of channels contained in each group to obtain different sub-output feature maps; and shuffling and combining the sub-output characteristic graphs of all the groups to obtain an output characteristic graph. The invention reduces the operation time consumption of the network model on the premise of ensuring the model effect.

Description

Convolution operation optimization method and device of neural network model

Technical Field

The invention relates to the technical field of neural network models, in particular to a convolution operation optimization method and device of a neural network model.

Background

In recent years, with the vigorous development of deep neural networks, academic circles and industries have seen major breakthroughs in deep learning in many fields, but the size and the calculation amount of the neural network model become bottlenecks in practical application, so that the neural network model is difficult to be applied to some scenes with high real-time requirements, such as typical application scenes of online video quality detection.

Reducing the amount of computation in the neural network can effectively reduce the operation time consumption of the neural network. However, reducing the amount of computation in the neural network may result in a reduction in the expressive power of the model, thereby affecting the actual effectiveness of the model.

Disclosure of Invention

The invention aims to provide a convolution operation optimization method and a convolution operation optimization device of a neural network model, which reduce the operation time consumption of the network model on the premise of ensuring the model effect.

The technical scheme provided by the invention is as follows:

a convolution operation optimization method of a neural network model is applied to convolution operation for extracting characteristic information from an input characteristic diagram and obtaining an output characteristic diagram, and comprises the following steps: segmenting the input feature map in the channel dimension to obtain sub-input feature maps of each group; respectively carrying out different group convolution operations on each sub-input feature map, and extracting feature information of channels contained in each group to obtain different sub-output feature maps; and shuffling and combining the sub-output characteristic graphs of all the groups to obtain the output characteristic graph.

Further, the segmenting the input feature map in the channel dimension to obtain sub-input feature maps of each group includes: and segmenting the input feature diagram according to the number of the channels of the input feature diagram to obtain a sub-input feature diagram of each channel.

Further, the group convolution operation includes: processing information of each channel of the sub-input feature map respectively by adopting a plurality of depth separable convolutions, wherein the depth separable convolutions are convolution kernels with channels being 1, and the number of the adopted depth separable convolutions is equal to the number of the channels of the sub-input feature map; and combining the features of the different channels extracted by the depth separable convolution through 1 x 1 point convolution to obtain a sub-output feature map.

Further, the group convolution operation includes: performing dimension increasing on the sub-input feature map through 1 × 1 point convolution to obtain a sub-input feature map with increased channel number; performing feature extraction on the sub-input feature map with the increased number of channels through depth separable convolution to obtain feature information; and performing dimensionality reduction on the feature information through 1 × 1 point convolution to obtain a sub-output feature map, wherein the number of channels of the sub-output feature map is equal to that of channels of the sub-input feature map.

Further, the performing dimensionality raising on the sub-input feature map by 1 × 1 point convolution includes: performing dimension increasing on the sub-input feature map through 1 × 1 point convolution, and performing nonlinear operation on a convolution result after dimension increasing by adopting a nonlinear activation function; the feature extraction by depth separable convolution comprises: firstly, carrying out feature extraction through depth separable convolution, and then carrying out nonlinear operation through the nonlinear activation function; the dimension reduction of the characteristic information through 1 × 1 point convolution includes: and reducing the dimension of the characteristic information through 1 multiplied by 1 point convolution, and performing activation processing through a linear activation function.

Further, the nonlinear activation function is a ReLU6 activation function.

The invention also provides a convolution operation optimization device of the neural network model, which is applied to convolution operation for extracting characteristic information from an input characteristic diagram and obtaining an output characteristic diagram, and comprises the following steps: the channel decomposition module is used for segmenting the input feature map in channel dimensions to obtain sub-input feature maps of each group; the grouping convolution module is used for respectively carrying out different grouping convolution operations on each sub-input feature map, extracting feature information of channels contained in each group and obtaining different sub-output feature maps; and the characteristic shuffling module is used for shuffling and combining the sub-output characteristic graphs of all the groups to obtain the output characteristic graph.

Further, the packet convolution module includes: a single-channel feature extraction unit, configured to separately process information of each channel of the sub-input feature map by using a plurality of depth-separable convolutions, where the depth-separable convolutions are convolution kernels with a channel of 1, and the number of the depth-separable convolutions used is equal to the number of channels of the sub-input feature map; and the feature merging unit is used for merging the features of the different channels extracted by the depth separable convolution through 1 x 1 point convolution to obtain a sub-output feature map.

Further, the packet convolution module includes: the dimension increasing unit is used for increasing the dimension of the sub-input feature map through 1 multiplied by 1 point convolution to obtain the sub-input feature map with increased channel number; the characteristic extraction unit is used for extracting the characteristics of the sub-input characteristic graphs with the increased channel number through depth separable convolution to obtain characteristic information; and the dimension reduction unit is used for performing dimension reduction on the feature information through 1 multiplied by 1 point convolution to obtain a sub-output feature map, wherein the number of channels of the sub-output feature map is equal to that of the channels of the sub-input feature map.

Further, the packet convolution module further includes: the first nonlinear activation unit is used for carrying out nonlinear operation on the convolution result after the dimension is increased by adopting a nonlinear activation function; a second nonlinear activation unit for performing a nonlinear operation by the nonlinear activation function after feature extraction by depth separable convolution; and the linear activation unit is used for performing activation processing through a linear activation function after dimension reduction is performed on the characteristic information through 1 × 1 point convolution.

The convolution operation optimization method and device of the neural network model provided by the invention can bring the following beneficial effects:

1. the invention optimizes the calculated amount and parameters of the convolution operation by carrying out the channel decomposition of the characteristic diagram and the channel decomposition of the convolution kernel in the convolution operation, thereby realizing the effect of the convolution operation optimization of the neural network model.

2. The invention optimizes the convolution operation of CNN in the network on the premise of ensuring the model effect, thereby reducing the calculated amount and the operation time consumption of the network model and meeting the real-time requirement under some classical application scenes.

Drawings

The above features, technical features, advantages and implementations of a method and apparatus for optimizing convolution operations of a neural network model will be further described in the following detailed description of preferred embodiments with reference to the accompanying drawings.

FIG. 1 is a flow diagram of one embodiment of a method for convolutional optimization of a neural network model of the present invention;

FIG. 2 is a flow diagram of another embodiment of a method for convolutional optimization of a neural network model of the present invention;

FIG. 3 is a flow diagram of another embodiment of a method for convolutional optimization of a neural network model of the present invention;

FIG. 4 is a schematic structural diagram of an embodiment of an apparatus for optimizing convolution operations of a neural network model according to the present invention;

FIG. 5 is a schematic structural diagram of an apparatus for optimizing convolution operations of a neural network model according to another embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an apparatus for optimizing convolution operations of a neural network model according to another embodiment of the present invention;

fig. 7 is a block diagram of a Group-incorporation structure of fig. 1.

The reference numbers illustrate:

110. the device comprises a channel decomposition module, 120 a grouping convolution module, 130 a feature shuffling module, 121 a single-channel feature extraction unit, 122 a feature merging unit, 123 a dimension ascending unit, 124 a feature extraction unit, 125 a dimension descending unit, 126 a first nonlinear activation unit, 127 a second nonlinear activation unit and 128 a linear activation unit.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".

In an embodiment of the present invention, as shown in fig. 1, a convolution operation optimization method for a neural network model, applied to a convolution operation for extracting feature information from an input feature map to obtain an output feature map, includes:

step S100 is to segment the input feature map in the channel dimension to obtain sub-input feature maps of each group.

Specifically, the input feature map may be an input original image of a certain model (for example, based on a convolutional neural network), or may be input feature information of a certain convolutional layer in the model. The data volume of the characteristic diagram is represented by H (height) W (width) C (number of channels), and the meaning is: data for C channels, each channel having H × W data. Since the present invention relates only to the amount of data of various feature maps and not to specific feature data, the following description is simplified, and the feature maps are denoted by H × W × C.

And splitting the input feature graph in the channel dimension, namely, grouping certain channels into a group. For example, the input feature map is 27 × 16, and the input feature map is grouped into 4 (16/4) groups by 4 channels, and each group includes data of 4 channels, thereby forming a sub-input feature map.

The above process is referred to as channel decomposition of the input feature map.

Step S200 performs different group convolution operations on each sub-input feature map, and extracts feature information of channels included in each group to obtain different sub-output feature maps.

Specifically, the group convolution operation refers to a convolution operation for each group. The input of the group convolution operation is a sub-input feature map, and the output is a sub-output feature map, so as to extract feature information from the sub-input feature map, wherein the feature information forms the sub-output feature map.

The group convolution operation may use convolution kernels having the same size as the conventional convolution operation, but the number of convolution kernels is only 1/group of the number of conventional convolution kernels (the convolution kernels used in the conventional convolution operation, referred to as conventional convolution kernels). The number of the conventional convolution kernels is equal to the number of channels of the output characteristic diagram, and the number of the convolution kernels of the group convolution is equal to the number of the channels of the output characteristic diagram divided by the number of the groups, which is also called as channel decomposition of the convolution kernels.

In order to ensure that the input feature map can be uniformly segmented and the conventional convolution kernel can be uniformly decomposed during channel decomposition, the group number is usually a common divisor of the number of channels of the input feature map and the number of channels of the output feature map.

Computational efficiency is improved since each group convolution only needs to process feature maps of a few channels.

For example, the input feature map is D_F×D_F×m，D_FThe height and width of the input characteristic diagram are shown, and m is the channel number of the input characteristic diagram; the output characteristic diagram is D_F×D_FX n, n is the channel number of the output characteristic diagram; the convolution kernel size is s × s, and with conventional convolution processing, n convolution kernels of s × s × m are used, and the amount of computation for each convolution kernel to complete the convolution operation of the input feature map is: s × s × m × D_F×D_FThe calculated number of n convolution kernels is s × s × m × D_F×D_F×n。

By using block convolution, assuming that the input data is divided into g groups, each sub-input feature map is D_F×D_FX (m/g), the corresponding sub-output feature map is D_F×D_F(n/g), each volume office using (n/g) s × s × g convolution kernels, the calculated amount for each volume office being s × s × (m/g) × D_F×D_FX (n/g). There are g group volume offices, so the total calculated amount is s × s × (m/g) × D_F×D_FAnd x (n/g) x g, the calculated amount is reduced to 1/g compared with the traditional convolution processing.

In addition, by adopting the block convolution, compared with the traditional convolution, the parameters required for performing the convolution operation are also reduced, and the analysis is as follows:

the parameters required for conventional convolution are m × s × s × n. Adopting grouping convolution, wherein the parameter required by each grouping convolution is (m/g) multiplied by s multiplied by (n/g); a total of g sets of convolutions, so the total parameter is (m/g) × s × s × (n/g) × g, reduced to the original 1/g.

Step S300 shuffles and combines the sub-output feature maps of all the groups to obtain an output feature map.

Specifically, after segmentation according to the channel dimensions, the features of the sub-output feature maps are relatively sparse, if all the grouped sub-output feature maps are connected in a simple cascade mode to serve as the output feature maps, information isolation of the feature maps among different groups can be caused, and the information expression capacity is reduced, so that the feature maps among different groups are mixed in a channel shuffling mode after group convolution, and feature interaction among the groups is realized.

For example, as shown in the Group-inclusion structure shown in fig. 7, taking an online video detection model as an example, an input video image has three channels of RGB, and pixel values of the three channels form an input feature map, that is, the number of channels of the input feature map is 3. Segmenting and grouping the input characteristic graphs according to three channels of RGB to obtain 3 sub-input characteristic graphs which are respectively as follows: a sub-input characteristic diagram 1, a sub-input characteristic diagram 2 and a sub-input characteristic diagram 3. Each sub-input feature map contains only pixel values of one channel.

And performing group convolution on each sub-input feature map respectively to obtain 3 sub-output feature maps, wherein each sub-output feature map comprises features of 3 channels (11 is data of channel 1 of the first sub-output feature map, 12 is data of channel 2 of the first sub-output feature map, 13 is data of channel 3 of the first sub-output feature map, 21 is data of channel 1 of the second sub-output feature map, 31 is data of channel 1 of the third sub-output feature map, and the other meanings are analogized in turn).

And performing channel shuffling and combination on the 3 sub-output characteristic graphs, mixing the characteristics of different channels, realizing the characteristic interaction among the channels and obtaining the output characteristic graph. The data of the first 3 channels of the output feature map are respectively composed of the data of the 1 st channel in the sub-output feature maps 1-3, and the middle 3 channels and the last 3 channels are processed similarly.

In the embodiment, the operation amount and the parameter amount of the convolution operation are reduced and the convolution operation is optimized in a grouping convolution mode, so that the network model is favorably applied to scenes with high real-time performance.

In another embodiment of the present invention, as shown in fig. 2, a convolution operation optimization method for a neural network model includes:

The group convolution operation on each sub-input feature map specifically comprises the following steps:

step S210, respectively processing information of each channel of the sub-input feature map by using a plurality of depth-separable convolutions, where the depth-separable convolutions are convolution kernels with channels of 1, and the number of the depth-separable convolutions is equal to the number of channels of the sub-input feature map;

step S220 combines the features of the different channels extracted by the depth separable convolution through a 1 × 1 point convolution to obtain a sub-output feature map.

Specifically, the depth separable convolution is a convolution kernel with channel 1, each convolution kernel only processes the features of one channel of the sub-input feature map. However, each depth separable convolution only learns the characteristics of one channel, and therefore the characteristics of different channels are combined through 1 × 1 point convolution, so that the characteristics of different channels can be learned.

For example, the sub-input feature map is D_F×D_FXm 1, requiring m1 s × s × 1 depth separable convolutions to process the features of all channels of the sub-input feature map. The computational load for all depth separable convolutions is: s × s × m1 × D_F×D_F。

If the sub-output feature diagram has n1 channels, n1 point convolutions of 1 × 1 × m1 are needed, and the calculation amount of all the point convolutions is m1 × D_F×D_FX n 1. The calculation amount of this combination is s × s × m1 × D_F×D_F+m1×D_F×D_FX n 1. The calculated amount (s × s × m1 × D) relative to the conventional convolution method_F×D_FXn 1) for this combination, the computational cost is reduced to:

furthermore, the depth separable convolution also enables compression of the model through a reduction in the number of parameters. Continuing with the previous example, if conventional convolution is used, the required parameters are m1 × s × s × n 1; and the number of parameters for the depth separable convolution is s × s × m1+ n1 × m1, the compression ratio of the parameters is:

In this embodiment, on the basis of the packet convolution, the packet convolution operation is implemented by using the depth separable convolution and the 1 × 1 dot convolution, so that the amount of operation and the amount of parameters of the convolution operation are further reduced, and the convolution operation is optimized.

In another embodiment of the present invention, as shown in fig. 3, a convolution operation optimization method for a neural network model includes:

step S110, the input feature diagram is segmented according to the number of the channels of the input feature diagram, and a sub-input feature diagram of each channel is obtained.

Specifically, taking the online video detection model as an example, the input video image has three channels of RGB, the pixel values of each channel form a two-dimensional matrix, and the pixel values of the three channels form an input feature map, in other words, the number of channels of the input feature map is 3. And segmenting and grouping the input feature map according to three channels of RGB to obtain 3 sub-input feature maps, wherein each sub-input feature map only comprises a pixel value of one channel.

step S230 performs dimension increase on the sub-input feature map by 1 × 1 point convolution, and performs nonlinear operation on the dimension-increased convolution result by using a nonlinear activation function to obtain a sub-input feature map with an increased number of channels.

Step S240, firstly, performing feature extraction on the sub-input feature map with the increased number of channels through depth separable convolution, and then performing nonlinear operation through the nonlinear activation function to obtain feature information;

step S250, performing dimension reduction on the feature information through 1 × 1 point convolution, and performing activation processing through a linear activation function to obtain a sub-output feature map, where the number of channels of the sub-output feature map is equal to the number of channels of the sub-input feature map.

Specifically, the sub-input feature map only contains the pixel value of one channel, and the number of channels is already small, so the dimension is increased by performing 1 × 1 point-by-point convolution to increase the number of channels of the sub-input feature map. After the point convolution is finished, the non-linear expression capacity is increased through the non-linear activation function processing.

And then 3 x 3 deep separable convolution is used to extract features, and the amount of computation in the network is greatly reduced. And after the depth separable convolution is finished, the nonlinear activation function processing is carried out, so that the nonlinear expression capability is increased.

Optionally, the nonlinear activation function is a ReLU6 activation function. The ReLU6 is a special ReLU function that limits the maximum output to 6, suitable for handling high-dimensional feature inputs.

Finally, dimension reduction is carried out by using 1 × 1 point-by-point convolution, on one hand, the number of channels of the feature graph is recovered, on the other hand, a certain channel shuffling effect is also achieved, the feature graphs extracted by the depth separable convolution are mixed according to the channels, and the information expression capacity is improved. The dimensionality of the features after the last 1 x 1 point convolution is already minimal, and if the ReLU6 is reused, the features will be destroyed, resulting in a large loss of information. A linear activation function is therefore used after the final 1 x 1 point convolution.

The method of the embodiment is applied to a damaged video detection model, iterative training is carried out on the model, and the VFD-SmartNet network is obtained after convergence. Comparing it with AlexNet, VGG16, ResNet18, ResNet34, ResNet-like, DenseNet-like models, the performance is compared as follows:

the data in the table show that the operation speed is related to the parameter quantity, the parameter quantity of the VFD-SmartNet model is obviously reduced, the operation speed is obviously improved, the aim of network acceleration is well achieved, the recall ratio and the precision ratio are also maintained at higher levels, and the model can improve the operation speed on the premise of keeping the accuracy of the model so as to guarantee the requirement of real-time performance.

In another embodiment of the present invention, as shown in fig. 4, an apparatus for optimizing convolution operations of a neural network model includes:

and the channel decomposition module 110 is configured to segment the input feature map in a channel dimension to obtain sub-input feature maps of each group.

Specifically, the input feature map may be an input original image of a certain model (for example, based on a convolutional neural network), or may be input feature information of a certain convolutional layer in the model. The data volume of the characteristic diagram is represented by H (height) W (width) C (number of channels), and the meaning is: data for C channels, each channel having H × W data.

And the grouping convolution module 120 is configured to perform different grouping convolution operations on each sub-input feature map, and extract feature information of channels included in each group to obtain different sub-output feature maps.

And the feature shuffling module 130 is used for shuffling and combining the sub-output feature maps of all the groups to obtain the output feature map.

In another embodiment of the present invention, as shown in fig. 5, an apparatus for optimizing convolution operations of a neural network model includes:

Wherein, the packet convolution module 120 includes:

a single-channel feature extraction unit 121, configured to separately process information of each channel of the sub-input feature map by using a plurality of depth-separable convolutions, where the depth-separable convolutions are convolution kernels with a channel 1, and the number of depth-separable convolutions used is equal to the number of channels of the sub-input feature map;

and the feature merging unit 122 is configured to merge the features of the different channels extracted by the depth separable convolution through 1 × 1 point convolution to obtain a sub-output feature map.

If the sub-output feature diagram has n1 channels, n1 point convolutions of 1 × 1 × m1 are needed, and the calculation amount of all the point convolutions is m1 × D_F×D_FX n 1. The calculation amount of this combination is s × s × m1 × D_F×D_F+m1×D_F×D_FX n 1. Relative to each otherCalculated amount of the conventional convolution method (s × s × m1 × D_F×D_FXn 1) for this combination, the computational cost is reduced to:

In another embodiment of the present invention, as shown in fig. 4 and 6, an apparatus for optimizing convolution operations of a neural network model includes:

and the channel decomposition module 110 is configured to split the input feature map according to the number of channels of the input feature map, so as to obtain a sub-input feature map of each channel.

Wherein, the packet convolution module 120 includes:

a dimension raising unit 123, configured to raise the dimension of the sub-input feature map by 1 × 1 dot convolution;

the first nonlinear activation unit 126 is configured to perform nonlinear operation on the convolution result after the dimension increase by using a nonlinear activation function to obtain a sub-input feature map with an increased channel number;

a feature extraction unit 124, configured to perform feature extraction on the sub-input feature map with the increased number of channels through depth separable convolution;

a second nonlinear activation unit 127, configured to perform a nonlinear operation through the nonlinear activation function after performing feature extraction through depth separable convolution, so as to obtain feature information;

a dimension reduction unit 125, configured to perform dimension reduction on the feature information through a 1 × 1 point convolution;

a linear activation unit 128, configured to perform dimension reduction on the feature information through 1 × 1 point convolution, and then perform activation processing through a linear activation function to obtain a sub-output feature map, where the number of channels of the sub-output feature map is equal to the number of channels of the sub-input feature map.

A feature shuffling module 300 for shuffling and combining the sub-output feature maps of all the groups to obtain the output feature maps

as can be seen from the data in the table, the operation speed of the VFD-SmartNet model is obviously improved, the goal of network acceleration is well achieved, the recall ratio and the precision ratio are also maintained at higher levels, and the model can improve the operation speed on the premise of keeping the accuracy of the model so as to ensure the requirement of real-time performance.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A convolution operation optimization method of a neural network model is applied to convolution operation for extracting characteristic information from an input characteristic diagram and obtaining an output characteristic diagram, and is characterized by comprising the following steps:

segmenting the input feature map in the channel dimension to obtain sub-input feature maps of each group;

respectively carrying out different group convolution operations on each sub-input feature map, and extracting feature information of channels contained in each group to obtain different sub-output feature maps;

and shuffling and combining the sub-output characteristic graphs of all the groups to obtain the output characteristic graph.

2. The convolution optimization method of claim 1, wherein the slicing the input feature map in channel dimensions to obtain sub-input feature maps for each group comprises:

and segmenting the input feature diagram according to the number of the channels of the input feature diagram to obtain a sub-input feature diagram of each channel.

3. The convolution optimization method of claim 1, wherein the set of convolution operations comprises:

processing information of each channel of the sub-input feature map respectively by adopting a plurality of depth separable convolutions, wherein the depth separable convolutions are convolution kernels with channels being 1, and the number of the adopted depth separable convolutions is equal to the number of the channels of the sub-input feature map;

and combining the features of the different channels extracted by the depth separable convolution through 1 x 1 point convolution to obtain a sub-output feature map.

4. The convolution optimization method of claim 1, wherein the set of convolution operations comprises:

performing dimension increasing on the sub-input feature map through 1 × 1 point convolution to obtain a sub-input feature map with increased channel number;

performing feature extraction on the sub-input feature map with the increased number of channels through depth separable convolution to obtain feature information;

and performing dimensionality reduction on the feature information through 1 × 1 point convolution to obtain a sub-output feature map, wherein the number of channels of the sub-output feature map is equal to that of channels of the sub-input feature map.

5. The convolution optimization method of claim 4, wherein:

the step of performing dimensionality raising on the sub-input feature map through 1 × 1 point convolution comprises the following steps:

performing dimension increasing on the sub-input feature map through 1 × 1 point convolution, and performing nonlinear operation on a convolution result after dimension increasing by adopting a nonlinear activation function;

the feature extraction by depth separable convolution comprises:

firstly, carrying out feature extraction through depth separable convolution, and then carrying out nonlinear operation through the nonlinear activation function;

the dimension reduction of the characteristic information through 1 × 1 point convolution includes:

and reducing the dimension of the characteristic information through 1 multiplied by 1 point convolution, and performing activation processing through a linear activation function.

6. The convolution optimization method of claim 5, wherein:

the nonlinear activation function is a ReLU6 activation function.

7. A convolution operation optimization device of a neural network model is applied to convolution operation for extracting characteristic information from an input characteristic diagram and obtaining an output characteristic diagram, and is characterized by comprising the following steps:

the channel decomposition module is used for segmenting the input feature map in channel dimensions to obtain sub-input feature maps of each group;

the grouping convolution module is used for respectively carrying out different grouping convolution operations on each sub-input feature map, extracting feature information of channels contained in each group and obtaining different sub-output feature maps;

and the characteristic shuffling module is used for shuffling and combining the sub-output characteristic graphs of all the groups to obtain the output characteristic graph.

8. The convolution optimization device of claim 7, wherein the packet convolution module comprises:

a single-channel feature extraction unit, configured to separately process information of each channel of the sub-input feature map by using a plurality of depth-separable convolutions, where the depth-separable convolutions are convolution kernels with a channel of 1, and the number of the depth-separable convolutions used is equal to the number of channels of the sub-input feature map;

and the feature merging unit is used for merging the features of the different channels extracted by the depth separable convolution through 1 x 1 point convolution to obtain a sub-output feature map.

9. The convolution optimization device of claim 7, wherein the packet convolution module comprises:

the dimension increasing unit is used for increasing the dimension of the sub-input feature map through 1 multiplied by 1 point convolution to obtain the sub-input feature map with increased channel number;

the characteristic extraction unit is used for extracting the characteristics of the sub-input characteristic graphs with the increased channel number through depth separable convolution to obtain characteristic information;

and the dimension reduction unit is used for performing dimension reduction on the feature information through 1 multiplied by 1 point convolution to obtain a sub-output feature map, wherein the number of channels of the sub-output feature map is equal to that of the channels of the sub-input feature map.

10. The convolution optimization device of claim 9, wherein the packet convolution module further comprises:

the first nonlinear activation unit is used for carrying out nonlinear operation on the convolution result after the dimension is increased by adopting a nonlinear activation function;

a second nonlinear activation unit for performing a nonlinear operation by a nonlinear activation function after feature extraction by depth separable convolution;

and the linear activation unit is used for performing activation processing through a linear activation function after dimension reduction is performed on the characteristic information through 1 × 1 point convolution.