CN115049054A

CN115049054A - Channel self-adaptive segmented dynamic network pruning method based on characteristic diagram response

Info

Publication number: CN115049054A
Application number: CN202210739093.4A
Authority: CN
Inventors: 陈琳; 尚明生; 龚赛君
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2022-06-12
Filing date: 2022-06-12
Publication date: 2022-09-13
Anticipated expiration: 2042-06-12

Abstract

The invention relates to a channel self-adaptive segmented dynamic network pruning method based on characteristic graph response, belonging to the field of deep learning and comprising the following steps: s1: building a convolutional neural network; s2: segmenting the network into a front-end shallow network and a back-end deep network; s3: a Hough transform and perceptual hash algorithm is connected in series with the front end shallow network layer; s4: a gray level co-occurrence matrix and a gray level histogram are connected in series with the back end deep layer network layer; s5: training a convolutional neural network by using a data set; s6: calculating the similarity of the convolution channels; s7: cutting branches according to the cutting rate and the similarity; s8: and repeating the steps S5-S7, and completing the pruning of the whole convolutional neural network in a segmented manner and applying the pruning. The invention can decide pruning according to the importance degree, does not lead all pictures in the data set to forcibly follow the same pruning subnet, and realizes the purposes of segmented accelerated reasoning and reduced operation cost.

Description

Channel self-adaptive segmented dynamic network pruning method based on characteristic diagram response

Technical Field

The invention relates to a channel self-adaptive segmentation dynamic network pruning method based on feature map response, belongs to the field of deep learning, and is particularly suitable for channel self-adaptive dynamic network pruning based on feature map response.

Background

The advent of deep convolutional neural networks has attracted considerable interest and made substantial progress in many areas of research, such as computer vision and natural language processing. However, the number of parameters in deep convolutional neural network based models (e.g., ResNet-50) typically exceeds hundreds of megabytes, thus requiring billions of floating-point number multiplications to process the input data. This inevitably presents a significant challenge to the deployment of network models on resource-limited devices, such as mobile phones and embedded devices. Therefore, huge storage space and expensive computation cost have become major problems hindering practical application of deep convolutional neural networks in complex real-world scenarios. To solve this problem, the neural network needs to be compressed, i.e. the deep convolutional neural network is compressed without significantly reducing the performance of the model, and the number of parameters is reduced. The main research directions for model compression include: pruning, knowledge distillation, quantification and low rank decomposition. The pruning can prune redundant network structures from the deep convolutional neural network, and is one of the most intuitive methods for realizing the compact neural network. In general, the impact of pruning on deep convolutional neural networks can be made very small.

The classification of neural network pruning includes static pruning and dynamic pruning besides common unstructured pruning and structured pruning, the static pruning refers to a pruning network obtained after the execution of a pruning algorithm is completed, the structure of the pruning network is not changed in subsequent detection and practical application, and the dynamic pruning can adaptively select the network structure according to input data. Research on neural network pruning has emerged in the early 90 s of the last century, for example, hansen proposed an importance-based pruning method [1] in which weight attenuation related to the absolute value of the weight of each hidden unit in a neural network is added to minimize the number of hidden units. For example, the methods of OBD [2] and OBS [3] proposed in the beginning of the last 90 th century are used for calculating the second derivative of the loss function to the weight, namely a Hessian matrix, sequencing the importance degree of the weight in the neural network by using the second derivative, and then cutting the neural network according to the pruning rate. The method [4] of the AMC automatic compression in 2018 is proposed, and the method utilizes reinforcement learning to learn the optimal sparsity ratio of each layer from the main according to different actual requirements (ensuring precision or limiting calculated amount). A pruning algorithm [5] with two-step alternate optimization of training and compression is also provided, correlation analysis with large calculation amount is not needed, and the process of manually setting hyper-parameters can be reduced.

In practice, the correlation (similarity between the phase feature maps) calculated after the individual input is iterated for several times in the pruning network is very weak, which can cause all data in the default data set to forcibly follow the same pruning subnet, and limit the characterization capability, reasoning efficiency and interpretability of the network.

[1]Hanson，Stephen Jose，L.Y.Pratt.Comparing Biases for Minimal Network Construction with Back-Propagation[C]//Advances in Neural Information Processing Systems 1.1989.

[2]Lecun Y.Optimal Brain Damage[J].Neural Information Proceeding Systems 2，(279)1990：598-605.

[3]Hassibi B.Second Order Derivatives for Network Pruning：Optimal Brain Surgeon[J].Advances in neural information processing systems 5，1992：164--171.

[4]He Y.AMC：AutoML for Model Compression and Acceleration on Mobile Devices[J].2018.

[5]Carreira-Perpinan M A Y.Idelbayev.Learning-Compression Algorithms for Neural Net Pruning[C]//IEEE/CVF Conference on Computer Vision and Pattem Recognition(CVPR)IEEE.2018.

[6]He Y，Liu P，Wang Z，et al.Filter pruning via geometric median for deep convolutional neural networks acceleration[C]//The IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019，4340-4349.

[7]Dong X，Yang Y.Network pruning via transformable architecture search[C]//The Conference on Neural Information Processing Systems(NeurIPS).2019，760-771.

Disclosure of Invention

In view of this, the present invention provides a channel adaptive segmentation dynamic network pruning method based on a feature map response, which adaptively selects a convolution kernel channel of each network layer according to an input image, so that all pictures in a data set do not forcibly follow the same pruning subnet, thereby achieving the purposes of accelerating reasoning and reducing operation cost.

In order to achieve the purpose, the invention provides the following technical scheme:

the channel self-adaptive segmented dynamic network pruning method based on the characteristic diagram response is characterized by comprising the following steps of:

s1: building an n-layer convolutional neural network according to the actual application requirement;

s2: dividing the convolutional neural network of the n layers into two segmented networks of a front-end shallow network and a rear-end deep network according to the scale of the characteristic graph and the network abstraction degree; the front m layers are front end shallow layer networks and mainly extract the outline and texture information of the input image; the back n-m layer is a back-end deep network and mainly extracts the high-level semantic characteristics of the input image;

s3: a feature map information distribution response model formed by connecting Hough transform and perceptual Hash algorithm in series is connected in series between the back of the activation function of each layer of the front-end shallow network layer and the next convolutional neural network layer;

s4: a characteristic diagram information distribution response model formed by connecting a gray level co-occurrence matrix and a gray level histogram in series is connected in series between the activation function of each layer of the back-end deep network layer and the next convolutional neural network layer;

s5: starting from an input layer, respectively training the whole convolutional neural network by utilizing a data set according to the pruning sequence of a front-end shallow network and a rear-end deep network;

s6: respectively aiming at the trained segmented networks, calculating the similarity of the convolution channels according to the information distribution model output of each layer of feature diagram;

s7: sequentially cutting off the neuron branches with low similarity obtained in the step S6 according to the set cutting rate and the similarity for the segmented networks which are trained;

s8: and repeating the steps S5-S7, respectively completing pruning of the whole convolutional neural network, and applying the pruned convolutional neural network to practical problems.

Further, the size of the feature map scale in step S2 depends on the sampling operation; the network abstraction degree depends on the number of nodes of the convolutional neural network and the connection relation between the nodes; the convolutional neural network is divided into a front-end shallow network and a rear-end deep network, and the segmentation basis can be defined by judging whether the feature information extracted by the extracted convolutional layer is concrete outline and texture information or abstract high-level semantic characteristics.

Further, the hough transform described in step S3 is a method for extracting straight lines and circles, and can reduce noise interference better than other detection methods, which is advantageous for extracting contour information. The implementation of Hough transform in Matlab requires the following three steps:

(1) performing Hough transformation by adopting a hough () function to obtain a Hough matrix;

(2) searching a peak point in the Hough matrix by adopting a hough peaks () function;

(3) and obtaining contour information in the original image on the basis of the results of the previous two steps by adopting a houghlines () function.

In application, a Hough transform program can be written by Matlab, and then a Matlab file can be called by python.

Further, the step of the perceptual hash algorithm described in step S3 is as follows:

(1) discrete cosine transform is adopted to reduce frequency;

(2) and (3) reducing the size: in order to simplify the computation of the discrete cosine transform, perceptual hashing starts with small pictures;

(3) simplifying the color: the same as the average hash, the image needs to be converted into a gray image, so that the calculated amount is further simplified;

(4) calculating a discrete cosine transform: the discrete cosine transform decomposes pictures into frequency aggregation and a ladder shape, as shown in formula (r):

wherein, fd (u) is the coefficient after discrete cosine transform, R is the number of points of the original signal, and c (u) is a compensation coefficient, which can make the discrete cosine transform matrix an orthogonal matrix;

(5) reducing discrete cosine transform to present low-frequency information in the image;

(6) calculating the mean value of discrete cosine transform;

(7) further reducing discrete cosine transform, setting the mean value of the discrete cosine transform to be more than or equal to 1, and setting the mean value of the discrete cosine transform to be less than 0;

(8) combining 64 byte bits to generate a hash value;

(9) and calculating fingerprints of the two pictures and calculating the Hamming distance.

Further, the gray level co-occurrence matrix described in step S4 is a common method for expressing image texture information by studying the correlation characteristics of the gray level space, and generates a gray level map.

The gray histogram described in step S4 reflects the relationship between the occurrence frequency of each gray level pixel in an image and the gray level, and the image with the relationship between the frequency and the gray level is a histogram of a gray image with the gray level as the abscissa and the frequency as the ordinate, and the distribution state of the texture feature information of the feature map can be checked according to the gray histogram. Reference may be made to:

liu jianzhuang, chestnut wenqing a two-dimensional Otsu automatic threshold segmentation method of grayscale images [ J ] automated scientific report, 1993 (01): 101-105.

Further, the concatenated feature map information distribution response models of step S3 and step S4 also need to be removed after pruning is completed.

Further, the termination condition of the training in step S5 is: and determining the trained segmented network according to the pruning sequence, and determining whether the training of the segmented network is finished or not by comparing whether the difference between the segmented network in the current round and the feature map output by the last layer of the segmented network in the next round meets the requirement of a set threshold value or not in the iterative training.

Further, the process of calculating the similarity in step S6 is as follows:

s601: determining a distribution type and a distribution parameter according to the information distribution model output of each layer of feature map;

s602: generating a gray scale image with the same distribution and parameters as those of S601 for each convolution kernel of the next layer;

s603: carrying out convolution operation on the gray-scale image by using each convolution kernel of the next layer to respectively obtain a new feature image which is generated by each convolution kernel so as to represent the feature extraction capability of the convolution kernels under the distribution;

s604: calculating the image entropy for each new feature map; the entropy corresponds to the similarity, and the larger the entropy is, the higher the similarity is.

The calculation formula of the image entropy H is as follows:

wherein q is _i The gray value is a pixel point value of i, and n is the number of image pixels.

Further, the front-end shallow layer network and the back-end deep layer network in step S2 may further be further segmented into a plurality of sub-networks according to the multi-scale features of the output feature map scaling size distribution, each sub-network being a segmented network; at this time, the training and pruning processes in steps S5 to S7 need to be changed from the order of the front-end shallow network and the back-end deep network to the order of the front-end shallow network sub-network by sub-network and the back-end deep network sub-network by sub-network.

The invention has the beneficial effects that: the invention provides a channel self-adaptive segmentation dynamic network pruning method based on characteristic diagram response, and provides a framework for network layer convolution channel self-adaptive selection, namely a characteristic diagram information distribution response model.

Drawings

In order to make the purpose and technical scheme of the invention more clear, the invention provides the following drawings for explanation:

fig. 1 is a flow chart of a channel adaptive segmentation dynamic network pruning method based on a feature map response.

Detailed Description

In order to make the purpose and technical solution of the present invention more clearly understood, the present invention will be described in detail with reference to the accompanying drawings and examples.

Example (b):

there are two sets of RGB image data for various objects in the real world-CIFAR-10 and CIFAR-100. Wherein: CIFAR-10 has 10 categories, each category has 5000 training pictures and 1000 verification pictures; CIFAR-100 has 100 categories, each category having 500 training pictures and 100 verification pictures. The convolutional neural network is adopted to classify the objects, and in consideration of network compression, the embodiment provides a channel adaptive segmentation dynamic network pruning method based on characteristic diagram response.

With reference to fig. 1, the method comprises the following steps:

the method comprises the following steps: the dataset images are pre-processed using some typical data enhancement methods, including random resizing, cropping, brightness variation, and horizontal flipping. And the data set is divided into a training set and a testing set according to the ratio of 8: 2.

Step two: in order to better perform comparison experiments and show the effect of the invention, the embodiment builds ResNet-32 and ResNet-56 convolutional neural networks for classification processing. The initial learning rates are set to 0.1, and include convolution kernels of 1x1 and convolution kernels of 3x 3.

Step three: dividing ResNet-32 and ResNet-56 convolutional neural networks into a front-end shallow network and a back-end deep network according to the scale size of a feature map and the network abstraction degree, specifically, the whole convolutional neural network can be mapped into 4 segmented sub-networks according to output feature map scaling multi-scale features distributed in sizes of [ 4X, 8X, 16X and 32X ], wherein [ 4X and 8X ] are used for extracting body contour and texture information belonging to the front-end shallow network, and [ 16X and 32X ] are used for extracting abstract high-level semantic characteristics belonging to the back-end deep network.

Step four: a feature map information distribution response model formed by connecting Hough transform and a perceptual Hash algorithm in series is connected between the activation function of the front end shallow network layer of each layer and the next convolutional neural network layer in series;

step five: and a characteristic graph information distribution response model formed by connecting a gray level co-occurrence matrix and a gray level histogram in series is connected between the activation function of the deep network layer at the rear end of each layer and the next convolutional neural network layer in series.

Step six: starting from an input layer, respectively starting training on the whole convolutional neural network by using a data set according to the pruning sequence of a front-end shallow network [ 4X, 8X ] sub-network and a rear-end deep network [ 16X, 32X ] sub-network, wherein 4 segments of networks are required to be trained in sequence in total, and each segment is subjected to multiple rounds of iteration. The termination conditions of the training are: and determining the trained sub-networks according to the pruning order, and determining whether the training of the sub-networks is finished by comparing whether the difference between the feature maps output by the last layer of the sub-networks in the current round and the sub-networks in the next round in the iterative training meets the requirement of a set threshold.

In this embodiment, on a CIFAR-10 dataset, an original network is trained at least 150 times, and the batch size of the dataset is 128; on CIFAR100, training an original network for 200 times, wherein the batch size of a data set is 128; pruning rates were set at 50% and 70%.

Step seven: and sequentially and respectively aiming at the sub-networks which are trained, and calculating the similarity of the convolution channels according to the information distribution model output of each layer of feature diagram.

Specifically, the method comprises the following steps:

(1) determining a distribution type and a distribution parameter according to the information distribution model output of each layer of feature map;

(2) generating a gray scale image with the same distribution and parameters as those in the step (1) for each convolution kernel of the next layer;

(3) carrying out convolution operation on the gray-scale image by using each convolution kernel of the next layer to respectively obtain a new feature image which is generated by each convolution kernel so as to represent the feature extraction capability of the convolution kernels under the distribution;

(4) calculating the image entropy for each new feature map; the entropy corresponds to the similarity, and the larger the entropy is, the higher the similarity is.

The calculation formula of the image entropy H is as follows:

Step eight: and sequentially cutting off the neuron branches with low similarity obtained by calculation in the step S6 according to the set cutting-off rate and the similarity aiming at the segmented network which finishes training.

Step nine: and repeating the sixth step to the eighth step to complete pruning of the whole convolutional neural network, removing the redundant part, and testing the test set data by using the convolutional neural network after pruning to obtain the experimental results shown in the tables 1 and 2.

For better comparison of experimental results, this embodiment introduces two existing pruning methods, wherein the FPGM proposed in reference [6] is a filter pruning algorithm based on geometric median to remove redundant filters, rather than manually setting the pruning rate to remove relatively less important filters. Reference [7] proposes TAS as a convertible architectural search model to prune the network through channel probability distribution and knowledge transfer.

TABLE 1 comparison of the results of different pruning methods on CIFAR-10

TABLE 2 comparison of the results of different pruning methods on CIFAR-100

From tables 1 and 2, it can be seen that the method of the present invention has better precision when the cropping rate is similar. Of course, the accuracy is reduced by greatly increasing the cutting rate, but the reduction degree is more moderate, and the method is better than the existing method and can meet the use requirements of most cases.

The method can ensure the testing precision while pruning, and shows the effectiveness of the method. Therefore, all input images are not forced to follow the same pruning subnet, the expression capability of the convolutional neural network is greatly improved, and no matter which data set is classified, a convolutional kernel channel required by the input images can be adaptively found by the method, so that the method is helpful for the mechanism exploration of the convolutional neural network.

Meanwhile, the experimental result proves the correctness of the stepwise optimization characteristic of the convolutional neural network again, and also proves the importance of the stage characteristic diagram for transmitting the input image information, and the more the types of the data sets, the better the experimental effect can be, and the more the dynamic network with large category extending to small category can be considered in the future, and the mysterious search of the convolutional neural network is continued.

Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims

1. The channel self-adaptive segmented dynamic network pruning method based on the characteristic diagram response is characterized by comprising the following steps of:

s6: respectively aiming at the segmented networks which are trained, outputting and calculating the similarity of the convolution channels according to the information distribution model of each layer of feature diagram;

s8: and repeating the steps S5-S7 to respectively complete the pruning of the whole convolutional neural network, and applying the pruned convolutional neural network to practical problems.

2. The method for channel adaptive segmentation dynamic network pruning based on feature map response according to claim 1, wherein the feature map size of step S2 depends on sampling operation; the network abstraction degree depends on the number of nodes of the convolutional neural network and the connection relation between the nodes; the convolutional neural network is divided into a front-end shallow network and a rear-end deep network, and the segmentation basis can be defined by judging whether the feature information extracted by the extracted convolutional layer is concrete outline and texture information or abstract high-level semantic characteristics.

3. The method for pruning according to the channel adaptive segmentation dynamic network based on the feature map response of claim 1, wherein the concatenated feature map information distribution response models of step S3 and step S4 also need to be removed after pruning is completed.

4. The method according to claim 1, wherein the termination condition of the training in step S5 is: and determining the trained segmented network according to the pruning sequence, and determining whether the training of the segmented network is finished or not by comparing whether the difference between the segmented network in the current round and the feature map output by the last layer of the segmented network in the next round meets the requirement of a set threshold value or not in the iterative training.

5. The method according to claim 1, wherein the similarity calculation process in step S6 is as follows:

6. The method according to claim 1, wherein the front-end shallow layer network and the back-end deep layer network in step S2 are further segmented into a plurality of sub-networks according to the multi-scale features of the outputted feature map, each sub-network being a segmented network; at this time, the training and pruning processes in steps S5 to S7 need to be changed from the order of the front-end shallow network and the back-end deep network to the order of the front-end shallow network sub-network by sub-network and the back-end deep network sub-network by sub-network.