CN113361702A

CN113361702A - Convolutional neural network pruning, reasoning method, device and computer readable medium

Info

Publication number: CN113361702A
Application number: CN202010148557.5A
Authority: CN
Inventors: 涂小兵; 薛盛可; 李春强; 尚云海; 劳懋元; 张伟丰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2021-09-07

Abstract

The embodiment of the application provides a pruning and reasoning scheme for a convolutional neural network, which can determine significance metric values of all groups of convolution kernels in an ith convolutional layer in a convolutional neural network model, then carry out pruning processing on at least one group of convolution kernels with the minimum significance metric value in the ith convolutional layer, and update a characteristic diagram and convolution kernels of an (i + 1) th convolutional layer based on the ith convolutional layer after pruning processing. Because the significance metric value represents the influence degree of the convolution kernel on obtaining the correct result of the convolutional neural network model, after the convolution kernel with the minimum significance metric value and the next layer of associated convolution kernels are pruned, the calculation amount of the convolutional layer during inference can be reduced under the condition of having small influence on the identification result of the convolutional neural network model, and meanwhile, because the scheme only relates to improvement on a software level, the requirement on hardware resources can be reduced, and the influence of the limitation of the hardware resources on the performance of the convolutional neural network can be reduced.

Description

Convolutional neural network pruning, reasoning method, device and computer readable medium

Technical Field

The present application relates to the field of information technology, and in particular, to a convolutional neural network pruning method, a convolutional neural network reasoning method, a convolutional neural network pruning device, and a convolutional neural network reasoning device.

Background

In recent years, deep learning has enjoyed great success in applications of artificial intelligence, including computer vision, speech recognition, and natural language processing. Among them, Convolutional Neural Network (CNN) is a representative algorithm, and it has applications in the fields of computer vision, speech recognition, natural language processing, etc., and especially in the aspect of computer vision, it benefits from the continuous deepening of network structure, and convolutional neural network has deeper applications. At the same time, however, the ever-deepening network structure also leads to an ever-increasing amount of computing resources required for training and processing of network models. Due to the lack of a scheme capable of reducing the resource overhead of the convolutional neural network on a software plane, the convolutional neural network needs high-cost hardware support, the application scene of the convolutional neural network is greatly limited, and the use of the convolutional neural network in an environment with limited hardware resources is blocked.

Content of application

An object of the present application is to provide a convolutional neural network pruning scheme and an inference scheme based on the convolutional neural network, so as to solve the problem that the convolutional neural network has a high demand on computing resources.

The embodiment of the application provides a convolutional neural network pruning method, which comprises the following steps:

determining significance measurement values of all groups of convolution kernels in the ith convolution layer in a convolution neural network model, wherein the significance measurement values represent the influence degrees of the convolution kernels on the convolution neural network model to obtain correct results, and i is a positive integer;

pruning at least one group of convolution kernels with the minimum significance metric value in the ith convolution layer;

and updating the characteristic graph and the convolution kernel of the (i + 1) th convolutional layer based on the ith convolutional layer after pruning.

The embodiment of the application provides an inference method based on a convolutional neural network, which comprises the following steps:

acquiring a convolutional neural network model, wherein the convolutional neural network model is the convolutional neural network model which is obtained after pruning processing and retraining of the convolutional neural network pruning method;

and reasoning the input data by utilizing the convolutional neural network model to obtain output data.

The embodiment of the present application further provides a convolutional neural network pruning device, including:

the calculation processing module is used for determining significance metric values of all groups of convolution kernels in the ith convolution layer in the convolution neural network model, wherein the significance metric values represent the influence degrees of the convolution kernels on the convolution neural network model to obtain correct results, and i is a positive integer;

and the pruning processing module is used for carrying out pruning processing on at least one group of convolution kernels with the minimum significance metric value in the ith convolution layer, and updating the characteristic diagram and the convolution kernels of the (i + 1) th convolution layer based on the ith convolution layer after the pruning processing.

The embodiment of the application also provides inference equipment based on the convolutional neural network, and the method comprises the following steps:

the model acquisition module is used for acquiring a convolutional neural network model, and the convolutional neural network model is a convolutional neural network model which is pruned by convolutional neural network pruning equipment and retrained;

and the reasoning module is used for reasoning the input data by the convolutional neural network model to obtain the output data.

Some embodiments of the present application also provide a computing device, wherein the device comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the aforementioned convolutional neural network pruning method.

Still other embodiments of the present application provide a computer readable medium having computer program instructions stored thereon that are executable by a processor to implement the convolutional neural network pruning method.

In the pruning scheme for the convolutional neural network provided in the embodiment of the application, the significance metric values of each group of convolution kernels in the ith convolutional layer in the convolutional neural network model can be determined, then pruning is performed on at least one group of convolution kernels with the minimum significance metric value in the ith convolutional layer, and based on the ith convolutional layer after pruning, the feature graph and the convolution kernels of the (i + 1) th convolutional layer are updated. And (4) as for the convolutional neural network model after pruning, retraining the convolutional neural network model and then reasoning by using input data of the convolutional neural network model to obtain output data. Because the significance metric value represents the influence degree of the convolution kernel on obtaining the correct result of the convolutional neural network model, after the convolution kernel with the minimum significance metric value and the next layer of associated convolution kernels are pruned, the calculation amount of the convolutional layer during inference can be reduced under the condition of having small influence on the identification result of the convolutional neural network model, and meanwhile, because the scheme only relates to improvement on a software level, the requirement on hardware resources can be reduced, and the influence of the limitation of the hardware resources on the performance of the convolutional neural network can be reduced.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a processing flow chart of a convolutional neural network pruning method provided in an embodiment of the present application;

FIG. 2 is a diagram illustrating an association between convolutional layers in a convolutional neural network model according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating exemplary convolution kernels in the ith convolutional layer in the embodiment of the present application;

FIG. 4 is a diagram illustrating an update of an i +1 th layer based on a pruning process of the i-th layer according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a convolutional neural network pruning device provided in the present application;

fig. 6 is a schematic structural diagram of a computing device for pruning a convolutional neural network according to an embodiment of the present disclosure;

the same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the devices serving the network each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, program means, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The embodiment of the application provides a convolutional neural network pruning method, in the method, because the significance metric value represents the degree of influence of the convolutional core on obtaining a correct result by the convolutional neural network model, after the convolutional core with the minimum significance metric value and the convolutional core associated with the next layer are pruned, the calculation amount of the convolutional layer during inference can be reduced under the condition of small influence on the identification result of the convolutional neural network model, and meanwhile, because the scheme only relates to improvement on a software level, the requirement on hardware resources can be reduced, and the influence of the limitation of the hardware resources on the performance of the convolutional neural network is reduced.

In an actual scenario, the execution subject of the method may be a user equipment, a network device, or a device formed by integrating the user equipment and the network device through a network, and may also be a program running in the above device. The user equipment comprises but is not limited to various terminal equipment such as a computer, a mobile phone and a tablet computer; including but not limited to implementations such as a network host, a single network server, multiple sets of network servers, or a cloud-computing-based collection of computers. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, one virtual computer consisting of a collection of loosely coupled computers.

Fig. 1 shows a processing flow of a convolutional neural network pruning method provided in an embodiment of the present application, where the method at least includes the following processing steps:

step S101, determining significance measurement values of each group of convolution kernels in the ith convolution layer in the convolution neural network model.

For a convolutional neural network model, it may include multiple convolutional layers, and for one convolutional layer, its input is a feature map (FeatureMap) of the layer, each layer includes at least a feature map of a channel, after performing convolutional calculation on the feature map and the convolutional kernel of the layer, the calculation result may be used as the feature map of the next convolutional layer, and so on until the last convolutional layer outputs as the input of the fully-connected layer, so as to obtain the identification result of the network model.

Fig. 2 shows an association relationship between convolution layers in a convolutional neural network model in an embodiment of the present application. For the i-th convolutional layer, a set of feature maps FM _1 is input, the number of channels in the set of feature maps is 4, that is, feature maps corresponding to 4 different sets of channels (channels) are included, and these feature maps are FM _101, FM _102, FM _103, and FM _104, respectively. For the convolution kernels of the convolutional layer, the number of channels needs to be the same as that of the feature map input by the layer, that is, the number of channels of each set of convolution kernels of the i-th convolutional layer is also 4. For example, in the embodiment of the present application, the number of channels of the feature map FM _2 of the i +1 th convolutional layer is 5, and is respectively FM _201, FM _202, FM _203, FM _204, and FM _205, so that the number of groups of convolution kernels of the i-th convolutional layer is also 5, which can be denoted as Fliter11, Fliter12, Fliter13, Fliter14, and Fliter15, and fig. 3 shows a specific schematic diagram of convolution kernels in the i-th convolutional layer.

Thus, the convolution kernels of the i-th convolutional layer can be represented as a 4 × 5 convolution kernel matrix, where 4 is the number of convolution kernel channels, and 5 is the number of groups of convolution kernels, and a total of 20 convolution kernels are included, and the size of each convolution kernel can be set according to the requirements of an actual scene, for example, can be set to 3 × 3, 5 × 5, and the like. When convolution calculation is performed, convolution calculation is performed on input feature maps FM _101, FM _102, FM _103, and FM _104, respectively, and convolution kernels of corresponding channels in each set of convolution kernels, for example, taking a first set of convolution kernels as an example, convolution is performed on each convolution kernel Fliter111, Fliter112, Fliter113, and Fliter114 in the first set of convolution kernels in feature maps FM _101, FM _102, FM _103, and FM _104, respectively, and after merging results, a feature map FM _201 of a first channel of the i +1 th convolutional layer can be obtained, and feature maps of other channels of the i +1 th convolutional layer can be sequentially obtained in the same manner. For example, it can be assumed that the feature map size is 5 × 5, the convolution kernel size is 3 × 3, and the step size is set to 1, whereby the calculation result size is 3 × 3.

In an actual scene, modes such as an activation function and a pooling layer can be added between convolution layers, so that a nonlinear factor is added in the processing process, the data processing amount is reduced, and the processing performance of the convolution neural network is improved. For example, taking the above scenario as an example, the result output by performing convolution calculation on the i-th convolutional layer may be calculated through the activation function and the pooling layer, and the result after completing the nonlinear mapping and pooling may be output as the feature map of the i + 1-th layer. Since the processing of the activation function and the pooling layer is not related to the principle of the pruning scheme, for convenience of description, only a brief description is made in the embodiment of the present application, and details are not described again.

For a set of convolution kernels, the significance metric value represents how much the convolution kernel influences the convolutional neural network model to obtain correct results. The greater the significance metric value corresponding to the convolution kernel is, the greater the influence of the group of convolution kernels on the convolution neural network model to obtain a correct result is, and the higher the importance of the group of convolution kernels to the convolution neural network model is; conversely, the smaller the significance metric value corresponding to the convolution kernel is, the smaller the influence of the set of convolution kernels on the convolution neural network model to obtain a correct result is, and the lower the importance of the set of convolution kernels to the convolution neural network model is.

In an actual scenario, the specific evaluation criterion for the significance metric of the convolution kernel may be implemented in various ways. In some embodiments of the present application, an L1 norm or an L2 norm of convolution kernel weights is used as a significance measure value of convolution kernels, so that when determining a significance measure value of each group of convolution kernels in an i-th convolution layer in a convolution neural network model, a weight value of each group of convolution kernels in the i-th convolution layer in the convolution neural network model may be obtained, and then an L1 norm or an L2 norm of each group of convolution kernels is calculated according to the weight value of each group of convolution kernels in the i-th convolution layer, and the L1 norm or the L2 norm is determined as the significance measure value of the convolution kernels. For example, in the embodiment of the present application, specific weight values of the first set of convolution kernels of the ith convolution layer are as follows:

when using the L1 norm as the significance metric for each set of convolution kernels, the L1 norm at which the first set of convolution kernels is obtained can be calculated as:

L1 Norm＝3+3+6+6+9＝27

and after four other sets of L1 norms can be calculated in the same manner, each set of convolution kernels L1 norms can be used as its significance measure.

If the L2 norm is used as the significance measure for each set of convolution kernels, the L2 norm that yields the first set of convolution kernels can be calculated as:

and after four other sets of L2 norms can be calculated in the same manner, each set of convolution kernels L2 norms can be used as its significance measure.

In other embodiments of the present application, a preset loss function value may also be used as the significance metric of the convolution kernel, so that when determining the significance metric of each set of convolution kernels in the ith convolution layer in the convolution neural network model, the loss function value of each set of convolution kernels in the ith convolution layer in the convolution neural network model may be calculated according to the preset loss function, and the loss function value may be determined as the significance metric of the convolution kernel.

And step S102, pruning at least one group of convolution kernels with the minimum significance metric value in the ith convolution layer. After the significance metric value of each group of convolution kernels in the ith convolution layer is determined, the influence degree of each group of convolution kernels on the convolution neural network model can be judged based on the significance metric value, and the convolution kernels with small influence degree can be selected to be pruned, so that the convolution layer reduces the calculation amount of the group of convolution kernels when convolution calculation is carried out, and the requirement of the whole convolution neural network on calculation resources is reduced.

In some embodiments of the present application, the convolution kernels in the ith convolution layer may be sorted according to the significance metric value, and then at least one group of convolution kernels with the smallest significance metric value may be determined as a target convolution kernel according to the sorting result, and the target convolution kernel may be pruned.

The number of the pruned convolution kernels during pruning can be set according to the requirements of an actual scene, for example, a preset value can be set according to the number of the original convolution kernels, when the number of the original convolution kernels is small, the preset value can be 1, namely, the preset value is a group of convolution kernels with the minimum significance metric value of pruning each time, and when the number of the original convolution kernels is large, the preset value can also be set to be a larger number so as to reduce more calculation amount. Taking the scenario shown in fig. 2 as an example, if each set of convolution kernels L2 norm is used as its significance metric, 5 sets of significance metrics are obtained by calculation: fliter 11-6.245, Fliter 12-22.711, Fliter 13-29.821, Fliter 14-5.7, and Fliter 15-31.2. The sequencing result is as follows: fliter15> Fliter13> Fliter12> Fliter11> Fliter 14. Thus, if one set of convolution kernels having the smallest significance metric value is selected for pruning, the fourth set of convolution kernels Fliter14 may be subtracted, and if two sets of convolution kernels having the smallest significance metric value are selected for pruning, the fourth set of convolution kernels Fliter14 and the first set of convolution kernels Fliter11 may be subtracted.

In addition, in other embodiments of the present application, when at least one convolution kernel with the smallest significance metric value is determined as the target convolution kernel according to the sorting result, a significance curve of the convolution kernels in the ith convolution layer may be determined according to the sorting result, then the pruning number of the convolution kernels that need to be pruned is determined according to the steepness of the significance curve, and the N sets of convolution kernels with the smallest significance metric value in the ith convolution layer may be determined as the target convolution kernels. And N is the pruning quantity, and when the pruning quantity of the convolution kernel to be pruned is determined according to the steepness degree of the significance curve, the steepness degree can be judged based on the slope of the significance curve. For example, in an actual scene, the degree of steepness of the saliency curve can be evaluated by using evaluation values such as the average, variance, standard deviation and the like of the slopes, and the larger the evaluation value is, the steeper the saliency curve is, the more convolution kernels can be pruned by the convolutional layer, and conversely, the gentler the saliency curve is, the less convolution kernels can be pruned by the convolutional layer.

And step S103, updating the characteristic diagram and the convolution kernel of the (i + 1) th convolutional layer based on the ith convolutional layer after pruning. Because the adjacent convolutional layers have a correlation relationship, after the pruning processing is performed on the convolution kernel of the ith convolutional layer, the (i + 1) th convolutional layer also changes correspondingly, and the feature map and the convolution kernel therein can be updated correspondingly based on the pruning processing of the previous layer, so that the calculation amount of the (i + 1) th layer is also reduced correspondingly.

Taking the scenario shown in fig. 4 as an example, when the pruning operation is not performed, the output of the feature maps of the four channels of the i-th convolutional layer convolved with the fourth set of convolution kernels is: characteristic map FM _204 of the 4 th channel in the i +1 th convolutional layer. Therefore, after the fourth set of convolution kernels Filter14 is pruned in the ith convolution layer, the feature map FM _204 of the 4 th channel in the (i + 1) th convolution layer can be pruned accordingly. Further, since the feature map FM _204 of the 4 th channel is calculated by the convolution kernel of the associated channel (i.e., the 4 th channel) in each convolution kernel of the i +1 th layer when the convolution layer of the i +1 th layer performs convolution calculation, the convolution kernel of the associated channel in each convolution kernel of the i +1 th layer may also be pruned accordingly. Therefore, in the embodiment of the application, based on the ith convolutional layer after pruning, the feature map and the convolution kernel of the (i + 1) th convolutional layer are updated, that is, based on the ith convolutional layer after pruning, pruning is performed on the associated feature map and the convolution kernel of the associated channel in the (i + 1) th convolutional layer, so that the pruning range is expanded, and the calculation amount is further reduced on the premise of not affecting the performance of the convolutional neural network.

In an actual scene, i is a positive integer and represents the number of the convolutional layers, so that each convolutional layer can be traversed, and corresponding pruning processing is performed on each convolutional layer in the convolutional neural network model, thereby completing optimization of the whole network model.

Because the significance metric value represents the influence degree of the convolution kernel on obtaining the correct result of the convolutional neural network model, after the convolution kernel with the minimum significance metric value and the next layer of associated convolution kernels are pruned, the calculation amount of the convolutional layer can be reduced under the condition of having small influence on the identification result of the convolutional neural network model, and meanwhile, because the scheme only relates to improvement on a software level, the requirement on hardware resources can be reduced, and the influence of the limitation of the hardware resources on the performance of the convolutional neural network is reduced. In addition, after pruning is completed on the convolutional neural network model, the pruned convolutional neural network model can be retrained, and each parameter in the convolutional neural network model is updated, so that the identification accuracy is recovered.

In some embodiments of the present application, a specific manner when performing pruning on at least one set of convolution kernels with the smallest significance metric value in the ith convolution layer may be: and setting the weight value of at least one group of convolution kernels with the minimum significance metric value in the ith convolution layer as zero, and not updating the zero weight value in the training process, namely, the weight value of the partial convolution kernels is always zero, so that the pruning effect is realized. For example, for the first set of convolution kernels of the ith convolution layer, the specific weight values are as follows:

if pruning is performed on the set of convolution kernels, the weight values are updated as follows:

for the scenario shown in fig. 4, the weight values of the convolution kernels of each channel in the ith and fourth sets of convolution kernels (i.e., the convolution kernels labeled "x" in the ith layer) are all set to zero.

When the feature map and the convolution kernel of the i +1 th convolutional layer are updated based on the pruned i th convolutional layer, convolution calculation may be performed based on the pruned i th convolutional layer to obtain a feature map of the i +1 th convolutional layer, where a value of the associated feature map of the i +1 th convolutional layer is zero, and the associated feature map is a feature map of a channel corresponding to the pruned convolution kernel in the i th convolutional layer. Taking the scenario shown in fig. 4 as an example, the pruned convolution kernels in the i-th convolution layer are the fourth set of convolution kernels Filter14, and the feature map of the corresponding channel in the i + 1-th layer is the feature map FM _204 of the 4-th channel, so the associated feature map is FM _ 204. Since the weight values of the convolution kernels of each channel in the ith and fourth sets of convolution kernels are all zero, and the corresponding output results are also zero, all the associated feature maps FM _204 (i.e., feature maps marked with "x" in the map) of the (i + 1) th convolution layer are also zero.

Meanwhile, for the convolution kernels in the (i + 1) th convolution layer, the weight values of the convolution kernels of the associated channels in each group of convolution kernels of the (i + 1) th layer may be set to zero, and the zero weight values are not updated in the training process. Taking the scenario shown in fig. 4 as an example, in the convolution kernel of the (i + 1) th layer, the convolution kernel of the 4 th channel in each group of convolution kernels performs convolution operation with the correlation feature map FM _204, so that the correlation channel in each group of convolution kernels of the (i + 1) th layer is the 4 th channel, and at this time, the convolution kernels of the correlation channels in each group of convolution kernels of the (i + 1) th layer are the convolution kernels Filter214, Filter224, Filter234, Filter244 and Filter 254. Since the convolution kernels of the associated channels perform convolution operation with the associated feature map FM _204, and the content of the associated feature map FM _204 is zero, the result of the convolution operation is zero regardless of the change of the weight value of the convolution kernel of the associated channel, so that the weight value of the convolution kernel of the associated channel (i.e., the convolution kernel marked with "x" at the i +1 th layer in the figure) can be directly set to zero to reduce the amount of calculation.

Furthermore, in some embodiments of the present application, the entire pruning process may employ a globally greedy pruning strategy. The global situation in the pruning strategy is that after all the convolutional layers are pruned in a mode of traversing all the convolutional layers, the pruned convolutional neural network model is retrained to restore the identification accuracy, and the greedy means that after the convolutional kernels of one convolutional layer are pruned according to the significance measurement value, the weight values of the convolutional kernels in the next convolutional layer are immediately updated, and the weight values of the convolutional kernels of the associated channels in each convolutional kernel of the next layer are set to be zero, so that the convolutional kernels with the weight values set to be zero do not influence the processing process of the convolutional neural network model any more.

By applying the convolutional neural network pruning method, the complexity of neural network model processing can be reduced, so that the calculation overhead is reduced when reasoning is carried out based on the convolutional neural network. Therefore, the embodiment of the application also provides an inference method based on the convolutional neural network, the method can obtain a convolutional neural network model, the convolutional neural network model adopts the pruning method to carry out pruning, convolutional layers in the model are effectively compressed, and retraining is carried out after the pruning is finished. And then, the input data is reasoned by using the convolutional neural network after pruning and training, so that the output data can be quickly obtained. For example, in an actual scene, the input data may be a picture to be recognized, the output data may be category information of the picture, and inference performed on the picture to be recognized by using a convolutional neural network is a processing procedure for recognizing the picture, so as to obtain the category information of the picture. Because the convolutional layer is compressed by pruning, the calculation amount in the inference process can be obviously reduced, and the inference efficiency based on the convolutional neural network is improved.

Based on the same inventive concept, the embodiment of the application also provides convolutional neural network pruning equipment, the corresponding method of the equipment is the convolutional neural network pruning method in the embodiment, and the problem solving principle is similar to that of the method.

In the convolutional neural network pruning device provided by the embodiment of the application, because the significance metric value represents the degree of influence of the convolutional kernel on obtaining a correct result by the convolutional neural network model, after the convolutional kernel with the minimum significance metric value and the next layer of associated convolutional kernel are pruned, the calculation amount of the convolutional layer can be reduced under the condition of less influence on the identification result of the convolutional neural network model, and meanwhile, because the scheme only relates to improvement on a software level, the requirement on hardware resources can be reduced, and the influence of the limitation of the hardware resources on the performance of the convolutional neural network is reduced.

In an actual scenario, the convolutional neural network pruning device may be a user device, a network device, or a device formed by integrating a user device and a network device through a network, and may also be a program running in the device. The user equipment comprises but is not limited to various terminal equipment such as a computer, a mobile phone and a tablet computer; including but not limited to implementations such as a network host, a single network server, multiple sets of network servers, or a cloud-computing-based collection of computers. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, one virtual computer consisting of a collection of loosely coupled computers.

Fig. 5 shows a structure of a convolutional neural network pruning device provided in an embodiment of the present application, which includes a computation processing module 510 and a pruning processing module 520. The calculation processing module 510 is configured to determine a significance metric value of each convolution kernel in the ith convolution layer in the convolutional neural network model; the pruning processing module 520 is configured to perform pruning processing on at least one convolution kernel with the minimum significance metric value in the ith convolution layer, and update the feature map and the convolution kernel of the (i + 1) th convolution layer based on the pruned ith convolution layer.

Thus, the convolution kernels of the i-th convolutional layer can be represented as a 4 × 5 convolution kernel matrix, where 4 is the number of convolution kernel channels, and 5 is the number of groups of convolution kernels, and a total of 20 convolution kernels are included, and the size of each convolution kernel can be set according to the requirements of an actual scene, for example, can be set to 3 × 3, 5 × 5, and the like. When convolution calculation is performed, convolution calculation is performed on input feature maps FM _101, FM _102, FM _103, and FM _104, respectively, and convolution kernels of corresponding channels in each set of convolution kernels, for example, taking a first set of convolution kernels as an example, convolution is performed on each convolution kernel Fliter111, Fliter112, Fliter113, and Fliter114 in the first set of convolution kernels in feature maps FM _101, FM _102, FM _103, and FM _104, respectively, and after merging results, a feature map FM _201 of a first channel of the i +1 th convolutional layer can be obtained, and feature maps of other channels of the i +1 th convolutional layer can be sequentially obtained in the same manner. FIG. 3 shows a specific process of the above convolution calculation, such as

Assuming that the signature size is 5 × 5, the convolution kernel size is 3 × 3, and the step size is set to 1, the calculation result size is 3 × 3.

L1 Norm＝3+3+6+6+9＝27

After the calculation processing module 510 determines the significance metric of each set of convolution kernels in the ith convolution layer, the pruning processing module 520 may determine the influence degree of each set of convolution kernels on the convolution neural network model based on the significance metric, and may select to perform pruning processing on the convolution kernels with small influence degree to prune, so that the amount of calculation on the set of convolution kernels is reduced when the convolution computation is performed on the ith convolution layer, and the requirement on the calculation resources of the entire convolution neural network is reduced.

In some embodiments of the present application, the pruning processing module may first sort the convolution kernels in the ith convolution layer according to the significance metric value, then determine at least one group of convolution kernels with the smallest significance metric value as a target convolution kernel according to the sorting result, and perform pruning processing on the target convolution kernel.

After completing the pruning processing on the convolution kernel with the minimum significance metric value in the ith convolution layer, the pruning processing module 520 may update the feature map and the convolution kernel of the (i + 1) th convolution layer based on the pruned ith convolution layer. Because the adjacent convolutional layers have a correlation relationship, after the pruning processing is performed on the convolution kernel of the ith convolutional layer, the (i + 1) th convolutional layer also changes correspondingly, and the feature map and the convolution kernel therein can be updated correspondingly based on the pruning processing of the previous layer, so that the calculation amount of the (i + 1) th layer is also reduced correspondingly.

Because the significance metric value represents the influence degree of the convolution kernel on obtaining the correct result of the convolutional neural network model, after the convolution kernel with the minimum significance metric value and the next layer of associated convolution kernels are pruned, the calculation amount of the convolutional layer can be reduced under the condition of having small influence on the identification result of the convolutional neural network model, and meanwhile, because the scheme only relates to improvement on a software level, the requirement on hardware resources can be reduced, and the influence of the limitation of the hardware resources on the performance of the convolutional neural network is reduced. In addition, after pruning is completed on the convolutional neural network model, the training module can retrain the pruned convolutional neural network model again, and each parameter in the convolutional neural network model is updated, so that the recognition accuracy is recovered.

In some embodiments of the present application, a specific manner when the pruning processing module prunes at least one set of convolution kernels with the smallest significance metric value in the ith convolution layer may be: and setting the weight value of at least one group of convolution kernels with the minimum significance metric value in the ith convolution layer as zero, and not updating the zero weight value in the training process, namely, the weight value of the partial convolution kernels is always zero, so that the pruning effect is realized. For example, for the first set of convolution kernels of the ith convolution layer, the specific weight values are as follows:

When the feature map and the convolution kernel of the i +1 th convolutional layer are updated based on the pruned i th convolutional layer, the pruning processing module may perform convolution calculation based on the pruned i th convolutional layer to obtain the feature map of the i +1 th convolutional layer, where a value of the associated feature map of the i +1 th convolutional layer is zero, and the associated feature map is a feature map of a channel corresponding to the pruned convolution kernel in the i th convolutional layer. Taking the scenario shown in fig. 4 as an example, the pruned convolution kernels in the i-th convolution layer are the fourth set of convolution kernels Filter14, and the feature map of the corresponding channel in the i + 1-th layer is the feature map FM _204 of the 4-th channel, so the associated feature map is FM _ 204. Since the weight values of the convolution kernels of each channel in the ith and fourth sets of convolution kernels are all zero, and the corresponding output results are also zero, all the associated feature maps FM _204 (i.e., feature maps marked with "x" in the map) of the (i + 1) th convolution layer are also zero.

By applying the convolutional neural network pruning equipment to prune the convolutional neural network model, the complexity of neural network model processing can be reduced, and the calculation overhead is reduced when reasoning is carried out based on the convolutional neural network. Therefore, the embodiment of the application also provides inference equipment based on the convolutional neural network, and the inference equipment can comprise a model acquisition module and an inference module. The model obtaining module can obtain a convolutional neural network model, the convolutional neural network model is pruned by the convolutional neural network pruning equipment, convolutional layers in the model are effectively compressed, and the convolutional neural network model is retrained after pruning is completed. The training module is used for reasoning input data by using the convolutional neural network after pruning and training, and can quickly obtain output data. For example, in an actual scene, the input data may be a picture to be recognized, the output data may be category information of the picture, and inference performed on the picture to be recognized by using a convolutional neural network is a processing procedure for recognizing the picture, so as to obtain the category information of the picture. Because the convolutional layer is compressed by pruning, the calculation amount in the inference process is obviously reduced, and the inference efficiency based on the convolutional neural network can be improved.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. Some embodiments according to the present application include a computing device as shown in fig. 6, which includes one or more memories 610 storing computer-readable instructions and a processor 620 for executing the computer-readable instructions, wherein the computer-readable instructions, when executed by the processor, cause the device to perform the methods and/or aspects based on the embodiments of the present application.

Furthermore, some embodiments of the present application also provide a computer readable medium, on which computer program instructions are stored, the computer readable instructions being executable by a processor to implement the methods and/or aspects of the foregoing embodiments of the present application.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In some embodiments, the software programs of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A convolutional neural network pruning method, wherein the method comprises:

2. The method of claim 1, wherein pruning at least one set of convolution kernels with a minimum significance metric value among the ith convolution layer comprises:

sorting convolution kernels in the ith convolution layer according to the significance metric value;

determining at least one group of convolution kernels with the minimum significance metric value as target convolution kernels according to the sorting result;

and pruning the target convolution kernel.

3. The method of claim 2, wherein determining at least one set of convolution kernels having a minimum significance metric value as a target convolution kernel according to the ranking results comprises:

determining a significance curve of convolution kernels in the ith convolution layer according to the sorting result;

determining the pruning quantity of convolution kernels to be pruned according to the steepness degree of the significance curve;

and determining N groups of convolution kernels with the minimum significance metric value in the ith convolution layer as target convolution kernels, wherein N is the pruning quantity.

4. The method of claim 1, wherein determining a significance metric value for each set of convolution kernels in the ith convolution layer in the convolutional neural network model comprises:

obtaining the weight values of each group of convolution kernels in the ith convolution layer in the convolution neural network model;

calculating the L1 norm or the L2 norm of each group of convolution kernels according to the weight value of each group of convolution kernels in the ith convolution layer;

determining the L1 norm or L2 norm as a significance metric value for the convolution kernel.

5. The method of claim 1, wherein determining a significance metric value for each set of convolution kernels in the ith convolution layer in the convolutional neural network model comprises:

calculating the loss function value of each group of convolution kernels in the ith convolution layer in the convolution neural network model according to a preset loss function;

determining the loss function value as a significance metric value for the convolution kernel.

6. The method of claim 1, wherein pruning at least one set of convolution kernels with a minimum significance metric value among the ith convolution layer comprises:

and setting the weight value of at least one group of convolution kernels with the minimum significance metric value in the ith convolution layer to be zero, and not updating the weight value with the zero value in the training process.

7. The method of claim 6, wherein updating the feature map and convolution kernels for the (i + 1) th convolutional layer based on the pruned ith convolutional layer comprises:

performing convolution calculation on the ith convolutional layer after pruning to obtain a feature map of the (i + 1) th convolutional layer, wherein the value of the associated feature map of the (i + 1) th convolutional layer is zero, and the associated feature map is a feature map of a channel corresponding to a pruned convolutional kernel in the ith convolutional layer;

and zeroing the weight values of convolution kernels of associated channels in each group of convolution kernels of the (i + 1) th layer, and not updating the weight values of the zeroed weights in the training process, wherein the associated channels are channels corresponding to associated feature maps in the (i + 1) th layer of convolution kernels in the convolution kernels.

8. The method of any of claims 1 to 7, wherein the method further comprises:

and retraining the convolutional neural network model after pruning.

9. A convolutional neural network-based inference method, wherein the method comprises:

obtaining a convolutional neural network model, wherein the convolutional neural network model is obtained after pruning and retraining by the method of claim 8;

10. A convolutional neural network pruning device, wherein the device comprises:

11. The apparatus of claim 10, wherein the pruning processing module is configured to order convolution kernels in the ith convolution layer according to a significance metric value; determining at least one group of convolution kernels with the minimum significance metric value as target convolution kernels according to the sorting result; and pruning the target convolution kernel.

12. The apparatus according to claim 11, wherein the pruning processing module is configured to determine a significance curve of a convolution kernel in the ith convolutional layer according to the sorting result; determining the pruning quantity of convolution kernels to be pruned according to the steepness degree of the significance curve; and determining N groups of convolution kernels with the minimum significance metric value in the ith convolution layer as target convolution kernels, wherein N is the pruning quantity.

13. The apparatus according to claim 10, wherein the calculation processing module is configured to obtain weight values of each set of convolutional kernels in the ith convolutional layer in the convolutional neural network model; calculating the L1 norm or the L2 norm of each group of convolution kernels according to the weight value of each group of convolution kernels in the ith convolution layer; determining the L1 norm or L2 norm as a significance metric value for the convolution kernel.

14. The apparatus according to claim 10, wherein the calculation processing module is configured to calculate the loss function values of each set of convolutional kernels in the ith convolutional layer in the convolutional neural network model according to a preset loss function; determining the loss function value as a significance metric value for the convolution kernel.

15. The apparatus of claim 10, wherein the pruning processing module is configured to set the weight value of at least one set of convolution kernels with the smallest significance metric value in the ith convolution layer to zero, and not update the zero weight value in the training process.

16. The device according to claim 15, wherein the pruning processing module is configured to perform convolution calculation based on the pruned i-th convolutional layer to obtain a feature map of the i + 1-th convolutional layer, where a value of an associated feature map of the i + 1-th convolutional layer is zero, and the associated feature map is a feature map of a channel corresponding to a pruned convolution kernel in the i-th convolutional layer; and setting the weight value of the convolution kernel of the associated channel in each group of convolution kernels of the (i + 1) th layer to zero, and not updating the weight value set to zero in the training process, wherein the associated channel is a channel corresponding to the associated feature map in the (i + 1) th layer of convolution kernels in the convolution kernels.

17. The apparatus of any one of claims 10 to 16, wherein the apparatus further comprises:

and the training module is used for retraining the convolutional neural network model after pruning is completed.

18. An inference device based on a convolutional neural network, wherein the method comprises:

a model acquisition module for acquiring a convolutional neural network model, the convolutional neural network model being a convolutional neural network model pruned and retrained by the apparatus of claim 17;

19. A computing device, wherein the device comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the method of any of claims 1 to 9.

20. A computer readable medium having stored thereon computer program instructions executable by a processor to implement the method of any one of claims 1 to 9.