CN113255912A

CN113255912A - Channel pruning method and device for neural network, electronic equipment and storage medium

Info

Publication number: CN113255912A
Application number: CN202110637218.8A
Authority: CN
Inventors: 刘李洋; 张士龙; 旷章辉; 王新江; 陈益民; 张伟
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-08-13
Anticipated expiration: 2041-06-08
Also published as: CN113255912B

Abstract

The disclosure relates to a channel pruning method and device for a neural network, an electronic device and a storage medium. The method comprises the following steps: grouping network layers of a neural network to obtain network layer grouping results of the neural network; grouping the channels of the neural network according to the network layer grouping result to obtain a plurality of channel groups in the neural network; processing the training image through the neural network to obtain a loss function value corresponding to the training image; respectively determining importance values of the plurality of channel groups based on the loss function values corresponding to the training images; and pruning at least one channel group in the plurality of channel groups according to the importance values of the plurality of channel groups to obtain a pruned neural network, wherein the pruned neural network is used for image processing.

Description

Channel pruning method and device for neural network, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a channel pruning method and apparatus for a neural network, an electronic device, and a storage medium.

Background

A deep learning model used for image processing can generally achieve a higher accuracy of image processing than a non-deep learning model used for image processing. However, deep learning models for image processing typically require significant video memory and computational resources, resulting in difficulty in deployment on edge devices that lack high-end hardware. In addition, the deep learning model for image processing is large in energy consumption and serious in delay, resulting in a possibility of limiting the throughput of the cloud service. Channel pruning is carried out on the neural network for image processing, which means that the reasoning efficiency of the neural network for image processing is improved as much as possible on the premise of acceptable loss of image processing accuracy.

Disclosure of Invention

The present disclosure provides a technical scheme of channel pruning for a neural network.

According to an aspect of the present disclosure, there is provided a channel pruning method for a neural network, including:

grouping network layers of a neural network to obtain network layer grouping results of the neural network;

grouping the channels of the neural network according to the network layer grouping result to obtain a plurality of channel groups in the neural network;

processing the training image through the neural network to obtain a loss function value corresponding to the training image;

respectively determining importance values of the plurality of channel groups based on the loss function values corresponding to the training images;

and pruning at least one channel group in the plurality of channel groups according to the importance values of the plurality of channel groups to obtain a pruned neural network, wherein the pruned neural network is used for image processing.

Grouping network layers of a neural network to obtain a network layer grouping result of the neural network, grouping channels of the neural network according to the network layer grouping result to obtain a plurality of channel groups in the neural network, processing a training image through the neural network to obtain a loss function value corresponding to the training image, respectively determining importance values of the plurality of channel groups based on the loss function value corresponding to the training image, and pruning at least one channel group in the plurality of channel groups according to the importance values of the plurality of channel groups to obtain a pruned neural network, wherein the pruned neural network is used for image processing, so that in the process of channel pruning of the neural network, the network structure information of the neural network can be fully utilized for channel grouping, and the channel group is taken as the minimum pruning unit to carry out channel pruning, thereby being beneficial to the neural network after pruning to achieve higher image processing speed on the premise of maintaining the image processing accuracy.

In a possible implementation manner, the grouping the network layers of the neural network to obtain the network layer grouping result of the neural network includes:

determining a father network layer of a plurality of packet-to-be-grouped network layers in a neural network;

and grouping the multiple packet network layers to be grouped according to the father network layer of the multiple packet network layers to be grouped to obtain a network layer grouping result of the neural network.

By adopting the implementation mode, the data transmission relation between the network layers of the neural network can be accurately determined, so that the rapid and accurate network layer grouping is facilitated.

In one possible implementation manner, the grouping the plurality of packet-to-be-grouped network layers according to a parent network layer of the plurality of packet-to-be-grouped network layers includes:

responding to that a first packet-to-be-grouped network layer and a second packet-to-be-grouped network layer in the plurality of packet-to-be-grouped network layers have a common father network layer, and dividing the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer into the same network layer group, wherein the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer are any two packet-to-be-grouped network layers in the plurality of packet-to-be-grouped network layers;

and/or the presence of a gas in the gas,

and in response to that the first packet-to-be-grouped network layer is a network layer of a first preset type and the first packet-to-be-grouped network layer is a father network layer of the second packet-to-be-grouped network layer, dividing the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer into the same network layer group.

By adopting the implementation mode, accurate network layer grouping can be realized in the neural network with a complex structure.

In one possible implementation, the first preset type of network layer includes a packet convolutional layer and/or a depth separable convolutional layer.

In this implementation, by dividing a first packet-to-be-grouped network layer and a second packet-to-be-grouped network layer into the same network layer group in response to the first packet-to-be-grouped network layer being a packet convolutional layer and/or a depth-separable convolutional layer among the plurality of packet-to-be-grouped network layers and the first packet-to-be-grouped network layer being a parent network layer of the second packet-to-be-grouped network layer, accurate network layer grouping can still be achieved in a complex network structure including packet convolutional layers and/or depth-separable convolutional layers.

In one possible implementation manner, the determining the importance values of the plurality of channel groups respectively based on the loss function values corresponding to the training images includes:

and for any one of the plurality of channel groups, determining the importance value of the channel group according to the gradient of the loss function value corresponding to the training image on the mask of the channel in the channel group.

In this implementation, for any one of the plurality of channel groups, the importance value of each channel group is determined according to the gradient of the loss function value corresponding to the training image on the mask of the channel in the channel group, so that the importance value of each channel group can be accurately determined. By pruning the channel groups based on the importance values of the respective channel groups thus determined, it is possible to improve the speed of image processing by the neural network while reducing the sacrifice of the accuracy of the neural network.

In a possible implementation manner, the pruning at least one of the plurality of channel groups according to the importance values of the plurality of channel groups includes:

for any one channel group in the plurality of channel groups, determining the video memory occupation amount corresponding to the channel group;

according to the video memory occupation amount, normalizing the importance value of the channel group to obtain a normalized importance value of the channel group;

pruning at least one of the plurality of channel groups according to the normalized importance values of the plurality of channel groups.

By adopting the implementation mode, the channel group with higher redundancy (namely lower importance) and higher display memory occupation (namely larger promotion amount of inference speed after removal, namely larger actual acceleration) can be removed, and the inference speed and the accuracy of the neural network can be balanced, so that higher speed promotion can be obtained on the premise of removing fewer channels, and further the sacrifice of the accuracy of the neural network in channel pruning can be reduced.

In a possible implementation manner, the determining, for any one of the channel groups, a video memory occupation amount corresponding to the channel group includes:

and for any one of the channel groups, determining the video memory occupation amount corresponding to the channel group according to at least one of the number of characteristic graphs, the height of the characteristic graphs and the width of the characteristic graphs corresponding to the channels in the channel group.

By adopting the implementation mode, the video memory occupation amount corresponding to each channel group can be accurately determined.

In one possible implementation, the neural network performs a plurality of iterations;

pruning at least one of the plurality of channel groups according to the normalized importance values of the plurality of channel groups, comprising:

determining cumulative importance values for the plurality of channel groups from the normalized importance values for at least two iterations of the plurality of channel groups;

pruning at least one channel group of the plurality of channel groups according to the cumulative importance values of the plurality of channel groups.

In this implementation manner, the cumulative importance values of the plurality of channel groups are determined according to the normalized importance values of at least two iterations of the plurality of channel groups, and at least one of the plurality of channel groups is pruned according to the cumulative importance values of the plurality of channel groups, so that channel groups with low importance and high video memory occupation determined according to at least two iterations can be pruned, and the inference speed and accuracy of the neural network for image processing can be better balanced.

In one possible implementation manner, the pruning at least one of the plurality of channel groups according to the accumulated importance values of the plurality of channel groups includes:

and in response to the fact that the iteration number between the iteration and the last pruning iteration reaches a preset interval algebra, pruning at least one channel group in the plurality of channel groups according to the accumulated importance values of the plurality of channel groups, wherein the preset interval algebra is larger than 0.

By adopting the implementation mode, the accuracy of image processing of the neural network is kept in the channel pruning process of the neural network.

pruning the channel group with the lowest accumulative importance value in the plurality of channel groups.

In this implementation, by pruning only one channel group of the plurality of channel groups having the lowest cumulative importance value at a time, it is possible to help maintain the accuracy of image processing by the neural network.

and in response to the fact that the calculation cost of the neural network does not meet the preset condition, pruning at least one channel group in the plurality of channel groups according to the importance values of the plurality of channel groups.

In this implementation manner, in response to that the computational overhead of the neural network does not satisfy the preset condition, at least one channel group of the plurality of channel groups is pruned according to the importance values of the plurality of channel groups, so that a pruned neural network whose computational overhead satisfies the preset condition can be obtained.

In a possible implementation manner, the processing, by the neural network, a training image to obtain a loss function value corresponding to the training image includes:

and processing the training image through a channel which is not pruned in the neural network to obtain a loss function value corresponding to the training image.

In this implementation, after any channel is pruned, in the subsequent trimming and pruning process of the neural network, the pruned channel may not participate in the calculation, so that the calculation amount of the neural network can be reduced, and the access to the video memory can be reduced, thereby improving the inference speed of the image processing performed by the neural network, and accelerating the trimming and pruning process of the neural network.

According to an aspect of the present disclosure, there is provided a channel pruning apparatus for a neural network, including:

the network layer grouping module is used for grouping network layers of the neural network to obtain a network layer grouping result of the neural network;

the channel grouping module is used for grouping the channels of the neural network according to the network layer grouping result to obtain a plurality of channel groups in the neural network;

the image processing module is used for processing the training image through the neural network to obtain a loss function value corresponding to the training image;

a determining module, configured to determine importance values of the multiple channel groups respectively based on the loss function values corresponding to the training images;

and the pruning module is used for pruning at least one channel group in the plurality of channel groups according to the importance values of the plurality of channel groups to obtain a pruned neural network, wherein the pruned neural network is used for image processing.

In one possible implementation, the network layer packet module is configured to:

and/or the presence of a gas in the gas,

In one possible implementation, the determining module is configured to:

In one possible implementation, the pruning module is configured to:

the pruning module is used for:

In one possible implementation, the pruning module is configured to:

In one possible implementation, the determining module is configured to:

According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, a network layer grouping result of a neural network is obtained by grouping network layers of the neural network, channels of the neural network are grouped according to the network layer grouping result to obtain a plurality of channel groups in the neural network, a training image is processed through the neural network to obtain a loss function value corresponding to the training image, importance values of the plurality of channel groups are respectively determined based on the loss function value corresponding to the training image, at least one channel group in the plurality of channel groups is pruned according to the importance values of the plurality of channel groups to obtain a pruned neural network, wherein the pruned neural network is used for image processing, so that in a process of channel pruning of the neural network, channel grouping can be performed by fully utilizing network structure information of the neural network, and the channel group is taken as the minimum pruning unit to carry out channel pruning, thereby being beneficial to the neural network after pruning to achieve higher image processing speed on the premise of maintaining the image processing accuracy.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of a channel pruning method for a neural network provided in an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating grouping of network layers in a channel pruning method for a neural network according to an embodiment of the present disclosure.

Fig. 3 shows another schematic diagram of grouping network layers in the channel pruning method for a neural network provided by the embodiment of the present disclosure.

Fig. 4 shows another schematic diagram of grouping network layers in the channel pruning method of the neural network provided by the embodiment of the present disclosure.

Fig. 5 shows another schematic diagram of grouping network layers in the channel pruning method for a neural network provided by the embodiment of the present disclosure.

Fig. 6 shows another schematic diagram of grouping network layers in the channel pruning method for a neural network provided by the embodiment of the present disclosure.

Fig. 7 shows a block diagram of a channel pruning device of a neural network provided by an embodiment of the present disclosure.

Fig. 8 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure.

Fig. 9 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The disclosed embodiment provides a channel pruning method and device for a neural network, an electronic device and a storage medium, wherein a network layer grouping result of the neural network is obtained by grouping network layers of the neural network, channels of the neural network are grouped according to the network layer grouping result to obtain a plurality of channel groups in the neural network, a training image is processed through the neural network to obtain a loss function value corresponding to the training image, importance values of the channel groups are respectively determined based on the loss function values corresponding to the training image, at least one channel group in the channel groups is pruned according to the importance values of the channel groups to obtain a pruned neural network, wherein the pruned neural network is used for image processing, so that in the process of channel pruning of the neural network, the network structure information of the neural network can be fully utilized to carry out channel grouping, and the channel group is taken as the minimum pruning unit to carry out channel pruning, so that the neural network after pruning can achieve higher image processing speed on the premise of maintaining the image processing accuracy.

The channel pruning method for the neural network provided by the embodiment of the present disclosure is described in detail below with reference to the accompanying drawings.

Fig. 1 shows a flowchart of a channel pruning method for a neural network provided in an embodiment of the present disclosure. In one possible implementation, the channel pruning method of the neural network may be performed by a terminal device or a server or other processing device. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable device. In some possible implementations, the channel pruning method of the neural network may be implemented by a processor calling computer-readable instructions stored in a memory. As shown in fig. 1, the channel pruning method of the neural network includes steps S11 to S15.

In step S11, network layers of the neural network are grouped, and a network layer grouping result of the neural network is obtained.

In step S12, according to the network layer grouping result, the channels of the neural network are grouped, so as to obtain a plurality of channel groups in the neural network.

In step S13, the training image is processed by the neural network, and a loss function value corresponding to the training image is obtained.

In step S14, importance values of the plurality of channel groups are determined based on the loss function values corresponding to the training images, respectively.

In step S15, at least one channel group of the plurality of channel groups is pruned according to the importance values of the plurality of channel groups, so as to obtain a pruned neural network, where the pruned neural network is used for image processing.

The neural network in the embodiments of the present disclosure may employ various types of backbone networks, such as ResNet, ResNeXt, MobileNet V2, Reg-Net, and so forth. The neural network may be used for image classification, or may perform object detection based on images, and so on.

In the embodiment of the present disclosure, some or all network layers of the neural network may be grouped to obtain a network layer grouping result of the neural network. The network layer grouping result may include information of a network layer group to which a network layer in the neural network belongs. For example, the network layer grouping result may include at least one network layer group, and the network layer grouping result may include information of network layers belonging to the respective network layer groups. Wherein a network layer group may represent a set of network layers. The number of network layers in any network layer group may be one or more than two. That is, any network layer may be grouped individually or belong to the same network layer group as other network layers. In a possible implementation manner, network layers belonging to a second preset type in the neural network may be grouped to obtain a network layer grouping result of the neural network. That is, before grouping the network layers of the neural network, a network layer belonging to a second preset type in the neural network may be regarded as a packet-to-be-grouped network layer. Since channel pruning only affects the dimensionality of the features (i.e., the number of channels), when grouping the network layers of the neural network, only the convolutional layers and the fully-connected layers may be considered, while ignoring other types of network layers, i.e., the second preset type may include the convolutional layers and the fully-connected layers. Of course, in another possible implementation manner, each network layer in the neural network may also be respectively used as a packet-to-be-grouped network layer. After the packet-to-be-grouped network layer in the neural network is determined, grouping can be performed on the packet-to-be-grouped network layer, and a network layer grouping result of the neural network is obtained. In the embodiment of the present disclosure, the packet layers to be grouped of the neural network may be grouped in sequence, that is, only one packet layer to be grouped at a time. Of course, more than two packet-to-be-grouped network layers can be grouped at the same time.

In a possible implementation manner, the grouping the network layers of the neural network to obtain the network layer grouping result of the neural network includes: determining a father network layer of a plurality of packet-to-be-grouped network layers in a neural network; and grouping the multiple packet network layers to be grouped according to the father network layer of the multiple packet network layers to be grouped to obtain a network layer grouping result of the neural network. In this implementation, a computational graph of the neural network may be determined, and a parent network layer of a packet-to-be-grouped network layer in the neural network may be determined from the computational graph. Wherein, the parent network layer of any packet-to-be-grouped network layer can represent the upper network layer of the packet-to-be-grouped network layer. In one example, the packet-to-be-grouped network layer and the parent network layer thereof may both be a second preset type of network layer, for example, the packet-to-be-grouped network layer and the parent network layer thereof are both convolutional layers or full connection layers. By adopting the implementation mode, the data transmission relation between the network layers of the neural network can be accurately determined, so that the rapid and accurate network layer grouping is facilitated.

As an example of this implementation, grouping the plurality of packet-to-be-grouped network layers according to a parent network layer of the plurality of packet-to-be-grouped network layers includes: responding to that a first packet-to-be-grouped network layer and a second packet-to-be-grouped network layer in the plurality of packet-to-be-grouped network layers have a common father network layer, and dividing the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer into the same network layer group, wherein the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer are any two packet-to-be-grouped network layers in the plurality of packet-to-be-grouped network layers; and/or responding to the fact that the first packet network layer to be grouped is a network layer of a first preset type and the first packet network layer to be grouped is a father network layer of the second packet network layer, and dividing the first packet network layer to be grouped and the second packet network layer to be grouped into the same network layer group. In this example, for the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer, if any one of "the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer have a common parent network layer" (noted as a first condition) and "the first packet-to-be-grouped network layer is a network layer of a first preset type, and the first packet-to-be-grouped network layer is a parent network layer" (noted as a second condition) of the second packet-to-be-grouped network layer is satisfied, the first packet-to-be-grouped network layer and the second packet-to-be-grouped network layer may be divided into the same network layer group. That is, in this example, for any two packet-to-be-grouped network layers, if any one of "the two packet-to-be-grouped network layers have a common parent network layer" and "one of the two packet-to-be-grouped network layers is a network layer of a first preset type, and the packet-to-be-grouped network layer is a parent network layer of the other packet-to-be-grouped network layer" is satisfied, the two packet-to-be-grouped network layers are divided into the same network layer group. In a case where any network layer group includes two or more network layers, at least one of any network layer in the network layer group and other network layers of the network layer group satisfies the first condition and/or the second condition. That is, at least one of any one of the network layers in the network layer group and the other network layers in the network layer group satisfies a condition for being classified into the same network layer group. For example, if the network layer C1 and the network layer C2 satisfy the first condition and/or the second condition, and the network layer C2 and the network layer C3 satisfy the first condition and/or the second condition, the network layer C1, the network layer C2, and the network layer C3 may be divided into the same network layer group. The network layer C1 and the network layer C3 may satisfy the first condition and/or the second condition, or may not satisfy neither the first condition nor the second condition. By adopting this example, accurate network layer grouping can be realized in a neural network of a complex structure.

As an example of this implementation, the first preset type of network layer includes a packet convolutional layer and/or a depth separable convolutional layer. For example, the first preset type of network layers includes a packet convolutional layer and a depth separable convolutional layer. In this example, by dividing a first packet-to-be-grouped network layer of the plurality of packet-to-be-grouped network layers into the same network layer group in response to the first packet-to-be-grouped network layer being a packet convolutional layer and/or a deep-separable convolutional layer and the first packet-to-be-grouped network layer being a parent network layer of a second packet-to-be-grouped network layer, accurate network layer grouping can still be achieved in a complex network structure including packet convolutional layers and/or deep-separable convolutional layers.

Of course, in other examples, the first preset type of network layer may also include other types of network layers, which is not limited herein.

In one example, the packet layers to be grouped of the neural network may be grouped in sequence. The method can respond to the condition that any packet network layer to be grouped meets the requirement of adding any established network layer group, and the packet network layer to be grouped is added to the established network layer group; the network layer group corresponding to the packet-to-be-grouped network layer can be established in response to the fact that the packet-to-be-grouped network layer does not meet the condition of joining each established network layer group, or the established network layer group does not exist. The number of established network layer groups may be 0, 1, or more than 2. In the case that the number of the established network layer groups is 0, a network layer group corresponding to the packet network layer to be established may be established, that is, a network layer group may be newly established. In the case that the number of the established network layer groups is 1, it can be determined whether the packet-to-be-grouped network layer satisfies the condition of joining the established network layer group. For example, if any of the following conditions is satisfied, it may be determined that the packet-to-be-grouped network layer satisfies the condition for joining the established network layer group: the network layer to be grouped and any network layer in the established network layer group have a common father network layer; any network layer in the established network layer groups is a grouping convolution layer or a deep separable convolution layer, and the network layer is a father network layer of the packet-to-be-grouped network layer; the packet-to-be-grouped network layer is a packet convolutional layer or a deep separable convolutional layer, and the packet-to-be-grouped network layer is a parent network layer of any network layer in the established network layer group. If the packet-to-be-grouped network layer meets the condition of adding the packet-to-be-grouped network layer to the established network layer group, adding the packet-to-be-grouped network layer to the established network layer group; if the packet-to-be-grouped network layer does not meet the condition of adding the established network layer group, a network layer group corresponding to the packet-to-be-grouped network layer can be established. Under the condition that the number of the established network layer groups is more than 2, whether the packet-to-be-grouped network layer meets the condition of adding any established network layer group can be judged. If the packet-to-be-grouped network layer meets the condition of adding any established network layer group, adding the packet-to-be-grouped network layer to the established network layer group; if the packet-to-be-grouped network layer does not meet the condition of adding each established network layer group, a network layer group corresponding to the packet-to-be-grouped network layer can be established.

In this implementation, when pruning the channels in the neural network, only the input channels of the network layer may be considered. Of course, in other possible implementations, only the output channel of the network layer may be considered, or both the input channel and the output channel may be considered.

In another possible implementation manner, the grouping the network layers of the neural network to obtain the network layer grouping result of the neural network includes: determining a sub-network layer of a plurality of packet-to-be-grouped network layers in a neural network; and grouping the multiple packet network layers to be grouped according to the sub-network layers of the multiple packet network layers to be grouped to obtain a network layer grouping result of the neural network.

Fig. 2 is a schematic diagram illustrating grouping of network layers in a channel pruning method for a neural network according to an embodiment of the present disclosure. Fig. 2 shows the network structure of the residual block of the ResNet-50. In the network structure shown in fig. 2, a convolutional layer (C), a normalization layer (B), an activation layer (R), and a pooling layer (P) are included. The convolutional layer comprises C1, C2, C3, C4, C5 and C6, the normalization layer comprises B1, B2, B3, B4, B5 and B6, the activation layer comprises R1, R2, R3 and R6, and the pooling layer comprises P1. Through the depth-first search algorithm, it may be determined that convolutional layer C2 and convolutional layer C5 have a common parent network layer C1, and thus, convolutional layer C2 and convolutional layer C5 may be divided into the same network layer group. And pruning the input channels of the convolutional layer C2 and the convolutional layer C5, namely pruning the output channel of the convolutional layer C1.

Fig. 3 shows another schematic diagram of grouping network layers in the channel pruning method for a neural network provided by the embodiment of the present disclosure. In the bottleneck residual block (residual bottle block) shown in fig. 3, 3 × 3 convolution layers conv₃Singly grouped, coiled layer conv₂And convolution layer conv₅With common parent network layer conv₁Thus convolution layer conv₂And convolution layer conv₅Belonging to the same network layer group.

Fig. 4 shows another schematic diagram of grouping network layers in the channel pruning method of the neural network provided by the embodiment of the present disclosure. In the example shown in FIG. 4, convolutional layer conv₂And convolution layer conv₅With common parent network layer conv₁Thus convolution layer conv₂And convolution layer conv₅Belonging to the same network layer group. In addition, convolutional layer conv₃Is a packet convolutional layer, convolutional layer conv₄Conv with its parent network layer₃Belonging to the same network layer group, i.e. group convolutional layer conv₃Conv with its sub-network layer₄Belonging to the same network layer group.

Fig. 5 shows another schematic diagram of grouping network layers in the channel pruning method for a neural network provided by the embodiment of the present disclosure. Fig. 5 shows a characteristic pyramid network structure of RetinaNet. In FIG. 5, a feature pyramid layer P₃、P₄、P₅And P₆The network layer groups have a common father network layer and belong to the same network layer group; characteristic pyramid layer P₇The network layer has a common parent network layer with the head network (head networks), and belongs to the same network layer group.

Fig. 6 shows another schematic diagram of grouping network layers in the channel pruning method for a neural network provided by the embodiment of the present disclosure. FIG. 6 shows a network structure of a Faster regional Convolutional Neural network (Faster R-CNN). In FIG. 6, P₆To pool the layer, no network layer packets are made. The first convolution layer of the RPN (Region-pro-temporal Network) and the R-CNN (Region-temporal Neural Networks,regional convolutional neural networks) have a common parent network layer and belong to the same network layer group.

In the embodiments of the present disclosure, the number of channels in any one channel group may be one or more than two. That is, any channel may be grouped individually or may belong to the same channel group as other channels. In the case that any channel group includes more than two channels, the channels in the channel group may include channels belonging to the same network layer, or may include channels belonging to different network layers.

In the embodiments of the present disclosure, the first channel and the second channel may be divided into the same channel group in response to satisfying any one of the following: the first network layer to which the first channel belongs and the second network layer to which the second channel belongs belong to the same network layer group, and the first channel and the second channel are corresponding channels in the first network layer and the second network layer; the first channel and the second channel belong to the same group in the same group convolutional layer; the user instruction indicates that the first channel and the second channel belong to the same channel group. The first channel and the second channel represent any two channels, the first network layer represents a network layer to which the first channel belongs, and the second network layer represents a network layer to which the second channel belongs. In this case, the number of channels of each network layer in the network layer group may be the same, and the corresponding channels may belong to the same channel group. For example, if the network layer C1 and the network layer C2 belong to the same network layer group, the first channel of the network layer C1 and the first channel of the network layer C2 may be divided into the same channel group, the second channel of the network layer C1 and the second channel of the network layer C2 may be divided into the same channel group, and so on. In the packet convolutional layer, a plurality of channels belonging to the same packet may belong to the same channel group. For example, if the packet convolutional layer includes 3 packets, it is possible to divide each channel belonging to the first packet into a first channel group, divide each channel belonging to the second packet into a second channel group, and divide each channel belonging to the third packet into a third channel group. In addition, any two channels can be divided into the same channel group according to a user instruction. For example, any two channels belonging to the same network layer may be divided into the same channel group according to a user instruction. Of course, the user instruction may also be used to instruct to divide more than three channels into the same channel group, which is not limited herein.

In the case where any channel group includes more than two channels, the individual channels in that channel group will be pruned at the same time, or retained at the same time. By adopting the embodiment of the disclosure, the channel pruning can be performed on the neural network with any complex structure, for example, the channel pruning can be performed on the neural network with the structure of residual connection, grouping convolutional layer, depth separable convolutional layer and feature pyramid network.

In one example, the set of training images used to train the neural network may be written as

Wherein N is more than or equal to 1 and less than or equal to N, N represents the number of training images in the training image set, x_nRepresenting the n-th training image, y, of the set of training images_nDenotes x_nAnd corresponding marking data.

In this embodiment, for any training image in the training image set, the training image may be input to the neural network, a prediction result corresponding to the training image is obtained through the neural network, and a loss function value corresponding to the training image is determined according to a difference between the prediction result corresponding to the training image and the labeled data corresponding to the training image.

In a possible implementation manner, the processing, by the neural network, a training image to obtain a loss function value corresponding to the training image includes: and processing the training image through a channel which is not pruned in the neural network to obtain a loss function value corresponding to the training image. In this implementation, after any channel is pruned, in the subsequent trimming and pruning process of the neural network, the pruned channel may not participate in the calculation, so that the calculation amount of the neural network can be reduced, and the access to the video memory can be reduced, thereby improving the inference speed of the image processing performed by the neural network, and accelerating the trimming and pruning process of the neural network.

Of course, in another possible implementation manner, when determining the importance value of the channel group, the training image may also be processed by all channels in the neural network, so as to obtain the loss function value corresponding to the training image, and determine the importance value of the channel group based on the loss function value obtained thereby.

In the embodiment of the present disclosure, corresponding masks may be respectively allocated to the channels of the neural network, and the masks of the channels may be initialized to a first preset value. In the neural network, the mask of the channel in the channel group that is not pruned may be a first preset value, and the mask of the channel in the channel group that is pruned may be a second preset value. For example, the first preset value may be 1, and the second preset value may be 0. In the case where any one channel group includes two or more channels, the mask values of the respective channels in the channel group are the same. In the embodiment of the present disclosure, the importance value of each channel group may be determined according to a change situation of the loss function value caused by removing each channel group.

In one possible implementation manner, the determining the importance values of the plurality of channel groups respectively based on the loss function values corresponding to the training images includes: and for any one of the plurality of channel groups, determining the importance value of the channel group according to the gradient of the loss function value corresponding to the training image on the mask of the channel in the channel group. In this implementation, for any channel group, the importance value of the channel in the channel group may be determined according to the gradient of the loss function value corresponding to the training image on the mask of the channel in the channel group, and the importance value of the channel group may be determined according to the importance value of the channel in the channel group, where the importance value of the channel group is positively correlated with the importance value of the channel in the channel group. For example, in the case where any channel group includes only one channel, the importance value of the channel may be taken as the importance value of the channel group. In the case that any one channel group includes two or more channels, the importance value of any one channel in the channel group may be used as the importance value of the channel group, or an average value of the importance values of the respective channels in the channel group may be used as the importance value of the channel group, or a sum value of the importance values of the respective channels in the channel group may be used as the importance value of the channel group, which is not limited herein. In this implementation, for any one of the plurality of channel groups, the importance value of each channel group is determined according to the gradient of the loss function value corresponding to the training image on the mask of the channel in the channel group, so that the importance value of each channel group can be accurately determined. By pruning the channel groups based on the importance values of the respective channel groups thus determined, it is possible to improve the speed of image processing by the neural network while reducing the sacrifice of the accuracy of the neural network.

In this implementation, the importance value of any one channel group positively correlates with the modulus of the gradient of the loss function corresponding to the training image over the masks of the channels in that channel group. As an example of this implementation, the importance value of the channel group is positively correlated with the square of the gradient. In one example, the importance value for a channel group may be determined using a chain rule of gradient calculations based on the importance values of individual channels in the channel group. For example, the importance value s of channel i can be determined using equation 1_i：

Wherein N represents the number of training images in the set of training images,

representing the value of the loss function, m, for the nth training image_iA mask representing the channel i is shown,

indicating belonging to the same channel group as channel i

Is masked for channel x.

In another possible implementation manner, for any one of the plurality of channel groups, the importance value of the channel group may be determined according to a change condition of a loss function value corresponding to the training image under a condition that the channel group does not participate in calculation.

In a possible implementation manner, the pruning at least one of the plurality of channel groups according to the importance values of the plurality of channel groups includes: for any one channel group in the plurality of channel groups, determining the video memory occupation amount corresponding to the channel group; according to the video memory occupation amount, normalizing the importance value of the channel group to obtain a normalized importance value of the channel group; pruning at least one of the plurality of channel groups according to the normalized importance values of the plurality of channel groups. The video memory occupation amount corresponding to any channel group can represent the video memory capacity occupied by the characteristic diagram corresponding to the channel group. In this implementation, the normalized importance value of any channel group is positively correlated with the importance value of that channel group and negatively correlated with the video memory footprint corresponding to that channel group. For example, the ratio of the importance value of the channel group to the video memory occupation amount corresponding to the channel group can be used as the normalized importance value of the channel group. Because the speed increment of image processing can be more accurately estimated by comparing the neural network after the channel group is removed with the neural network without the channel group according to the display memory occupation amount corresponding to any channel group, the channel group with higher redundancy (namely lower importance) and higher display memory occupation amount (namely larger promotion amount of inference speed after the channel group is removed, namely larger actual acceleration) can be removed by adopting the implementation mode, the inference speed and the accuracy of the neural network can be balanced, and therefore, on the premise of removing fewer channels, larger speed promotion can be obtained, and further the sacrifice of the accuracy of the neural network in channel pruning can be reduced.

As an example of this implementation, the determining, for any one of the channel groups, a video memory occupation amount corresponding to the channel group includes: and for any one of the channel groups, determining the video memory occupation amount corresponding to the channel group according to at least one of the number of characteristic graphs, the height of the characteristic graphs and the width of the characteristic graphs corresponding to the channels in the channel group. In this example, for any channel, the video memory occupation amount corresponding to the channel can be determined according to one, two or three of the number, height and width of the feature map of the channel. For example, for any channel, the product of the number, height and width of the feature map of the channel can be determined as the video memory occupation amount corresponding to the channel. According to the display occupation amount corresponding to each channel in the channel group, the display occupation amount corresponding to the channel group can be obtained. By adopting the example, the video memory occupation amount corresponding to each channel group can be accurately determined.

As an example of this implementation, the neural network performs a plurality of iterations; pruning at least one of the plurality of channel groups according to the normalized importance values of the plurality of channel groups, comprising: determining cumulative importance values for the plurality of channel groups from the normalized importance values for at least two iterations of the plurality of channel groups; pruning at least one channel group of the plurality of channel groups according to the cumulative importance values of the plurality of channel groups. For example, in the t-th iteration of the neural network, the training image is processed by the neural network, and a t-th generation loss function value corresponding to the training image can be obtained, where t is an integer greater than or equal to 0. According to the t-th generation loss function value corresponding to the training image, parameters of the neural network can be updated, so that the neural network can be finely adjusted, and the accuracy of the neural network can be recovered after pruning. The t-th generation importance values of the plurality of channel groups can be respectively determined based on the t-th generation loss function values corresponding to the training images. According to the video memory occupation amount corresponding to the channel group, the tth generation importance value of the channel group can be normalized, and the tth generation normalized importance value of the channel group is obtained. From the normalized importance values of the t-th generation of the channel group and the normalized importance values of at least one generation prior to the t-1 th generation of the channel group, a cumulative importance value for the t-th generation of the channel group may be determined. For example, a cumulative importance value for a t-th generation of the channel group may be determined from a t-th generation normalized importance value for the channel group and q-th to t-1-th generations of the channel group, where the q-th generation represents a next generation of an iteration of a most recent pruning prior to the t-th generation, i.e., the q-1 generation represents an iteration of a most recent pruning prior to the t-th generation. The cumulative importance value for the t-th generation of the channel group may be determined as the sum of the t-th generation normalized importance value for the channel group and the q-th through t-1-th generation normalized importance values for the channel group. In this example, by determining the cumulative importance values of the plurality of channel groups according to the normalized importance values of at least two iterations of the plurality of channel groups, and pruning at least one channel group of the plurality of channel groups according to the cumulative importance values of the plurality of channel groups, the channel groups with lower importance and higher video memory occupation determined according to at least two iterations can be pruned, so that the inference speed and accuracy of the image processing by the neural network can be better balanced.

In one example, the pruning at least one of the plurality of channel groups according to the cumulative importance values of the plurality of channel groups includes: and in response to the fact that the iteration number between the iteration and the last pruning iteration reaches a preset interval algebra, pruning at least one channel group in the plurality of channel groups according to the accumulated importance values of the plurality of channel groups, wherein the preset interval algebra is larger than 0. For example, the preset interval algebra is d, wherein d > 0. In this example, channel pruning is not performed in each iteration, so that after each channel pruning, parameters of the neural network can be updated through subsequent iterations to perform fine tuning on the neural network, thereby recovering the accuracy of the neural network after the channel pruning. By adopting the example, the accuracy of image processing of the neural network is kept during channel pruning of the neural network.

In another example, channel pruning may also be performed in each iteration of the neural network.

In one example, the pruning at least one of the plurality of channel groups according to the cumulative importance values of the plurality of channel groups includes: pruning the channel group with the lowest accumulative importance value in the plurality of channel groups. In this example, by pruning only one of the plurality of channel groups having the lowest cumulative importance value at a time, the accuracy of image processing by the neural network is maintained.

In another example, pruning may also be performed for more than two channel groups at a time.

In another possible implementation, the importance values of the channel groups may be normalized according to floating point operands (FLOPs).

In a possible implementation manner, the pruning at least one of the plurality of channel groups according to the importance values of the plurality of channel groups includes: and in response to the fact that the calculation cost of the neural network does not meet the preset condition, pruning at least one channel group in the plurality of channel groups according to the importance values of the plurality of channel groups. In one example, if the FLOPs of the neural network reach an expected value, it may be determined that the computational overhead of the neural network meets a preset condition; if the FLOPs of the neural network do not reach an expected value, it can be determined that the computational overhead of the neural network does not meet a preset condition. Of course, other indicators may be used to determine the computational overhead of the neural network, and are not limited herein. In this implementation manner, in response to that the computational overhead of the neural network does not satisfy the preset condition, at least one channel group of the plurality of channel groups is pruned according to the importance values of the plurality of channel groups, so that a pruned neural network whose computational overhead satisfies the preset condition can be obtained.

In embodiments of the present disclosure, the pruned neural network may be used for image processing. For example, the neural network is used for image classification, and then, the to-be-classified image may be output to the pruned neural network, and the to-be-classified image is subjected to image classification by the pruned neural network, so as to obtain an image category to which the to-be-classified image belongs. For another example, if the neural network is used for target detection (e.g., pedestrian detection or vehicle detection), then the image to be detected may be input into the pruned neural network, and target detection is performed on the image to be detected through the pruned neural network, so as to obtain a tag of a target object in the image to be detected. By adopting the pruned neural network obtained by the embodiment of the disclosure to process images, the video memory occupation amount and the calculation cost can be reduced.

The channel pruning method for the neural network provided by the embodiment of the present disclosure is described below through a specific application scenario. In the application scenario, each convolutional layer and the full connection layer in the neural network can be respectively used as a packet-to-be-grouped network layer. For example, the set of packet-to-be-grouped network layers in the neural network can be written as

1≤j≤M，l_jTo represent

M denotes the number of packet layers to be grouped. A computational graph of the neural network may be determined

And in the calculation chart

Finding a packet layer l to be grouped by a depth-first search algorithm_jParent network layer P [ l ]_j]. The network layer packet result of the neural network can be recorded as

1≤k≤K，g_kAnd the K network layer group in the network layer grouping result is shown, and K is less than or equal to M.

In the case where no packets are being packetized to either packet-to-packet network layer,

may be an empty set. That is to say that the first and second electrodes,

may be initialized to an empty set. In this application scenario, each packet-to-be-grouped network layer of the neural network may be grouped in sequence. For any packet network layer to be grouped, if the packet network layer to be grouped meets the condition of adding any established network layer group, adding the packet network layer to be grouped into the established network layer group; the network layer group corresponding to the packet-to-be-grouped network layer may be established in response to that the packet-to-be-grouped network layer does not satisfy the condition of joining each established network layer group, or that no established network layer group exists. For example, if

Add l to g, where l denotes the packet layer to be grouped, P g]Representing a set of parent network layers for each network layer in the set of established network layers g. For another example, in the case that the established network layer group g includes a packet convolutional layer or a deep separable convolutional layer C, if

Add l to g, where l denotes the packet layer to be grouped, P g]Represents a set of parent network layers for each network layer in g.

After grouping each convolution layer and all-connection layer in the neural network to obtain a network layer grouping result, grouping the channels of each convolution layer and all-connection layer according to the network layer grouping result to obtain a plurality of channel groups. Each channel of the neural network may be assigned a corresponding mask, and the mask of each channel may be initialized to 1. In the neural network, the mask of a channel in a channel group that is not pruned may be 1, the mask of a channel in a channel group that has been pruned may be 0, and in the case where any channel group includes two or more channels, the mask values of the respective channels in the channel group are the same.

In this application scenario, the number of iterations of the neural network may be denoted as t, where t is an integer greater than or equal to 0. In the t-th iteration of the neural network, for any training image in the training image set, the training image may be input to the neural network, a t-th generation prediction result corresponding to the training image is obtained through the neural network, and a t-th generation loss function value corresponding to the training image is determined according to a difference between the t-th generation prediction result corresponding to the training image and the labeled data corresponding to the training image. According to the t-th generation loss function value corresponding to each training image, the parameters of the neural network can be updated, so that the neural network can be finely adjusted.

Based on the t-th generation loss function values corresponding to the training images, the t-th generation importance values of the channel groups can be determined respectively by using the above formula 1. According to the video memory occupation amount corresponding to each channel group, the tth generation importance value of each channel group can be normalized, and the tth generation normalized importance value of each channel group is obtained. For any channel group, the cumulative importance value of the t-th generation of the channel group may be determined as the sum of the normalized importance value of the t-th generation of the channel group and the normalized importance value of the q-th generation to the t-1-th generation of the channel group, where the q-th generation represents the next generation of the iteration of the latest pruning before the t-th generation, that is, the q-1-th generation represents the iteration of the latest pruning before the t-th generation. If t% d is 0, that is, the remainder obtained by dividing t by d is 0, pruning the channel group with the lowest cumulative importance value of the t-th generation among the plurality of channel groups. The iteration is repeated until the FLOPs of the neural network reach the expected value.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a channel pruning device for a neural network, an electronic device, a computer-readable storage medium, and a program, which can all be used to implement any one of the channel pruning methods for a neural network provided by the present disclosure, and corresponding technical solutions and technical effects can be referred to in corresponding descriptions of the method sections and are not described again.

Fig. 7 shows a block diagram of a channel pruning device of a neural network provided by an embodiment of the present disclosure. As shown in fig. 7, the channel pruning apparatus of the neural network includes:

a network layer grouping module 71, configured to group network layers of a neural network to obtain a network layer grouping result of the neural network;

a channel grouping module 72, configured to group channels of the neural network according to the network layer grouping result to obtain multiple channel groups in the neural network;

the image processing module 73 is configured to process the training image through the neural network to obtain a loss function value corresponding to the training image;

a determining module 74, configured to determine importance values of the multiple channel groups respectively based on the loss function values corresponding to the training images;

a pruning module 75, configured to prune at least one channel group of the multiple channel groups according to the importance values of the multiple channel groups, to obtain a pruned neural network, where the pruned neural network is used for image processing.

In one possible implementation, the network layer packet module 71 is configured to:

and/or the presence of a gas in the gas,

In one possible implementation, the determining module 74 is configured to:

In one possible implementation manner, the pruning module 75 is configured to:

the pruning module 75 is configured to:

In one possible implementation manner, the pruning module 75 is configured to:

In one possible implementation, the determining module 74 is configured to:

In some embodiments, functions or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementations and technical effects thereof may refer to the description of the above method embodiments, which are not described herein again for brevity.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described method. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.

Embodiments of the present disclosure also provide a computer program, which includes computer readable code, and when the computer readable code runs in an electronic device, a processor in the electronic device executes the above method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-volatile computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the above method.

An embodiment of the present disclosure further provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 8 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 8, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second generation mobile communication technology (2G), a third generation mobile communication technology (3G), a fourth generation mobile communication technology (4G)/long term evolution of universal mobile communication technology (LTE), a fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 9 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 9, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A channel pruning method of a neural network is characterized by comprising the following steps:

2. The method of claim 1, wherein the grouping the network layer of the neural network to obtain the network layer grouping result of the neural network comprises:

3. The method of claim 2, wherein the grouping the plurality of packet-to-be-grouped network layers according to a parent network layer of the plurality of packet-to-be-grouped network layers comprises:

and/or the presence of a gas in the gas,

4. The method of claim 3, wherein the first preset type of network layer comprises a packet convolutional layer and/or a depth separable convolutional layer.

5. The method according to any one of claims 1 to 4, wherein the determining the importance values of the plurality of channel groups respectively based on the loss function values corresponding to the training images comprises:

6. The method according to any one of claims 1 to 5, wherein said pruning at least one of the plurality of channel groups according to the importance values of the plurality of channel groups comprises:

7. The method according to claim 6, wherein the determining, for any one of the plurality of channel groups, a video memory occupation amount corresponding to the channel group comprises:

8. The method of claim 6 or 7, wherein the neural network performs a plurality of iterations;

9. The method of claim 8, wherein pruning at least one of the plurality of channel groups according to the cumulative importance values of the plurality of channel groups comprises:

10. The method according to claim 8 or 9, wherein said pruning at least one of said plurality of channel groups according to said cumulative importance values of said plurality of channel groups comprises:

11. The method according to any one of claims 1 to 10, wherein said pruning at least one of the plurality of channel groups according to the importance values of the plurality of channel groups comprises:

12. The method according to any one of claims 1 to 11, wherein the processing, by the neural network, a training image to obtain a corresponding loss function value of the training image comprises:

13. A channel pruning device for a neural network, comprising:

14. An electronic device, comprising:

one or more processors;

a memory for storing executable instructions;

wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the method of any one of claims 1 to 12.

15. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 12.