CN113302657B - Neural network compression method and device - Google Patents

Neural network compression method and device Download PDF

Info

Publication number
CN113302657B
CN113302657B CN201880099411.7A CN201880099411A CN113302657B CN 113302657 B CN113302657 B CN 113302657B CN 201880099411 A CN201880099411 A CN 201880099411A CN 113302657 B CN113302657 B CN 113302657B
Authority
CN
China
Prior art keywords
input feature
feature map
parameter
convolution
convolution kernels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201880099411.7A
Other languages
Chinese (zh)
Other versions
CN113302657A (en
Inventor
纪荣嵘
林绍辉
李与超
钟刚
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN113302657A publication Critical patent/CN113302657A/en
Application granted granted Critical
Publication of CN113302657B publication Critical patent/CN113302657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A neural network compression method and device can determine importance parameters corresponding to a first input feature map according to first parameters and second parameters, wherein the importance parameters represent importance degrees of the first input feature map, the first parameters are used for representing sparseness degrees of the first input feature map, and the second parameters are used for representing the number of features contained in the first input feature map. Because the sparseness of the first input feature map and the number of the features contained in the first input feature map are comprehensively considered when the importance parameter corresponding to the first input feature map is determined, the importance parameter is more accurate in evaluating the importance degree of the first input feature map, and the number of the second convolution kernels corresponding to the first input feature map after the neural network is compressed is determined according to the importance parameter, so that the convolution kernels corresponding to the first input feature map can be prevented from being deleted excessively or erroneously in the neural network compression process.

Description

Neural network compression method and device
Technical Field
The present application relates to the field of neural networks, and in particular, to a method and apparatus for compressing a neural network.
Background
In recent years, neural networks have shown extremely excellent performance in application scenes such as object classification and detection, voice processing, behavior recognition, image super-resolution, and the like, and have received attention. The neural network realizes effective improvement of performance by introducing a large amount of parameters and calculation, but also brings huge consumption of calculation resources and storage resources, so that the neural network is restricted from being used in devices such as mobile phones and the like.
In order to solve the problem, researchers propose to compress the neural network by exploiting redundancy of the parameters of the neural network, so as to reduce space and calculation amount occupied by a network model of the neural network, for example, for a convolution layer with huge calculation amount in a partial depth neural network, the network pruning technology can measure the importance degree of an input feature map of the convolution layer by the sum of absolute values of convolution kernels included in the convolution layer, and remove storage space of the convolution kernels corresponding to unimportant input feature maps and corresponding convolution calculation according to the importance degree, thereby reducing space and calculation amount occupied by the model network model and reducing calculation cost.
However, at present, when the convolutional layer is compressed, the accuracy of measuring the importance degree of the input feature map of the convolutional layer is not high, so that the neural network is not accurately compressed according to the importance degree of the input feature map, and the performance of the compressed neural network is greatly reduced.
Disclosure of Invention
The embodiment of the application provides a neural network compression method and device, which are used for solving the technical problem that the performance of a convolutional neural network is greatly reduced when the convolutional neural network is compressed at present.
In a first aspect, a neural network compression method is provided, in the method, according to at least one first convolution kernel corresponding to a first input feature map of a neural network model, a first parameter corresponding to the first input feature map and a second parameter are determined, wherein the first parameter is used for representing the sparseness of the first input feature map, and the second parameter is used for representing the number of features contained in the first input feature map; determining importance parameters corresponding to the first input feature map according to the first parameters and the second parameters, wherein the importance parameters represent the importance degree of the first input feature map; and determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter, wherein the number of the second convolution kernels is smaller than or equal to the number of the first convolution kernels.
By adopting the method, the sparseness of the first input feature map and the number of the features contained in the first input feature map are comprehensively considered when the importance parameter corresponding to the first input feature map is determined, so that the importance parameter is more accurate in evaluating the importance degree of the first input feature map, and then, the number of the second convolution kernels corresponding to the first input feature map can be determined according to the importance parameter of the first input feature map, if the number of the second convolution kernels is used as the convolution kernels corresponding to the first input feature map after the neural network is compressed, excessive or wrong deletion of the convolution kernels corresponding to the first input feature map in the neural network model compression process can be avoided, and great reduction of the neural network performance caused by the neural network model compression process can be avoided.
In one possible design, the first parameter is determined by the following formula:
Wherein s c is the first parameter, W i,c is the i-th first convolution kernel corresponding to the first input feature map, c represents the index of the first input feature map, W i,c||1 is the L1 norm of W i,c, N is the number of the first convolution kernels, N is a positive integer, and i is more than or equal to 1 and less than or equal to N. By adopting the formula to determine the first parameter, the accuracy in determining the first parameter can be improved.
In one possible design, when determining the second parameter corresponding to the first input feature map according to at least one first convolution kernel of the neural network model corresponding to the first input feature map, a nearest neighbor matrix of the at least one first convolution kernel may be determined according to the at least one first convolution kernel, and the second parameter may be determined according to the nearest neighbor matrix. The value of the jth column element of the ith row in the nearest neighbor matrix is determined by the following formula:
wherein, For the value of the ith row and jth column element, c is the index of the first input feature map, W i,c is the ith first convolution kernel corresponding to the first input feature map, W j,c is the jth first convolution kernel corresponding to the first input feature map,Indicating that the j-th first convolution kernel W j,c belongs to the k-nearest neighbor of the i-th first convolution kernel W i,c,Meaning that the j-th first convolution kernel W j,c does not belong to the k-nearest neighbor of the i-th first convolution kernel W i,c, k is an integer, and k is not less than 1 and not more than N, i is not less than 1 and not more than N, j is not less than 1 and not more than N, N is the number of the first convolution kernels, and W i,c-Wj,c is the L2 norm of (W i,c-Wj,c). The accuracy in determining the second parameter can be improved by adopting the design to determine the second parameter.
In one possible design, the second parameter is determined by the following formula:
wherein E c is the second parameter, Log 2 represents a base 2 logarithmic operation. The accuracy in determining the second parameter can be improved by determining the second parameter using the above formula.
In one possible design, the importance parameter is determined by the following formula:
wherein, An importance parameter initial value corresponding to the first input feature map, c is an index of the first input feature map, V c is the importance parameter, s c is the first parameter, E c is the second parameter, α is a preset parameter, and α is a non-negative real number; /(I)For the importance parameter initial value corresponding to all the first input feature images in the convolution layer where the first input feature images are located,/>For the maximum value of the importance parameter initial values corresponding to all the first input feature graphs in the convolution layer,/>And M is the number of all the first input feature pictures in the convolution layer and is a positive integer.
From the above design, it is understood that the V c is a pair ofThe normalized numerical value ranges between 0 and 1, so that the magnitude relation between importance parameters corresponding to different first input feature images can embody the importance degree between different first input feature images.
In addition, according to the design, the importance parameter corresponding to the first input feature map may be determined according to the first convolution kernel corresponding to the first input feature map, where, because the first convolution kernel corresponding to the first input feature map is related to only the index of the first input feature map and is unrelated to the attribute of the first input feature map, the application may determine the importance parameter corresponding to the first input feature map according to the index of the first input feature map without obtaining the first input feature map, so the manner of determining the importance parameter corresponding to the first input feature map is more flexible, or may determine the importance parameter corresponding to the first input feature map before obtaining the first input feature map, and determine the output feature map after obtaining the first input feature map to improve the image processing efficiency.
In one possible design, when determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter, the number of second convolution kernels may be determined according to the importance parameter and the number of first convolution kernels.
According to the design, the number of second convolution kernels can be used as convolution kernels corresponding to the first input feature images after the neural network is compressed, so that a cut-off mode is not adopted when the neural network model is compressed, namely, after the compression, all the first convolution kernels corresponding to the input feature images in the neural network model are not reserved or deleted, the number of the second convolution kernels corresponding to the compressed input feature images is determined according to the importance parameters corresponding to the input feature images, and the convolution kernels corresponding to the input feature images in the compressed neural network model is determined according to the number.
In one possible design, the number of second convolution kernels is determined by the following formula:
Wherein Q c is the number of the second convolution kernels, V c is the importance parameter, G and T are preset parameters, G and T are positive integers, Representing rounding down V c G,/>Represents the upward rounding of V c G, N is the number of the first convolution kernels, N is a positive integer,/>Representation pair/>And (5) carrying out upward rounding.
In one possible design, when the number of the second convolution kernels is not zero, the at least one first convolution kernel may be further clustered to obtain Q c sets of first convolution kernels, where Q c is the number of the second convolution kernels, and the second convolution kernels are obtained, where each second convolution kernel is a cluster center of each of the obtained sets of first convolution kernels.
In one possible design, the neural network model may also be trained based on second convolution kernels corresponding to all of the second input feature maps, the second input feature maps being the first input feature maps corresponding to at least one second convolution kernel. Therefore, through training, the performance of the neural network after the neural network is compressed is further improved.
In one possible design, the output feature map of the convolution layer where the first input feature map is located is determined according to all the second input feature maps and all the second convolution kernels corresponding to the second input feature maps, where the second input feature map is the first input feature map corresponding to at least one second convolution kernel.
In a second aspect, the present application provides an apparatus having the functionality to implement the method as referred to in any one of the possible designs of the first aspect or the first aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In one possible design, the apparatus may include a parameter determination unit and a convolution kernel determination unit. The parameter determining unit may be configured to determine, according to at least one first convolution kernel corresponding to a first input feature map of the neural network model, a first parameter corresponding to the first input feature map, where the first parameter is used to represent a sparseness degree of the first input feature map, and a second parameter, where the second parameter is used to represent a number of features included in the first input feature map; the parameter determining unit is further configured to determine an importance parameter corresponding to the first input feature map according to the first parameter and the second parameter, where the importance parameter characterizes an importance degree of the first input feature map; the convolution kernel determining unit is configured to determine, according to the importance parameter, a number of second convolution kernels corresponding to the first input feature map, where the number of second convolution kernels is smaller than or equal to the number of first convolution kernels.
In one possible design, the first parameter is determined by the following formula:
Wherein s c is the first parameter, W i,c is the i-th first convolution kernel corresponding to the first input feature map, c represents the index of the first input feature map, W i,c||1 is the L1 norm of W i,c, N is the number of the first convolution kernels, N is a positive integer, and i is more than or equal to 1 and less than or equal to N.
In one possible design, the parameter determining unit may be specifically configured to determine, when determining the second parameter corresponding to the first input feature map according to at least one first convolution kernel corresponding to the first input feature map of the neural network model, a nearest neighbor matrix of the at least one first convolution kernel according to the at least one first convolution kernel;
Determining the second parameter according to the nearest neighbor matrix;
The value of the jth column element of the ith row in the nearest neighbor matrix is determined by the following formula:
wherein, For the value of the ith row and jth column element, c is the index of the first input feature map, W i,c is the ith first convolution kernel corresponding to the first input feature map, W j,c is the jth first convolution kernel corresponding to the first input feature map,Indicating that the j-th first convolution kernel W j,c belongs to the k-nearest neighbor of the i-th first convolution kernel W i,c,Meaning that the j-th first convolution kernel W j,c does not belong to the k-nearest neighbor of the i-th first convolution kernel W i,c, k is an integer, and k is not less than 1 and not more than N, i is not less than 1 and not more than N, j is not less than 1 and not more than N, N is the number of the first convolution kernels, and W i,c-Wj,c is the L2 norm of (W i,c-Wj,c).
In one possible design, the second parameter is determined by the following formula:
wherein E c is the second parameter, Log 2 represents a base 2 logarithmic operation.
In one possible design, the importance parameter is determined by the following formula:
wherein, An importance parameter initial value corresponding to the first input feature map, c is an index of the first input feature map, V c is the importance parameter, s c is the first parameter, E c is the second parameter, α is a preset parameter, and α is a non-negative real number; /(I)For the importance parameter initial value corresponding to all the first input feature images in the convolution layer where the first input feature images are located,/>For the maximum value of the importance parameter initial values corresponding to all the first input feature graphs in the convolution layer,/>And M is the number of all the first input feature pictures in the convolution layer and is a positive integer.
In one possible design, the convolution kernel determining unit may be specifically configured to determine the number of second convolution kernels according to the importance parameter and the number of first convolution kernels when determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter.
In one possible design, the apparatus may further include a training unit configured to train the neural network model according to second convolution kernels corresponding to all of the second input feature maps, the second input feature maps being the first input feature maps corresponding to at least one second convolution kernel.
In one possible design, the apparatus may further include an output feature map determining unit, configured to determine, according to all second input feature maps and second convolution kernels corresponding to all the second input feature maps, an output feature map of a convolution layer where the first input feature map is located, where the second input feature map is the first input feature map corresponding to at least one second convolution kernel.
In a third aspect, embodiments of the present application provide a computer readable storage medium comprising program instructions which, when used on a computer, cause the computer to carry out the functions of the first aspect or any possible design thereof.
In a fourth aspect, embodiments of the application provide a computer program product which, when run on a computer, causes the computer to carry out the functions of the first aspect or any of the possible designs of the first aspect.
Fifth application, embodiments of the application provide a system comprising the apparatus of the second aspect or any of the possible designs of the second aspect.
In a sixth aspect, embodiments of the present application provide a chip, which may be coupled to a memory, for reading and executing a program or instructions stored in the memory to implement the functions involved in any possible design of the first aspect or the second aspect.
Drawings
FIG. 1A is a schematic diagram of a convolutional layer according to an embodiment of the present application;
FIG. 1B is a schematic diagram of an input feature map according to an embodiment of the present application;
Fig. 2A is a schematic architecture diagram of a neural network model compression system according to an embodiment of the present application;
FIG. 2B is a schematic diagram of another neural network model compression system according to an embodiment of the present application;
Fig. 3 is a schematic structural diagram of a neural network model compression device according to an embodiment of the present application;
Fig. 4 is a schematic flow chart of a neural network model compression method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another convolutional layer provided by an embodiment of the present application;
FIG. 6 is a flowchart of another method for compressing a neural network model according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of another neural network model compression device according to an embodiment of the present application.
Detailed Description
It should be understood that in embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one (item) below" or the like, refers to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b and c can be single or multiple.
Technical terms related to the embodiments of the present application are described below:
1. Neural Networks (NNs) are complex network systems formed by a large number of simple processing units (or neurons) widely interconnected, reflecting many of the fundamental features of human brain function. The function and characteristics of neurons can be modeled by mathematical models, so that neural network models (also referred to herein as network models of neural networks) can be constructed based on the mathematical models of neurons.
2. And the convolution layer (convolutional layer) is used for extracting the features in the input feature map through convolution operation in the neural network model. It should be understood that the convolutional layer according to the embodiments of the present application is not limited to the convolutional layer in the convolutional neural network, but may be a convolutional layer in another type of neural network model. The overall operation of the convolutional layer is briefly described herein as shown in fig. 1A. Assuming that an input feature map of a certain convolution layer is X i, the size is h i*wi, the number of output feature maps is n i+1, when performing convolution operation, it can be understood that n i feature maps input and n i+1 convolution kernel matrices (kernel matrix) F i,j perform convolution operation, where,(Meaning that F i,j belongs to real space, and F i,j is composed of n i matrices of K) to obtain a final n i+1 output feature graphs X i+1, where each element K in the convolution kernel matrix is a convolution kernel, and the convolution kernel size is k×k, and K e is R k*k. It should be appreciated that the n i+1 output feature maps X i+1 may also be used as input feature maps for another convolution layer, as shown in fig. 1A, in which n i+1 input feature maps X i+1 may be convolved with n i+2 convolution kernels to obtain n i+2 output feature maps X i+2.
3. The input feature map is an input image in the convolution layer, and all the input feature maps in the convolution layer and the convolution kernels in the convolution layer are subjected to convolution operation and summed to obtain an output feature map of the convolution layer, wherein the output feature map is a further combination of features (or called image features, such as color, edge, shape, semantics and the like) of the input feature map. The features extracted from the input feature map may be used to identify the original image (i.e., the input image of the image processing process) from which the input feature map was generated, as shown in fig. 1B, the image indicated by the number (a) is the original image, and the image indicated by the number (B) is a plurality of output feature maps of a certain convolution layer (e.g., the second layer) obtained from the original image through a ResNet-50 neural network, and the plurality of output feature maps may be used as the input feature map of the next convolution layer (e.g., the third layer).
4. A convolution kernel, which is a filter used to extract features of an image. The convolution kernel in the convolution layer is used for carrying out convolution processing on the input feature images of the convolution layer and summing the input feature images to obtain the output feature images of the convolution layer. In the application, a convolution kernel used for carrying out convolution operation on an input feature map in a convolution layer is called a convolution kernel corresponding to the input feature map. It should be understood that the present application is not limited to the convolution kernel corresponding to the input feature map being a two-dimensional (2D) convolution kernel, and the convolution kernel may be a one-dimensional, three-dimensional, or more-dimensional convolution kernel.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings. First, a system provided by the embodiment of the present application is described, then an apparatus for implementing a method provided by the embodiment of the present application is described, and finally a specific implementation manner of the method provided by the embodiment of the present application is described.
As shown in fig. 2A, an embodiment of the present application provides a neural network model compression system 200, where the system 200 may include a neural network model compression device 201 and an image input device 202.
The neural network model compression device 201 may store a neural network model, and compress the stored neural network model by executing the neural network model compression method provided by the embodiment of the present application, where the neural network model includes a convolution layer, and is configured to perform a convolution operation on an input feature map of the convolution layer. The neural network model compression device 201 may be a device having storage and calculation functions, such as a server, a terminal device (such as a mobile phone, a tablet computer, etc.), a computer, a chip, etc.
In one possible design, the neural network model compression device 201 may be further configured to perform image processing on the input image through the neural network model, for example, after compressing the neural network model through the neural network model compression method provided by the embodiment of the present application, the neural network model compression device 201 may further identify the input image using the compressed neural network model, for example, the neural network model compression device 201 may identify a house number included in the input image, or identify a kind of animal in the input image, where the input image may be transmitted by the image input device 202.
In another neural network model compression system 200 as shown in fig. 2B, the neural network model compression device 201 may be used only for compressing the neural network model according to the method provided by the embodiment of the present application, without performing a process such as recognition of an input feature map.
Specifically, the neural network model compression system 200 may further include an image processing device 203, configured to perform image processing, such as image recognition, on the input image transmitted by the image input device 202 according to the neural network model. The neural network model compression device 201 may be specifically configured to compress the neural network model used by the image processing device 203.
In an embodiment of the present application, the image input device 202 as shown in fig. 2A and/or fig. 2B may be an image acquisition device, configured to acquire an input image by photographing or the like, and send the input image to the neural network model compression device 201/the image processing device 203, for example, the image input device 202 may be a photographing device such as a camera; alternatively, the image input device 202 may be an image storage device, which stores an input image through a certain storage space, and sends the input image to the neural network model compression device 201/the image processing device 203 when the input image needs to be processed, where the input device 202 may be a storage device such as a universal serial bus flash disk (universal serial bus FLASH DISK, USB FLASH DISK), a mobile hard disk, a secure digital card (SD card), a hard disk (HARD DISK DRIVE, HDD), or a Solid State Drive (SSD), and it should be understood that the input device 202 may be a mobile storage device separate from the neural network model compression device 201, or may be a variety of memories fixedly connected to the neural network model compression device 201.
Based on the above structure, the neural network model compression system 200 as shown in fig. 2A and/or fig. 2B may be used to process the input image to identify the input image, for example, the neural network model compression system 200 may identify a house number contained in the input image, or identify the kind of animal in the input image. Specifically, the input image may be sent from the image input device 202 to the neural network model compression device 201/the image processing device 203, so that the neural network model compression device 201 processes the input image.
It should be understood that the above structure of the neural network model compression system 200 is merely illustrative, and the present application is not limited to the neural network model compression system 200 having other structures, for example, the neural network model compression device 201 shown in fig. 2A may be integrated with the image input device 202; or the neural network model compression device 201, the image input device 202, and the image processing device 203 as shown in fig. 2B are integrated.
As shown in fig. 3, the structure of a neural network model compression device 201 provided by an embodiment of the present application includes a processor 301, a memory 302, and a communication interface 303, where the memory 302 is used to store an application program, instructions, and data (for example, store a neural network model related to the embodiment of the present application); the communication interface 303 may be used to support the neural network model compression device 201 to communicate, for example, the communication interface 303 may be used to receive an input image, or receive other messages, data, the communication interface 303 may be a fiber link interface, an ethernet interface, or a copper wire interface, etc.; the processor 301 may invoke applications and/or instructions in the memory 302 to implement the neural network model compression method provided by the embodiments of the present application.
It should be understood that the processor 301, the memory 302, and the communication interface 303 may be a structure separated from each other, and are connected to each other by a connection medium; or the processor 301, the memory 302 and the communication interface 303, or parts of the processor 301, the memory 302 or the communication interface 303 may be integrated. In the embodiment of the present application, the connection medium between the processor 301, the memory 302 and the communication interface 303 is not limited, and the processor 301, the memory 302 and the communication interface 303 may be connected through a bus (bus), or may be connected through other connection media.
It should also be understood that the neural network model compression device 201 may be a server, a computer, or a terminal device having the structure shown in fig. 3, or may be a chip or other devices.
In the following, referring to fig. 4, taking an implementation subject as an example of a neural network model compression device 201 storing a neural network model, a neural network model compression method according to an embodiment of the present application is described, where the method may include the following steps:
S101: according to at least one first convolution kernel corresponding to a first input feature map in a neural network model, determining a first parameter corresponding to the first input feature map and a second parameter, wherein the first parameter is used for representing the sparseness degree of the first input feature map, and the second parameter is used for representing the number of features contained in the first input feature map.
S102: and determining importance parameters corresponding to the first input feature map according to the first parameters and the second parameters, wherein the importance parameters can be used for representing the importance degree of the first input feature map.
S103: and determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter corresponding to the first input feature map, wherein the number of the second convolution kernels is smaller than or equal to the number of the first convolution kernels.
By adopting the method, when the importance parameters corresponding to the first input feature map in the neural network model are considered, the sparseness of the first input feature map and the number of the features contained in the first input feature map can be considered, so that the importance parameters can describe the importance degree of the first input feature map more accurately, and when the neural network model is compressed, the number of the second convolution kernels corresponding to the first input feature map is determined according to the importance parameters corresponding to the first input feature map, so that the corresponding storage space of the second convolution kernels and the corresponding computation of the second convolution kernels are reserved, and excessive or wrong deletion of the storage space of the convolution kernels and the corresponding computation of the convolution kernels can be avoided in the compression process of the neural network model, and the performance of the neural network is greatly reduced.
It should be understood that the first convolution kernel referred to in S101 is a convolution kernel corresponding to the first input feature map before the neural network model is compressed. The first input feature map is one of at least one first input feature map of a certain convolution layer of the neural network model, for example, an image shown by a number (B) in fig. 1B may be used as the first input feature map, and these images may be multiple input feature maps in the same convolution layer. If the neural network model shown in FIG. 1A is the neural network model before compression, the first input feature map is used for the first input feature mapThe convolution kernel performing the convolution operation may be regarded as the first input feature map/>A corresponding first convolution kernel.
In the implementation manner of the step shown in S101, the first parameter is used to represent the sparseness of the first input feature map, where the sparseness of the first input feature map may be represented by using the size of the receptive field (RECEPTIVE FIELD) of the first input feature map, where the larger the size of the receptive field of the first input feature map is when the non-zero elements are more after the binarization of the first input feature map, the larger the first parameter is, and conversely, the smaller the size of the receptive field of the first input feature map is when the non-zero elements are less after the binarization of the first input feature map is.
In one possible implementation, the first parameter corresponding to the first input feature map may be determined according to a norm (e.g., L1 norm) of at least one first convolution kernel corresponding to the first input feature map.
Specifically, according to the first formula, a first parameter corresponding to the first input feature map may be determined:
Wherein s c is a first parameter corresponding to a first input feature map, the ith first convolution kernel in the N first convolution kernels corresponding to the W ic Is that first input feature map, C represents an index of the first input feature map, i W i,c||1 is an L1 norm of W i,c, N is a number of the first convolution kernels corresponding to the first input feature map, N is a positive integer, i is 1-N, C is 1-C, and C is a number of all the first input feature maps in a convolution layer where the first input feature map is located.
It should be understood that the manner of determining the first parameter corresponding to the first input feature map according to the formula one is merely illustrative, and the present application is not limited to determining the first parameter corresponding to the first input feature map by other methods or formulas, for example, the size of the receptive field of the first input feature map may be determined according to other formulas than the formula one in implementation, and the size may be used as the first parameter corresponding to the first input feature map.
In the implementation manner of the step shown in S101, the second parameter corresponding to the first input feature map may represent the number of features included in the first input feature map, and may be used as the second parameter corresponding to the first input feature map according to the density entropy of at least one first convolution kernel corresponding to the first input feature map, where the more features included in the first input feature map, the more features are abundant, the smaller the density entropy of the convolution kernel is, and the smaller the second parameter is, otherwise, if the fewer features included in the first input feature map are, the larger the density entropy of the convolution kernel is, and the larger the second parameter is.
Specifically, the nearest neighbor matrix of the at least one first convolution kernel can be determined according to the at least one first convolution kernel corresponding to the first input feature map, and then, the second parameter corresponding to the first input feature map is determined according to the nearest neighbor matrix. The value of the j-th element of the i-th row in the nearest neighbor matrix can be used for representing the distance between the i-th first convolution kernel corresponding to the first input feature map and the j-th first convolution kernel corresponding to the first input feature map.
In practice, the distance between the ith first convolution kernel corresponding to the first input feature map and the jth first convolution kernel corresponding to the first input feature map may be determined in a variety of ways, only one of which is illustrated herein:
The distance may be determined according to equation two:
wherein, For the value of the ith row and jth column element in the nearest neighbor matrix, namely the distance between the ith first convolution kernel corresponding to the first input feature map and the jth first convolution kernel corresponding to the first input feature map, c is the index of the first input feature map, W i,c is the ith first convolution kernel corresponding to the first input feature map, W j,c is the jth first convolution kernel corresponding to the first input feature map,/>Indicating that the j-th first convolution kernel W j,c belongs to the k-nearest neighbor of the i-th first convolution kernel W i,c,/>Meaning that the j-th first convolution kernel W j,c does not belong to the k-nearest neighbor of the i-th first convolution kernel W i,c, k is an integer, and k is not less than 1 and not more than N, i is not less than 1 and not more than N, j is not less than 1 and not more than N, N is the number of the first convolution kernels, and W i,c-Wj,c is the L2 norm of (W i,c-Wj,c).
It should be understood that the above manner of determining the distance between the ith first convolution kernel corresponding to the first input feature map and the jth first convolution kernel corresponding to the first input feature map is merely illustrative, and the present application is not limited to determining the value of the distance in the above manner, for example, the manhattan distance between the ith first convolution kernel corresponding to the first input feature map and the jth first convolution kernel corresponding to the first input feature map may be used asTime/>Is a value of (a).
According to the nearest neighbor matrix, a second parameter corresponding to the first input feature map can be determined according to a formula III:
Wherein E c is a second parameter corresponding to the first input feature map, Log 2 represents a base 2 logarithmic operation. Wherein dm (W i,c) characterizes the density of the first convolution kernel, E c, the density entropy of the first convolution kernel.
In the step shown in S102, if it is determined that the first parameter corresponding to the first input feature map is S c and the second parameter corresponding to the first input feature map is E c, the importance parameter corresponding to the first input feature map may be determined according to the fourth formula:
wherein, C is an index of the first input feature map, V c is an importance parameter corresponding to the first input feature map, α is a preset parameter, and α is a non-negative real number; /(I)For the importance parameter initial value corresponding to all the first input feature images in the convolution layer where the first input feature images are located,/>For the maximum value of the initial values of importance parameters corresponding to all the first input feature graphs in the convolution layer,/>And M is the number of all the first input feature pictures in the convolution layer and is a positive integer. It should be understood that the/>, references hereinCan be according to/>And (3) determining. In practice, α=1 may be taken.
Can be matched by the formula IVNormalization is performed so that the value range of V c is between 0 and 1.
It should be understood that, the importance parameter corresponding to the first input feature map may represent the importance degree of the first input feature map, and the larger the value of the importance parameter, the higher the importance of the first input feature map is represented, so that if the number of convolution kernels corresponding to the first input feature map is larger after the neural network is compressed, the larger the change of the number of convolution kernels corresponding to the first input feature map compared with the change of the number of convolution kernels corresponding to the first input feature map before the neural network model is compressed, the larger the reduction of the performance of the neural network is, so that for the first input feature map with the higher importance parameter, the difference between the number of second convolution kernels corresponding to the first input feature map after the neural network is reduced as much as possible and the number of second convolution kernels corresponding to the first input feature map before the neural network model is compressed.
According to the formulas one to four, in the neural network model compression method provided by the embodiment of the application, the determined importance parameter is only related to at least one first convolution kernel corresponding to the first input feature map, and the at least one first convolution kernel corresponding to the first input feature map is only related to the index of the first input feature map, so that the neural network model compression method provided by the embodiment of the application can be executed before the neural network model compression device 201 obtains the first input feature map to be subjected to convolution operation, thereby improving the efficiency of the importance degree corresponding to the first input feature map in determining and improving the flexibility in determining the importance degree.
In the step shown in S103, the number of second convolution kernels may be determined according to the importance parameter corresponding to the first input feature map and the number of first convolution kernels corresponding to the first input feature map. In an implementation, the number of the second convolution kernels may be used as the number of convolution kernels corresponding to the first input feature map after the neural network is compressed.
The number of second convolution kernels corresponding to the first input feature map may be determined in a plurality of ways, and in one possible implementation manner, the number of second convolution kernels may be determined according to the fifth formula:
Wherein Q c is the number of second convolution kernels corresponding to the first input feature map, V c is the importance parameter corresponding to the first input feature map, G and T are both preset parameters, G and T are both positive integers, Representing rounding down V c G,Represents the upward rounding of V c G, N is the number of the first convolution kernels corresponding to the first input feature map, N is a positive integer,/>Representation pair/>And (5) carrying out upward rounding. Illustratively, T and G are generally not greater than log2N.
In implementation, the value of T may be 0 or 1 or other values, for example, for a network with fewer network parameters or smaller calculation scale, such as the neural networks ResNet-56 and the neural networks DenseNet-40, t=0; for networks with more network parameters or larger calculation scale, such as neural networks ResNet-50 and neural networks DenseNet-121, t=1 may be used.
After determining the number of second convolution kernels corresponding to the first input feature map, if the number of second convolution kernels is not zero, clustering at least one first convolution kernel corresponding to the first input feature map according to the number of second convolution kernels to obtain Q c groups of first convolution kernels, where Q c is the number of second convolution kernels corresponding to the first input feature map, and then taking the cluster center of each group of first convolution kernels in the Q c groups of first convolution kernels as a second convolution kernel. In implementation, the present application is not limited to a clustering method when at least one first convolution kernel corresponding to the first input feature map is clustered into the Q c group of first convolution kernels, and for example, the k-means algorithm or other clustering algorithm may be used to perform this step.
Taking a K-means algorithm as an example, if N first convolution kernels corresponding to the first input feature map with the index of c are 2D convolution kernels with the index of K h*Kw, after the first convolution kernels are clustered into Q c groups according to the K-means algorithm, the cluster center of each group of first convolution kernels can be expressed asThe index value of the cluster center may be denoted as I c∈RN, where the index value of the cluster center may be used to represent an index of a second convolution kernel used for performing convolution operation with the first input feature map X c when determining an nth output feature map of a convolution layer where the first input feature map is located, and the second convolution kernel may be denoted as B c(Ic (n)), so that the input feature map with index c may be performed by performing convolution operation according to the second convolution kernel B c(Ic (n)), and summing the convolution operation results of other input feature maps in the convolution layer and the corresponding second convolution kernel to obtain the nth output feature map of the convolution layer where the first input feature map with index c is located.
After determining the number of second convolution kernels corresponding to the first input feature map, a neural network model in which the first input feature map is located may be trained according to the determined second convolution kernels to enable training of the neural network model, and the trained neural network model may be used for image processing of the input image, e.g., to identify the input image according to the trained neural network model. Specifically, the neural network model where the first input feature map is located may be trained according to a second convolution kernel corresponding to a second input feature map, where the second input feature map is the first input feature map corresponding to at least one second convolution kernel. In the training process of the neural network model, the index value of the clustering center can be kept unchanged, the value of the clustering center of each group of first convolution kernels is finely adjusted through training, and the value of the second convolution kernels is updated according to the fine adjustment of the value of the clustering center.
It should be understood that the training of the neural network model according to the present application includes both the first training of the neural network model and the retraining of the neural network model. In general, the neural network model in which the first input feature map is located is a neural network model which is trained for the first time, so that after the neural network model compression method provided by the embodiment of the application is executed, the neural network model can be retrained according to the second convolution kernel corresponding to the second input feature map, and parameters in the neural network model are finely adjusted, so that the performance of the neural network is further improved.
After determining the number of second convolution kernels corresponding to the first input feature map, an output feature map of the first input feature map in the convolution layer may be further determined according to all second input feature maps and second convolution kernels corresponding to all second input feature maps, where the second input feature map is the first input feature map corresponding to at least one second convolution kernel.
Next, a process of determining an output feature map of a convolutional layer according to the neural network model compression method provided in an embodiment of the present application will be described with reference to a flowchart shown in fig. 6, where the process may include the following steps:
S201: and determining a first parameter and a second parameter corresponding to each first input feature map according to the first convolution kernel corresponding to each first input feature map in the convolution layer.
S202: and determining importance parameters corresponding to each first input feature map according to the first parameters and the second parameters corresponding to each first input feature map.
S203: and determining the number of second convolution kernels corresponding to each first input feature map according to the importance parameter corresponding to each first input feature map.
S204: when the number of the second convolution kernels corresponding to the first input feature images is not zero, clustering the first convolution kernels corresponding to the first input feature images according to the number of the second convolution kernels corresponding to the first input feature images, and taking the clustering center of each type of first convolution kernels obtained by clustering as one second convolution kernel corresponding to the first input feature images.
S205: and determining an output characteristic diagram of the convolution layer according to the second output characteristic diagram and a second convolution kernel corresponding to the second output characteristic diagram, wherein the second output characteristic diagram is a first input characteristic diagram corresponding to at least one second convolution kernel in the convolution layer.
The following will further describe an example of the convolution layer shown in fig. 5. According to the image shown in the number (a) in fig. 5, which is one convolution layer in the neural network model before compression, it can be seen that the number of the first input feature images (from top to bottom, the indexes C of the first input feature images are respectively 1,2 and 3) in the convolution layer is 3 (i.e. c=3), the number of the first convolution kernels corresponding to each input feature image and the number of the output feature images of the first input feature images in the convolution layer are 4 (i.e. n=4), as shown in the number (b) of fig. 5, if the parameter g=3 and t=0 related to the formula fifth, in the case of determining that the importance parameter corresponding to the input feature image with the index of 1 is 0.53 according to the neural network model compression method provided by the embodiment of the application, Q c =2, i.e. the number of the second convolution kernels corresponding to the input feature images is 2, after that the number of the first convolution kernels corresponding to the input feature images is 2, and after that the number of the first convolution kernels corresponding to each input feature image is 2 sets of the first convolution kernels is clustered, as shown in the second convolution kernels, i.e. 1, the index of the input feature images corresponding to the second convolution images corresponding to the index 1 is obtained by the second convolution kernels of the index 1; if the importance parameter corresponding to the input feature map with the index of 2 is 0.97, according to the formula five, Q c =4 can be determined, that is, the number of second convolution kernels corresponding to the input feature map is 4, after that, 4 first convolution kernels corresponding to the input feature map can be clustered into 4 groups of first convolution kernels, and the cluster center of each group of first convolution kernels is taken as one second convolution kernel, so as to obtain 4 second convolution kernels corresponding to the input feature map with the index of 2 in the image as shown in the figure 5 number (c), that is, the convolution kernels corresponding to the input feature map with the index of 2 after compression; if the importance parameter corresponding to the input feature map with index 3 is 0.02, according to the formula five, it may be determined that Q c =0, that is, the number of second convolution kernels corresponding to the input feature map is 0, and at this time, the 4 first convolution kernels corresponding to the input feature map may not be clustered, as in the image with the number (c) of fig. 5, and the number of second convolution kernels corresponding to the input feature map with index 3 is 0 (indicated by a dashed line in fig. 5).
In one possible implementation manner, when determining the first input feature maps with indexes 1, 2 and 3, and determining the output feature map of the convolution layer shown in fig. 5, the output feature maps of the first input feature maps with indexes 1, 2 and 3 in the convolution layer may be determined according to the first input feature map with index 1 and the corresponding 2 second convolution kernels and the first input feature map with index 2 and the corresponding 4 second convolution kernels.
If the convolution operation performed on the input feature map in the convolution layer is expressed as formula six:
W n,c is a first convolution kernel for performing convolution operation with the first input feature map X c when determining the nth output feature map of the first input feature map X c with index c, and Y n is the nth output feature map of the convolution layer where the first input feature map X c is located;
After clustering the first convolution kernel according to the k-means algorithm described above, W n,c may be replaced with B c(Ic (n)), so that equation six may be rewritten as equation seven:
Wherein, the ith element in B c is the clustering center of the first convolution kernel of the ith group after the first convolution kernel is clustered into Q c groups, i is a positive integer, i is more than or equal to 1 and less than or equal to Q c,Ic is the index value of B c, Representing a convolution operation.
After the first convolution kernels shown in the number (a) in fig. 5 are clustered into groups Q c according to the k-means algorithm, if the cluster center of each group of the first convolution kernels is B c andThe index value of the cluster center may be denoted as I c and I c∈RN. Specifically, for an input feature map with index 1, its corresponding Q c =2, its corresponding cluster center/>, may be determinedIndex value I 1 = [1, 2] of B 1, wherein when determining the nth output feature map, the nth element in I 1 is taken as the value of I 1 (n); for an input feature map with index of 2, its corresponding Q c =4, its corresponding cluster center can be determinedIndex value I 2 = [1,2,3,4] of B 2, wherein when determining the nth output feature map, the nth element in I 2 is taken as the value of I 2 (n); for an input feature map with index 3, its corresponding Q c =0, it is not necessary to determine the index values I 3 of its corresponding cluster centers B 3 and B 3, or that is, B 3 and I 3 are not present.
Taking the 2 nd (n=2) output feature map corresponding to the first input feature map with indexes 1,2 and 3 as an example, the above formula seven can be expanded into the formula eight, and according to the formula eight, the 2 nd output feature map Y 2 of the convolution layer shown in fig. 5 can be determined for the first input feature map with indexes 1,2 and 3:
Wherein, I 1 (2) =1 can be determined from I 1 = [1, 2], and an input feature map with index 1 can be determined from I 2 = [1,2,3,4], and X 2 represents an input feature map with index 2.
Therefore, the method is more flexible and effective in compressing the neural network model, and can further avoid excessive reduction of the performance of the neural network caused by the compression of the neural network model.
The above description of the solution provided by the embodiment of the present application has been mainly given in terms of the operations performed by the neural network model compression device 201. It will be appreciated that the neural network model compression device 201 may include corresponding hardware structures and/or software modules that perform the respective functions in order to achieve the above-described functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware, computer software, or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution.
Referring to fig. 7, a schematic diagram of an apparatus 700 according to an embodiment of the present application is provided, where the apparatus 700 may be adapted to the system 200 shown in fig. 2A and/or fig. 2B, for implementing the function of the neural network model compression device 201 according to an embodiment of the present application. As shown in fig. 7, the apparatus 700 may include a parameter determination unit 701 and a convolution kernel determination unit 702.
In implementing the neural network model compression method provided by the embodiment of the present application, the parameter determining unit 701 may be configured to determine, according to at least one first convolution kernel corresponding to a first input feature map of a neural network model, a first parameter corresponding to the first input feature map and a second parameter, where the first parameter is used to represent a sparseness degree of the first input feature map, and the second parameter is used to represent a number of features included in the first input feature map, so that the step S101 in the method shown in fig. 4 may be implemented by the parameter determining unit 701.
The parameter determining unit 701 may be further configured to determine, according to the first parameter and the second parameter, an importance parameter corresponding to the first input feature map, where the importance parameter characterizes an importance degree of the first input feature map, so that the step S102 in the method shown in fig. 4 may be implemented by the parameter determining unit 701.
The convolution kernel determining unit 702 may be configured to determine, according to the importance parameter, a number of second convolution kernels corresponding to the first input feature map, where the number of second convolution kernels is less than or equal to the number of first convolution kernels, so that the step shown in S103 in the method shown in fig. 4 may be implemented by the convolution kernel determining unit 702.
For example, the first parameter corresponding to the first input feature map may be determined according to the above formula one.
The parameter determining unit 701 is specifically configured to determine, when determining, according to at least one first convolution kernel of the neural network model corresponding to the first input feature map, a second parameter corresponding to the first input feature map, determine a nearest neighbor matrix of the at least one first convolution kernel according to the at least one first convolution kernel, and determine the second parameter according to the nearest neighbor matrix;
the parameter determining unit 701 may determine the value of the j-th element of the i-th row in the nearest neighbor matrix according to the above formula two.
For example, after determining the nearest neighbor matrix, the parameter determining unit 701 may determine the second parameter corresponding to the first input feature map according to the above formula three.
For example, after determining the first parameter and the second parameter corresponding to the first input feature map, the parameter determining unit 701 may determine the importance parameter corresponding to the first input feature map according to the above formula four.
The convolution kernel determining unit 702 may be specifically configured to determine, when determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter corresponding to the first input feature map, the number of second convolution kernels corresponding to the first input feature map according to the importance parameter corresponding to the first input feature map and the number of first convolution kernels corresponding to the first input feature map.
For example, the convolution kernel determining unit 702 may determine the number of second convolution kernels corresponding to the first input feature map according to the above-described formula five.
Illustratively, when the number of second convolution kernels corresponding to the first input feature map is not zero, the convolution kernel determining unit 702 is further configured to cluster at least one first convolution kernel corresponding to the first input feature map to obtain a Q c set of first convolution kernels, Q c is the number of second convolution kernels, and obtain the second convolution kernels, where each second convolution kernel is a cluster center of each obtained set of first convolution kernels.
The apparatus 700 may further comprise a training unit 703, the training unit 703 being operable to train the neural network model based on second convolution kernels corresponding to all of the second input feature maps, the second input feature maps being first input feature maps corresponding to at least one of the second convolution kernels.
The apparatus 700 may further include an output feature map determining unit 704, where the output feature map determining unit 704 may be configured to determine, according to all second input feature maps and second convolution kernels corresponding to all the second input feature maps, an output feature map of a convolution layer where the first input feature map is located, where the second input feature map is the first input feature map corresponding to at least one second convolution kernel.
It should be understood that fig. 7 only illustrates a modular division manner of the apparatus 700, and the present application does not limit the apparatus 700 to have other module division manners, for example, the apparatus 700 may be modularized into a processing unit and a storage unit, where the processing unit may have the functions of the parameter determining unit 701 and the convolution kernel determining unit 702, and the storage unit may be used to store an application program, an instruction, and corresponding data required for the processing unit to perform the functions, so that the processing unit and the storage unit cooperate with each other, so that the apparatus 700 implements the functions of the neural network model compressing apparatus 201 provided by the embodiment of the present application. Optionally, the processing unit may further have the functions of the training unit 703 and the output feature map determining unit 704.
The exemplary apparatus 700 and the modules of the apparatus 700 shown in fig. 7 may also be implemented by the neural network model compressing apparatus 201 having the structure shown in fig. 3. Specifically, the functions of the parameter determination unit 701 and the convolution kernel determination unit 702 as illustrated in fig. 7 may be implemented by the processor 301. Alternatively, the functions of the training unit 703 and the output feature map determining unit 704 may also be implemented by the processor 301.
It should be appreciated that in the neural network model compression device 201 as shown in fig. 3, the processor 301 may be a Central Processing Unit (CPU), and the processor 301 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein the general purpose processor may be a microprocessor or any conventional processor or the like.
Memory 302 may include read-only memory and random access memory and provide instructions and data to the processor. Memory 302 may also be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA DATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM).

Claims (18)

1. A neural network model compression method, comprising:
Determining a first parameter and a second parameter corresponding to a first input feature map according to at least one first convolution kernel corresponding to the first input feature map of the neural network model, wherein the first parameter is used for representing the sparseness degree of the first input feature map, and the second parameter is used for representing the number of features contained in the first input feature map; the neural network model is used for executing any one or more of classification and detection, voice processing, behavior recognition and image super-resolution processing;
Determining importance parameters corresponding to the first input feature map according to the first parameters and the second parameters, wherein the importance parameters represent the importance degree of the first input feature map;
Determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter, wherein the number of the second convolution kernels is smaller than or equal to the number of the first convolution kernels;
The importance parameter is determined by the following formula:
wherein, An importance parameter initial value corresponding to the first input feature map, c is an index of the first input feature map, V c is the importance parameter, s c is the first parameter, E c is the second parameter, α is a preset parameter, and α is a non-negative real number;
for the importance parameter initial values corresponding to all the first input feature graphs in the convolution layer where the first input feature graphs are located,/> For the maximum value of the importance parameter initial values corresponding to all the first input feature graphs in the convolution layer,/>And M is the number of all the first input feature graphs in the convolution layer and is a positive integer.
2. The method of claim 1, wherein determining the second parameter corresponding to the first input feature map from at least one first convolution kernel of a neural network model corresponding to the first input feature map comprises:
determining a nearest neighbor matrix of the at least one first convolution kernel according to the at least one first convolution kernel;
Determining the second parameter according to the nearest neighbor matrix;
The value of the jth column element of the ith row in the nearest neighbor matrix is determined by the following formula:
wherein, For the value of the j-th element in the i-th row, c is the index of the first input feature map, W i,c is the i-th first convolution kernel corresponding to the first input feature map, and W j,c is the j-th first convolution kernel corresponding to the first input feature map,/>Representing that the jth first convolution kernel W j,c belongs to the k nearest neighbor of the ith first convolution kernel W i,c,/>Representing that the jth first convolution kernel W j,c does not belong to the k nearest neighbors of the ith first convolution kernel W i,c, k is an integer, and 1.ltoreq.k.ltoreq.N, 1.ltoreq.i.ltoreq.N, 1.ltoreq.j.ltoreq.N, N being the number of the first convolution kernels, and II W i,c-Wj,c II being the L2 norm of (W i,c-Wj,c).
3. The method of claim 2, wherein the second parameter is determined by the formula:
Wherein E c is the second parameter, Log 2 represents a base 2 logarithmic operation.
4. The method of claim 1, wherein the first parameter is determined by the formula:
Wherein s c is the first parameter, W i,c is the ith first convolution kernel corresponding to the first input feature map, c is the index of the first input feature map, II W i,c1 is the L1 norm of W i,c, N is the number of the first convolution kernels, N is a positive integer, and i is more than or equal to 1 and less than or equal to N.
5. The method of claim 1, wherein determining the number of second convolution kernels corresponding to the first input feature map based on the importance parameter comprises:
and determining the number of the second convolution kernels according to the importance parameter and the number of the first convolution kernels.
6. The method of claim 5, wherein the number of second convolution kernels is determined by the formula:
Wherein Q c is the number of the second convolution kernels, V c is the importance parameter, G and T are preset parameters, G and T are positive integers, Representing rounding down V c G,/>Represents the upward rounding of V c G, N is the number of the first convolution kernels, N is a positive integer,/>Representation pair/>And (5) carrying out upward rounding.
7. The method of claim 5, wherein when the number of second convolution kernels is non-zero, further comprising:
clustering the at least one first convolution kernel to obtain a Q c group of first convolution kernels, wherein Q c is the number of the second convolution kernels;
And obtaining the second convolution kernels, wherein each second convolution kernel is a clustering center of each obtained group of first convolution kernels.
8. The method as recited in claim 7, further comprising:
and training the neural network model according to second convolution kernels corresponding to all second input feature graphs, wherein the second input feature graphs are the first input feature graphs corresponding to at least one second convolution kernel.
9. The method as recited in claim 7, further comprising:
And determining an output characteristic diagram of a convolution layer where the first input characteristic diagram is located according to all the second input characteristic diagrams and all second convolution kernels corresponding to the second input characteristic diagrams, wherein the second input characteristic diagram is the first input characteristic diagram corresponding to at least one second convolution kernel.
10. A neural network model compression device, characterized by comprising:
The parameter determining unit is used for determining a first parameter and a second parameter corresponding to a first input feature map according to at least one first convolution kernel corresponding to the first input feature map of the neural network model, wherein the first parameter is used for representing the sparseness degree of the first input feature map, and the second parameter is used for representing the number of features contained in the first input feature map; the neural network model is used for executing any one or more of classification and detection, voice processing, behavior recognition and image super-resolution processing;
The parameter determining unit is further configured to determine an importance parameter corresponding to the first input feature map according to the first parameter and the second parameter, where the importance parameter characterizes an importance degree of the first input feature map;
The convolution kernel determining unit is used for determining the number of second convolution kernels corresponding to the first input feature map according to the importance parameter, wherein the number of the second convolution kernels is smaller than or equal to the number of the first convolution kernels;
The importance parameter is determined by the following formula:
wherein, An importance parameter initial value corresponding to the first input feature map, c is an index of the first input feature map, V c is the importance parameter, s c is the first parameter, E c is the second parameter, α is a preset parameter, and α is a non-negative real number;
for the importance parameter initial values corresponding to all the first input feature graphs in the convolution layer where the first input feature graphs are located,/> For the maximum value of the importance parameter initial values corresponding to all the first input feature graphs in the convolution layer,/>And M is the number of all the first input feature graphs in the convolution layer and is a positive integer.
11. The apparatus of claim 10, wherein the parameter determination unit configured to determine, based on at least one first convolution kernel of a neural network model corresponding to a first input feature map, a second parameter corresponding to the first input feature map comprises:
The parameter determining unit is specifically configured to determine a nearest neighbor matrix of the at least one first convolution kernel according to the at least one first convolution kernel, and determine the second parameter according to the nearest neighbor matrix;
The value of the jth column element of the ith row in the nearest neighbor matrix is determined by the following formula:
wherein, For the value of the j-th element in the i-th row, c is the index of the first input feature map, W i,c is the i-th first convolution kernel corresponding to the first input feature map, and W j,c is the j-th first convolution kernel corresponding to the first input feature map,/>Representing that the jth first convolution kernel W j,c belongs to the k nearest neighbor of the ith first convolution kernel W i,c,/>Representing that the jth first convolution kernel W j,c does not belong to the k nearest neighbors of the ith first convolution kernel W i,c, k is an integer, and 1.ltoreq.k.ltoreq.N, 1.ltoreq.i.ltoreq.N, 1.ltoreq.j.ltoreq.N, N being the number of the first convolution kernels, and II W i,c-Wj,c II being the L2 norm of (W i,c-Wj,c).
12. The apparatus of claim 11, wherein the second parameter is determined by the formula:
Wherein E c is the second parameter, Log 2 represents a base 2 logarithmic operation.
13. The apparatus of claim 10, wherein the first parameter is determined by the formula:
Wherein s c is the first parameter, W i,c is the ith first convolution kernel corresponding to the first input feature map, c is the index of the first input feature map, II W i,c1 is the L1 norm of W i,c, N is the number of the first convolution kernels, N is a positive integer, and i is more than or equal to 1 and less than or equal to N.
14. The apparatus of claim 10, wherein the convolution kernel determination unit configured to determine a number of second convolution kernels corresponding to the first input feature map according to the importance parameter comprises:
The convolution kernel determining unit is specifically configured to determine the number of the second convolution kernels according to the importance parameter and the number of the first convolution kernels.
15. The apparatus of claim 14, wherein the number of second convolution kernels is determined by the formula:
Wherein Q c is the number of the second convolution kernels, V c is the importance parameter, G and T are preset parameters, G and T are positive integers, Representing rounding down V c G,/>Represents the upward rounding of V c G, N is the number of the first convolution kernels, N is a positive integer,/>Representation pair/>And (5) carrying out upward rounding.
16. The apparatus of claim 14, wherein when the number of second convolution kernels is not zero, the convolution kernel determination unit is further to:
clustering the at least one first convolution kernel to obtain a Q c group of first convolution kernels, wherein Q c is the number of the second convolution kernels;
And obtaining the second convolution kernels, wherein each second convolution kernel is a clustering center of each obtained group of first convolution kernels.
17. The apparatus of claim 16, further comprising a training unit:
the training unit is configured to train the neural network model according to second convolution kernels corresponding to all second input feature graphs, where the second input feature graphs are the first input feature graphs corresponding to at least one second convolution kernel.
18. The apparatus of claim 16, further comprising an output feature map determination unit:
The output feature map determining unit is configured to determine, according to all second input feature maps and second convolution kernels corresponding to all second input feature maps, an output feature map of a convolution layer where the first input feature map is located, where the second input feature map is the first input feature map corresponding to at least one second convolution kernel.
CN201880099411.7A 2018-11-16 2018-11-16 Neural network compression method and device Active CN113302657B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/116023 WO2020097936A1 (en) 2018-11-16 2018-11-16 Neural network compressing method and device

Publications (2)

Publication Number Publication Date
CN113302657A CN113302657A (en) 2021-08-24
CN113302657B true CN113302657B (en) 2024-04-26

Family

ID=70731299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880099411.7A Active CN113302657B (en) 2018-11-16 2018-11-16 Neural network compression method and device

Country Status (2)

Country Link
CN (1) CN113302657B (en)
WO (1) WO2020097936A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488563A (en) * 2015-12-16 2016-04-13 重庆大学 Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device
CN107506722A (en) * 2017-08-18 2017-12-22 中国地质大学(武汉) One kind is based on depth sparse convolution neutral net face emotion identification method
CN108108677A (en) * 2017-12-12 2018-06-01 重庆邮电大学 One kind is based on improved CNN facial expression recognizing methods
CN108256544A (en) * 2016-12-29 2018-07-06 深圳光启合众科技有限公司 Picture classification method and device, robot
CN108596988A (en) * 2018-03-09 2018-09-28 西安电子科技大学 A kind of compression algorithm for convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9904874B2 (en) * 2015-11-05 2018-02-27 Microsoft Technology Licensing, Llc Hardware-efficient deep convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488563A (en) * 2015-12-16 2016-04-13 重庆大学 Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device
CN108256544A (en) * 2016-12-29 2018-07-06 深圳光启合众科技有限公司 Picture classification method and device, robot
CN107506722A (en) * 2017-08-18 2017-12-22 中国地质大学(武汉) One kind is based on depth sparse convolution neutral net face emotion identification method
CN108108677A (en) * 2017-12-12 2018-06-01 重庆邮电大学 One kind is based on improved CNN facial expression recognizing methods
CN108596988A (en) * 2018-03-09 2018-09-28 西安电子科技大学 A kind of compression algorithm for convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Channel-Level Acceleration of Deep Face Representations;ADAM POLYAK 等;SPECIAL SECTION ON APPLYING FOUR Ds OF MACHINE LEARNING TO ADVANCE BIOMETRICS;第3卷;全文 *
基于统计分析的卷积神经网络模型压缩方法;杨扬 等;计算机***应用;全文 *

Also Published As

Publication number Publication date
CN113302657A (en) 2021-08-24
WO2020097936A1 (en) 2020-05-22

Similar Documents

Publication Publication Date Title
US11983850B2 (en) Image processing method and apparatus, device, and storage medium
CN110309847B (en) Model compression method and device
CN111382867B (en) Neural network compression method, data processing method and related devices
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN111738244A (en) Image detection method, image detection device, computer equipment and storage medium
CN110659667A (en) Picture classification model training method and system and computer equipment
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN111898703B (en) Multi-label video classification method, model training method, device and medium
US20200184245A1 (en) Improper neural network input detection and handling
CN109919084B (en) Pedestrian re-identification method based on depth multi-index hash
CN113642445B (en) Hyperspectral image classification method based on full convolution neural network
CN111340077A (en) Disparity map acquisition method and device based on attention mechanism
CN110222718A (en) The method and device of image procossing
WO2020062299A1 (en) Neural network processor, data processing method and related device
CN112328715A (en) Visual positioning method, training method of related model, related device and equipment
WO2022179588A1 (en) Data coding method and related device
WO2019128248A1 (en) Signal processing method and apparatus
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
CN108388869B (en) Handwritten data classification method and system based on multiple manifold
WO2021057690A1 (en) Neural network building method and device, and image processing method and device
CN116888605A (en) Operation method, training method and device of neural network model
CN113302657B (en) Neural network compression method and device
CN114638823B (en) Full-slice image classification method and device based on attention mechanism sequence model
WO2023115814A1 (en) Fpga hardware architecture, data processing method therefor and storage medium
CN115713769A (en) Training method and device of text detection model, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant