CN110232436A - Pruning method, device and the storage medium of convolutional neural networks - Google Patents
Pruning method, device and the storage medium of convolutional neural networks Download PDFInfo
- Publication number
- CN110232436A CN110232436A CN201910380839.5A CN201910380839A CN110232436A CN 110232436 A CN110232436 A CN 110232436A CN 201910380839 A CN201910380839 A CN 201910380839A CN 110232436 A CN110232436 A CN 110232436A
- Authority
- CN
- China
- Prior art keywords
- convolutional
- characteristic pattern
- norm
- layer
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
This application discloses a kind of pruning method of convolutional neural networks, device and storage medium, belong to the nerual network technique field in artificial intelligence field in technical field of computer vision.Convolutional neural networks include multiple layers, multiple layers include one or more convolutional layers, this method comprises: obtaining the feature set of graphs of the n-th convolutional layer output of convolutional neural networks, feature set of graphs includes multiple characteristic patterns, n-th convolutional layer is any of one or more convolutional layers, and n is positive integer;Determine the important feature figure and insignificant characteristic pattern in feature set of graphs;Convolutional neural networks are trimmed based on insignificant characteristic pattern, obtain next layer of input of the n-th convolutional layer, next layer of input of the n-th convolutional layer includes important feature figure.By being distinguished to important characteristic pattern and insignificant characteristic pattern, and only reduce resource consumption using important feature figure as next layer of input when executing deep learning task using convolutional neural networks.
Description
Technical field
This application involves nerual network technique field, in particular to a kind of pruning method of convolutional neural networks, device and
Storage medium.
Background technique
Neural network, which has become, solves computer vision, speech recognition and natural language processing even depth learning tasks
More advanced technology.Nevertheless, neural network algorithm is that computation-intensive and storage are intensive, this is difficult to it by portion
It affixes one's name in the only equipment of limited hardware resource.
In order to solve this limitation, depth-compression can be used and reduce calculating required for neural network significantly and deposit
Storage demand.Such as the convolutional neural networks with full articulamentum, model size can be reduced several times, example by depth-compression
Such as tens times.Depth-compression based on convolutional neural networks (Convolutional Neural Networks, CNN) includes such as
Under several different technologies: quantization compression, beta pruning, light weight network design and knowledge extract.
Wherein, beta pruning refers to after the completion of convolutional neural networks training, neuron therein or cynapse is cut off, to make
Neural network model is obtained to be compressed.But this method neuron cut off or the information of cynapse are permanently lost, and are obtained
Model fix.
Summary of the invention
This application provides a kind of pruning method of convolutional neural networks, device and storage mediums, can reduce convolution
While the calculation amount of neural network, the loss of information is avoided, while realizing the otherness for different samples.
In a first aspect, at least embodiment of the application provides a kind of pruning method of convolutional neural networks, the volume
Product neural network includes multiple layers, and the multiple layer includes one or more convolutional layers, which comprises
The feature set of graphs of the n-th convolutional layer output of the convolutional neural networks is obtained, the feature set of graphs includes more
A characteristic pattern, n-th convolutional layer are any of one or more of convolutional layers, and n is positive integer;
Determine the important feature figure and insignificant characteristic pattern in the feature set of graphs, the important feature figure is to the volume
The influence of the output result of product neural network is greater than the insignificant characteristic pattern;
The convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain next layer of n-th convolutional layer
Input, next layer of input of n-th convolutional layer includes the important feature figure.
In the embodiment of the present application, in the convolutional neural networks course of work, by the characteristic pattern for distinguishing convolutional layer output
In important feature figure and insignificant characteristic pattern only export therein and when exporting characteristic pattern to next layer of neural network
Important feature figure, rather than important feature figure is smaller on the influence of the output result of convolutional neural networks, so doing so, not only to defeated
The influence of result is smaller out or even does not influence output as a result, and can greatly save calculating.Simultaneously as the program is in mind
The operation carried out during through Web vector graphic will not have an impact the structure of neural network model, so this will not be lost
A little information.Also, due to the application it is confirmed that whether characteristic pattern is important, and for different samples, neural network is produced
Raw characteristic pattern is different, therefore for different samples, the important feature figure and insignificant characteristic pattern determined
Can be different, so that the characteristic pattern that the program trims different samples is different, being for different samples has
Otherness.
Optionally, the important feature figure and insignificant characteristic pattern in the determination feature set of graphs, comprising:
Calculate the L2 norm of each characteristic pattern of n-th convolutional layer;
If the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that the fisrt feature figure
For the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that described first
Characteristic pattern is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
In this implementation, judging characteristic figure can be carried out by comparing the L2 norm of characteristic pattern and the size of threshold value
Importance.Specifically, in convolution operation, the numerical value of each feature is bigger in characteristic pattern, the output to convolutional neural networks
As a result influence it is also bigger, it is also more important;And L2 norm is characterized the root mean square of the quadratic sum of each feature in figure, therefore can be with
By the size of L2 norm, the importance of each characteristic pattern is judged.
Optionally, the first threshold of n-th convolutional layer is the norm mean value of n-th convolutional layer and multiplying for threshold coefficient
Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
In this implementation, the basis using norm mean value as first threshold, by comparing the model of each characteristic pattern
Several sizes with mean value carry out the significance level of judging characteristic figure.Meanwhile by the way that threshold coefficient is arranged, allow to adjust as needed
The size of the threshold value is saved, to adjust the ratio of the trimming of characteristic pattern.
Optionally, described before the important feature figure and insignificant characteristic pattern in the determination feature set of graphs
Determine the important feature figure and insignificant characteristic pattern in the feature set of graphs, further includes:
Determine whether n-th convolutional layer is that can trim convolutional layer;
If n-th convolutional layer is that can trim convolutional layer, it is determined that important feature figure in the feature set of graphs and non-
Important feature figure.
In this implementation, before judging whether each characteristic pattern be important characteristic pattern, first judge one layer of characteristic pattern
Whether need to trim, avoid generating unreasonable trimming when the significance level of one layer of characteristic pattern is suitable, after improving trimming
The accuracy of convolutional neural networks.
Optionally, whether the determination n-th convolutional layer is that can trim convolutional layer, comprising:
Calculate the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer;
If the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer is greater than second threshold, it is determined that described
N-th convolutional layer can trim convolutional layer described in being.
Here, judge whether one layer of characteristic pattern needs to trim to realize by the coefficient of variation.The coefficient of variation indicates one group of number
According to dispersion degree, the coefficient of variation of the L2 norm of all characteristic patterns of the n-th convolutional layer indicates each characteristic pattern of the n-th convolutional layer
L2 norm dispersion degree, by calculate one layer of characteristic pattern L2 norm the coefficient of variation, can determine each characteristic pattern
Significance level dispersion, to judge whether need to trim.
Optionally, described that the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer
Next layer of input, comprising:
The feature of the insignificant characteristic pattern is all set to 0;The insignificant characteristic pattern after 0 being set and described important
Next layer of input of the characteristic pattern as n-th convolutional layer;
Alternatively, described trim the convolutional neural networks based on the insignificant characteristic pattern, n-th convolutional layer is obtained
Next layer of input, comprising:
Insignificant characteristic pattern is screened out from the feature set of graphs, using the important feature figure as n-th convolution
Next layer of input of layer.
In this implementation, two kinds of modes for trimming insignificant characteristic pattern are provided, one is by insignificant feature
Figure sets 0, is then output to next layer;Another kind is important feature figure to be only output to next layer.Two methods are able to achieve
The trimming of insignificant characteristic pattern reduces calculation amount.
Optionally, the method also includes:
The training convolutional neural networks, the loss function used during the training convolutional neural networks include
L2, the regularization term of 1 norm, the regularization term are used for the sparse features of learning sample.
In this implementation, by the way that L2, the regularization term of 1 norm, Lai Jinhang convolutional Neural are added in loss function
The training of network.Due to the effect of the regularization term of L2,1 norm be learning sample sparse features (namely study arrive feature
Significance level), so being trained using the loss function for including the regularization term, a sparse neural network can be generated
Model, the sparse neural network model can carry out the selection of important feature in subsequent use.
Optionally, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2, and 1
Norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are described after inputting the first sample
The sum of L2 norm of characteristic pattern of all convolutional layers of convolutional neural networks, the first sample are any of all samples
Sample.
In this implementation, L2, the regularization term of 1 norm are the L2 of each different samples, and the sum of 1 norm makes in this way
It obtains and may learn each sample by the convolutional neural networks that the loss function including the regularization term is trained
Sparse features.In subsequent use convolutional neural networks, for different samples, the choosing to important feature may be implemented
It selects.
An at least embodiment for second aspect, the application provides a kind of clipping device of convolutional neural networks, the volume
Product neural network includes multiple layers, and the multiple layer includes one or more convolutional layers, and described device includes:
Computing unit is configured as obtaining the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, described
Feature set of graphs includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive
Integer;
Determination unit, the important feature figure and insignificant characteristic pattern being configured to determine that in the feature set of graphs are described
Influence of the important feature figure to the output result of the convolutional neural networks is greater than the insignificant characteristic pattern;
Processing unit is configured as trimming the convolutional neural networks based on the insignificant characteristic pattern, obtains described n-th
Next layer of input of convolutional layer, next layer of input of n-th convolutional layer include the important feature figure.
Optionally, the determination unit, comprising:
Computation subunit is configured as calculating the L2 norm of each characteristic pattern of n-th convolutional layer;
Determine subelement, if the L2 norm for being configured as fisrt feature figure is less than the first threshold of n-th convolutional layer,
Determine that the fisrt feature figure is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to first threshold
Value, it is determined that the fisrt feature figure is the important feature figure, and the fisrt feature figure is appointing in the feature set of graphs
One characteristic pattern.
Optionally, the first threshold of n-th convolutional layer is the norm mean value of n-th convolutional layer and multiplying for threshold coefficient
Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
Optionally, the determining subelement, the important feature being additionally configured in the determination feature set of graphs
Before figure and insignificant characteristic pattern, determine whether n-th convolutional layer is that can trim convolutional layer;It can if n-th convolutional layer is
Trim convolutional layer, it is determined that important feature figure and insignificant characteristic pattern in the feature set of graphs.
Optionally, the computation subunit is additionally configured to calculate the L2 norm of all characteristic patterns of n-th convolutional layer
The coefficient of variation;
The determining subelement, if being configured as the coefficient of variation greater than second threshold, it is determined that n-th convolutional layer
Convolutional layer is trimmed to be described.
Optionally, the processing unit is configured as the feature of the insignificant characteristic pattern being all set to 0;After 0 being set
Next layer of the input as n-th convolutional layer of the insignificant characteristic pattern and the important feature figure;
Alternatively, the processing unit, is configured as screening out insignificant characteristic pattern from the feature set of graphs, it will be described
Next layer of input of the important feature figure as n-th convolutional layer.
Optionally, described device further include:
Training unit is configured as training the convolutional neural networks, during the training convolutional neural networks
The loss function used includes L2, the regularization term of 1 norm, sparse features of the regularization term for learning sample.
Optionally, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2, and 1
Norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are described after inputting the first sample
The sum of L2 norm of characteristic pattern of all convolutional layers of convolutional neural networks, the first sample are any of all samples
Sample.
An at least embodiment for the third aspect, the application provides a kind of clipping device of convolutional neural networks, the volume
The clipping device of product neural network includes processor and memory;The memory is for storing software program and module, institute
Processor is stated by running or executing storage software program in the memory and/or module realizes above-mentioned first aspect
Method in any possible embodiment.
Optionally, the processor is one or more, and the memory is one or more.
Optionally, the memory can be integrated with the processor or the memory divides with processor
From setting.
During specific implementation, memory can be non-transient (non-transitory) memory, such as read-only
Memory (read only memory, ROM), can be integral to the processor on same chip, can also be respectively set
On different chips, the embodiment of the present application does not limit the type and memory of memory and the set-up mode of processor
It is fixed.
Fourth aspect, at least embodiment of the application provide a kind of computer program (product), the computer journey
Sequence (product) includes: computer program code, when the computer program code is run by computer, so that the computer
Execute the method in any possible embodiment of above-mentioned first aspect.
5th aspect, at least embodiment of the application provide a kind of computer readable storage medium, the computer
For readable storage medium storing program for executing for program code performed by storage processor, said program code includes for realizing above-mentioned first party
Method in any possible embodiment in face.
6th aspect, provides a kind of chip, including processor, and processor is described for calling and running from memory
The instruction stored in memory, so that the communication equipment for being equipped with the chip executes any possible of above-mentioned first aspect
Method in embodiment.
7th aspect, provides another chip, comprising: input interface, output interface, processor and memory, it is described defeated
It is connected between incoming interface, output interface, the processor and the memory by internal connecting path, the processor is used
In executing the code in the memory, when the code is performed, the processor is for executing above-mentioned first aspect
Method in any possible embodiment.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram of convolutional neural networks provided by the embodiments of the present application;
Fig. 2 is the structural schematic diagram of processing equipment provided by the embodiments of the present application;
Fig. 3 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application;
Fig. 4 is the flow chart of the pruning method of another convolutional neural networks provided by the embodiments of the present application;
Fig. 5 is a kind of schematic diagram of characteristic pattern trimming process provided by the embodiments of the present application;
Fig. 6 is the flow chart of the pruning method of another convolutional neural networks provided by the embodiments of the present application;
Fig. 7 is a kind of flow chart of convolutional neural networks training method provided by the embodiments of the present application;
Fig. 8 is a kind of schematic diagram of the training process of convolutional neural networks provided by the embodiments of the present application;
Fig. 9 is a kind of structural schematic diagram of the clipping device of convolutional neural networks provided by the embodiments of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party
Formula is described in further detail.
To introduce the applied field of the application first convenient for the understanding to technical solution provided by the embodiments of the present application
Scape.
It in this scenario include that processing equipment can store and be run in processing equipment in the application scenarios of the application
Convolutional neural networks.
Processing equipment involved by the application may include handheld device, mobile unit, wearable device, calculate equipment
Or it is connected to the other equipment and cloud equipment of radio modem, terminal (terminal), terminal device (Terminal
Equipment), monitoring device, server etc..For convenience of description, in the application, referred to as processing equipment.
Convolutional neural networks include multiple layers, this multiple layer includes one or more convolutional layers, the convolution other than convolutional layer
Neural network includes but is not limited to ReLu (amendment linear unit) layer, pond layer and full articulamentum etc..Each convolutional layer includes
Multiple filters (or being convolution kernel), each filter is an array, and number therein is referred to as weight or parameter.Convolution
The effect of the convolutional layer of neural network is that the filter carried out in convolution operation, such as first layer convolutional layer to input can be with setting
Step-length slided on sample, in each sliding position, the array of filter be multiplied with sample data after be added obtain a number
All numerical value obtained in sliding process are formed a new array by value, which is referred to as characteristic pattern (feature
Map), a feature of each numerical value in characteristic pattern i.e. this feature figure, multiple filters of each convolutional layer pass through volume
Product obtains multiple characteristic patterns, this multiple characteristic pattern constitutes one layer of characteristic pattern.
Fig. 1 is a kind of structural schematic diagram of convolutional neural networks.Referring to Fig. 1, which includes being arranged alternately
Convolutional layer and pond layer, convolutional layer 1 as shown in Figure 1, pond layer 1, convolutional layer 2, pond layer 2 ..., and be connected to most
Full articulamentum 1, full articulamentum 2 on the layer of the latter pond and softmax layers, softmax layers of output is convolutional Neural net
The output of network.
Fig. 2 is a kind of possible hardware structural diagram of aforementioned processing equipment.As shown in Fig. 2, processing equipment includes place
Manage device 10, memory 20 and communication interface 30.It will be understood by those skilled in the art that structure shown in Figure 2 is not constituted
Restriction to the processing equipment may include perhaps combining certain components or different than illustrating more or fewer components
Component layout.Wherein:
Processor 10 is the control centre of processing equipment, utilizes each of various interfaces and the entire processing equipment of connection
Part by running or execute the software program and/or module that are stored in memory 20, and calls and is stored in memory 20
Interior data execute the various functions and processing data of processing equipment, to carry out whole control to processing equipment.Processor 10
Can be CPU, can also be other general processors, digital signal processor (digital signal processing,
DSP), specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate
Array (field-programmable gate array, FPGA) either other programmable logic device, discrete gate or crystalline substance
Body pipe logical device, discrete hardware components etc..General processor can be microprocessor either any conventional processor etc..
It is worth noting that processor, which can be, supports advanced reduced instruction set machine (advanced RISC machines, ARM) frame
The processor of structure.
Memory 20 can be used for storing software program and module.Processor 10 is stored in the soft of memory 20 by operation
Part program and module, thereby executing various function application and data processing.Memory 20 can mainly include storing program area
The storage data area and, wherein storing program area can storage program area 21, acquisition module 22, determining module 23, processing module 24
With application program 25 needed for one or more functions (such as importance of judging characteristic figure etc.) etc.;Storage data area can store
Created data (such as characteristic pattern etc. of convolutional layer output) etc. are used according to UE or destination server.The memory 20
It can be volatile memory or nonvolatile memory, or may include both volatile and non-volatile memories.Wherein, non-
Volatile memory can be read-only memory (read-only memory, ROM), programmable read only memory
(programmable ROM, PROM), Erasable Programmable Read Only Memory EPROM (erasable PROM, EPROM), electric erasable
Programmable read only memory (electrically EPROM, EEPROM) or flash memory.Volatile memory can be arbitrary access
Memory (random access memory, RAM) is used as External Cache.By exemplary but be not restricted theory
Bright, the RAM of many forms is available.For example, static random access memory (static RAM, SRAM), dynamic randon access are deposited
Reservoir (dynamic random access memory, DRAM), Synchronous Dynamic Random Access Memory (synchronous
DRAM, SDRAM), double data speed synchronous dynamic RAM (double data date SDRAM, DDR
SDRAM), enhanced Synchronous Dynamic Random Access Memory (enhanced SDRAM, ESDRAM), synchronized links dynamic random are deposited
Access to memory (synchlink DRAM, SLDRAM) and direct rambus random access memory (direct rambus
RAM, DR RAM).Correspondingly, memory 20 can also include Memory Controller, to provide processor 10 to memory 20
Access.
Wherein, processor 20 obtains module 22 by operation and executes following functions: obtaining the n-th of the convolutional neural networks
Convolutional layer output feature set of graphs, the feature set of graphs includes multiple characteristic patterns, n-th convolutional layer be it is one or
Any of multiple convolutional layers, n are positive integer;Processor 20 executes following functions by operation determining module 23: determining institute
The important feature figure and insignificant characteristic pattern in feature set of graphs are stated, the important feature figure is to the defeated of the convolutional neural networks
The influence of result is greater than the insignificant characteristic pattern out;Processor 20 executes following functions by operation processing module 24: being based on
The insignificant characteristic pattern trims the convolutional neural networks, obtains next layer of input of n-th convolutional layer, and described n-th
Next layer of input of convolutional layer includes the important feature figure.
The embodiment of the present application also provides a kind of chip, including processor, processor is for calling and transporting from memory
The instruction stored in line storage, so that the communication equipment for being equipped with chip executes any convolutional Neural provided by the present application
The method of the trimming of network.
The embodiment of the present application also provides a kind of chips, comprising: input interface, output interface, processor and memory, it is defeated
It is connected between incoming interface, output interface, processor and memory by internal connecting path, processor is for executing memory
In code, when code is performed, method that processor is used to execute the trimming of any of the above-described kind of convolutional neural networks.
It should be understood that above-mentioned processor can be CPU, can also be other general processors, DSP, ASIC, FPGA or
Person other programmable logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be
Microprocessor either any conventional processor etc..It is worth noting that processor can be the processing for supporting ARM framework
Device.
Further, in an alternative embodiment, above-mentioned processor is one or more, and memory is one or more
It is a.Optionally, memory can be integral to the processor together or memory and processor it is separately positioned.Above-mentioned memory
It may include read-only memory and random access memory, and provide instruction and data to processor.Memory can also include
Nonvolatile RAM.For example, memory can be with the information of storage device type.
The memory can be volatile memory or nonvolatile memory, or may include that volatile and non-volatile is deposited
Both reservoirs.Wherein, nonvolatile memory can be ROM, PROM, EPROM, EEPROM or flash memory.Volatile memory can
To be RAM, it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available.For example,
SRAM, DRAM, SDRAM, DDR SDRAM, ESDRAM, SLDRAM and DR RAM.
This application provides a kind of computer programs, when computer program is computer-executed, can make processor
Or computer executes corresponding each step in the embodiment of the method for the trimming of any convolutional neural networks provided by the present application
Rapid and/or process.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process described herein or function.The computer can be general purpose computer, special purpose computer, meter
Calculation machine network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or
It is transmitted from a computer readable storage medium to another computer readable storage medium, for example, the computer instruction can
To pass through wired (such as coaxial cable, optical fiber, digital subscriber from a web-site, computer, server or data center
Line) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or data center into
Row transmission.The computer readable storage medium can be any usable medium or include one that computer can access
Or the data storage devices such as integrated server, data center of multiple usable mediums.The usable medium can be magnetic medium,
(for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State
Disk) etc..
Fig. 3 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application.This method can be with
It is executed by the processing equipment in aforementioned applications scene, as shown in figure 3, this method comprises the following steps.
Step S31: the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature atlas are obtained
Closing includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer.
As previously mentioned, the convolutional layer of convolutional neural networks obtains characteristic pattern by convolution operation.Convolutional neural networks include
One or more convolutional layers, when convolutional neural networks include multiple convolutional layers, the n-th convolutional layer here can be convolutional Neural
Any layer of network.Also, the pruning method of convolutional neural networks provided by the embodiments of the present application, both can be to convolutional Neural net
The part convolutional layer of network executes, and can also execute to whole convolutional layers of convolutional neural networks.Wherein, part convolutional layer is held
When row, this part convolutional layer is the arbitrary portion convolutional layer in convolutional neural networks, these convolutional layers can be adjacent convolution
Layer, is also possible to spaced convolutional layer.
Step S32: the important feature figure and insignificant characteristic pattern in the feature set of graphs are determined.
In the embodiment of the present application, characteristic pattern is divided into important feature figure and insignificant characteristic pattern, for distinguishing each spy
Levy the significance level of figure.Important feature figure and insignificant characteristic pattern be it is opposite, the significance level of important feature figure is greater than non-heavy
Want characteristic pattern.Important feature figure refer to convolutional neural networks execute deep learning task when act on it is larger (to result accuracy rate
It is affected), namely to the characteristic pattern that the output result of convolutional neural networks is affected, rather than important feature figure refers to and is rolling up
Effect is smaller when product neural network executes deep learning task (influencing on result accuracy rate smaller), namely to convolutional neural networks
Output result influence lesser characteristic pattern.Effect of the important feature figure when convolutional neural networks execute deep learning task is big
In insignificant characteristic pattern, influence of the important feature figure to the output result of convolutional neural networks is greater than insignificant characteristic pattern.
Wherein, deep learning task refers to use convolutional neural networks with the target that sets to data (sample) at
Reason, goal include but is not limited to classification, target detection etc., and data include but is not limited to the number such as text, image, voice
According to.The deep learning task that convolutional neural networks provided by the present application can execute includes but is not limited to that depth model is used to calculate
Various tasks under scene, for example, Computer Vision Task, voice recognition tasks and natural language processing task dispatching, these
Business can specifically include image or text classification, target detection, semantic segmentation etc..Convolutional neural networks are executing above-mentioned depth
When habit task, output result namely with the target of setting to data (sample) handled as a result, for example target detection appoint
Target position when business, the image category etc. when image classification task.
Illustratively, when carrying out target detection using convolutional neural networks, important feature figure refers in convolutional Neural net
Network acted on when target detection it is larger, to the characteristic pattern that is affected of object detection results of convolutional neural networks output, and
Insignificant characteristic pattern refers to that effect is smaller when convolutional neural networks carry out target detection, to the target of convolutional neural networks output
Testing result influences lesser characteristic pattern.
Wherein, the sample itself that the importance of characteristic pattern depends on input will be solved the problems, such as with deep learning task.Cause
This important feature figure is also possible to have and deep learning either the characteristic pattern of the feature of sample data itself can be presented
The characteristic pattern of the high feature of the target correlation of task.
In the embodiment of the present application, when user uses mobile phone photograph, face can be automatically grabbed by target detection, moved
The targets such as object can help Mobile phone automatic focusing, beautification etc..But since the processing capacity of mobile phone is relatively weak, using this Shen
Scheme please carrys out the process of performance objective detection, and the calculation amount of convolutional neural networks when can greatly reduce target detection can
Mobile phone products quality is promoted, brings better user experience to user.
Step S33: the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer
Next layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Since effect is smaller when convolutional neural networks execute deep learning task for insignificant characteristic pattern, so in subsequent meter
Give up in calculation these characteristic patterns to output result influence it is smaller in addition do not influence output as a result, and in subsequent calculating process
In, it no longer needs to that computing resource is spent to handle these characteristic patterns being rejected, calculating can be greatlyd save.
For example, next layer of the n-th convolutional layer is pond layer, then the important feature figure in the characteristic pattern of the n-th convolutional layer is made
For the input of the pond layer, pond layer completes pondization operation according only to the important feature figure of the n-th convolutional layer.
In addition, the pruning method of convolutional neural networks provided by the present application can be in the convolutional neural networks after depth-compression
It is carried out on the convolutional neural networks not after depth-compression, such as can be to carrying out the convolutional Neural after cut operator
Network carries out the pruning method of above-mentioned convolutional neural networks, to further decrease calculation amount.
In the embodiment of the present application, step S31- step S33 can be executed by aforementioned processing equipment, and processing equipment is in convolution
In the work process of neural network, the characteristic pattern that control convolutional layer is exported to next layer, to control the meter of convolutional neural networks
Calculation amount.The result for executing deep learning task is eventually exported using the convolutional neural networks after the trimming of above method step.
In the embodiment of the present application, in the convolutional neural networks course of work, by the characteristic pattern for distinguishing convolutional layer output
In important feature figure and insignificant characteristic pattern only export therein and when exporting characteristic pattern to next layer of neural network
Important feature figure, rather than important feature figure is smaller on the influence of the output result of convolutional neural networks, so doing so, not only to defeated
The influence of result is smaller out or even does not influence output as a result, and can greatly save calculating.Simultaneously as the program is in mind
The operation carried out during through Web vector graphic will not have an impact the structure of neural network model, so this will not be lost
A little information.Also, due to the application it is confirmed that whether characteristic pattern is important, and for different samples, neural network is produced
Raw characteristic pattern is different, therefore for different samples, the important feature figure and insignificant characteristic pattern determined
Can be different, so that the characteristic pattern that the program trims different samples is different, being for different samples has
Otherness.
Fig. 4 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application.This method can be with
It is executed by the processing equipment in aforementioned applications scene, as shown in figure 4, this method comprises the following steps.
Step S41: the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature atlas are obtained
Closing includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer.
Step S42: the L2 norm of each characteristic pattern of n-th convolutional layer is calculated.
The L2 norm of characteristic pattern is square root sum square of each feature in characteristic pattern.The application uses each feature
The L2 norm of figure measures the significance level of each characteristic pattern as statistic.In convolution operation, each feature in characteristic pattern
Numerical value it is larger, the output results of convolutional neural networks is influenced it is also larger, it is also more important, and L2 norm is these features
Quadratic sum root mean square, therefore the importance of each characteristic pattern can be judged by the size of L2 norm.
In this step, the calculation formula of L2 norm is as follows:
In formula (1), m indicates that m-th of sample, n indicate that the n-th convolutional layer, cn indicate the cn feature of the n-th convolutional layer
Figure.When what is indicated is that convolutional neural networks input m-th of sample, the L2 norm of the cn characteristic pattern of the n-th convolutional layer.
Hn and Wn is respectively the height and width of the n-th convolutional layer characteristic pattern, and (hn, ω n) indicates the position of the feature in characteristic pattern, and F is
Feature (feature),When what is indicated is that convolutional neural networks input m-th of sample, the cn of the n-th convolutional layer is special
Levy the feature that position in figure is (hn, ω n).
Calculate the L2 norm of each characteristic pattern in the n-th convolutional layer one by one by the formula.
Step S43: if the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that described the
One characteristic pattern is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that
The fisrt feature figure is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
Wherein, influence of the important feature figure to the output result of the convolutional neural networks is greater than the insignificant characteristic pattern.
That is, judging the size of its L2 norm and first threshold respectively for each characteristic pattern, each feature is determined
Whether figure is insignificant characteristic pattern.
Since the characteristic dimension of convolutional layer each in convolutional neural networks is different, so the size of the first threshold of different layers
It is different.First threshold can be designed according to the mean value of the L2 norm of each convolutional layer.Wherein, characteristic dimension refers to the big of feature
It is small, for example, a convolutional layer feature sizes between 100-200, the feature sizes of another convolutional layer between 0.1-0.2,
Then the characteristic dimension of the two convolutional layers is different.
Illustratively, the first threshold of n-th convolutional layer is the norm mean value and threshold coefficient of n-th convolutional layer
Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
Wherein, the threshold coefficient is greater than or equal to 0 and less than 2.Here, the value of threshold coefficient be [0,2).Because
In first threshold, the part that is multiplied with the threshold coefficient is the mean value of L2 norm, and twice of the mean value of L2 norm almost etc.
In L2 norm maximum value.So threshold coefficient it is selected [0,2) between value, may be implemented for all characteristic patterns to be divided into important
Partial Feature figure is divided into important feature figure, all characteristic patterns is divided into the various situations such as insignificant characteristic pattern by characteristic pattern.
Specifically, threshold coefficient here can in above-mentioned value range on-demand value, the value of threshold coefficient is bigger, draw
The insignificant characteristic pattern got is more, and accuracy rate is lower, and calculation amount is smaller;The value of threshold coefficient is smaller, and what is divided is non-
Important feature figure is fewer, and accuracy rate is higher, and calculation amount is bigger.
Wherein, first threshold can be calculated using following formula:
In formula (2), β indicates threshold coefficient, μnIndicate the mean value of the L2 norm of the n-th convolutional layer, Cn indicates the n-th convolution
The sum of the characteristic pattern of layer.
Step S44: the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer
Next layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Here, carrying out trimming to convolutional neural networks includes structuring trimming and unstructured trimming.What the application used
It is structuring trim mode, wherein structuring trimming, which refers to, directly removes a characteristic pattern, an a layer either convolution kernel
Deng being conducive to speeding up to realize on hardware.
Structuring trim mode provided by the present application includes the following two kinds, the purpose of both modes be remove it is insignificant
Characteristic pattern:
The first, which may include: that the feature of the insignificant characteristic pattern is all set to 0;Described in after 0 being set
Next layer of the input of insignificant characteristic pattern and the important feature figure as n-th convolutional layer.
Second, which may include: to screen out insignificant characteristic pattern from the feature set of graphs, will be described important
Next layer of input of the characteristic pattern as n-th convolutional layer.
The second way namely controlling which characteristic pattern is transferred to next layer, which characteristic pattern is not transmitted to next layer,
The application controls important feature figure and is transferred to next layer, and insignificant characteristic pattern is not transmitted to next layer.For example, can will be insignificant
The weight of the corresponding connection of characteristic pattern is set to 0, so that the insignificant characteristic pattern will not be output to the next of convolutional neural networks
Layer.
Here next layer of convolutional neural networks refers to that next layer of the n-th convolutional layer in neural network, the layer can be
ReLu layers or pond layer etc., are also possible to convolutional layer.
Fig. 5 is a kind of schematic diagram of characteristic pattern trimming process provided by the embodiments of the present application, indicates volume n-th referring to Fig. 5, Fn
The characteristic pattern of lamination, the n-th convolutional layer include that multiple characteristic pattern Fn determine important feature after calculating L2 norm (Norm)
Figure and insignificant characteristic pattern.Insignificant characteristic pattern is trimmed, such as wherein will set 0 by insignificant characteristic pattern, namely after trimming
White square, important feature figure be trim after the square with shade, characteristic pattern at this time is exported to next layer of Lx, Lx
It can be convolutional layer, ReLu, pond layer etc..
In the embodiment of the present application, step S41-S44 can be executed for each convolutional layer of convolutional neural networks
The step of.This implementation is briefly described below with reference to Fig. 1: after 1 input sample of convolutional layer, convolutional layer 1 is to this
Sample carries out convolution operation, obtains the feature set of graphs of convolutional layer 1;Step S41-S44 is executed to the feature set of graphs of convolutional layer 1
Method and step;Then Chi Huacao is carried out by the important feature figure of 1 pair of the pond layer convolutional layer 1 by step S41-S44 output
Make, and result is exported to convolutional layer 2;Convolutional layer 2 carries out convolution layer operation for the result of pond layer 1, obtains convolutional layer 2
Feature set of graphs;The method and step of step S41-S44 is executed to the feature set of graphs of convolutional layer 2;Then passed through by 2 Duis of pond layer
The important feature figure of the convolutional layer 2 of step S41-S44 output carries out pondization operation, and result is exported to convolutional layer 3;Class according to this
It pushes away, until the characteristic pattern that all convolutional layers export passes through above-mentioned trimming.After the completion of all convolutional layers and pond layer are handled,
Result is exported and is handled to full articulamentum 1, full articulamentum 2 and softmax layers, and finally by the softmax layers of output volume
The output result of product neural network.For example, the convolutional neural networks, when executing Face datection, output result is that face is being schemed
As upper position;For another example, for the convolutional neural networks when executing picture classification, output result is picture classification result.
Below by taking the pruning method of convolutional neural networks shown in Fig. 4 as an example, to the calculation amount of method provided by the present application
It is illustrated.
The calculation amount of (n+1)th convolutional layer convolutional calculation in the convolution of standard are as follows: Cn × Cn+1 × Hn × Wn × k × k;Its
In, Cn is the quantity of the characteristic pattern of the n-th convolutional layer output namely the input quantity of the (n+1)th convolutional layer;Hn × Wn is the n-th convolution
The size of the characteristic pattern of layer output;Cn+1 is the characteristic pattern quantity of the output of the (n+1)th convolutional layer;K × k is convolution kernel (filter)
Size.
And include two parts using the calculation amount of convolutional calculation after method shown in Fig. 4:
The calculation amount of the L2 norm of n-th convolution layer by layer: Cn × Hn × Wn;
The calculation amount of the (n+1)th convolutional layer convolutional calculation after trimming C insignificant characteristic patterns:
Wherein, Cn-C
For the quantity of the characteristic pattern of the n-th convolutional layer output namely the input quantity of the (n+1)th convolutional layer;
Then using the amount of calculation after method shown in Fig. 4 are as follows: Cn × Hn × Wn+ (Cn-C) × Cn+1 × Hn × Wn × k ×
K, compared to the computation amount of the convolutional calculation of standard.
Fig. 6 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application, as shown in fig. 6,
Compared with the method that Fig. 4 is provided, main distinction point is this method, determines the mode of important feature figure and insignificant characteristic pattern not
Together, this method comprises the following steps:
Step S51: the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature atlas are obtained
Closing includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer.
Specifically, step S51 can be identical as step S41.
Step S52: the L2 norm of each characteristic pattern of n-th convolutional layer is calculated.
Specifically, step S52 can be identical as step S42.
Step S53: determine whether n-th convolutional layer is that can trim convolutional layer.
Specifically, which may include:
The first step calculates the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer.
The coefficient of variation refers to the ratio between mean square deviation and average value of multiple data, is for analyzing data discrete degree size
Amount.For the L2 norm of all characteristic patterns of convolutional layer, these characteristic patterns can be analyzed by calculating the coefficient of variation
The dispersion degree of L2 norm.If the coefficient of variation is larger, illustrate that the dispersion degree of the L2 norm of characteristic pattern is larger, characteristic pattern
Importance is more dispersed, different characteristic figure acted in the task of subsequent execution convolutional neural networks it is of different sizes, at this point, then can be with
Calculation amount is reduced by trimming insignificant characteristic pattern.And if the coefficient of variation is smaller, illustrate the L2 norm of characteristic pattern
Dispersion degree it is smaller, the importance of characteristic pattern is more intensive, and different characteristic figure is in the task of subsequent execution convolutional neural networks
Sizableness is acted on, is difficult to be trimmed under the premise of ensuring accuracy rate at this time.
Wherein, the calculation formula of the coefficient of variation of the L2 norm of all characteristic patterns of the n-th convolutional layer is as follows:
In formula (3), CVnIndicate the coefficient of variation of the L2 norm of all characteristic patterns of the n-th convolutional layer.
Second step, if the coefficient of variation is greater than second threshold, it is determined that n-th convolutional layer can trim convolution to be described
Layer.
Due to the coefficient of variation indicate be data dispersion degree, it is unrelated with the characteristic dimension of each layer convolutional layer, therefore not
The same second threshold α can be used with convolutional layer.
The value range of second threshold α can for [0,2).Here second threshold α, which is also one, can according to need
Carry out value threshold value, pass through the value of condition α, thus it is possible to vary the insignificant characteristic pattern finally trimmed number.For example, the
The value of two threshold alphas is smaller, and convolutional layer is more for trimming of determining, and the insignificant characteristic pattern finally trimmed may be more;
The value of second threshold α is bigger, and the insignificant characteristic pattern finally trimmed may be fewer, the insignificant characteristic pattern finally trimmed
It may be fewer.
If the coefficient of variation is less than or equal to the second threshold, it is determined that n-th convolutional layer is not described repairs
Cut convolutional layer.
If n-th convolutional layer is not that can trim convolutional layer, the characteristic pattern of this layer of convolutional layer is not trimmed, at this time
Step S54 is not executed, directly exports all characteristic patterns of n-th convolutional layer to next layer of convolutional neural networks, as
Next layer of input of convolutional neural networks.
Step S54: if the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that described the
One characteristic pattern is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that
The fisrt feature figure is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
Specifically, step S54 can be identical as step S43.
Step S55: the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer
Next layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Specifically, step S55 can be identical as step S44.
Fig. 7 is a kind of flow chart of convolutional neural networks training method provided by the embodiments of the present application, as shown in fig. 7, should
Method can execute before the pruning method of the convolutional neural networks provided by any width of Fig. 3-Fig. 6, and this method includes as follows
Step:
Step S61: training sample is obtained.
Here training sample includes multiple samples.
Step S62: using convolutional neural networks described in the sample training, in the process of the training convolutional neural networks
Used in loss function include L2, the regularization term of 1 norm, the regularization term is used for the sparse features of learning sample.
The training method of convolutional neural networks provided by the present application is identical as conventional training method, and change is only to lose
Function, wherein conventional training method can be back-propagation algorithm etc..
Wherein, regularization term is the penalty term of loss function, can be to the weight of neural network by the way that regularization term is added
It is constrained.
Wherein, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2,1 model
Number is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are to input the volume after the first sample
The sum of the L2 norm of characteristic pattern of all convolutional layers of product neural network, the first sample are any one sample in all samples
This.
The L1 norm of the L2 norm of all characteristic patterns of the L2,1 norm namely convolutional neural networks of first sample.L1 norm
The effect of regularization L1 regularization is to can produce sparse weight matrix, that is, generates a sparse model, is used for feature selecting.?
That is, by the regularization term that the L1 norm is added in the loss function of convolutional neural networks so that the convolutional neural networks at
For a sparse model.It is subsequent using the product neural network when, can by execute method shown in any width of Fig. 3-Fig. 6 into
Row feature selecting.
In the embodiment of the present application, the formula of the loss function can be expressed as follows:
In formula (4), Ltask is loss function, and for different tasks, Ltask has a different forms, such as
For image classification, Ltask is cross entropy loss function;γ R (ω) is conventional regularization term, and γ is custom parameter, R
(ω) is usually L2 norm, can prevent over-fitting;For the L2 that the application increases newly, the regularization term of 1 norm, λ is positive
Then change coefficient,For L2,1 norm, wherein NmFor the L2 of m-th of sample, 1 norm, M is total sample number.
Wherein,Wherein, N is total number of plies of convolutional layer.For the L2 norm of a characteristic pattern,
Specific calculation may refer to step S42.
Fig. 8 is a kind of schematic diagram of the training process of convolutional neural networks provided by the embodiments of the present application.Referring to Fig. 8,
When training, calculate the L2 norm of each characteristic pattern of each convolutional layer, in figure Fn and Fn+1 respectively indicate the n-th convolutional layer and n-th+
The characteristic pattern of 1 convolutional layer, each layer of convolutional layer include multiple characteristic pattern Fn or Fn+1.L2 norm calculation L1 model based on each layer
Number (L2,1 norm), is added L2, the regularization term of 1 norm completes the training of convolutional neural networks in loss function Loss.Figure
Lx can be convolutional layer, ReLu, pond layer etc. in 8.
It is illustrated below with reference to accuracy rate of the 1~table of table 4 to method provided by the present application: such as table 1, table 2, table 3 and table 4
It is shown, using the pruning method of convolutional neural networks provided by the present application on CIFAR-10 data set, guaranteeing accuracy rate base
In the case that this does not decline, the characteristic pattern of higher proportion still can be trimmed.Even in compressed network and lightweight
On network, this method is equally effective.
Table 1 using random beta pruning (rand) and presses minimum value beta pruning (min) on trained convolutional neural networks VGG16
When accuracy rate
In table 1, pr (10%, 20%, 30%) respectively indicates random beta pruning or the beta pruning ratio in minimum value beta pruning.λ
(0,1e-6,1e-7,1e-8) is the L2 increased newly in loss function, and the regularization coefficient of the regularization term of 1 norm is right when λ=0
Obtained convolutional neural networks should routinely be trained.The numerical value (such as 0.673) of three decimals in table indicates accuracy rate.It can see
It is quasi- when out, using the trained obtained random beta pruning of convolutional neural networks model progress of the present processes or by minimum value beta pruning
True rate is better than the convolutional neural networks that conventional training obtains.
The trimming rate and accuracy rate that table 2 is trimmed on VGG16 using the application method
In table 2, thresh (0.5,0.5,0.5,1.0,1.0,1.0) respectively indicates in method provided by the present application
Two threshold alphas and threshold coefficient β.λ (0,1e-6,1e-7,1e-8) is the regularization term of the L2,1 norm increased newly in loss function
Regularization coefficient, the numerical value (such as 0.469) of corresponding three decimals of pr indicates trimming rate in table, corresponding three decimals of acc
Numerical value (such as 0.934) indicates accuracy rate.For example, accuracy rate is up to 0.934 when trimming rate is 46.9%.It can be seen that this Shen
The method that please be provided still is able to maintain higher accuracy rate under higher trimming rate.In addition, when λ=0, matching convention instruction
The convolutional neural networks got, it can be seen that in the convolutional Neural net obtained using method provided by the present application to conventional training
When network is trimmed, accuracy rate is also superior to random beta pruning or presses minimum value beta pruning.
Table 3 uses the trimming rate and accuracy rate trimmed on the application method VGG16 after being compressed
The trimming rate and accuracy rate that table 4 is trimmed on MobileNet-V2 using the application method, MobileNet-
V2 is the convolutional neural networks of lightweight
Fig. 9 is a kind of block diagram of feature extraction provided by the embodiments of the present application.The clipping device of the convolutional neural networks can
To pass through all or part of software, hardware or both being implemented in combination with as processing equipment.The convolutional neural networks
Clipping device may include: computing unit 701, determination unit 702 and processing unit 703.
Wherein, computing unit 701 is configured as obtaining the feature atlas of the n-th convolutional layer output of the convolutional neural networks
It closes, the feature set of graphs includes multiple characteristic patterns, and n-th convolutional layer is any in one or more of convolutional layers
A, n is positive integer;Determination unit 702 is configured to determine that important feature figure and insignificant feature in the feature set of graphs
Figure, influence of the important feature figure to the output result of the convolutional neural networks are greater than the insignificant characteristic pattern;Processing
Unit 703 is configured as trimming the convolutional neural networks based on the insignificant characteristic pattern, obtains under n-th convolutional layer
One layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Optionally, the determination unit 702, comprising:
Computation subunit 721 is configured as calculating the L2 norm of each characteristic pattern of n-th convolutional layer;
Subelement 722 is determined, if the L2 norm for being configured as fisrt feature figure is less than the first threshold of n-th convolutional layer
Value, it is determined that the fisrt feature figure is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to described
First threshold, it is determined that the fisrt feature figure is the important feature figure, and the fisrt feature figure is the feature set of graphs
In any feature figure.
Optionally, the first threshold of n-th convolutional layer is the norm mean value of n-th convolutional layer and multiplying for threshold coefficient
Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
Optionally, the determining subelement 722, the important spy being additionally configured in the determination feature set of graphs
Before sign figure and insignificant characteristic pattern, determine whether n-th convolutional layer is that can trim convolutional layer;If n-th convolutional layer is
Convolutional layer can be trimmed, it is determined that important feature figure and insignificant characteristic pattern in the feature set of graphs.
Optionally, the computation subunit 721 is additionally configured to calculate the L2 of all characteristic patterns of n-th convolutional layer
The coefficient of variation of norm;
The determining subelement 722, if being configured as the coefficient of variation greater than second threshold, it is determined that described volume n-th
Lamination can trim convolutional layer described in being.
Optionally, the processing unit 703 is configured as the feature of the insignificant characteristic pattern being all set to 0;It will set
Next layer of the input of the insignificant characteristic pattern and the important feature figure as n-th convolutional layer after 0;
Alternatively, the processing unit 703, is configured as screening out insignificant characteristic pattern from the feature set of graphs, it will
Next layer of input of the important feature figure as n-th convolutional layer.
Optionally, the device further include:
Training unit 704 is configured as training the convolutional neural networks, in the process of the training convolutional neural networks
Used in loss function include L2, the regularization term of 1 norm, the regularization term is used for the sparse features of learning sample.
Optionally, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2, and 1
Norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are described after inputting the first sample
The sum of L2 norm of characteristic pattern of all convolutional layers of convolutional neural networks, the first sample are any of all samples
Sample.
It should be understood that the clipping device of convolutional neural networks provided by the above embodiment is when being trimmed, only with
The division progress of above-mentioned each functional unit can according to need and for example, in practical application by above-mentioned function distribution by not
Same functional unit is completed, i.e., the internal structure of equipment is divided into different functional units, to complete whole described above
Or partial function.In addition, the trimming of the clipping device and convolutional neural networks of convolutional neural networks provided by the above embodiment
Embodiment of the method belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely the protection scopes of the alternative embodiment of the application, but the application to be not limited thereto, any ripe
Know those skilled in the art within the technical scope of the present application, any changes or substitutions that can be easily thought of, should all contain
Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (20)
1. a kind of pruning method of convolutional neural networks, the convolutional neural networks include multiple layers, and the multiple layer includes one
A or multiple convolutional layers, which is characterized in that the described method includes:
The feature set of graphs of the n-th convolutional layer output of the convolutional neural networks is obtained, the feature set of graphs includes multiple spies
Sign figure, n-th convolutional layer are any of one or more of convolutional layers, and n is positive integer;
Determine the important feature figure and insignificant characteristic pattern in the feature set of graphs, the important feature figure is to the convolution mind
The influence of output result through network is greater than the insignificant characteristic pattern;
The convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain next layer of input of n-th convolutional layer,
Next layer of input of n-th convolutional layer includes the important feature figure.
2. the method according to claim 1, wherein the important feature figure in the determination feature set of graphs
With insignificant characteristic pattern, comprising:
Calculate the L2 norm of each characteristic pattern of n-th convolutional layer;
If the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that the fisrt feature figure is institute
State insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that the fisrt feature
Figure is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
3. according to the method described in claim 2, it is characterized in that, the first threshold of n-th convolutional layer is n-th convolution
The norm mean value of layer and the product of threshold coefficient, the norm mean value are the L2 norm of all characteristic patterns of n-th convolutional layer
Mean value.
4. according to the method in claim 2 or 3, which is characterized in that important in the determination feature set of graphs
Before characteristic pattern and insignificant characteristic pattern, the method also includes:
Determine whether n-th convolutional layer is that can trim convolutional layer;
If n-th convolutional layer is that can trim convolutional layer, it is determined that important feature figure in the feature set of graphs and insignificant
Characteristic pattern.
5. according to the method described in claim 4, it is characterized in that, the determination n-th convolutional layer whether be can trim roll
Lamination, comprising:
Calculate the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer;
If the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer is greater than second threshold, it is determined that described volume n-th
Lamination can trim convolutional layer described in being.
6. method according to any one of claims 1 to 5, which is characterized in that described to be repaired based on the insignificant characteristic pattern
The convolutional neural networks are cut, next layer of input of n-th convolutional layer is obtained, comprising:
The feature of the insignificant characteristic pattern is all set to 0;The insignificant characteristic pattern and the important feature after 0 being set
Scheme next layer of the input as n-th convolutional layer.
7. method according to any one of claims 1 to 5, which is characterized in that described to be repaired based on the insignificant characteristic pattern
The convolutional neural networks are cut, next layer of input of n-th convolutional layer is obtained, comprising:
Insignificant characteristic pattern is screened out from the feature set of graphs, using the important feature figure as n-th convolutional layer
Next layer of input.
8. method according to any one of claims 1 to 7, which is characterized in that the method also includes:
The training convolutional neural networks, the loss function used during the training convolutional neural networks include L2, and 1
The regularization term of norm, the regularization term are used for the sparse features of learning sample.
9. according to the method described in claim 8, it is characterized in that, the L2, the regularization term of 1 norm includes L2,1 norm with
The product of regularization coefficient, the L2,1 norm are the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm
For the sum of the L2 norm of characteristic pattern of all convolutional layers for inputting the convolutional neural networks after the first sample, described first
Sample is any one sample in all samples.
10. a kind of clipping device of convolutional neural networks, the convolutional neural networks include multiple layers, and the multiple layer includes one
A or multiple convolutional layers, which is characterized in that described device includes:
Computing unit is configured as obtaining the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature
Set of graphs includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer;
Determination unit, the important feature figure and insignificant characteristic pattern being configured to determine that in the feature set of graphs are described important
Influence of the characteristic pattern to the output result of the convolutional neural networks is greater than the insignificant characteristic pattern;
Processing unit is configured as trimming the convolutional neural networks based on the insignificant characteristic pattern, obtains n-th convolution
Next layer of input of layer, next layer of input of n-th convolutional layer includes the important feature figure.
11. device according to claim 10, which is characterized in that the determination unit, comprising:
Computation subunit is configured as calculating the L2 norm of each characteristic pattern of n-th convolutional layer;
Subelement is determined, if the L2 norm for being configured as fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that
The fisrt feature figure is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold,
Then determine that the fisrt feature figure is the important feature figure, the fisrt feature figure is any spy in the feature set of graphs
Sign figure.
12. device according to claim 11, which is characterized in that the first threshold of n-th convolutional layer is described volume n-th
The norm mean value of lamination and the product of threshold coefficient, the norm mean value are the L2 model of all characteristic patterns of n-th convolutional layer
Several mean values.
13. device according to claim 11 or 12, which is characterized in that the determining subelement is additionally configured in institute
Before stating the important feature figure and insignificant characteristic pattern determined in the feature set of graphs, determine n-th convolutional layer whether be
Convolutional layer can be trimmed;If n-th convolutional layer is that can trim convolutional layer, it is determined that the important feature figure in the feature set of graphs
With insignificant characteristic pattern.
14. device according to claim 13, which is characterized in that the computation subunit is additionally configured to described in calculating
The coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer;
The determining subelement, if being configured as the coefficient of variation greater than second threshold, it is determined that n-th convolutional layer is institute
Convolutional layer can be trimmed by stating.
15. device according to any one of claims 10 to 14, which is characterized in that the processing unit, be configured as by
The feature of the insignificant characteristic pattern is all set to 0;The insignificant characteristic pattern and the important feature figure after 0 being set as
Next layer of input of n-th convolutional layer.
16. device according to any one of claims 10 to 14, which is characterized in that the processing unit, be configured as from
Insignificant characteristic pattern is screened out in the feature set of graphs, using the important feature figure as next layer of n-th convolutional layer
Input.
17. device according to any one of claims 10 to 16, which is characterized in that described device further include:
Training unit is configured as training the convolutional neural networks, uses during the training convolutional neural networks
Loss function include L2, the regularization term of 1 norm, the regularization term is used for the sparse features of learning sample.
18. device according to claim 17, which is characterized in that the L2, the regularization term of 1 norm include L2,1 norm
With the product of regularization coefficient, the L2,1 norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 model
Number is the sum of the L2 norm of characteristic pattern of all convolutional layers that inputs the convolutional neural networks after the first samples, described the
One sample is any one sample in all samples.
19. a kind of clipping device of convolutional neural networks, which is characterized in that the clipping device of the convolutional neural networks includes place
Manage device and memory;For the memory for storing software program and module, the processor is by operation or executes storage
Software program and/or module in the memory realizes method as described in any one of claim 1 to 9.
20. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is used for storage processor
Performed program code, said program code include for realizing the finger of method as described in any one of claim 1 to 9
It enables.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910380839.5A CN110232436A (en) | 2019-05-08 | 2019-05-08 | Pruning method, device and the storage medium of convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910380839.5A CN110232436A (en) | 2019-05-08 | 2019-05-08 | Pruning method, device and the storage medium of convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110232436A true CN110232436A (en) | 2019-09-13 |
Family
ID=67861206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910380839.5A Pending CN110232436A (en) | 2019-05-08 | 2019-05-08 | Pruning method, device and the storage medium of convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110232436A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222629A (en) * | 2019-12-31 | 2020-06-02 | 暗物智能科技(广州)有限公司 | Neural network model pruning method and system based on adaptive batch normalization |
CN111930982A (en) * | 2020-07-20 | 2020-11-13 | 南京南瑞信息通信科技有限公司 | Intelligent labeling method for power grid images |
CN112102183A (en) * | 2020-09-02 | 2020-12-18 | 杭州海康威视数字技术股份有限公司 | Sparse processing method, device and equipment |
CN112734036A (en) * | 2021-01-14 | 2021-04-30 | 西安电子科技大学 | Target detection method based on pruning convolutional neural network |
US11003959B1 (en) * | 2019-06-13 | 2021-05-11 | Amazon Technologies, Inc. | Vector norm algorithmic subsystems for improving clustering solutions |
CN115146775A (en) * | 2022-07-04 | 2022-10-04 | 同方威视技术股份有限公司 | Edge device reasoning acceleration method and device and data processing system |
CN116188878A (en) * | 2023-04-25 | 2023-05-30 | 之江实验室 | Image classification method, device and storage medium based on neural network structure fine adjustment |
CN117829241A (en) * | 2024-03-04 | 2024-04-05 | 西北工业大学 | Pruning method of convolutional neural network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528638A (en) * | 2016-01-22 | 2016-04-27 | 沈阳工业大学 | Method for grey correlation analysis method to determine number of hidden layer characteristic graphs of convolutional neural network |
CN106548234A (en) * | 2016-11-17 | 2017-03-29 | 北京图森互联科技有限责任公司 | A kind of neural networks pruning method and device |
CN107944555A (en) * | 2017-12-07 | 2018-04-20 | 广州华多网络科技有限公司 | Method, storage device and the terminal that neutral net is compressed and accelerated |
CN109389043A (en) * | 2018-09-10 | 2019-02-26 | 中国人民解放军陆军工程大学 | A kind of crowd density estimation method of unmanned plane picture |
CN109472352A (en) * | 2018-11-29 | 2019-03-15 | 湘潭大学 | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature |
CN109522949A (en) * | 2018-11-07 | 2019-03-26 | 北京交通大学 | Model of Target Recognition method for building up and device |
CN109598340A (en) * | 2018-11-15 | 2019-04-09 | 北京知道创宇信息技术有限公司 | Method of cutting out, device and the storage medium of convolutional neural networks |
CN109657595A (en) * | 2018-12-12 | 2019-04-19 | 中山大学 | Based on the key feature Region Matching face identification method for stacking hourglass network |
-
2019
- 2019-05-08 CN CN201910380839.5A patent/CN110232436A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528638A (en) * | 2016-01-22 | 2016-04-27 | 沈阳工业大学 | Method for grey correlation analysis method to determine number of hidden layer characteristic graphs of convolutional neural network |
CN106548234A (en) * | 2016-11-17 | 2017-03-29 | 北京图森互联科技有限责任公司 | A kind of neural networks pruning method and device |
CN107944555A (en) * | 2017-12-07 | 2018-04-20 | 广州华多网络科技有限公司 | Method, storage device and the terminal that neutral net is compressed and accelerated |
CN109389043A (en) * | 2018-09-10 | 2019-02-26 | 中国人民解放军陆军工程大学 | A kind of crowd density estimation method of unmanned plane picture |
CN109522949A (en) * | 2018-11-07 | 2019-03-26 | 北京交通大学 | Model of Target Recognition method for building up and device |
CN109598340A (en) * | 2018-11-15 | 2019-04-09 | 北京知道创宇信息技术有限公司 | Method of cutting out, device and the storage medium of convolutional neural networks |
CN109472352A (en) * | 2018-11-29 | 2019-03-15 | 湘潭大学 | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature |
CN109657595A (en) * | 2018-12-12 | 2019-04-19 | 中山大学 | Based on the key feature Region Matching face identification method for stacking hourglass network |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11003959B1 (en) * | 2019-06-13 | 2021-05-11 | Amazon Technologies, Inc. | Vector norm algorithmic subsystems for improving clustering solutions |
CN111222629A (en) * | 2019-12-31 | 2020-06-02 | 暗物智能科技(广州)有限公司 | Neural network model pruning method and system based on adaptive batch normalization |
CN111930982A (en) * | 2020-07-20 | 2020-11-13 | 南京南瑞信息通信科技有限公司 | Intelligent labeling method for power grid images |
CN112102183A (en) * | 2020-09-02 | 2020-12-18 | 杭州海康威视数字技术股份有限公司 | Sparse processing method, device and equipment |
CN112734036A (en) * | 2021-01-14 | 2021-04-30 | 西安电子科技大学 | Target detection method based on pruning convolutional neural network |
CN115146775A (en) * | 2022-07-04 | 2022-10-04 | 同方威视技术股份有限公司 | Edge device reasoning acceleration method and device and data processing system |
CN116188878A (en) * | 2023-04-25 | 2023-05-30 | 之江实验室 | Image classification method, device and storage medium based on neural network structure fine adjustment |
CN117829241A (en) * | 2024-03-04 | 2024-04-05 | 西北工业大学 | Pruning method of convolutional neural network |
CN117829241B (en) * | 2024-03-04 | 2024-06-07 | 西北工业大学 | Pruning method of convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110232436A (en) | Pruning method, device and the storage medium of convolutional neural networks | |
US11907760B2 (en) | Systems and methods of memory allocation for neural networks | |
CN109344921B (en) | A kind of image-recognizing method based on deep neural network model, device and equipment | |
CN108416440A (en) | A kind of training method of neural network, object identification method and device | |
CN109583325A (en) | Face samples pictures mask method, device, computer equipment and storage medium | |
US11776257B2 (en) | Systems and methods for enhancing real-time image recognition | |
CN111950656B (en) | Image recognition model generation method and device, computer equipment and storage medium | |
CN106485316A (en) | Neural network model compression method and device | |
CN106709565A (en) | Optimization method and device for neural network | |
US20220076385A1 (en) | Methods and systems for denoising media using contextual information of the media | |
CN109657582A (en) | Recognition methods, device, computer equipment and the storage medium of face mood | |
CN115600650A (en) | Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium | |
CN113657421B (en) | Convolutional neural network compression method and device, and image classification method and device | |
CN109766800B (en) | Construction method of mobile terminal flower recognition model | |
CN110378305A (en) | Tealeaves disease recognition method, equipment, storage medium and device | |
US20240153271A1 (en) | Method and apparatus for selecting cover of video, computer device, and storage medium | |
CN113469233B (en) | Tobacco leaf automatic grading method and system based on deep learning | |
CN108734264A (en) | Deep neural network model compression method and device, storage medium, terminal | |
CN111428854A (en) | Structure searching method and structure searching device | |
CN110147833A (en) | Facial image processing method, apparatus, system and readable storage medium storing program for executing | |
CN110647974A (en) | Network layer operation method and device in deep neural network | |
CN110197107A (en) | Micro- expression recognition method, device, computer equipment and storage medium | |
WO2019091401A1 (en) | Network model compression method and apparatus for deep neural network, and computer device | |
CN114241234A (en) | Fine-grained image classification method, device, equipment and medium | |
CN110874635A (en) | Deep neural network model compression method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |