CN110232436A - Pruning method, device and the storage medium of convolutional neural networks - Google Patents

Pruning method, device and the storage medium of convolutional neural networks Download PDF

Info

Publication number
CN110232436A
CN110232436A CN201910380839.5A CN201910380839A CN110232436A CN 110232436 A CN110232436 A CN 110232436A CN 201910380839 A CN201910380839 A CN 201910380839A CN 110232436 A CN110232436 A CN 110232436A
Authority
CN
China
Prior art keywords
convolutional
characteristic pattern
norm
layer
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910380839.5A
Other languages
Chinese (zh)
Inventor
刘传建
王云鹤
韩凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910380839.5A priority Critical patent/CN110232436A/en
Publication of CN110232436A publication Critical patent/CN110232436A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

This application discloses a kind of pruning method of convolutional neural networks, device and storage medium, belong to the nerual network technique field in artificial intelligence field in technical field of computer vision.Convolutional neural networks include multiple layers, multiple layers include one or more convolutional layers, this method comprises: obtaining the feature set of graphs of the n-th convolutional layer output of convolutional neural networks, feature set of graphs includes multiple characteristic patterns, n-th convolutional layer is any of one or more convolutional layers, and n is positive integer;Determine the important feature figure and insignificant characteristic pattern in feature set of graphs;Convolutional neural networks are trimmed based on insignificant characteristic pattern, obtain next layer of input of the n-th convolutional layer, next layer of input of the n-th convolutional layer includes important feature figure.By being distinguished to important characteristic pattern and insignificant characteristic pattern, and only reduce resource consumption using important feature figure as next layer of input when executing deep learning task using convolutional neural networks.

Description

Pruning method, device and the storage medium of convolutional neural networks
Technical field
This application involves nerual network technique field, in particular to a kind of pruning method of convolutional neural networks, device and Storage medium.
Background technique
Neural network, which has become, solves computer vision, speech recognition and natural language processing even depth learning tasks More advanced technology.Nevertheless, neural network algorithm is that computation-intensive and storage are intensive, this is difficult to it by portion It affixes one's name in the only equipment of limited hardware resource.
In order to solve this limitation, depth-compression can be used and reduce calculating required for neural network significantly and deposit Storage demand.Such as the convolutional neural networks with full articulamentum, model size can be reduced several times, example by depth-compression Such as tens times.Depth-compression based on convolutional neural networks (Convolutional Neural Networks, CNN) includes such as Under several different technologies: quantization compression, beta pruning, light weight network design and knowledge extract.
Wherein, beta pruning refers to after the completion of convolutional neural networks training, neuron therein or cynapse is cut off, to make Neural network model is obtained to be compressed.But this method neuron cut off or the information of cynapse are permanently lost, and are obtained Model fix.
Summary of the invention
This application provides a kind of pruning method of convolutional neural networks, device and storage mediums, can reduce convolution While the calculation amount of neural network, the loss of information is avoided, while realizing the otherness for different samples.
In a first aspect, at least embodiment of the application provides a kind of pruning method of convolutional neural networks, the volume Product neural network includes multiple layers, and the multiple layer includes one or more convolutional layers, which comprises
The feature set of graphs of the n-th convolutional layer output of the convolutional neural networks is obtained, the feature set of graphs includes more A characteristic pattern, n-th convolutional layer are any of one or more of convolutional layers, and n is positive integer;
Determine the important feature figure and insignificant characteristic pattern in the feature set of graphs, the important feature figure is to the volume The influence of the output result of product neural network is greater than the insignificant characteristic pattern;
The convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain next layer of n-th convolutional layer Input, next layer of input of n-th convolutional layer includes the important feature figure.
In the embodiment of the present application, in the convolutional neural networks course of work, by the characteristic pattern for distinguishing convolutional layer output In important feature figure and insignificant characteristic pattern only export therein and when exporting characteristic pattern to next layer of neural network Important feature figure, rather than important feature figure is smaller on the influence of the output result of convolutional neural networks, so doing so, not only to defeated The influence of result is smaller out or even does not influence output as a result, and can greatly save calculating.Simultaneously as the program is in mind The operation carried out during through Web vector graphic will not have an impact the structure of neural network model, so this will not be lost A little information.Also, due to the application it is confirmed that whether characteristic pattern is important, and for different samples, neural network is produced Raw characteristic pattern is different, therefore for different samples, the important feature figure and insignificant characteristic pattern determined Can be different, so that the characteristic pattern that the program trims different samples is different, being for different samples has Otherness.
Optionally, the important feature figure and insignificant characteristic pattern in the determination feature set of graphs, comprising:
Calculate the L2 norm of each characteristic pattern of n-th convolutional layer;
If the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that the fisrt feature figure For the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that described first Characteristic pattern is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
In this implementation, judging characteristic figure can be carried out by comparing the L2 norm of characteristic pattern and the size of threshold value Importance.Specifically, in convolution operation, the numerical value of each feature is bigger in characteristic pattern, the output to convolutional neural networks As a result influence it is also bigger, it is also more important;And L2 norm is characterized the root mean square of the quadratic sum of each feature in figure, therefore can be with By the size of L2 norm, the importance of each characteristic pattern is judged.
Optionally, the first threshold of n-th convolutional layer is the norm mean value of n-th convolutional layer and multiplying for threshold coefficient Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
In this implementation, the basis using norm mean value as first threshold, by comparing the model of each characteristic pattern Several sizes with mean value carry out the significance level of judging characteristic figure.Meanwhile by the way that threshold coefficient is arranged, allow to adjust as needed The size of the threshold value is saved, to adjust the ratio of the trimming of characteristic pattern.
Optionally, described before the important feature figure and insignificant characteristic pattern in the determination feature set of graphs Determine the important feature figure and insignificant characteristic pattern in the feature set of graphs, further includes:
Determine whether n-th convolutional layer is that can trim convolutional layer;
If n-th convolutional layer is that can trim convolutional layer, it is determined that important feature figure in the feature set of graphs and non- Important feature figure.
In this implementation, before judging whether each characteristic pattern be important characteristic pattern, first judge one layer of characteristic pattern Whether need to trim, avoid generating unreasonable trimming when the significance level of one layer of characteristic pattern is suitable, after improving trimming The accuracy of convolutional neural networks.
Optionally, whether the determination n-th convolutional layer is that can trim convolutional layer, comprising:
Calculate the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer;
If the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer is greater than second threshold, it is determined that described N-th convolutional layer can trim convolutional layer described in being.
Here, judge whether one layer of characteristic pattern needs to trim to realize by the coefficient of variation.The coefficient of variation indicates one group of number According to dispersion degree, the coefficient of variation of the L2 norm of all characteristic patterns of the n-th convolutional layer indicates each characteristic pattern of the n-th convolutional layer L2 norm dispersion degree, by calculate one layer of characteristic pattern L2 norm the coefficient of variation, can determine each characteristic pattern Significance level dispersion, to judge whether need to trim.
Optionally, described that the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer Next layer of input, comprising:
The feature of the insignificant characteristic pattern is all set to 0;The insignificant characteristic pattern after 0 being set and described important Next layer of input of the characteristic pattern as n-th convolutional layer;
Alternatively, described trim the convolutional neural networks based on the insignificant characteristic pattern, n-th convolutional layer is obtained Next layer of input, comprising:
Insignificant characteristic pattern is screened out from the feature set of graphs, using the important feature figure as n-th convolution Next layer of input of layer.
In this implementation, two kinds of modes for trimming insignificant characteristic pattern are provided, one is by insignificant feature Figure sets 0, is then output to next layer;Another kind is important feature figure to be only output to next layer.Two methods are able to achieve The trimming of insignificant characteristic pattern reduces calculation amount.
Optionally, the method also includes:
The training convolutional neural networks, the loss function used during the training convolutional neural networks include L2, the regularization term of 1 norm, the regularization term are used for the sparse features of learning sample.
In this implementation, by the way that L2, the regularization term of 1 norm, Lai Jinhang convolutional Neural are added in loss function The training of network.Due to the effect of the regularization term of L2,1 norm be learning sample sparse features (namely study arrive feature Significance level), so being trained using the loss function for including the regularization term, a sparse neural network can be generated Model, the sparse neural network model can carry out the selection of important feature in subsequent use.
Optionally, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2, and 1 Norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are described after inputting the first sample The sum of L2 norm of characteristic pattern of all convolutional layers of convolutional neural networks, the first sample are any of all samples Sample.
In this implementation, L2, the regularization term of 1 norm are the L2 of each different samples, and the sum of 1 norm makes in this way It obtains and may learn each sample by the convolutional neural networks that the loss function including the regularization term is trained Sparse features.In subsequent use convolutional neural networks, for different samples, the choosing to important feature may be implemented It selects.
An at least embodiment for second aspect, the application provides a kind of clipping device of convolutional neural networks, the volume Product neural network includes multiple layers, and the multiple layer includes one or more convolutional layers, and described device includes:
Computing unit is configured as obtaining the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, described Feature set of graphs includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive Integer;
Determination unit, the important feature figure and insignificant characteristic pattern being configured to determine that in the feature set of graphs are described Influence of the important feature figure to the output result of the convolutional neural networks is greater than the insignificant characteristic pattern;
Processing unit is configured as trimming the convolutional neural networks based on the insignificant characteristic pattern, obtains described n-th Next layer of input of convolutional layer, next layer of input of n-th convolutional layer include the important feature figure.
Optionally, the determination unit, comprising:
Computation subunit is configured as calculating the L2 norm of each characteristic pattern of n-th convolutional layer;
Determine subelement, if the L2 norm for being configured as fisrt feature figure is less than the first threshold of n-th convolutional layer, Determine that the fisrt feature figure is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to first threshold Value, it is determined that the fisrt feature figure is the important feature figure, and the fisrt feature figure is appointing in the feature set of graphs One characteristic pattern.
Optionally, the first threshold of n-th convolutional layer is the norm mean value of n-th convolutional layer and multiplying for threshold coefficient Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
Optionally, the determining subelement, the important feature being additionally configured in the determination feature set of graphs Before figure and insignificant characteristic pattern, determine whether n-th convolutional layer is that can trim convolutional layer;It can if n-th convolutional layer is Trim convolutional layer, it is determined that important feature figure and insignificant characteristic pattern in the feature set of graphs.
Optionally, the computation subunit is additionally configured to calculate the L2 norm of all characteristic patterns of n-th convolutional layer The coefficient of variation;
The determining subelement, if being configured as the coefficient of variation greater than second threshold, it is determined that n-th convolutional layer Convolutional layer is trimmed to be described.
Optionally, the processing unit is configured as the feature of the insignificant characteristic pattern being all set to 0;After 0 being set Next layer of the input as n-th convolutional layer of the insignificant characteristic pattern and the important feature figure;
Alternatively, the processing unit, is configured as screening out insignificant characteristic pattern from the feature set of graphs, it will be described Next layer of input of the important feature figure as n-th convolutional layer.
Optionally, described device further include:
Training unit is configured as training the convolutional neural networks, during the training convolutional neural networks The loss function used includes L2, the regularization term of 1 norm, sparse features of the regularization term for learning sample.
Optionally, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2, and 1 Norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are described after inputting the first sample The sum of L2 norm of characteristic pattern of all convolutional layers of convolutional neural networks, the first sample are any of all samples Sample.
An at least embodiment for the third aspect, the application provides a kind of clipping device of convolutional neural networks, the volume The clipping device of product neural network includes processor and memory;The memory is for storing software program and module, institute Processor is stated by running or executing storage software program in the memory and/or module realizes above-mentioned first aspect Method in any possible embodiment.
Optionally, the processor is one or more, and the memory is one or more.
Optionally, the memory can be integrated with the processor or the memory divides with processor From setting.
During specific implementation, memory can be non-transient (non-transitory) memory, such as read-only Memory (read only memory, ROM), can be integral to the processor on same chip, can also be respectively set On different chips, the embodiment of the present application does not limit the type and memory of memory and the set-up mode of processor It is fixed.
Fourth aspect, at least embodiment of the application provide a kind of computer program (product), the computer journey Sequence (product) includes: computer program code, when the computer program code is run by computer, so that the computer Execute the method in any possible embodiment of above-mentioned first aspect.
5th aspect, at least embodiment of the application provide a kind of computer readable storage medium, the computer For readable storage medium storing program for executing for program code performed by storage processor, said program code includes for realizing above-mentioned first party Method in any possible embodiment in face.
6th aspect, provides a kind of chip, including processor, and processor is described for calling and running from memory The instruction stored in memory, so that the communication equipment for being equipped with the chip executes any possible of above-mentioned first aspect Method in embodiment.
7th aspect, provides another chip, comprising: input interface, output interface, processor and memory, it is described defeated It is connected between incoming interface, output interface, the processor and the memory by internal connecting path, the processor is used In executing the code in the memory, when the code is performed, the processor is for executing above-mentioned first aspect Method in any possible embodiment.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram of convolutional neural networks provided by the embodiments of the present application;
Fig. 2 is the structural schematic diagram of processing equipment provided by the embodiments of the present application;
Fig. 3 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application;
Fig. 4 is the flow chart of the pruning method of another convolutional neural networks provided by the embodiments of the present application;
Fig. 5 is a kind of schematic diagram of characteristic pattern trimming process provided by the embodiments of the present application;
Fig. 6 is the flow chart of the pruning method of another convolutional neural networks provided by the embodiments of the present application;
Fig. 7 is a kind of flow chart of convolutional neural networks training method provided by the embodiments of the present application;
Fig. 8 is a kind of schematic diagram of the training process of convolutional neural networks provided by the embodiments of the present application;
Fig. 9 is a kind of structural schematic diagram of the clipping device of convolutional neural networks provided by the embodiments of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.
To introduce the applied field of the application first convenient for the understanding to technical solution provided by the embodiments of the present application Scape.
It in this scenario include that processing equipment can store and be run in processing equipment in the application scenarios of the application Convolutional neural networks.
Processing equipment involved by the application may include handheld device, mobile unit, wearable device, calculate equipment Or it is connected to the other equipment and cloud equipment of radio modem, terminal (terminal), terminal device (Terminal Equipment), monitoring device, server etc..For convenience of description, in the application, referred to as processing equipment.
Convolutional neural networks include multiple layers, this multiple layer includes one or more convolutional layers, the convolution other than convolutional layer Neural network includes but is not limited to ReLu (amendment linear unit) layer, pond layer and full articulamentum etc..Each convolutional layer includes Multiple filters (or being convolution kernel), each filter is an array, and number therein is referred to as weight or parameter.Convolution The effect of the convolutional layer of neural network is that the filter carried out in convolution operation, such as first layer convolutional layer to input can be with setting Step-length slided on sample, in each sliding position, the array of filter be multiplied with sample data after be added obtain a number All numerical value obtained in sliding process are formed a new array by value, which is referred to as characteristic pattern (feature Map), a feature of each numerical value in characteristic pattern i.e. this feature figure, multiple filters of each convolutional layer pass through volume Product obtains multiple characteristic patterns, this multiple characteristic pattern constitutes one layer of characteristic pattern.
Fig. 1 is a kind of structural schematic diagram of convolutional neural networks.Referring to Fig. 1, which includes being arranged alternately Convolutional layer and pond layer, convolutional layer 1 as shown in Figure 1, pond layer 1, convolutional layer 2, pond layer 2 ..., and be connected to most Full articulamentum 1, full articulamentum 2 on the layer of the latter pond and softmax layers, softmax layers of output is convolutional Neural net The output of network.
Fig. 2 is a kind of possible hardware structural diagram of aforementioned processing equipment.As shown in Fig. 2, processing equipment includes place Manage device 10, memory 20 and communication interface 30.It will be understood by those skilled in the art that structure shown in Figure 2 is not constituted Restriction to the processing equipment may include perhaps combining certain components or different than illustrating more or fewer components Component layout.Wherein:
Processor 10 is the control centre of processing equipment, utilizes each of various interfaces and the entire processing equipment of connection Part by running or execute the software program and/or module that are stored in memory 20, and calls and is stored in memory 20 Interior data execute the various functions and processing data of processing equipment, to carry out whole control to processing equipment.Processor 10 Can be CPU, can also be other general processors, digital signal processor (digital signal processing, DSP), specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate Array (field-programmable gate array, FPGA) either other programmable logic device, discrete gate or crystalline substance Body pipe logical device, discrete hardware components etc..General processor can be microprocessor either any conventional processor etc.. It is worth noting that processor, which can be, supports advanced reduced instruction set machine (advanced RISC machines, ARM) frame The processor of structure.
Memory 20 can be used for storing software program and module.Processor 10 is stored in the soft of memory 20 by operation Part program and module, thereby executing various function application and data processing.Memory 20 can mainly include storing program area The storage data area and, wherein storing program area can storage program area 21, acquisition module 22, determining module 23, processing module 24 With application program 25 needed for one or more functions (such as importance of judging characteristic figure etc.) etc.;Storage data area can store Created data (such as characteristic pattern etc. of convolutional layer output) etc. are used according to UE or destination server.The memory 20 It can be volatile memory or nonvolatile memory, or may include both volatile and non-volatile memories.Wherein, non- Volatile memory can be read-only memory (read-only memory, ROM), programmable read only memory (programmable ROM, PROM), Erasable Programmable Read Only Memory EPROM (erasable PROM, EPROM), electric erasable Programmable read only memory (electrically EPROM, EEPROM) or flash memory.Volatile memory can be arbitrary access Memory (random access memory, RAM) is used as External Cache.By exemplary but be not restricted theory Bright, the RAM of many forms is available.For example, static random access memory (static RAM, SRAM), dynamic randon access are deposited Reservoir (dynamic random access memory, DRAM), Synchronous Dynamic Random Access Memory (synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (double data date SDRAM, DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (enhanced SDRAM, ESDRAM), synchronized links dynamic random are deposited Access to memory (synchlink DRAM, SLDRAM) and direct rambus random access memory (direct rambus RAM, DR RAM).Correspondingly, memory 20 can also include Memory Controller, to provide processor 10 to memory 20 Access.
Wherein, processor 20 obtains module 22 by operation and executes following functions: obtaining the n-th of the convolutional neural networks Convolutional layer output feature set of graphs, the feature set of graphs includes multiple characteristic patterns, n-th convolutional layer be it is one or Any of multiple convolutional layers, n are positive integer;Processor 20 executes following functions by operation determining module 23: determining institute The important feature figure and insignificant characteristic pattern in feature set of graphs are stated, the important feature figure is to the defeated of the convolutional neural networks The influence of result is greater than the insignificant characteristic pattern out;Processor 20 executes following functions by operation processing module 24: being based on The insignificant characteristic pattern trims the convolutional neural networks, obtains next layer of input of n-th convolutional layer, and described n-th Next layer of input of convolutional layer includes the important feature figure.
The embodiment of the present application also provides a kind of chip, including processor, processor is for calling and transporting from memory The instruction stored in line storage, so that the communication equipment for being equipped with chip executes any convolutional Neural provided by the present application The method of the trimming of network.
The embodiment of the present application also provides a kind of chips, comprising: input interface, output interface, processor and memory, it is defeated It is connected between incoming interface, output interface, processor and memory by internal connecting path, processor is for executing memory In code, when code is performed, method that processor is used to execute the trimming of any of the above-described kind of convolutional neural networks.
It should be understood that above-mentioned processor can be CPU, can also be other general processors, DSP, ASIC, FPGA or Person other programmable logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be Microprocessor either any conventional processor etc..It is worth noting that processor can be the processing for supporting ARM framework Device.
Further, in an alternative embodiment, above-mentioned processor is one or more, and memory is one or more It is a.Optionally, memory can be integral to the processor together or memory and processor it is separately positioned.Above-mentioned memory It may include read-only memory and random access memory, and provide instruction and data to processor.Memory can also include Nonvolatile RAM.For example, memory can be with the information of storage device type.
The memory can be volatile memory or nonvolatile memory, or may include that volatile and non-volatile is deposited Both reservoirs.Wherein, nonvolatile memory can be ROM, PROM, EPROM, EEPROM or flash memory.Volatile memory can To be RAM, it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available.For example, SRAM, DRAM, SDRAM, DDR SDRAM, ESDRAM, SLDRAM and DR RAM.
This application provides a kind of computer programs, when computer program is computer-executed, can make processor Or computer executes corresponding each step in the embodiment of the method for the trimming of any convolutional neural networks provided by the present application Rapid and/or process.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process described herein or function.The computer can be general purpose computer, special purpose computer, meter Calculation machine network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or It is transmitted from a computer readable storage medium to another computer readable storage medium, for example, the computer instruction can To pass through wired (such as coaxial cable, optical fiber, digital subscriber from a web-site, computer, server or data center Line) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or data center into Row transmission.The computer readable storage medium can be any usable medium or include one that computer can access Or the data storage devices such as integrated server, data center of multiple usable mediums.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk) etc..
Fig. 3 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application.This method can be with It is executed by the processing equipment in aforementioned applications scene, as shown in figure 3, this method comprises the following steps.
Step S31: the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature atlas are obtained Closing includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer.
As previously mentioned, the convolutional layer of convolutional neural networks obtains characteristic pattern by convolution operation.Convolutional neural networks include One or more convolutional layers, when convolutional neural networks include multiple convolutional layers, the n-th convolutional layer here can be convolutional Neural Any layer of network.Also, the pruning method of convolutional neural networks provided by the embodiments of the present application, both can be to convolutional Neural net The part convolutional layer of network executes, and can also execute to whole convolutional layers of convolutional neural networks.Wherein, part convolutional layer is held When row, this part convolutional layer is the arbitrary portion convolutional layer in convolutional neural networks, these convolutional layers can be adjacent convolution Layer, is also possible to spaced convolutional layer.
Step S32: the important feature figure and insignificant characteristic pattern in the feature set of graphs are determined.
In the embodiment of the present application, characteristic pattern is divided into important feature figure and insignificant characteristic pattern, for distinguishing each spy Levy the significance level of figure.Important feature figure and insignificant characteristic pattern be it is opposite, the significance level of important feature figure is greater than non-heavy Want characteristic pattern.Important feature figure refer to convolutional neural networks execute deep learning task when act on it is larger (to result accuracy rate It is affected), namely to the characteristic pattern that the output result of convolutional neural networks is affected, rather than important feature figure refers to and is rolling up Effect is smaller when product neural network executes deep learning task (influencing on result accuracy rate smaller), namely to convolutional neural networks Output result influence lesser characteristic pattern.Effect of the important feature figure when convolutional neural networks execute deep learning task is big In insignificant characteristic pattern, influence of the important feature figure to the output result of convolutional neural networks is greater than insignificant characteristic pattern.
Wherein, deep learning task refers to use convolutional neural networks with the target that sets to data (sample) at Reason, goal include but is not limited to classification, target detection etc., and data include but is not limited to the number such as text, image, voice According to.The deep learning task that convolutional neural networks provided by the present application can execute includes but is not limited to that depth model is used to calculate Various tasks under scene, for example, Computer Vision Task, voice recognition tasks and natural language processing task dispatching, these Business can specifically include image or text classification, target detection, semantic segmentation etc..Convolutional neural networks are executing above-mentioned depth When habit task, output result namely with the target of setting to data (sample) handled as a result, for example target detection appoint Target position when business, the image category etc. when image classification task.
Illustratively, when carrying out target detection using convolutional neural networks, important feature figure refers in convolutional Neural net Network acted on when target detection it is larger, to the characteristic pattern that is affected of object detection results of convolutional neural networks output, and Insignificant characteristic pattern refers to that effect is smaller when convolutional neural networks carry out target detection, to the target of convolutional neural networks output Testing result influences lesser characteristic pattern.
Wherein, the sample itself that the importance of characteristic pattern depends on input will be solved the problems, such as with deep learning task.Cause This important feature figure is also possible to have and deep learning either the characteristic pattern of the feature of sample data itself can be presented The characteristic pattern of the high feature of the target correlation of task.
In the embodiment of the present application, when user uses mobile phone photograph, face can be automatically grabbed by target detection, moved The targets such as object can help Mobile phone automatic focusing, beautification etc..But since the processing capacity of mobile phone is relatively weak, using this Shen Scheme please carrys out the process of performance objective detection, and the calculation amount of convolutional neural networks when can greatly reduce target detection can Mobile phone products quality is promoted, brings better user experience to user.
Step S33: the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer Next layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Since effect is smaller when convolutional neural networks execute deep learning task for insignificant characteristic pattern, so in subsequent meter Give up in calculation these characteristic patterns to output result influence it is smaller in addition do not influence output as a result, and in subsequent calculating process In, it no longer needs to that computing resource is spent to handle these characteristic patterns being rejected, calculating can be greatlyd save.
For example, next layer of the n-th convolutional layer is pond layer, then the important feature figure in the characteristic pattern of the n-th convolutional layer is made For the input of the pond layer, pond layer completes pondization operation according only to the important feature figure of the n-th convolutional layer.
In addition, the pruning method of convolutional neural networks provided by the present application can be in the convolutional neural networks after depth-compression It is carried out on the convolutional neural networks not after depth-compression, such as can be to carrying out the convolutional Neural after cut operator Network carries out the pruning method of above-mentioned convolutional neural networks, to further decrease calculation amount.
In the embodiment of the present application, step S31- step S33 can be executed by aforementioned processing equipment, and processing equipment is in convolution In the work process of neural network, the characteristic pattern that control convolutional layer is exported to next layer, to control the meter of convolutional neural networks Calculation amount.The result for executing deep learning task is eventually exported using the convolutional neural networks after the trimming of above method step.
In the embodiment of the present application, in the convolutional neural networks course of work, by the characteristic pattern for distinguishing convolutional layer output In important feature figure and insignificant characteristic pattern only export therein and when exporting characteristic pattern to next layer of neural network Important feature figure, rather than important feature figure is smaller on the influence of the output result of convolutional neural networks, so doing so, not only to defeated The influence of result is smaller out or even does not influence output as a result, and can greatly save calculating.Simultaneously as the program is in mind The operation carried out during through Web vector graphic will not have an impact the structure of neural network model, so this will not be lost A little information.Also, due to the application it is confirmed that whether characteristic pattern is important, and for different samples, neural network is produced Raw characteristic pattern is different, therefore for different samples, the important feature figure and insignificant characteristic pattern determined Can be different, so that the characteristic pattern that the program trims different samples is different, being for different samples has Otherness.
Fig. 4 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application.This method can be with It is executed by the processing equipment in aforementioned applications scene, as shown in figure 4, this method comprises the following steps.
Step S41: the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature atlas are obtained Closing includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer.
Step S42: the L2 norm of each characteristic pattern of n-th convolutional layer is calculated.
The L2 norm of characteristic pattern is square root sum square of each feature in characteristic pattern.The application uses each feature The L2 norm of figure measures the significance level of each characteristic pattern as statistic.In convolution operation, each feature in characteristic pattern Numerical value it is larger, the output results of convolutional neural networks is influenced it is also larger, it is also more important, and L2 norm is these features Quadratic sum root mean square, therefore the importance of each characteristic pattern can be judged by the size of L2 norm.
In this step, the calculation formula of L2 norm is as follows:
In formula (1), m indicates that m-th of sample, n indicate that the n-th convolutional layer, cn indicate the cn feature of the n-th convolutional layer Figure.When what is indicated is that convolutional neural networks input m-th of sample, the L2 norm of the cn characteristic pattern of the n-th convolutional layer. Hn and Wn is respectively the height and width of the n-th convolutional layer characteristic pattern, and (hn, ω n) indicates the position of the feature in characteristic pattern, and F is Feature (feature),When what is indicated is that convolutional neural networks input m-th of sample, the cn of the n-th convolutional layer is special Levy the feature that position in figure is (hn, ω n).
Calculate the L2 norm of each characteristic pattern in the n-th convolutional layer one by one by the formula.
Step S43: if the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that described the One characteristic pattern is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that The fisrt feature figure is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs. Wherein, influence of the important feature figure to the output result of the convolutional neural networks is greater than the insignificant characteristic pattern.
That is, judging the size of its L2 norm and first threshold respectively for each characteristic pattern, each feature is determined Whether figure is insignificant characteristic pattern.
Since the characteristic dimension of convolutional layer each in convolutional neural networks is different, so the size of the first threshold of different layers It is different.First threshold can be designed according to the mean value of the L2 norm of each convolutional layer.Wherein, characteristic dimension refers to the big of feature It is small, for example, a convolutional layer feature sizes between 100-200, the feature sizes of another convolutional layer between 0.1-0.2, Then the characteristic dimension of the two convolutional layers is different.
Illustratively, the first threshold of n-th convolutional layer is the norm mean value and threshold coefficient of n-th convolutional layer Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
Wherein, the threshold coefficient is greater than or equal to 0 and less than 2.Here, the value of threshold coefficient be [0,2).Because In first threshold, the part that is multiplied with the threshold coefficient is the mean value of L2 norm, and twice of the mean value of L2 norm almost etc. In L2 norm maximum value.So threshold coefficient it is selected [0,2) between value, may be implemented for all characteristic patterns to be divided into important Partial Feature figure is divided into important feature figure, all characteristic patterns is divided into the various situations such as insignificant characteristic pattern by characteristic pattern.
Specifically, threshold coefficient here can in above-mentioned value range on-demand value, the value of threshold coefficient is bigger, draw The insignificant characteristic pattern got is more, and accuracy rate is lower, and calculation amount is smaller;The value of threshold coefficient is smaller, and what is divided is non- Important feature figure is fewer, and accuracy rate is higher, and calculation amount is bigger.
Wherein, first threshold can be calculated using following formula:
In formula (2), β indicates threshold coefficient, μnIndicate the mean value of the L2 norm of the n-th convolutional layer, Cn indicates the n-th convolution The sum of the characteristic pattern of layer.
Step S44: the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer Next layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Here, carrying out trimming to convolutional neural networks includes structuring trimming and unstructured trimming.What the application used It is structuring trim mode, wherein structuring trimming, which refers to, directly removes a characteristic pattern, an a layer either convolution kernel Deng being conducive to speeding up to realize on hardware.
Structuring trim mode provided by the present application includes the following two kinds, the purpose of both modes be remove it is insignificant Characteristic pattern:
The first, which may include: that the feature of the insignificant characteristic pattern is all set to 0;Described in after 0 being set Next layer of the input of insignificant characteristic pattern and the important feature figure as n-th convolutional layer.
Second, which may include: to screen out insignificant characteristic pattern from the feature set of graphs, will be described important Next layer of input of the characteristic pattern as n-th convolutional layer.
The second way namely controlling which characteristic pattern is transferred to next layer, which characteristic pattern is not transmitted to next layer, The application controls important feature figure and is transferred to next layer, and insignificant characteristic pattern is not transmitted to next layer.For example, can will be insignificant The weight of the corresponding connection of characteristic pattern is set to 0, so that the insignificant characteristic pattern will not be output to the next of convolutional neural networks Layer.
Here next layer of convolutional neural networks refers to that next layer of the n-th convolutional layer in neural network, the layer can be ReLu layers or pond layer etc., are also possible to convolutional layer.
Fig. 5 is a kind of schematic diagram of characteristic pattern trimming process provided by the embodiments of the present application, indicates volume n-th referring to Fig. 5, Fn The characteristic pattern of lamination, the n-th convolutional layer include that multiple characteristic pattern Fn determine important feature after calculating L2 norm (Norm) Figure and insignificant characteristic pattern.Insignificant characteristic pattern is trimmed, such as wherein will set 0 by insignificant characteristic pattern, namely after trimming White square, important feature figure be trim after the square with shade, characteristic pattern at this time is exported to next layer of Lx, Lx It can be convolutional layer, ReLu, pond layer etc..
In the embodiment of the present application, step S41-S44 can be executed for each convolutional layer of convolutional neural networks The step of.This implementation is briefly described below with reference to Fig. 1: after 1 input sample of convolutional layer, convolutional layer 1 is to this Sample carries out convolution operation, obtains the feature set of graphs of convolutional layer 1;Step S41-S44 is executed to the feature set of graphs of convolutional layer 1 Method and step;Then Chi Huacao is carried out by the important feature figure of 1 pair of the pond layer convolutional layer 1 by step S41-S44 output Make, and result is exported to convolutional layer 2;Convolutional layer 2 carries out convolution layer operation for the result of pond layer 1, obtains convolutional layer 2 Feature set of graphs;The method and step of step S41-S44 is executed to the feature set of graphs of convolutional layer 2;Then passed through by 2 Duis of pond layer The important feature figure of the convolutional layer 2 of step S41-S44 output carries out pondization operation, and result is exported to convolutional layer 3;Class according to this It pushes away, until the characteristic pattern that all convolutional layers export passes through above-mentioned trimming.After the completion of all convolutional layers and pond layer are handled, Result is exported and is handled to full articulamentum 1, full articulamentum 2 and softmax layers, and finally by the softmax layers of output volume The output result of product neural network.For example, the convolutional neural networks, when executing Face datection, output result is that face is being schemed As upper position;For another example, for the convolutional neural networks when executing picture classification, output result is picture classification result.
Below by taking the pruning method of convolutional neural networks shown in Fig. 4 as an example, to the calculation amount of method provided by the present application It is illustrated.
The calculation amount of (n+1)th convolutional layer convolutional calculation in the convolution of standard are as follows: Cn × Cn+1 × Hn × Wn × k × k;Its In, Cn is the quantity of the characteristic pattern of the n-th convolutional layer output namely the input quantity of the (n+1)th convolutional layer;Hn × Wn is the n-th convolution The size of the characteristic pattern of layer output;Cn+1 is the characteristic pattern quantity of the output of the (n+1)th convolutional layer;K × k is convolution kernel (filter) Size.
And include two parts using the calculation amount of convolutional calculation after method shown in Fig. 4:
The calculation amount of the L2 norm of n-th convolution layer by layer: Cn × Hn × Wn;
The calculation amount of the (n+1)th convolutional layer convolutional calculation after trimming C insignificant characteristic patterns:
Wherein, Cn-C For the quantity of the characteristic pattern of the n-th convolutional layer output namely the input quantity of the (n+1)th convolutional layer;
Then using the amount of calculation after method shown in Fig. 4 are as follows: Cn × Hn × Wn+ (Cn-C) × Cn+1 × Hn × Wn × k × K, compared to the computation amount of the convolutional calculation of standard.
Fig. 6 is a kind of flow chart of the pruning method of convolutional neural networks provided by the embodiments of the present application, as shown in fig. 6, Compared with the method that Fig. 4 is provided, main distinction point is this method, determines the mode of important feature figure and insignificant characteristic pattern not Together, this method comprises the following steps:
Step S51: the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature atlas are obtained Closing includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer.
Specifically, step S51 can be identical as step S41.
Step S52: the L2 norm of each characteristic pattern of n-th convolutional layer is calculated.
Specifically, step S52 can be identical as step S42.
Step S53: determine whether n-th convolutional layer is that can trim convolutional layer.
Specifically, which may include:
The first step calculates the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer.
The coefficient of variation refers to the ratio between mean square deviation and average value of multiple data, is for analyzing data discrete degree size Amount.For the L2 norm of all characteristic patterns of convolutional layer, these characteristic patterns can be analyzed by calculating the coefficient of variation The dispersion degree of L2 norm.If the coefficient of variation is larger, illustrate that the dispersion degree of the L2 norm of characteristic pattern is larger, characteristic pattern Importance is more dispersed, different characteristic figure acted in the task of subsequent execution convolutional neural networks it is of different sizes, at this point, then can be with Calculation amount is reduced by trimming insignificant characteristic pattern.And if the coefficient of variation is smaller, illustrate the L2 norm of characteristic pattern Dispersion degree it is smaller, the importance of characteristic pattern is more intensive, and different characteristic figure is in the task of subsequent execution convolutional neural networks Sizableness is acted on, is difficult to be trimmed under the premise of ensuring accuracy rate at this time.
Wherein, the calculation formula of the coefficient of variation of the L2 norm of all characteristic patterns of the n-th convolutional layer is as follows:
In formula (3), CVnIndicate the coefficient of variation of the L2 norm of all characteristic patterns of the n-th convolutional layer.
Second step, if the coefficient of variation is greater than second threshold, it is determined that n-th convolutional layer can trim convolution to be described Layer.
Due to the coefficient of variation indicate be data dispersion degree, it is unrelated with the characteristic dimension of each layer convolutional layer, therefore not The same second threshold α can be used with convolutional layer.
The value range of second threshold α can for [0,2).Here second threshold α, which is also one, can according to need Carry out value threshold value, pass through the value of condition α, thus it is possible to vary the insignificant characteristic pattern finally trimmed number.For example, the The value of two threshold alphas is smaller, and convolutional layer is more for trimming of determining, and the insignificant characteristic pattern finally trimmed may be more; The value of second threshold α is bigger, and the insignificant characteristic pattern finally trimmed may be fewer, the insignificant characteristic pattern finally trimmed It may be fewer.
If the coefficient of variation is less than or equal to the second threshold, it is determined that n-th convolutional layer is not described repairs Cut convolutional layer.
If n-th convolutional layer is not that can trim convolutional layer, the characteristic pattern of this layer of convolutional layer is not trimmed, at this time Step S54 is not executed, directly exports all characteristic patterns of n-th convolutional layer to next layer of convolutional neural networks, as Next layer of input of convolutional neural networks.
Step S54: if the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that described the One characteristic pattern is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that The fisrt feature figure is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
Specifically, step S54 can be identical as step S43.
Step S55: the convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain n-th convolutional layer Next layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Specifically, step S55 can be identical as step S44.
Fig. 7 is a kind of flow chart of convolutional neural networks training method provided by the embodiments of the present application, as shown in fig. 7, should Method can execute before the pruning method of the convolutional neural networks provided by any width of Fig. 3-Fig. 6, and this method includes as follows Step:
Step S61: training sample is obtained.
Here training sample includes multiple samples.
Step S62: using convolutional neural networks described in the sample training, in the process of the training convolutional neural networks Used in loss function include L2, the regularization term of 1 norm, the regularization term is used for the sparse features of learning sample.
The training method of convolutional neural networks provided by the present application is identical as conventional training method, and change is only to lose Function, wherein conventional training method can be back-propagation algorithm etc..
Wherein, regularization term is the penalty term of loss function, can be to the weight of neural network by the way that regularization term is added It is constrained.
Wherein, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2,1 model Number is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are to input the volume after the first sample The sum of the L2 norm of characteristic pattern of all convolutional layers of product neural network, the first sample are any one sample in all samples This.
The L1 norm of the L2 norm of all characteristic patterns of the L2,1 norm namely convolutional neural networks of first sample.L1 norm The effect of regularization L1 regularization is to can produce sparse weight matrix, that is, generates a sparse model, is used for feature selecting.? That is, by the regularization term that the L1 norm is added in the loss function of convolutional neural networks so that the convolutional neural networks at For a sparse model.It is subsequent using the product neural network when, can by execute method shown in any width of Fig. 3-Fig. 6 into Row feature selecting.
In the embodiment of the present application, the formula of the loss function can be expressed as follows:
In formula (4), Ltask is loss function, and for different tasks, Ltask has a different forms, such as For image classification, Ltask is cross entropy loss function;γ R (ω) is conventional regularization term, and γ is custom parameter, R (ω) is usually L2 norm, can prevent over-fitting;For the L2 that the application increases newly, the regularization term of 1 norm, λ is positive Then change coefficient,For L2,1 norm, wherein NmFor the L2 of m-th of sample, 1 norm, M is total sample number.
Wherein,Wherein, N is total number of plies of convolutional layer.For the L2 norm of a characteristic pattern, Specific calculation may refer to step S42.
Fig. 8 is a kind of schematic diagram of the training process of convolutional neural networks provided by the embodiments of the present application.Referring to Fig. 8, When training, calculate the L2 norm of each characteristic pattern of each convolutional layer, in figure Fn and Fn+1 respectively indicate the n-th convolutional layer and n-th+ The characteristic pattern of 1 convolutional layer, each layer of convolutional layer include multiple characteristic pattern Fn or Fn+1.L2 norm calculation L1 model based on each layer Number (L2,1 norm), is added L2, the regularization term of 1 norm completes the training of convolutional neural networks in loss function Loss.Figure Lx can be convolutional layer, ReLu, pond layer etc. in 8.
It is illustrated below with reference to accuracy rate of the 1~table of table 4 to method provided by the present application: such as table 1, table 2, table 3 and table 4 It is shown, using the pruning method of convolutional neural networks provided by the present application on CIFAR-10 data set, guaranteeing accuracy rate base In the case that this does not decline, the characteristic pattern of higher proportion still can be trimmed.Even in compressed network and lightweight On network, this method is equally effective.
Table 1 using random beta pruning (rand) and presses minimum value beta pruning (min) on trained convolutional neural networks VGG16 When accuracy rate
In table 1, pr (10%, 20%, 30%) respectively indicates random beta pruning or the beta pruning ratio in minimum value beta pruning.λ (0,1e-6,1e-7,1e-8) is the L2 increased newly in loss function, and the regularization coefficient of the regularization term of 1 norm is right when λ=0 Obtained convolutional neural networks should routinely be trained.The numerical value (such as 0.673) of three decimals in table indicates accuracy rate.It can see It is quasi- when out, using the trained obtained random beta pruning of convolutional neural networks model progress of the present processes or by minimum value beta pruning True rate is better than the convolutional neural networks that conventional training obtains.
The trimming rate and accuracy rate that table 2 is trimmed on VGG16 using the application method
In table 2, thresh (0.5,0.5,0.5,1.0,1.0,1.0) respectively indicates in method provided by the present application Two threshold alphas and threshold coefficient β.λ (0,1e-6,1e-7,1e-8) is the regularization term of the L2,1 norm increased newly in loss function Regularization coefficient, the numerical value (such as 0.469) of corresponding three decimals of pr indicates trimming rate in table, corresponding three decimals of acc Numerical value (such as 0.934) indicates accuracy rate.For example, accuracy rate is up to 0.934 when trimming rate is 46.9%.It can be seen that this Shen The method that please be provided still is able to maintain higher accuracy rate under higher trimming rate.In addition, when λ=0, matching convention instruction The convolutional neural networks got, it can be seen that in the convolutional Neural net obtained using method provided by the present application to conventional training When network is trimmed, accuracy rate is also superior to random beta pruning or presses minimum value beta pruning.
Table 3 uses the trimming rate and accuracy rate trimmed on the application method VGG16 after being compressed
The trimming rate and accuracy rate that table 4 is trimmed on MobileNet-V2 using the application method, MobileNet- V2 is the convolutional neural networks of lightweight
Fig. 9 is a kind of block diagram of feature extraction provided by the embodiments of the present application.The clipping device of the convolutional neural networks can To pass through all or part of software, hardware or both being implemented in combination with as processing equipment.The convolutional neural networks Clipping device may include: computing unit 701, determination unit 702 and processing unit 703.
Wherein, computing unit 701 is configured as obtaining the feature atlas of the n-th convolutional layer output of the convolutional neural networks It closes, the feature set of graphs includes multiple characteristic patterns, and n-th convolutional layer is any in one or more of convolutional layers A, n is positive integer;Determination unit 702 is configured to determine that important feature figure and insignificant feature in the feature set of graphs Figure, influence of the important feature figure to the output result of the convolutional neural networks are greater than the insignificant characteristic pattern;Processing Unit 703 is configured as trimming the convolutional neural networks based on the insignificant characteristic pattern, obtains under n-th convolutional layer One layer of input, next layer of input of n-th convolutional layer include the important feature figure.
Optionally, the determination unit 702, comprising:
Computation subunit 721 is configured as calculating the L2 norm of each characteristic pattern of n-th convolutional layer;
Subelement 722 is determined, if the L2 norm for being configured as fisrt feature figure is less than the first threshold of n-th convolutional layer Value, it is determined that the fisrt feature figure is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to described First threshold, it is determined that the fisrt feature figure is the important feature figure, and the fisrt feature figure is the feature set of graphs In any feature figure.
Optionally, the first threshold of n-th convolutional layer is the norm mean value of n-th convolutional layer and multiplying for threshold coefficient Product, the norm mean value are the mean value of the L2 norm of all characteristic patterns of n-th convolutional layer.
Optionally, the determining subelement 722, the important spy being additionally configured in the determination feature set of graphs Before sign figure and insignificant characteristic pattern, determine whether n-th convolutional layer is that can trim convolutional layer;If n-th convolutional layer is Convolutional layer can be trimmed, it is determined that important feature figure and insignificant characteristic pattern in the feature set of graphs.
Optionally, the computation subunit 721 is additionally configured to calculate the L2 of all characteristic patterns of n-th convolutional layer The coefficient of variation of norm;
The determining subelement 722, if being configured as the coefficient of variation greater than second threshold, it is determined that described volume n-th Lamination can trim convolutional layer described in being.
Optionally, the processing unit 703 is configured as the feature of the insignificant characteristic pattern being all set to 0;It will set Next layer of the input of the insignificant characteristic pattern and the important feature figure as n-th convolutional layer after 0;
Alternatively, the processing unit 703, is configured as screening out insignificant characteristic pattern from the feature set of graphs, it will Next layer of input of the important feature figure as n-th convolutional layer.
Optionally, the device further include:
Training unit 704 is configured as training the convolutional neural networks, in the process of the training convolutional neural networks Used in loss function include L2, the regularization term of 1 norm, the regularization term is used for the sparse features of learning sample.
Optionally, the L2, the regularization term of 1 norm include L2, the product of 1 norm and regularization coefficient, the L2, and 1 Norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm are described after inputting the first sample The sum of L2 norm of characteristic pattern of all convolutional layers of convolutional neural networks, the first sample are any of all samples Sample.
It should be understood that the clipping device of convolutional neural networks provided by the above embodiment is when being trimmed, only with The division progress of above-mentioned each functional unit can according to need and for example, in practical application by above-mentioned function distribution by not Same functional unit is completed, i.e., the internal structure of equipment is divided into different functional units, to complete whole described above Or partial function.In addition, the trimming of the clipping device and convolutional neural networks of convolutional neural networks provided by the above embodiment Embodiment of the method belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely the protection scopes of the alternative embodiment of the application, but the application to be not limited thereto, any ripe Know those skilled in the art within the technical scope of the present application, any changes or substitutions that can be easily thought of, should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.

Claims (20)

1. a kind of pruning method of convolutional neural networks, the convolutional neural networks include multiple layers, and the multiple layer includes one A or multiple convolutional layers, which is characterized in that the described method includes:
The feature set of graphs of the n-th convolutional layer output of the convolutional neural networks is obtained, the feature set of graphs includes multiple spies Sign figure, n-th convolutional layer are any of one or more of convolutional layers, and n is positive integer;
Determine the important feature figure and insignificant characteristic pattern in the feature set of graphs, the important feature figure is to the convolution mind The influence of output result through network is greater than the insignificant characteristic pattern;
The convolutional neural networks are trimmed based on the insignificant characteristic pattern, obtain next layer of input of n-th convolutional layer, Next layer of input of n-th convolutional layer includes the important feature figure.
2. the method according to claim 1, wherein the important feature figure in the determination feature set of graphs With insignificant characteristic pattern, comprising:
Calculate the L2 norm of each characteristic pattern of n-th convolutional layer;
If the L2 norm of fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that the fisrt feature figure is institute State insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, it is determined that the fisrt feature Figure is the important feature figure, and the fisrt feature figure is any feature figure in the feature set of graphs.
3. according to the method described in claim 2, it is characterized in that, the first threshold of n-th convolutional layer is n-th convolution The norm mean value of layer and the product of threshold coefficient, the norm mean value are the L2 norm of all characteristic patterns of n-th convolutional layer Mean value.
4. according to the method in claim 2 or 3, which is characterized in that important in the determination feature set of graphs Before characteristic pattern and insignificant characteristic pattern, the method also includes:
Determine whether n-th convolutional layer is that can trim convolutional layer;
If n-th convolutional layer is that can trim convolutional layer, it is determined that important feature figure in the feature set of graphs and insignificant Characteristic pattern.
5. according to the method described in claim 4, it is characterized in that, the determination n-th convolutional layer whether be can trim roll Lamination, comprising:
Calculate the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer;
If the coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer is greater than second threshold, it is determined that described volume n-th Lamination can trim convolutional layer described in being.
6. method according to any one of claims 1 to 5, which is characterized in that described to be repaired based on the insignificant characteristic pattern The convolutional neural networks are cut, next layer of input of n-th convolutional layer is obtained, comprising:
The feature of the insignificant characteristic pattern is all set to 0;The insignificant characteristic pattern and the important feature after 0 being set Scheme next layer of the input as n-th convolutional layer.
7. method according to any one of claims 1 to 5, which is characterized in that described to be repaired based on the insignificant characteristic pattern The convolutional neural networks are cut, next layer of input of n-th convolutional layer is obtained, comprising:
Insignificant characteristic pattern is screened out from the feature set of graphs, using the important feature figure as n-th convolutional layer Next layer of input.
8. method according to any one of claims 1 to 7, which is characterized in that the method also includes:
The training convolutional neural networks, the loss function used during the training convolutional neural networks include L2, and 1 The regularization term of norm, the regularization term are used for the sparse features of learning sample.
9. according to the method described in claim 8, it is characterized in that, the L2, the regularization term of 1 norm includes L2,1 norm with The product of regularization coefficient, the L2,1 norm are the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 norm For the sum of the L2 norm of characteristic pattern of all convolutional layers for inputting the convolutional neural networks after the first sample, described first Sample is any one sample in all samples.
10. a kind of clipping device of convolutional neural networks, the convolutional neural networks include multiple layers, and the multiple layer includes one A or multiple convolutional layers, which is characterized in that described device includes:
Computing unit is configured as obtaining the feature set of graphs of the n-th convolutional layer output of the convolutional neural networks, the feature Set of graphs includes multiple characteristic patterns, and n-th convolutional layer is any of one or more of convolutional layers, and n is positive integer;
Determination unit, the important feature figure and insignificant characteristic pattern being configured to determine that in the feature set of graphs are described important Influence of the characteristic pattern to the output result of the convolutional neural networks is greater than the insignificant characteristic pattern;
Processing unit is configured as trimming the convolutional neural networks based on the insignificant characteristic pattern, obtains n-th convolution Next layer of input of layer, next layer of input of n-th convolutional layer includes the important feature figure.
11. device according to claim 10, which is characterized in that the determination unit, comprising:
Computation subunit is configured as calculating the L2 norm of each characteristic pattern of n-th convolutional layer;
Subelement is determined, if the L2 norm for being configured as fisrt feature figure is less than the first threshold of n-th convolutional layer, it is determined that The fisrt feature figure is the insignificant characteristic pattern;If the L2 norm of fisrt feature figure is greater than or equal to the first threshold, Then determine that the fisrt feature figure is the important feature figure, the fisrt feature figure is any spy in the feature set of graphs Sign figure.
12. device according to claim 11, which is characterized in that the first threshold of n-th convolutional layer is described volume n-th The norm mean value of lamination and the product of threshold coefficient, the norm mean value are the L2 model of all characteristic patterns of n-th convolutional layer Several mean values.
13. device according to claim 11 or 12, which is characterized in that the determining subelement is additionally configured in institute Before stating the important feature figure and insignificant characteristic pattern determined in the feature set of graphs, determine n-th convolutional layer whether be Convolutional layer can be trimmed;If n-th convolutional layer is that can trim convolutional layer, it is determined that the important feature figure in the feature set of graphs With insignificant characteristic pattern.
14. device according to claim 13, which is characterized in that the computation subunit is additionally configured to described in calculating The coefficient of variation of the L2 norm of all characteristic patterns of n-th convolutional layer;
The determining subelement, if being configured as the coefficient of variation greater than second threshold, it is determined that n-th convolutional layer is institute Convolutional layer can be trimmed by stating.
15. device according to any one of claims 10 to 14, which is characterized in that the processing unit, be configured as by The feature of the insignificant characteristic pattern is all set to 0;The insignificant characteristic pattern and the important feature figure after 0 being set as Next layer of input of n-th convolutional layer.
16. device according to any one of claims 10 to 14, which is characterized in that the processing unit, be configured as from Insignificant characteristic pattern is screened out in the feature set of graphs, using the important feature figure as next layer of n-th convolutional layer Input.
17. device according to any one of claims 10 to 16, which is characterized in that described device further include:
Training unit is configured as training the convolutional neural networks, uses during the training convolutional neural networks Loss function include L2, the regularization term of 1 norm, the regularization term is used for the sparse features of learning sample.
18. device according to claim 17, which is characterized in that the L2, the regularization term of 1 norm include L2,1 norm With the product of regularization coefficient, the L2,1 norm is the L2 of each sample, the sum of 1 norm, wherein the L2 of first sample, 1 model Number is the sum of the L2 norm of characteristic pattern of all convolutional layers that inputs the convolutional neural networks after the first samples, described the One sample is any one sample in all samples.
19. a kind of clipping device of convolutional neural networks, which is characterized in that the clipping device of the convolutional neural networks includes place Manage device and memory;For the memory for storing software program and module, the processor is by operation or executes storage Software program and/or module in the memory realizes method as described in any one of claim 1 to 9.
20. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is used for storage processor Performed program code, said program code include for realizing the finger of method as described in any one of claim 1 to 9 It enables.
CN201910380839.5A 2019-05-08 2019-05-08 Pruning method, device and the storage medium of convolutional neural networks Pending CN110232436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910380839.5A CN110232436A (en) 2019-05-08 2019-05-08 Pruning method, device and the storage medium of convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910380839.5A CN110232436A (en) 2019-05-08 2019-05-08 Pruning method, device and the storage medium of convolutional neural networks

Publications (1)

Publication Number Publication Date
CN110232436A true CN110232436A (en) 2019-09-13

Family

ID=67861206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910380839.5A Pending CN110232436A (en) 2019-05-08 2019-05-08 Pruning method, device and the storage medium of convolutional neural networks

Country Status (1)

Country Link
CN (1) CN110232436A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222629A (en) * 2019-12-31 2020-06-02 暗物智能科技(广州)有限公司 Neural network model pruning method and system based on adaptive batch normalization
CN111930982A (en) * 2020-07-20 2020-11-13 南京南瑞信息通信科技有限公司 Intelligent labeling method for power grid images
CN112102183A (en) * 2020-09-02 2020-12-18 杭州海康威视数字技术股份有限公司 Sparse processing method, device and equipment
CN112734036A (en) * 2021-01-14 2021-04-30 西安电子科技大学 Target detection method based on pruning convolutional neural network
US11003959B1 (en) * 2019-06-13 2021-05-11 Amazon Technologies, Inc. Vector norm algorithmic subsystems for improving clustering solutions
CN115146775A (en) * 2022-07-04 2022-10-04 同方威视技术股份有限公司 Edge device reasoning acceleration method and device and data processing system
CN116188878A (en) * 2023-04-25 2023-05-30 之江实验室 Image classification method, device and storage medium based on neural network structure fine adjustment
CN117829241A (en) * 2024-03-04 2024-04-05 西北工业大学 Pruning method of convolutional neural network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528638A (en) * 2016-01-22 2016-04-27 沈阳工业大学 Method for grey correlation analysis method to determine number of hidden layer characteristic graphs of convolutional neural network
CN106548234A (en) * 2016-11-17 2017-03-29 北京图森互联科技有限责任公司 A kind of neural networks pruning method and device
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated
CN109389043A (en) * 2018-09-10 2019-02-26 中国人民解放军陆军工程大学 A kind of crowd density estimation method of unmanned plane picture
CN109472352A (en) * 2018-11-29 2019-03-15 湘潭大学 A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature
CN109522949A (en) * 2018-11-07 2019-03-26 北京交通大学 Model of Target Recognition method for building up and device
CN109598340A (en) * 2018-11-15 2019-04-09 北京知道创宇信息技术有限公司 Method of cutting out, device and the storage medium of convolutional neural networks
CN109657595A (en) * 2018-12-12 2019-04-19 中山大学 Based on the key feature Region Matching face identification method for stacking hourglass network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528638A (en) * 2016-01-22 2016-04-27 沈阳工业大学 Method for grey correlation analysis method to determine number of hidden layer characteristic graphs of convolutional neural network
CN106548234A (en) * 2016-11-17 2017-03-29 北京图森互联科技有限责任公司 A kind of neural networks pruning method and device
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated
CN109389043A (en) * 2018-09-10 2019-02-26 中国人民解放军陆军工程大学 A kind of crowd density estimation method of unmanned plane picture
CN109522949A (en) * 2018-11-07 2019-03-26 北京交通大学 Model of Target Recognition method for building up and device
CN109598340A (en) * 2018-11-15 2019-04-09 北京知道创宇信息技术有限公司 Method of cutting out, device and the storage medium of convolutional neural networks
CN109472352A (en) * 2018-11-29 2019-03-15 湘潭大学 A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature
CN109657595A (en) * 2018-12-12 2019-04-19 中山大学 Based on the key feature Region Matching face identification method for stacking hourglass network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11003959B1 (en) * 2019-06-13 2021-05-11 Amazon Technologies, Inc. Vector norm algorithmic subsystems for improving clustering solutions
CN111222629A (en) * 2019-12-31 2020-06-02 暗物智能科技(广州)有限公司 Neural network model pruning method and system based on adaptive batch normalization
CN111930982A (en) * 2020-07-20 2020-11-13 南京南瑞信息通信科技有限公司 Intelligent labeling method for power grid images
CN112102183A (en) * 2020-09-02 2020-12-18 杭州海康威视数字技术股份有限公司 Sparse processing method, device and equipment
CN112734036A (en) * 2021-01-14 2021-04-30 西安电子科技大学 Target detection method based on pruning convolutional neural network
CN115146775A (en) * 2022-07-04 2022-10-04 同方威视技术股份有限公司 Edge device reasoning acceleration method and device and data processing system
CN116188878A (en) * 2023-04-25 2023-05-30 之江实验室 Image classification method, device and storage medium based on neural network structure fine adjustment
CN117829241A (en) * 2024-03-04 2024-04-05 西北工业大学 Pruning method of convolutional neural network
CN117829241B (en) * 2024-03-04 2024-06-07 西北工业大学 Pruning method of convolutional neural network

Similar Documents

Publication Publication Date Title
CN110232436A (en) Pruning method, device and the storage medium of convolutional neural networks
US11907760B2 (en) Systems and methods of memory allocation for neural networks
CN109344921B (en) A kind of image-recognizing method based on deep neural network model, device and equipment
CN108416440A (en) A kind of training method of neural network, object identification method and device
CN109583325A (en) Face samples pictures mask method, device, computer equipment and storage medium
US11776257B2 (en) Systems and methods for enhancing real-time image recognition
CN111950656B (en) Image recognition model generation method and device, computer equipment and storage medium
CN106485316A (en) Neural network model compression method and device
CN106709565A (en) Optimization method and device for neural network
US20220076385A1 (en) Methods and systems for denoising media using contextual information of the media
CN109657582A (en) Recognition methods, device, computer equipment and the storage medium of face mood
CN115600650A (en) Automatic convolution neural network quantitative pruning method and equipment based on reinforcement learning and storage medium
CN113657421B (en) Convolutional neural network compression method and device, and image classification method and device
CN109766800B (en) Construction method of mobile terminal flower recognition model
CN110378305A (en) Tealeaves disease recognition method, equipment, storage medium and device
US20240153271A1 (en) Method and apparatus for selecting cover of video, computer device, and storage medium
CN113469233B (en) Tobacco leaf automatic grading method and system based on deep learning
CN108734264A (en) Deep neural network model compression method and device, storage medium, terminal
CN111428854A (en) Structure searching method and structure searching device
CN110147833A (en) Facial image processing method, apparatus, system and readable storage medium storing program for executing
CN110647974A (en) Network layer operation method and device in deep neural network
CN110197107A (en) Micro- expression recognition method, device, computer equipment and storage medium
WO2019091401A1 (en) Network model compression method and apparatus for deep neural network, and computer device
CN114241234A (en) Fine-grained image classification method, device, equipment and medium
CN110874635A (en) Deep neural network model compression method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination