CN109086819A

CN109086819A - Caffemodel model compression method, system, equipment and medium

Info

Publication number: CN109086819A
Application number: CN201810836366.0A
Authority: CN
Inventors: 罗壮
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2018-12-25
Anticipated expiration: 2038-07-26
Also published as: CN109086819B

Abstract

The invention discloses a kind of caffemodel model compression method, system, equipment and media, and wherein caffemodel model compression method includes: that the caffemodel model after a training is imported using caffe frame；Obtain the first weight matrix；Generate mask code matrix；Using training set training caffemodel model, after iteration, fc6 layer and/or fc7 layers of weight matrix is the second weight matrix；Each element each element multiplication corresponding with mask code matrix in second weight matrix is generated into third weight matrix, sets third weight matrix for the weight matrix of fc6 layers and/or fc7 layers；Iteration terminates, and converts corresponding csr sparse matrix format for third weight matrix and generates compression weight matrix.Method of the invention can reach the effect for reducing the memory space of caffemodel model.

Description

Caffemodel model compression method, system, equipment and medium

Technical field

The present invention relates to algorithm fields, and in particular to a kind of caffemodel model compression method, system, equipment and Jie Matter.

Background technique

Pvanet-faster-rcnn (a kind of object detection model) be it is a kind of based on convolutional neural networks in image The algorithm model that object is detected.The Pvanet- of training standard under a kind of caffe (deep learning frame) frame The size of caffemodel (deep learning frame model) model that faster-rcnn model obtains be 369MB (million, computer One of storage cell), which is made of several layers, wherein fc6 (the 6th layer of full articulamentum) layer and fc7 layers of (full articulamentum 7th layer) weight parameter total account for about 352MB.

When calculating the model with GPU (graphics processor) calorimeter, the caffemodel model of 369MB size be will reside in In GPU video memory, the parameter amount in caffemodel model is larger, and occupancy GPU video memory resource is more, can not be nervous in video memory resource GPU video card on run, cause GPU operational performance decline.

Summary of the invention

The technical problem to be solved by the present invention is in order to overcome caffemodel model occupancy in operation in the prior art GPU video memory, the defect for causing GPU operational performance to decline, provide a kind of caffemodel model compression method, system, equipment and Medium.

The present invention is to solve above-mentioned technical problem by following technical proposals:

A kind of caffemodel model compression method, the caffemodel model compression method include:

The caffemodel model after a training is imported using caffe frame, the caffemodel model includes fc6 layers And/or fc7 layers, fc6 layers and/or fc7 layer of the weight matrix is the first weight matrix；

Obtain first weight matrix；

Absolute value in first weight matrix is greater than and is set as 1 equal to the element of preset threshold, and will be described After the element that absolute value in first weight matrix is less than the preset threshold is set as 0, mask code matrix, the default threshold are generated Value is a positive value；

Using the training set training caffemodel model, after iteration, fc6 layers and/or fc7 layer of the weight matrix For the second weight matrix；

Each element in second weight matrix is generated with each element multiplication corresponding in the mask code matrix Fc6 layers and/or fc7 layer of the weight matrix is set the third weight matrix by third weight matrix；

Return it is described using the training set training caffemodel model, after iteration, described fc6 layers and/or fc7 layer The step of weight matrix is the second weight matrix；

Until reaching default iteration termination condition, then iteration terminates, and converts corresponding csr for the third weight matrix A kind of (sparse matrix compression storage format) sparse matrix format generates compression weight matrix, by the caffemodel model Weight matrix is set as the compression weight matrix.

Preferably, the step of weight matrix by the caffemodel model is set as the compression weight matrix Further include:

After iteration, the training precision for obtaining the caffemodel model is repetitive exercise precision；

The training precision of the caffemodel model before iteration is original training precision, calculates the repetitive exercise essence Degree compares the down ratio of the original training precision, if the down ratio is higher than default precise ratio, reduces described pre- If threshold value, the step of generating the mask code matrix is returned；

It is described to convert corresponding sparse matrix format for the third weight matrix and generate the compression weight matrix Step includes:

Until the down ratio is lower than the default precise ratio, convert the third weight matrix to corresponding dilute It dredges matrix format and generates the compression weight matrix.

Preferably, the range of the default precise ratio is 0.1%-0.5%.

Preferably, the step of weight matrix by the caffemodel model is set as the compression weight matrix It also includes later:

Using the fc6 layers and/or fc7 layer of reception input data, and by the input data and the compression weight square Battle array obtains output data as multiplication operation.

A kind of caffemodel model compression system, the caffemodel model compression system include import modul, cover Code generation module, iteration module, mask module, return module and conversion module；

The import modul is used to import the caffemodel model after a training using caffe frame, described Caffemodel model includes fc6 layers and/or fc7 layers, and fc6 layers and/or fc7 layer of the weight matrix is the first weight square Battle array；

The mask generation module is used to obtain first weight matrix, and will be absolute in first weight matrix Value is greater than and is set as 1 equal to the element of preset threshold, and the absolute value in first weight matrix is less than described preset After the element of threshold value is set as 0, mask code matrix is generated, the preset threshold is a positive value；

The iteration module is used for using the training set training caffemodel model, after iteration, described fc6 layer with/ Or fc7 layers of weight matrix is the second weight matrix；

The mask module is used for each element in second weight matrix is corresponding with the mask code matrix Each element multiplication generates third weight matrix, sets the third for fc6 layers and/or fc7 layer of the weight matrix and weighs Weight matrix；

The return module is described using the training set training caffemodel model for returning, described after iteration The step of fc6 layers and/or fc7 layers of weight matrix is the second weight matrix；

The conversion module is used for until reaching default iteration termination condition, then iteration terminates, by the third weight square Battle array is converted into corresponding csr sparse matrix format and generates compression weight matrix, by the weight matrix of the caffemodel model It is set as the compression weight matrix.

Preferably, the caffemodel model compression system further includes precision comparison module, the precision comparison module After for iteration, and obtaining the training precision of the caffemodel model is repetitive exercise precision, the institute before iteration The training precision for stating caffemodel model is original training precision；

The precision module is also used to calculate the down ratio that the repetitive exercise precision compares the original training precision, If the down ratio is higher than default precise ratio, the preset threshold is reduced, the mask code matrix generation module is called；

The conversion module is also used to until the down ratio is lower than the default precise ratio, by the third weight Matrix is converted into corresponding sparse matrix format and generates the compression weight matrix.

Preferably, the range of the default precise ratio is 0.1%-0.5%.

Preferably, the caffemodel model compression system also includes computing module, the computing module is used to utilize institute Fc6 layers and/or fc7 layers reception input data are stated, and the input data is obtained with the compression weight matrix as multiplication operation To output data.

A kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, the processor realize caffemodel model compression method as described above when executing the computer program.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step of caffemodel model compression method as described above is realized when row.

The positive effect of the present invention is that:

The present invention is by by each element of the fc6 layer of caffemodel model and/or fc7 layers of weight matrix and default threshold Value generates the mask code matrix for only including 0 and 1 element more afterwards, continues to train caffemodel model, will generate after each iteration The second weight matrix each element and mask code matrix corresponding position each element multiplication, after final iteration, obtain The weight matrix of caffemodel model include some inactive elements 0, and convert csr sparse matrix lattice for this weight matrix Formula, to achieve the effect that the memory space for reducing caffemodel model.

Detailed description of the invention

Fig. 1 is the flow chart of the caffemodel model compression method of the embodiment of the present invention 1.

Fig. 2 is the flow chart of the caffemodel model compression method of the embodiment of the present invention 2.

Fig. 3 is the module diagram of the caffemodel model compression system of the embodiment of the present invention 3.

Fig. 4 is the module diagram of the caffemodel model compression system of the embodiment of the present invention 4.

Fig. 5 is the structural schematic diagram of the electronic equipment of the embodiment of the present invention 5.

Specific embodiment

The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality It applies among a range.

Caffe is a kind of open source deep learning frame, and the present embodiment is realized based on caffe deep learning frame, can be right The caffemodel model file obtained after Pvanet-faster-rcnn training is compressed.Fc6 in caffemodel model Weights (weight) parameter and fc7 layers of weights parameter of layer exist in the form of dense matrix.Before implementation, need The file to be prepared includes: the caffemodel model to be compressed after Pvanet-faster-rcnn training, and training should Training set when caffemodel model.

Embodiment 1

It, can be to caffemodel model in compression process the present embodiment provides a kind of caffemodel model compression method Fc6 layers and fc7 layers carry out identical compression processing, can also be individually respectively to fc6 layers and fc7 layers progress compression processing.

As shown in Figure 1, caffemodel model compression method includes:

Step 101: importing the caffemodel model after a training using caffe frame, caffemodel model includes Fc6 layers and/or fc7 layers, fc6 layers and/or fc7 layers of weight matrix is the first weight matrix.

Step 102: obtaining the first weight matrix.

Step 103: the absolute value in the first weight matrix being greater than and is set as 1 equal to the element of preset threshold, and will After the element that absolute value in first weight matrix is less than preset threshold is set as 0, mask code matrix is generated, preset threshold is one just Value.

In the present embodiment, fc6 layers can select different values from the preset threshold of fc7 layer choosing respectively according to the actual situation.

Step 104: utilizing training set training caffemodel model, after iteration, fc6 layers and/or fc7 layers of weight matrix For the second weight matrix.

Step 105: each element each element multiplication corresponding with mask code matrix in the second weight matrix is generated The weight matrix of fc6 layers and/or fc7 layers is set third weight matrix by third weight matrix.

Step 106: judge whether to meet default iteration termination condition, if so, iteration terminates, executes step 107, if It is no, then return step 104.

Until reaching default iteration termination condition, then iteration terminates.Default iteration termination condition can be led with caffe frame Iteration termination condition when caffemodel model training after the training entered is identical.

Step 107: corresponding csr sparse matrix format, which is converted, by third weight matrix generates compression weight matrix, it will The weight matrix of caffemodel model is set as compression weight matrix.

So that the occupied space of caffemodel model greatly reduces, it may make caffemodel model after progress It occupies GPU video memory space when reforwarding is calculated to reduce, to improve GPU operational performance.

Embodiment 2

The present embodiment provides a kind of caffemodel model compression method, compared with Example 1, difference exists the present embodiment In caffemodel model compression method is before step 107 further include:

Step 107-1: after iteration, the training precision for obtaining caffemodel model is repetitive exercise precision, iteration The training precision of preceding caffemodel model is original training precision.

Step 107-2: the down ratio that repetitive exercise precision compares original training precision is calculated.

Step 107-3: judging whether down ratio is higher than default precise ratio, if so, 107-4 is thened follow the steps, if it is not, Execute step 107.The range of default precise ratio may be configured as 0.1%-0.5%.

Step 107-4: preset threshold, return step 103 are reduced.

Step 107 includes:

Until down ratio is lower than default precise ratio, it is raw to convert corresponding sparse matrix format for third weight matrix At compression weight matrix.

Default precise ratio in the present embodiment is selected as 0.5%.

In practical applications, the Pvanet-faster-rcnn's that can detecte 20 kinds of objects to one with this method Caffemodel model is compressed, and the initial size of caffemodel model is 369MB, and adopting said method is by caffemodel When being 37MB after model compression, model accuracy down ratio is only 0.36%.Obtain the original with caffemodel initial model Beginning training precision is suitable, but the smaller new model of occupied space.

Preferably, it is also included after the step of setting compression weight matrix for the weight matrix of caffemodel model:

Make multiplication operation using fc6 layers and/or fc7 layers reception input data, and by input data and compression weight matrix Obtain output data.

When model carries out forward calculation, fc6 layers of input data, which be multiplied with fc6 layers of weight matrix, generates the layer Output data, fc7 layers of input data, which be multiplied with fc7 layers of weight matrix, generates this layer of output data.Compared to before, press The occupied space size of caffemodel model after contracting, so that in model forward calculation, the GPU video memory resource of model occupancy Space is reduced, and the runnability of GPU is substantially increased.

Embodiment 3

The present embodiment provides a kind of caffemodel model compression systems, as shown in figure 3, caffemodel model compression system System includes import modul 201, mask generation module 202, iteration module 203, mask module 204, return module 205 and conversion mould Block 206.

Import modul 201 is used to import the caffemodel model after a training, caffemodel mould using caffe frame Type includes fc6 layers and/or fc7 layers, and fc6 layers and/or fc7 layers of weight matrix is the first weight matrix.

Mask generation module 202 for obtain the first weight matrix, and by the absolute value in the first weight matrix be greater than and Element equal to preset threshold is set as 1, and sets 0 for the element that the absolute value in the first weight matrix is less than preset threshold Afterwards, mask code matrix is generated, preset threshold is a positive value.

Iteration module 203 is used to train caffemodel model using training set, after iteration, fc6 layers and/or fc7 layers Weight matrix is the second weight matrix.

Mask module 204 is used for each the element phase corresponding with mask code matrix of each element in the second weight matrix Multiply and generate third weight matrix, sets third weight matrix for the weight matrix of fc6 layers and/or fc7 layers.

Return module 205 is for calling iteration module 203.

Conversion module 206 is used for when until reaching default iteration termination condition, after iteration, by third weight matrix turn It turns to corresponding csr sparse matrix format and generates compression weight matrix, the weight matrix of caffemodel model is set as pressing Contracting weight matrix.When caffemodel model training after the training that default iteration termination condition can be imported with caffe frame Iteration termination condition it is identical.

Embodiment 4

The present embodiment provides a kind of caffemodel model compression system, compared with Example 3, difference exists the present embodiment In as shown in figure 4, caffemodel model compression system further includes precision comparison module 207, precision comparison module 207 is used for After iteration, and obtaining the training precision of caffemodel model is repetitive exercise precision, the caffemodel before iteration The training precision of model is original training precision；

Precision comparison module 207 is also used to calculate the down ratio that repetitive exercise precision compares original training precision, if under Drop ratio is higher than default precise ratio, then reduces preset threshold, calls mask code matrix generation module；The range of default precise ratio It may be configured as 0.1%-0.5%.

Conversion module 206 is also used to until down ratio is lower than default precise ratio, converts third weight matrix to pair The sparse matrix format answered generates compression weight matrix.

Default precise ratio in the present embodiment is selected as 0.5%.

Preferably, caffemodel model compression system also includes computing module, computing module be used for using fc6 layers and/or Fc7 layers of reception input data, and input data and compression weight matrix are obtained into output data as multiplication operation.

Embodiment 5

Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided in this embodiment.The electronic equipment includes memory, place The computer program managing device and storage on a memory and can running on a processor, the processor execute real when described program The caffemodel model compression method of current embodiment 1.The electronic equipment 30 that Fig. 5 is shown is only an example, should not be to this The function and use scope of inventive embodiments bring any restrictions.

As shown in figure 5, electronic equipment 30 can be showed in the form of universal computing device, such as it can set for server It is standby.The component of electronic equipment 30 can include but is not limited to: at least one above-mentioned processor 31, above-mentioned at least one processor 32, the bus 33 of different system components (including memory 32 and processor 31) is connected.

Bus 33 includes data/address bus, address bus and control bus.

Memory 32 may include volatile memory, such as random access memory (RAM) 321 and/or cache Memory 322 can further include read-only memory (ROM) 323.

Memory 32 can also include program/utility 325 with one group of (at least one) program module 324, this The program module 324 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.

Processor 31 by operation storage computer program in memory 32, thereby executing various function application and Data processing, such as caffemodel model compression method provided by the embodiment of the present invention 1.

Electronic equipment 30 can also be communicated with one or more external equipments 34 (such as keyboard, sensing equipment etc.).It is this Communication can be carried out by input/output (I/O) interface 35.Also, the equipment 30 that model generates can also pass through Network adaptation Device 36 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) logical Letter.As shown, the other modules for the equipment 30 that network adapter 36 is generated by bus 33 and model communicate.It should be understood that Although not shown in the drawings, the equipment 30 that can be generated with binding model uses other hardware and/or software module, including but unlimited In: microcode, device driver, redundant processor, external disk drive array, RAID (disk array) system, magnetic tape drive Device and data backup storage system etc..

It should be noted that although being referred to several units/modules or subelement/mould of electronic equipment in the above detailed description Block, but it is this division be only exemplary it is not enforceable.In fact, embodiment according to the present invention, is retouched above The feature and function for two or more units/modules stated can embody in a units/modules.Conversely, above description A units/modules feature and function can with further division be embodied by multiple units/modules.

Embodiment 6

A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with Caffemodel model compression method provided by embodiment 1 is realized when processor executes.

Wherein, what readable storage medium storing program for executing can use more specifically can include but is not limited to: portable disc, hard disk, random Access memory, read-only memory, erasable programmable read only memory, light storage device, magnetic memory device or above-mentioned times The suitable combination of meaning.

In possible embodiment, the present invention is also implemented as a kind of form of program product comprising program generation Code, when described program product is run on the terminal device, said program code is realized in fact for executing the terminal device Apply the step in caffemodel model compression method described in example 1.

Wherein it is possible to be write with any combination of one or more programming languages for executing program of the invention Code, said program code can be executed fully on a user device, partly execute on a user device, is only as one Vertical software package executes, part executes on a remote device or executes on a remote device completely on a user device for part.

Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and Modification each falls within protection scope of the present invention.

Claims

1. a kind of caffemodel model compression method, which is characterized in that the caffemodel model compression method includes:

Using caffe frame import one training after caffemodel model, the caffemodel model include fc6 layer with/ Or fc7 layers, fc6 layers and/or fc7 layer of the weight matrix is the first weight matrix；

Obtain first weight matrix；

Absolute value in first weight matrix is greater than and is set as 1 equal to the element of preset threshold, and by described first After the element that absolute value in weight matrix is less than the preset threshold is set as 0, mask code matrix is generated, the preset threshold is One positive value；

Using the training set training caffemodel model, after iteration, fc6 layer and/or fc7 layer of the weight matrix is the Two weight matrix；

Each element in second weight matrix is generated into third with each element multiplication corresponding in the mask code matrix Fc6 layers and/or fc7 layer of the weight matrix is set the third weight matrix by weight matrix；

Return it is described using the training set training caffemodel model, after iteration, fc6 layers and/or fc7 layer of the weight The step of matrix is the second weight matrix；

Until reaching default iteration termination condition, then iteration terminates, and it is sparse to convert corresponding csr for the third weight matrix Matrix format generates compression weight matrix, sets the compression weight square for the weight matrix of the caffemodel model Battle array.

2. caffemodel model compression method as described in claim 1, which is characterized in that described by the caffemodel The weight matrix of model is set as the step of compression weight matrix further include:

The training precision of the caffemodel model before iteration is original training precision, calculates the repetitive exercise precision phase The default threshold is reduced if the down ratio is higher than default precise ratio than the down ratio of the original training precision Value returns to the step of generating the mask code matrix；

It is described to convert the step of corresponding sparse matrix format generates the compression weight matrix for the third weight matrix Include:

Until the down ratio is lower than the default precise ratio, corresponding sparse square is converted by the third weight matrix Grid array formula generates the compression weight matrix.

3. caffemodel model compression method as claimed in claim 2, which is characterized in that the model of the default precise ratio It encloses for 0.1%-0.0.5%.

4. caffemodel model compression method as described in claim 1, which is characterized in that described by the caffemodel The weight matrix of model is set as also including after the step of compression weight matrix:

Make using the fc6 layers and/or fc7 layer of reception input data, and by the input data and the compression weight matrix Multiplication operation obtains output data.

5. a kind of caffemodel model compression system, which is characterized in that the caffemodel model compression system includes leading Enter module, mask generation module, iteration module, mask module, return module and conversion module；

The import modul is used to import the caffemodel model after a training, the caffemodel using caffe frame Model includes fc6 layers and/or fc7 layers, and fc6 layers and/or fc7 layer of the weight matrix is the first weight matrix；

The mask generation module is used to obtain first weight matrix, and the absolute value in first weight matrix is big In being set as 1 with the element for being equal to preset threshold, and the absolute value in first weight matrix is less than the preset threshold Element be set as 0 after, generate mask code matrix, the preset threshold be a positive value；

The iteration module is used to train the caffemodel model using training set, after iteration, described fc6 layers and/or fc7 The weight matrix of layer is the second weight matrix；

The mask module be used for by each element in second weight matrix it is corresponding with the mask code matrix each Element multiplication generates third weight matrix, sets the third weight square for fc6 layers and/or fc7 layer of the weight matrix Battle array；

The return module is described using the training set training caffemodel model for returning, fc6 layers described after iteration And/or fc7 layers weight matrix be the second weight matrix the step of；

The conversion module is used for until reaching default iteration termination condition, then iteration terminates, and the third weight matrix is turned It turns to corresponding csr sparse matrix format and generates compression weight matrix, the weight matrix of the caffemodel model is arranged For the compression weight matrix.

6. caffemodel model compression system as claimed in claim 5, which is characterized in that the caffemodel model pressure Compression system further includes precision comparison module, after the precision comparison module is used for iteration, and described in acquisition The training precision of caffemodel model is repetitive exercise precision, and the training precision of the caffemodel model before iteration is Original training precision；

The precision module is also used to calculate the down ratio that the repetitive exercise precision compares the original training precision, if institute It states down ratio and is higher than default precise ratio, then reduce the preset threshold, call the mask code matrix generation module；

The conversion module is also used to until the down ratio is lower than the default precise ratio, by the third weight matrix It is converted into corresponding sparse matrix format and generates the compression weight matrix.

7. caffemodel model compression system as claimed in claim 6, which is characterized in that the model of the default precise ratio It encloses for 0.1%-0.5%.

8. caffemodel model compression system as claimed in claim 5, which is characterized in that the caffemodel model pressure Compression system also includes computing module, and the computing module is used to utilizing the fc6 layers and/or fc7 layer of reception input data, and by institute It states input data and the compression weight matrix and obtains output data as multiplication operation.

9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor is realized of any of claims 1-4 when executing the computer program Caffemodel model compression method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of caffemodel model compression method of any of claims 1-4 is realized when being executed by processor.