CN105184369A

CN105184369A - Depth learning model matrix compression method and device

Info

Publication number: CN105184369A
Application number: CN201510566639.0A
Authority: CN
Inventors: 陈海波; 李晓燕
Original assignee: Hangzhou Langhe Technology Co Ltd
Current assignee: Hangzhou Langhe Technology Co Ltd
Priority date: 2015-09-08
Filing date: 2015-09-08
Publication date: 2015-12-23

Abstract

The invention provides a depth learning model matrix compression method and device. The last linear layer of a depth learning model is connected with M hidden nodes and N classification nodes. The last linear layer is corresponding to a weight matrix W. The method comprises the steps that S101 according to the absolute values of the elements of the weight matrix W, a K value is calculated; and Step S102 the last linear layer is divided into a first linear layer and a second linear layer, wherein the weight matrix of the first linear layer is a matrix P of M*K, and the weight matrix of the second linear layer is a matrix Q of K*N. The output of the first linear layer is the input of the second linear layer. M*N is greater than K*(M+N), and the weight matrix W is compressed.

Description

For the matrix compression method and apparatus of degree of deep learning model

Technical field

Embodiments of the present invention relate to computer realm, and more specifically, embodiments of the present invention relate to the matrix compression method and apparatus for degree of deep learning model.

Background technology

This part embodiments of the present invention be intended to for stating in claims provide background or context.Description is not herein because be included in just admit it is prior art in this part.

Degree of deep learning model is applied in data analysis more and more widely, it is by the linear layer between the node to different layers and non-linear layer, data are mapped, the process such as computing, and in processes to model training, revise, upgrade, thus the final accuracy promoting classification or prediction.In degree of deep learning model, last layer of linear layer is the processing layer of the class node connecting hidden node and output usually.Suppose that hidden node number is M, class node number is N, then the weight matrix of last layer of linear layer is M*N.In actual process, N is a larger number, and cause the parameter amount of degree of deep learning model can increase along with the increase of N, operand is very large.

In order to reduce operand; have employed the model compression method based on SVD matrix decomposition technology; in the method; usually can carry out singular value (SVD) to the weight matrix M*N of last layer of linear layer to decompose; be decomposed into and carry out this weight matrix of approximate representation by two of M*K and K*N matrix multiples, namely split into two linear layer structures.In degree of deep learning model, weight matrix represents the feedback influence degree of fillet between node, and in usual weight matrix, a lot of element value, close to 0, reflects feedback relationship not strong between a lot of node.But, based on SVD matrix decomposition technology model compression method by these close to 0 element process in a comparable manner with the element with strong feedback relationship, operand is still very large.In addition, the model compression method based on SVD matrix decomposition technology does not consider how to select K yet, optimizes operation result, reduces operand simultaneously.

Summary of the invention

Because of in degree of deep learning model of the prior art, the matrix element close to 0 still processes with the matrix element with strong feedback relationship by model compression method in a comparable manner, and operand is still very large.In addition, these model compression methods do not consider how to select K yet, optimize operation result, reduce operand simultaneously.

Therefore in the prior art, the matrix compression method and apparatus operand for degree of deep learning model is very large, and inefficiency, the model accuracy rate obtained is lower, and this is very bothersome process.

For this reason, be starved of a kind of matrix compression method and apparatus for degree of deep learning model of improvement, so that effective condensation matrix, reduce operand, improve the efficiency of data processing, realize the optimization of degree of deep learning model, thus improve accuracy and the reliability of degree of deep learning model.

In the present context, embodiments of the present invention are expected to provide a kind of matrix compression method and apparatus for degree of deep learning model.

In the first aspect of embodiment of the present invention, provide a kind of matrix compression method for degree of deep learning model, last layer of linear layer of wherein said degree of deep learning model connects M hidden node and N number of class node, the weight matrix of described last layer of linear layer

W = [\begin{matrix} w_{11} & w_{1 N} \\ ... \\ w_{M 1} & w_{M N} \end{matrix}],

Described method comprises: step S101: according to the absolute value of the element of described weight matrix W, calculating K value; And step S102: described last layer of linear layer is decomposed into the first linear layer and the second linear layer, and the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

The output of described first linear layer is the input of described second linear layer, and M*N > K* (M+N), so that described weight matrix W is compressed.

According to a second aspect of the invention, provide a kind of matrix compression device for degree of deep learning model, last layer of linear layer of wherein said degree of deep learning model connects M hidden node and N number of class node, the weight matrix of described last layer of linear layer

W = [\begin{matrix} w_{11} & w_{1 N} \\ ... \\ w_{M 1} & w_{M N} \end{matrix}],

Described device comprises: K value computing module, is suitable for the absolute value of the element according to described weight matrix W, calculating K value; And compression module, be suitable for described last layer of linear layer to be decomposed into the first linear layer and the second linear layer, the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

According to the matrix compression device for degree of deep learning model in above-mentioned embodiment of the present invention, wherein said K value computing module comprises: element statistical module, is suitable for the element w adding up described weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse; K value calculating sub module, is suitable for according to N _sparse, calculating K value.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, also comprise sparse guillotine factor computing module, this module is suitable for: by equation w _block=k ₁* σ ²+ μ calculates described sparse guillotine factor w _block, wherein k ₁for positive constant, σ ²for the variance of the element of described weight matrix, μ is the average of the element of described weight matrix.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, wherein said K value calculating sub module is suitable for: by equation K=k ₂* (M*N-N _sparse)/(M+N), calculating K value, wherein k ₂for positive constant.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, also comprise: first optimizes module, is suitable for being optimized process to described matrix P and described matrix Q.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, wherein said first optimizes module is suitable for: adopt stochastic gradient descent algorithm to be optimized process to described matrix P and described matrix Q, comprise: assignment module, be suitable for the random starting values element of described matrix P and described matrix Q being given to Gaussian distributed, the average of the random starting values of the element of wherein said matrix P and described matrix Q is 0 and variance is 1; W _predictioncomputing module, is suitable for calculating W _prediction=P*Q, and loss function error calculating module, is suitable for according to W _predictionand w _ij, counting loss function error Loss; Gradient direction information computing module, is suitable for, according to described loss function error Loss, calculating the Gradient direction information of described matrix P and described matrix Q; Update module, is suitable for the Gradient direction information according to described matrix P and described matrix Q, upgrades described matrix P and described matrix Q.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, wherein said loss function error calculating module is suitable for: by following equation counting loss function error Loss:Loss=Loss1+ λ Loss2, λ is positive constant

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, wherein said Gradient direction information computing module is suitable for: the Gradient direction information being calculated described matrix P by following equation: and | w _{ij predicts}| < w _block; and w _{ij predicts}>=w _block; and w _{ij predicts}≤-w _block; The Gradient direction information of described matrix Q is calculated by following equation: and | w _{ij predicts}| < w _block; and w _{ij predicts}>=w _block; and w _{ij predicts}≤-w _block.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, wherein said update module is suitable for: be updated to respectively by described matrix P and described matrix Q:

p_{i k} = p_{i k} + l r * {&dtri;}_{p_{i k}} L o s s; q_{k j} = p_{k j} + l r * {&dtri;}_{p_{k j}} L o s s;

Wherein lr is constant.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, also comprise: iteration module, be suitable for: trigger described W _predictioncomputing module calculates W _prediction=P*Q, and trigger described loss function error calculating module according to W _predictionand w _ijcounting loss function error Loss, trigger described Gradient direction information computing module according to described loss function error Loss calculate described matrix P and described matrix Q Gradient direction information, trigger described update module and upgrade described matrix P and described matrix Q according to the Gradient direction information of described matrix P and described matrix Q; When described loss function error Loss is less than the error convergence value preset, described iteration module stops iteration, obtains the weight matrix Q ' of the weight matrix P ' of described first linear layer after optimizing and described second linear layer after optimizing; Decomposing module, is suitable for described weight matrix W to be decomposed into described first linear layer and described second linear layer.

According to the matrix compression device for degree of deep learning model in above-mentioned arbitrary embodiment of the present invention, also comprise: second optimizes module, the inverse process being suitable for being trained by degree of depth Model of Neural Network carries out iteration optimization process to described weight matrix P ' and described weight matrix Q ', to obtain the weight matrix P after the second optimization " and second optimize after weight matrix Q ", respectively as the weight matrix of described first linear layer and described second linear layer.

According to the matrix compression method and apparatus for degree of deep learning model of embodiment of the present invention, calculating K value is carried out by the absolute value of the element according to weight matrix W, the number expanding intermediate node can be selected adaptively according to the unique characteristics of weight matrix W, i.e. K value, but not the number of artificial selection intermediate node, thus significantly can optimize the parameter of degree of deep learning model, improve reliability and the accuracy of degree of deep learning model; And by calculating the sparse guillotine factor w of weight matrix W _blockand the element w of statistical weight matrix W _ijabsolute value be less than described sparse guillotine factor w _blockelement number N _sparse, carry out calculating K value, consider the openness of weight matrix W, effectively have compressed degree of deep learning model, decrease operand, improve operation efficiency, ensure accuracy and the reliability of degree of deep learning model simultaneously.

Accompanying drawing explanation

By reference to accompanying drawing reading detailed description hereafter, above-mentioned and other objects of exemplary embodiment of the invention, feature and advantage will become easy to understand.In the accompanying drawings, show some embodiments of the present invention by way of example, and not by way of limitation, wherein:

Fig. 1 schematically shows the schematic flow sheet of the matrix compression method for degree of deep learning model according to embodiment of the present invention;

Fig. 2 schematically shows the schematic diagram of the matrix compression device for degree of deep learning model according to embodiment of the present invention;

Fig. 3 schematically shows the schematic diagram of the matrix compression device for degree of deep learning model according to another embodiment of the present invention; And

Fig. 4 schematically shows the program product of the matrix compression for degree of deep learning model according to another embodiment of the present invention.

In the accompanying drawings, identical or corresponding label represents identical or corresponding part.

Embodiment

Below with reference to some illustrative embodiments, principle of the present invention and spirit are described.Should be appreciated that providing these embodiments is only used to enable those skilled in the art understand better and then realize the present invention, and not limit the scope of the invention by any way.On the contrary, provide these embodiments to be to make the disclosure more thorough and complete, and the scope of the present disclosure intactly can be conveyed to those skilled in the art.

Art technology technician know, embodiments of the present invention can be implemented as a kind of system, device, equipment, method or computer program.Therefore, the disclosure can be implemented as following form, that is: hardware, completely software (comprising firmware, resident software, microcode etc.) completely, or the form that hardware and software combines.

According to the embodiment of the present invention, a kind of matrix compression method and apparatus for degree of deep learning model is proposed.

In this article, it is to be appreciated that any number of elements in accompanying drawing is all unrestricted for example, and any name is all only for distinguishing, and does not have any limitation.

Below with reference to some representative embodiments of the present invention, explaination principle of the present invention and spirit in detail.

summary of the invention

The present inventor finds, in degree of deep learning model, last layer of linear layer connects M hidden node and N number of class node, and weight matrix is W.Calculating K value is carried out by the absolute value of the element according to weight matrix W, the number expanding intermediate node can be selected adaptively according to the unique characteristics of weight matrix W, i.e. K value, but not the number of artificial selection intermediate node, thus significantly can optimize the parameter of degree of deep learning model, improve reliability and the accuracy of degree of deep learning model; And by calculating the sparse guillotine factor w of weight matrix W _blockand the element w of statistical weight matrix W _ijabsolute value be less than described sparse guillotine factor w _blockelement number N _sparse, carry out calculating K value, consider the openness of weight matrix W, effectively have compressed degree of deep learning model, decrease operand, improve operation efficiency, ensure accuracy and the reliability of degree of deep learning model simultaneously.

After describing ultimate principle of the present invention, lower mask body introduces various non-limiting embodiment of the present invention.

illustrative methods

Below with reference to Fig. 1, the matrix compression method method for degree of deep learning model according to exemplary embodiment of the invention is described.

Fig. 1 schematically shows the schematic flow sheet of the matrix compression method 100 for degree of deep learning model according to embodiment of the present invention.Last layer of linear layer of wherein said degree of deep learning model connects M hidden node and N number of class node, the weight matrix of described last layer of linear layer

W =

[\begin{matrix} w_{11} & w_{1 N} \\ ... \\ w_{M 1} & w_{M N} \end{matrix}] .

As shown in Figure 1, the method 100 can comprise: step S101: according to the absolute value of the element of described weight matrix W, calculating K value.

Calculating K value is carried out by the absolute value of the element according to weight matrix W, the number expanding intermediate node can be selected adaptively according to the unique characteristics of weight matrix W, i.e. K value, but not the number of artificial selection intermediate node, thus significantly can optimize the parameter of degree of deep learning model, improve reliability and the accuracy of degree of deep learning model.

In the embodiment that some are possible, described step S101 comprises: step S101a: the element w adding up described weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse; Step S101b: according to N _sparse, calculating K value.

As previously mentioned, in degree of deep learning model, in usual weight matrix, a lot of element value is less, even close to 0, reflects feedback relationship not strong between a lot of node.According to exemplary embodiment of the present invention, method 100 is by the element w of statistical weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse, carry out calculating K value, due to generally speaking, absolute value close to 0 the element w of weight matrix W _ijnumber is more, and W is more sparse for this weight matrix, and therefore above-mentioned process considers the openness of weight matrix W, effectively have compressed degree of deep learning model, decreases operand, improve operation efficiency, ensures accuracy and the reliability of degree of deep learning model simultaneously.

In the embodiment that some are possible, step 101a comprises: by equation w _block=k ₁* σ ²+ μ calculates described sparse guillotine factor w _block, wherein k ₁for positive constant.In the embodiment that some are possible, wherein, σ ²for the variance of the element of described weight matrix, μ is the average of the element of described weight matrix; The variances sigma of weight matrix W ²be the parameter of the dispersion degree weighing element in weight matrix W, average μ is the parameter of the central tendency of element in reflection weight matrix W, therefore, by the variances sigma according to weight matrix W ²carry out calculating K value with average μ, according to the number of the unique characteristics of weight matrix W adaptively selected expansion intermediate node more accurately, thus the parameter of degree of deep learning model can be optimized further, improve reliability and the accuracy of degree of deep learning model.Alternatively, sparse guillotine factor w _blockalso can preset according to other parameter values of weight matrix W, can also experimentally result select the constant preset as sparse guillotine factor w _block.

In the embodiment that some are possible, step S101b comprises: by equation K=k ₂* (M*N-N _sparse)/(M+N), calculating K value, wherein k ₂for positive constant.Optionally, also by other formula, N can be used _sparsecarry out calculating K value, as long as N _sparsebecome inverse change relation with K value, just can meet know-why of the present invention to a certain extent: i.e. N _sparselarger, show in the element absolute value of weight matrix more close to the number of 0, this weight matrix is more sparse comparatively speaking, and so K value just can be less, thus when meeting this weight matrix sparse characteristic, weight matrix can be compressed greatly.

As shown in Figure 1, method 100 also comprises: step S102: described last layer of linear layer is decomposed into the first linear layer and the second linear layer, and the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q =

[\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

In the embodiment that some are possible, method 100 also comprises: step S103: be optimized process to described matrix P and described matrix Q.

In the embodiment that some are possible, described step S103 comprises: adopt stochastic gradient descent algorithm to be optimized process to described matrix P and described matrix Q, comprising:

Step S103a: the random starting values element of described matrix P and described matrix Q being given to Gaussian distributed, the average of the random starting values of the element of wherein said matrix P and described matrix Q is 0 and variance is 1;

Step S103b: calculate W _prediction=P*Q, and step S103c: according to W _predictionand w _ij, counting loss function error Loss; Step S103d: according to described loss function error Loss, calculates the Gradient direction information of described matrix P and described matrix Q;

Step S103e: according to the Gradient direction information of described matrix P and described matrix Q, upgrades described matrix P and described matrix Q.

In the embodiment that some are possible, step S103c comprises: by following equation counting loss function error Loss:

Loss=Loss1+ λ Loss2, λ are positive constant,

In the embodiment that some are possible, step S103d comprises: the Gradient direction information being calculated described matrix P by following equation:

and | w _{ij predicts}| < w _block;

and w _{ij predicts}>=w _block;

and w _{ij predicts}≤-w _block;

The Gradient direction information of described matrix Q is calculated by following equation:

and | w _{ij predicts}| < w _block;

and w _{ij predicts}>=w _block;

and w _{ij predicts}≤-w _block.

In the embodiment that some are possible, step S103e comprises: be updated to respectively by described matrix P and described matrix Q:

p_{i k} = p_{i k} + l r * {&dtri;}_{p_{i k}} L o s s;

q_{k j} = p_{k j} + l r * {&dtri;}_{p_{k j}} L o s s;

Wherein lr is constant.

In the embodiment that some are possible, method 100 also comprises: repeat step S103b to step S103e; When described loss function error Loss is less than the error convergence value preset, stop iteration, obtain the weight matrix Q ' of the weight matrix P ' of described first linear layer after optimizing and described second linear layer after optimizing; Described weight matrix W is decomposed into described first linear layer and described second linear layer.Alternatively, error convergence value can as required, based on computing time, calculated amount, the accuracy requirement that may allow etc. because usually setting.

In the embodiment that some are possible, method 100 also comprises: step S104: the inverse process of being trained by degree of depth Model of Neural Network carries out iteration optimization process to described weight matrix P ' and described weight matrix Q ', to obtain the weight matrix P after the second optimization " and second optimize after weight matrix Q ", respectively as the weight matrix of described first linear layer and described second linear layer.By inverse process, iteration optimization process is carried out to weight matrix P ' and described weight matrix Q ', further increases accuracy and the reliability of degree of deep learning model.

Like this, according to the matrix compression method for degree of deep learning model of embodiment of the present invention, calculating K value is carried out by the absolute value of the element according to weight matrix W, the number expanding intermediate node can be selected adaptively according to the unique characteristics of weight matrix W, i.e. K value, but not the number of artificial selection intermediate node, thus significantly can optimize the parameter of degree of deep learning model, improve reliability and the accuracy of degree of deep learning model; And pass through the element w of statistical weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse, carry out calculating K value, consider the openness of weight matrix W, effectively have compressed degree of deep learning model, decrease operand, improve operation efficiency, ensure accuracy and the reliability of degree of deep learning model simultaneously.

example devices

After the method describing exemplary embodiment of the invention, next, with reference to figure 2, the matrix compression device for degree of deep learning model of exemplary embodiment of the invention is described.

Fig. 2 schematically shows the schematic diagram of the matrix compression 200 for degree of deep learning model according to embodiment of the present invention, last layer of linear layer of wherein said degree of deep learning model connects M hidden node and N number of class node, the weight matrix of described last layer of linear layer

W = [\begin{matrix} w_{11} & w_{1 N} \\ ... \\ w_{M 1} & w_{M N} \end{matrix}] .

As shown in Figure 2, this device 200 can comprise: K value computing module 201 and compression module 202.

In device 200, described K value computing module 201 is suitable for: according to the absolute value of the element of described weight matrix W, calculating K value.

Calculating K value is carried out by the absolute value of the element according to weight matrix W, device 200 can select according to the unique characteristics of weight matrix W the number expanding intermediate node adaptively, i.e. K value, but not the number of artificial selection intermediate node, thus significantly can optimize the parameter of degree of deep learning model, improve reliability and the accuracy of degree of deep learning model.

In the embodiment that some are possible, described K value computing module 201 comprises: element statistical module 201a, is suitable for the element w adding up described weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse; K value calculating sub module 201b, is suitable for according to N _sparse, calculating K value.

As previously mentioned, in degree of deep learning model, in usual weight matrix, a lot of element value is less, even close to 0, reflects feedback relationship not strong between a lot of node.According to exemplary embodiment of the present invention, device 200 is by the element w by element statistical module 201a statistical weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse, and carry out calculating K value by K value calculating sub module 201b, consider the openness of weight matrix W, effectively have compressed degree of deep learning model, decrease operand, improve operation efficiency, ensure accuracy and the reliability of degree of deep learning model simultaneously.

In the embodiment that some are possible, wherein element statistical module 201a is suitable for: by equation w _block=k ₁* σ ²+ μ calculates described sparse guillotine factor w _block, wherein k ₁for positive constant.In the embodiment that some are possible, wherein, σ ²for the variance of the element of described weight matrix, μ is the average of the element of described weight matrix; The variances sigma of weight matrix W ²be the parameter of the dispersion degree weighing element in weight matrix W, average μ is the parameter of the central tendency of element in reflection weight matrix W, and therefore, element statistical module 201a is by the variances sigma according to weight matrix W ²carry out calculating K value with average μ, according to the number of the unique characteristics of weight matrix W adaptively selected expansion intermediate node more accurately, thus the parameter of degree of deep learning model can be optimized further, improve reliability and the accuracy of degree of deep learning model.Alternatively, sparse guillotine factor w _blockalso can preset according to other parameter values of weight matrix W, can also experimentally result select the constant preset as sparse guillotine factor w _block.

In the embodiment that some are possible, wherein K value calculating sub module 201b is suitable for: by equation K=k ₂* (M*N-N _sparse)/(M+N), calculating K value, wherein k ₂for positive constant; Optionally, also by other formula, N can be used _sparsecarry out calculating K value, as long as N _sparsebecome inverse change relation with K value, just can meet know-why of the present invention to a certain extent.

As shown in Figure 2, device 200 also comprises: compression module 202, is suitable for described last layer of linear layer to be decomposed into the first linear layer and the second linear layer, and the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

In the embodiment that some are possible, device 200 also comprises: first optimizes module 203, is suitable for being optimized process to described matrix P and described matrix Q.

In the embodiment that some are possible, first optimizes module 203 is suitable for: adopt stochastic gradient descent algorithm to be optimized process to described matrix P and described matrix Q, comprising:

Assignment module 203a, be suitable for the random starting values element of described matrix P and described matrix Q being given to Gaussian distributed, the average of the random starting values of the element of wherein said matrix P and described matrix Q is 0 and variance is 1;

W _predictioncomputing module 203b, is suitable for calculating W _prediction=P*Q, and

Loss function error calculating module 203c, is suitable for according to W _predictionand w _ij, counting loss function error Loss;

Gradient direction information computing module 203d, is suitable for, according to described loss function error Loss, calculating the Gradient direction information of described matrix P and described matrix Q;

Update module 203e, is suitable for the Gradient direction information according to described matrix P and described matrix Q, upgrades described matrix P and described matrix Q.

In the embodiment that some are possible, loss function error calculating module 203c is suitable for: by following equation counting loss function error Loss:

Loss=Loss1+ λ Loss2, λ are positive constant,

In the embodiment that some are possible, Gradient direction information computing module 203d is suitable for: the Gradient direction information being calculated described matrix P by following equation:

and | w _{ij predicts}| < w _block;

and w _{ij predicts}>=w _block;

and w _{ij predicts}≤-w _block;

and | w _{ij predicts}| < w _block;

and w _{ij predicts}>=w _block;

and w _{ij predicts}≤-w _block.

In the embodiment that some are possible, update module 203e is suitable for: be updated to respectively by described matrix P and described matrix Q:

p_{i k} = p_{i k} + l r * {&dtri;}_{p_{i k}} L o s s;

q_{k j} = p_{k j} + l r * {&dtri;}_{p_{k j}} L o s s;

Wherein lr is constant.

In the embodiment that some are possible, device 200 also comprises: iteration module 204, is suitable for triggering described W _predictioncomputing module calculates W _prediction=P*Q, and trigger described loss function error calculating module according to W _predictionand w _ijcounting loss function error Loss, trigger described Gradient direction information computing module according to described loss function error Loss calculate described matrix P and described matrix Q Gradient direction information, trigger described update module and upgrade described matrix P and described matrix Q according to the Gradient direction information of described matrix P and described matrix Q; When described loss function error Loss is less than the error convergence value preset, described iteration module stops iteration, obtains the weight matrix Q ' of the weight matrix P ' of described first linear layer after optimizing and described second linear layer after optimizing; And decomposing module 205, be suitable for described weight matrix W to be decomposed into described first linear layer and described second linear layer.Alternatively, error convergence value can as required, based on computing time, calculated amount, the accuracy requirement that may allow etc. because usually setting.

In the embodiment that some are possible, device 200 also comprises: second optimizes module 206, the inverse process being suitable for being trained by degree of depth Model of Neural Network carries out iteration optimization process to described weight matrix P ' and described weight matrix Q ', to obtain the weight matrix P after the second optimization " and second optimize after weight matrix Q ", respectively as the weight matrix of described first linear layer and described second linear layer.By inverse process, iteration optimization process is carried out to weight matrix P ' and described weight matrix Q ', further increases accuracy and the reliability of degree of deep learning model.

Like this, according to the matrix compression device for degree of deep learning model of embodiment of the present invention, calculating K value is carried out by the absolute value of the element according to weight matrix W, the number expanding intermediate node can be selected adaptively according to the unique characteristics of weight matrix W, i.e. K value, but not the number of artificial selection intermediate node, thus significantly can optimize the parameter of degree of deep learning model, improve reliability and the accuracy of degree of deep learning model; And pass through the element w of statistical weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse, carry out calculating K value, consider the openness of weight matrix W, effectively have compressed degree of deep learning model, decrease operand, improve operation efficiency, ensure accuracy and the reliability of degree of deep learning model simultaneously.

example devices

After the method and system describing exemplary embodiment of the invention, next, the matrix compression device for degree of deep learning model according to another illustrative embodiments of the present invention is introduced.

Person of ordinary skill in the field can understand, and various aspects of the present invention can be implemented as system, method or program product.Therefore, various aspects of the present invention can be implemented as following form, that is: hardware embodiment, completely Software Implementation (comprising firmware, microcode etc.) completely, or the embodiment that hardware and software aspect combines, " circuit ", " module " or " system " can be referred to as here.

In the embodiment that some are possible, can at least comprise at least one processing unit and at least one storage unit according to the matrix compression device for degree of deep learning model of the present invention.Wherein, described cell stores has program code, when described program code is performed by described processing unit, described processing unit is performed describe in this instructions above-mentioned " illustrative methods " part according to the various illustrative embodiments of the present invention for the step in the matrix compression method of degree of deep learning model.Such as, described processing unit can perform step S101 as shown in Figure 1: according to the absolute value of the element of described weight matrix W, calculating K value; And step S102: described last layer of linear layer is decomposed into the first linear layer and the second linear layer, and the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

Referring to Fig. 3, the matrix compression device 10 for degree of deep learning model according to this embodiment of the present invention is described.The matrix compression device 10 for degree of deep learning model of Fig. 3 display is only an example, should not bring any restriction to the function of the embodiment of the present invention and usable range.

As shown in Figure 3, the matrix compression device 10 for degree of deep learning model shows with the form of universal computing device.Assembly for the matrix compression device 10 of degree of deep learning model can include but not limited to: the bus 18 of at least one processing unit 16 above-mentioned, at least one storage unit 28 above-mentioned, connection different system assembly (comprising storage unit 28 and processing unit 16).

Bus 18 represent in a few class bus structure one or more, comprise memory bus or Memory Controller, peripheral bus, AGP, processor or use any bus-structured local bus in multiple bus structure.

Storage unit 28 can comprise the computer-readable recording medium of volatile memory form, such as random access memory (RAM) 30 and/or cache memory 32, can also further ROM (read-only memory) (ROM) 34.

Storage unit 28 can also comprise the program/utility 40 with one group of (at least one) program module 42, such program module 42 includes but not limited to: operating system, one or more application program, other program module and routine data, may comprise the realization of network environment in each or certain combination in these examples.

Matrix compression device 10 for degree of deep learning model also can communicate with one or more external unit 14 (such as keyboard, sensing equipment, bluetooth equipment etc.), also can make with one or more devices communicating that contact person can be mutual with the matrix compression device 10 for degree of deep learning model, and/or communicate with any equipment (such as router, modulator-demodular unit etc.) making this matrix compression device 10 for degree of deep learning model can carry out communicating with other computing equipment one or more.This communication can be passed through I/O (I/O) interface 22 and carry out.And, matrix compression device 10 for degree of deep learning model can also by network adapter 20 and one or more network (such as LAN (Local Area Network) (LAN), wide area network (WAN) and/or public network, such as the Internet) communication.As shown in the figure, network adapter 20 is by bus 18 and other module communication for the matrix compression device 10 of degree of deep learning model.Be understood that, although not shown, other hardware and/or software module can be used in conjunction with the matrix compression device 10 for degree of deep learning model, include but not limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc.

exemplary process product

In the embodiment that some are possible, various aspects of the present invention can also be embodied as a kind of form of program product, it comprises program code, when described program product runs in system, described program code be used for that described system is performed and describe in this instructions above-mentioned " illustrative methods " part according to the various illustrative embodiments of the present invention for the step in the matrix compression method of degree of deep learning model, such as, described system can perform step S101 as shown in Figure 1: according to the absolute value of the element of described weight matrix W, calculating K value; And step S102: described last layer of linear layer is decomposed into the first linear layer and the second linear layer, and the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

Described program product can adopt the combination in any of one or more computer-readable recording medium.Computer-readable recording medium can be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing such as may be-but not limited to-the system of electricity, magnetic, optical, electrical magnetic, infrared ray or semiconductor, device or device, or combination above arbitrarily.The example more specifically (non exhaustive list) of readable storage medium storing program for executing comprises: the combination with the electrical connection of one or more wire, portable disc, hard disk, random access memory (RAM), ROM (read-only memory) (ROM), erasable type programmable read only memory (EPROM or flash memory), optical fiber, Portable, compact dish ROM (read-only memory) (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate.

As shown in Figure 4, describe the program product 90 of the matrix compression for degree of deep learning model according to the embodiment of the present invention, it can adopt Portable, compact dish ROM (read-only memory) (CD-ROM) and comprise program code, and at terminal device, PC such as, can run.But program product of the present invention is not limited thereto, in this document, readable storage medium storing program for executing can be any comprising or stored program tangible medium, and this program can be used by instruction execution system, device or device or be combined with it.

The data-signal that readable signal medium can comprise in a base band or propagate as a carrier wave part, wherein carries readable program code.The data-signal of this propagation can adopt various ways, comprises the combination of---but being not limited to---electromagnetic signal, light signal or above-mentioned any appropriate.Readable signal medium can also be any computer-readable recording medium beyond readable storage medium storing program for executing, and this computer-readable recording medium can send, propagates or transmit the program for being used by instruction execution system, device or device or be combined with it.

The program code that computer-readable recording medium comprises can with any suitable medium transmission, comprises that---but being not limited to---is wireless, wired, optical cable, RF etc., or the combination of above-mentioned any appropriate.

The program code operated for performing the present invention can be write with the combination in any of one or more programming languages, described programming language comprises object oriented program language-such as Java, C++ etc., also comprises conventional process type programming language-such as " C " language or similar programming language.Program code can fully on contact person's computing equipment perform, partly on contact person's equipment perform, as one independently software package perform, part perform on a remote computing in contact person's computing equipment upper part or perform in remote computing device or server completely.In the situation relating to remote computing device, remote computing device can by the network of any kind---comprise LAN (Local Area Network) (LAN) or wide area network (WAN)-be connected to contact person's computing equipment, or, external computing device (such as utilizing ISP to pass through Internet connection) can be connected to.

Although it should be noted that the some devices or sub-device that to be referred in above-detailed for the matrix compression of degree of deep learning model, this division is only not enforceable.In fact, according to the embodiment of the present invention, the Characteristic and function of two or more devices above-described can be specialized in one apparatus.Otherwise, the Characteristic and function of an above-described device can Further Division for be specialized by multiple device.

In addition, although describe the operation of the inventive method in the accompanying drawings with particular order, this is not that requirement or hint must perform these operations according to this particular order, or must perform the result that all shown operation could realize expectation.Additionally or alternatively, some step can be omitted, multiple step be merged into a step and perform, and/or a step is decomposed into multiple step and perform.

Although describe spirit of the present invention and principle with reference to some embodiments, but should be appreciated that, the present invention is not limited to disclosed embodiment, can not combine to be benefited to the feature that the division of each side does not mean that in these aspects yet, this division is only the convenience in order to state.The present invention is intended to contain the interior included various amendment of spirit and scope and the equivalent arrangements of claims.

Claims

1., for a matrix compression method for degree of deep learning model, last layer of linear layer of wherein said degree of deep learning model connects M hidden node and N number of class node, the weight matrix of described last layer of linear layer

W = [\begin{matrix} w_{11} & w_{1 N} \\ ... \\ w_{M 1} & w_{M N} \end{matrix}],

Described method comprises:

Step S101: according to the absolute value of the element of described weight matrix W, calculating K value; And

Step S102: described last layer of linear layer is decomposed into the first linear layer and the second linear layer, the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],

The output of described first linear layer is the input of described second linear layer, and M*N>K* (M+N), so that described weight matrix W is compressed.

2. method according to claim 1, wherein said step S101 comprises:

Step S101a: the element w adding up described weight matrix W _ijabsolute value be less than default sparse guillotine factor w _blockelement number N _sparse;

Step S101b: according to N _sparse, calculating K value.

3. method according to claim 2, wherein said step 101a comprises:

By equation w _block=k ₁* σ ²+ μ calculates described sparse guillotine factor w _block, wherein k ₁for positive constant, σ ²for the variance of the element of described weight matrix, μ is the average of the element of described weight matrix.

4. method according to claim 3, wherein said step S101b comprises:

By equation K=k ₂* (M*N-N _sparse)/(M+N), calculating K value, wherein k ₂for positive constant.

5. method according to claim 4, also comprises:

Step S103: process is optimized to described matrix P and described matrix Q.

6. method according to claim 5, wherein said step S103 comprises: adopt stochastic gradient descent algorithm to be optimized process to described matrix P and described matrix Q, comprising:

Step S103b: calculate W _prediction=P*Q, and

Step S103c: according to W _predictionand w _ij, counting loss function error Loss;

Step S103d: according to described loss function error Loss, calculates the Gradient direction information of described matrix P and described matrix Q;

7. method according to claim 6, wherein step S103c comprises:

By following equation counting loss function error Loss:

Loss=Loss1+ λ Loss2, λ are positive constant,

8. method according to claim 7, wherein step S103d comprises:

The Gradient direction information of described matrix P is calculated by following equation:

9. method according to claim 8, wherein step S103e comprises:

Described matrix P and described matrix Q is updated to respectively:

\begin{matrix} p_{i k} = p_{i k} + l r * {&dtri;}_{p_{i k}} L o s s; \\ q_{k j} = p_{k j} + l r * {&dtri;}_{p_{k j}} L o s s; \end{matrix}

Wherein lr is constant.

10. method according to claim 6, also comprises:

Repeat step S103b to step S103e;

When described loss function error Loss is less than the error convergence value preset, stop iteration, obtain the weight matrix Q ' of the weight matrix P ' of described first linear layer after optimizing and described second linear layer after optimizing;

Described weight matrix W is decomposed into described first linear layer and described second linear layer.

11. methods according to claim 10, also comprise:

Step S104: the inverse process of being trained by degree of depth Model of Neural Network carries out iteration optimization process to described weight matrix P ' and described weight matrix Q ', to obtain the weight matrix P after the second optimization " and second optimize after weight matrix Q ", respectively as the weight matrix of described first linear layer and described second linear layer.

12. 1 kinds of matrix compression devices for degree of deep learning model, last layer of linear layer of wherein said degree of deep learning model connects M hidden node and N number of class node, the weight matrix of described last layer of linear layer

W = [\begin{matrix} w_{11} & w_{1 N} \\ ... \\ w_{M 1} & w_{M N} \end{matrix}],

Described device comprises:

K value computing module, is suitable for the absolute value of the element according to described weight matrix, calculating K value; And

Compression module, be suitable for described last layer of linear layer to be decomposed into the first linear layer and the second linear layer, the weight matrix of wherein said first linear layer is the matrix of M*K

P = [\begin{matrix} p_{11} & p_{1 K} \\ ... \\ p_{M 1} & p_{M K} \end{matrix}],

The weight matrix of described second linear layer is the matrix of K*N

Q = [\begin{matrix} q_{11} & q_{1 N} \\ ... \\ q_{K 1} & q_{K N} \end{matrix}],