CN112784969A - Convolutional neural network accelerated learning method based on sampling - Google Patents
Convolutional neural network accelerated learning method based on sampling Download PDFInfo
- Publication number
- CN112784969A CN112784969A CN202110136925.9A CN202110136925A CN112784969A CN 112784969 A CN112784969 A CN 112784969A CN 202110136925 A CN202110136925 A CN 202110136925A CN 112784969 A CN112784969 A CN 112784969A
- Authority
- CN
- China
- Prior art keywords
- matrix
- convolution kernel
- vector
- sampling
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000005070 sampling Methods 0.000 title claims abstract description 52
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 36
- 239000011159 matrix material Substances 0.000 claims abstract description 122
- 239000013598 vector Substances 0.000 claims abstract description 101
- 238000004364 calculation method Methods 0.000 claims abstract description 36
- 238000009826 distribution Methods 0.000 claims description 50
- 238000010586 diagram Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 17
- 238000000605 extraction Methods 0.000 abstract description 16
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 230000001133 acceleration Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000001537 neural effect Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 229940050561 matrix product Drugs 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a sampling-based convolutional neural network accelerated learning method, and belongs to the technical field of convolutional neural networks. In the method, only partial convolution kernel vectors are sampled and obtained in the forward propagation stage to be multiplied by input data, and the rest vectors are ignored and are not calculated. The back propagation stage also updates only the convolution kernel vectors that participate in the computation during forward propagation. Therefore, compared with the prior convolutional network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculation amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated because only the meaningful weight in the network is calculated and updated each time. The convolution neural network accelerated learning method based on sampling does not need to adjust the macrostructure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and more cost-saving compared with a convolution accelerating method based on hardware.
Description
Technical Field
The invention belongs to the technical field of convolutional neural networks, and particularly relates to a sampling-based convolutional neural network accelerated learning method.
Background
Convolutional Neural Networks (CNNs) are one of the first successful depth models, and have been in the frontier of deep learning commercial applications, and have been attracting attention in the fields of image detection and segmentation, object recognition, speech processing, and the like.
The convolution operation is a process of sliding different convolution kernels over an input picture and performing a certain operation. Specifically, at each sliding position, the elements of the convolution kernel are multiplied by the elements of the input picture in a one-to-one correspondence, and then summed. After which it is de-linearized by an activation function, most commonly a linear rectification, i.e. a ReLU function. Such a calculation principle enables the convolution network to have the capability of extracting local features. The convolutional neural network has the main structure that a plurality of convolutional layers are stacked to serve as a feature extractor, and finally, a fully-connected layer is connected to serve as a classifier. In order to make the convolutional network have better feature extraction capability, a plurality of convolutional layers are stacked, so that the parameter scale of the convolutional neural network is greatly improved along with the increase of the network depth. The computation of the network for forward feature extraction and backward error propagation is generally in the order of millions to hundreds of millions, and the convolution operation of the convolutional layer is the most important one of the computation resources. Therefore, accelerating the convolution operation is the key to improve the calculation efficiency of the convolution neural network model.
The neural network training frameworks that are now commonly used, such as the Caffe and TensorFlow frameworks, spread the input data and convolution kernels into two-dimensional arrays, thereby converting the convolution operation into a matrix multiplication operation. In the learning of the convolution network, each convolution layer needs to perform a cubic matrix product in total, one time is needed in forward propagation, and one time is needed in each of the output gradient matrix and the gradient matrix of the convolution kernel obtained by calculation in reverse propagation, which consumes a large amount of calculation resources. In fact, the calculation of not all the neural units in the convolution network is meaningful, the neural units with larger characteristic values have larger influence on the subsequent network layer, and the process of using the commonly used activation function ReLU to carry out linearization can directly set the negative characteristic value to 0. Therefore, a new convolutional network learning method is necessary to be designed, the values of the more meaningful neural units in the original output are only calculated without calculating complete matrix multiplication, and the calculation processes of the rest neural units are omitted without calculation, so that the calculation cost can be reduced, and the convolutional network training is accelerated. Moreover, the learning method is improved in method implementation and is more practical compared with a convolution acceleration method based on hardware equipment.
Disclosure of Invention
In an actual feature extraction scene, the convolutional network parameter scale is large, the calculation cost is high, and redundancy exists in the calculation process, so that the speed of training a convolutional network model to extract features is slow, and an acceleration method based on hardware equipment is difficult to be practically applied. The invention provides a convolutional neural network accelerated learning method based on sampling aiming at the problems.
The technical scheme of the invention is as follows:
a convolutional neural network accelerated learning method based on sampling comprises the following steps:
step 1: in the forward propagation stage, an output characteristic diagram is obtained by utilizing probability sampling calculation;
step 2: in the backward propagation stage, only the corresponding gradient values of the neurons participating in calculation in forward propagation are reserved, the rest gradient values are ignored to be set to be 0, the gradient matrix is deleted, and the deleted gradient matrix is used for calculating and updating the convolution kernel parameters;
and step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.
Further, according to the convolutional neural network accelerated learning method based on sampling, in step 1, the method for obtaining the output feature map by utilizing probability sampling calculation in the forward propagation stage comprises the following steps: firstly, unfolding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by utilizing probability sampling for each input vector in the expanded two-dimensional matrix; the input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.
Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1 specifically comprises the following steps:
step 1.1: expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum st of each column of elements in the matrix W;
step 1.2: according to the sum of absolute values s of each column element of the matrix WtFor each vector X in XiConstructing a conditional probability distribution P (j | x)i);
Step 1.3: according to the probability distribution P (j | x)i) Sampling for tau times to obtain a convolution kernel vector number, and sampling for tau times to obtain a convolution kernel vector candidate set Vpre;
Step 1.4: according to preset conditions to VpreScreening the elements in the vector set, and forming a final candidate convolution kernel vector number set V by the screened elements;
step 1.5: vector xiAnd performing inner product only with the vectors in the set V, filling the result into the position corresponding to the output characteristic diagram Y, and setting the rest positions to be 0.
Further, according to the sample-based convolutional neural network accelerated learning method, wherein the conditional probability distribution P (j | x)i) Is to a vector X in XiExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector wjIs proportional to the input vector xiAnd wjInner product x ofiwj TThe formula is as follows:
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. N is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, i.e. the number of convolution kernel vectors in the matrix.
Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1.2 specifically includes:
according to the sum of absolute values s of each column element of the matrix WtFor each of the convolution kernel matrices WThe columns construct a polynomial distribution P (j | t) and for each vector X in XiConstructing a polynomial distribution P (t | x)i) (ii) a According to Construction of conditional probability distribution P (j | x)i) (ii) a Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of the matrix W; d ═ dx=dw,dxIs the dimension of the vector x; dwThe dimension of the convolution kernel vector w.
Further, according to the sample-based convolutional neural network accelerated learning method, the polynomial distribution P (j | t) and P (t | x)i) Expressed by equation (3) and equation (4), respectively:
P(j|t)~PN([|W1t|,...,|Wnt|]),j∈[1,n],t∈[1,d] (3)
P(t|xi)~PN([|xi1s1|,...,|xidsd|]),t∈[1,d] (4)
wherein PN represents a polynomial distribution. In formula (3), each distribution P (j | t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the convolution kernel matrix W is selected; since the matrix W has d columns, d distributions are constructed. In the formula (4), P (t | x) is distributedi) Representing the input vector xiIn terms of the probability of selecting a different column number t of the matrix W.
Further, according to the sample-based convolutional neural network accelerated learning method, the method for obtaining a convolutional kernel vector number in each sample in step 1.3 is as follows: according to P (t | x)i) Sampling to obtain a column number t of a convolution kernel matrix Wchosen(ii) a Finding the t-th in the conditional probability distribution P (j | t)chosenA probability distribution P (j | t ═ t)chosen) Sampling from the distribution to obtain a number jchosen,jchosenI.e. the convolution kernel vector obtained in the samplingNumber (n).
Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1.4 specifically includes: firstly, calculating a result weight of a convolution kernel vector number j obtained by sampling each time and recording the result weight as omegajThen set VpreAccording to their weight ωjSorting, and reserving the theta vectors with the maximum weight as a candidate convolution kernel vector number set V; wherein, ω isj=ωj+sgn(xitWjt) (ii) a sgn () represents a sign function when xitWjtWhen > 0, sgn (x)itWjt)=
1; when x isitWjtWhen 0, sgn (x)itWjt) 0; when x isitWjtWhen < 0, sgn (x)itWjt)=-1。
Further, according to the sampling-based convolutional neural network accelerated learning method, θ represents the number of the candidate convolutional kernel vectors which are finally reserved, and θ is less than or equal to n, wherein n represents the number of matrix rows of the two-dimensional matrix W, i.e. the number of convolutional kernel vectors in the matrix.
The convolutional neural network accelerated learning method based on sampling, wherein the step 2 comprises the following steps:
step 2.1: for input gradient matrixNeglecting the unneeded gradient value to set 0, and inputting the gradient matrixPerforming deletion to obtain a deleted input gradient matrix
Step 2.4: from a convolution kernel gradient matrixUpdating the matrix W, outputting the gradient matrixThe input gradient matrix continues to propagate backward as the previous layer in the network.
Compared with the prior art, the convolutional neural network accelerated learning method based on sampling provided by the invention has the following beneficial effects: the method utilizes the probability sampling principle to reduce the calculated amount under the condition of not influencing the network feature extraction capability, thereby accelerating the speed of constructing the convolutional network, improving the calculation efficiency of using the convolutional neural network model to extract the features and meeting the requirement of rapid feature extraction in practical application. Specifically, in the forward propagation stage, only part of convolution kernel vectors are sampled and obtained to be multiplied by input data, and the rest vectors are ignored and are not calculated. The back propagation stage also updates only the convolution kernel vectors that participate in the computation during forward propagation. Therefore, compared with the prior convolutional network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculation amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated because only the meaningful weight in the network is calculated and updated each time. The convolution neural network accelerated learning method based on sampling does not need to adjust the macrostructure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and more cost-saving compared with a convolution accelerating method based on hardware.
Drawings
FIG. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process provided by the present invention;
FIG. 2 is a schematic flow chart of a sample-based convolutional neural network accelerated learning method provided in the present invention;
FIG. 3 is a schematic flow chart of forward propagation provided by the present invention;
FIG. 4 is a block diagram of the constructed probability distribution P (j | x) provided by the present inventioni) A schematic flow diagram of (a);
FIG. 5 is a schematic diagram of a sampling detailed flow in step 1.3 provided by the present invention;
FIG. 6 is a graph according to set VpreScreening to obtain a flow schematic diagram of a final convolution kernel vector set V;
FIG. 7 is a flowchart illustrating a back propagation and update process according to the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the embodiments and the accompanying drawings, and it is obvious that the described embodiments are one preferred embodiment of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process according to the present invention. When the convolution network is used for feature extraction, a plurality of convolution layers are stacked to improve the feature extraction capability of the network. Taking image feature extraction as an example, the input of each convolution layer is an original picture or a feature map obtained through a previous layer, and the output of each convolution layer is an output feature map obtained through convolution operation. The invention aims to reduce the calculation amount of convolution operation without influencing the characteristic extraction effect of the convolution network and improve the calculation efficiency of characteristic extraction by using a convolution neural network model, so the input and output dimensionality is the same as the definition in the conventional convolution operation. The definitions of the symbolic variables labeled in fig. 1 are shown in table 1.
TABLE 1 meaning table of symbolic variables referred to in FIG. 1
An embodiment of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 2, a schematic flow chart of the sample-based convolutional neural network accelerated learning method provided by the present invention includes steps 1, 2, and 3:
step 1: and in the forward propagation stage, the probability sampling calculation is utilized to obtain an output characteristic diagram.
Firstly, the input characteristic diagram and the convolution kernel are unfolded into a two-dimensional matrix. And (4) performing probability sampling on each input vector in the input two-dimensional matrix obtained after expansion to obtain a corresponding candidate convolution kernel vector number set V. The input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.
The specific workflow of the forward propagation stage, as shown in fig. 3, includes step 1.1, step 1.2, step 1.3, step 1.4, and step 1.5:
step 1.1: and expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, and calculating the absolute value sum of each column of elements in the matrix W.
Specifically, as shown in fig. 1, in the process of expanding the input data, according to the definition of the conventional convolution operation, in the process of sliding the convolution kernel on the original image, the elements of the convolution kernel at each sliding position are multiplied by the elements of the input feature pixels in a one-to-one correspondence, and then summed. The convolution operation can be converted to a matrix product. As shown in fig. 1, the convolution kernel is a four-dimensional tensor of kn × kh × kw × kc, which is expanded into a two-dimensional matrix denoted by W. Dimension of W is n x dwEach row of the matrix represents a convolution kernel vector w, the corresponding dimension d of whichwKh × kw × kc; n represents the number of matrix rows, i.e. the number of convolution kernel vectors in the matrix, so n equals kn. Let the input eigen map be a four-dimensional tensor of in X ih X iw X ic, expand the input eigen map into a two-dimensional matrix X according to the dimension of convolution kernel, where X is N X dxIs used for the two-dimensional matrix of (1). Each vector X in the X matrix is an area covered by the convolution kernel each time the convolution kernel slides on the original image, and thus the dimension d of the vector XxAnd a rollDimension d of the product-kernel vector wwAre equal. Let dx=dwD. N denotes how many times the convolution kernel has slid on the input feature map, and N ═ in × oh × ow can be obtained by the definition of the convolution operation. The convolution operation can thus be converted into the product of the expanded two-dimensional matrices X and W.
The sum of absolute values s of the elements of each column of the matrix WtThe calculation is as follows.
Wherein t is the column number of the matrix W; wjtI.e. the element in the jth row and the tth column of the convolution kernel matrix W. Calculating this value provides for the subsequent steps of constructing the probability distribution.
Step 1.2: according to the sum of absolute values s of each column element of the matrix WtFor each vector X in XiConstructing a conditional probability distribution P (j | x)i)。
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. The meaning of the probability distribution is to a vector X in XiExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector wjIs proportional to the input vector xiAnd wjInner product x ofiwj TThe absolute value of (a).
Specifically, since the conditional probability distribution P (j | x) is directly constructedi)∝xiwj TIs difficult, therefore according to Step 1.2 by constructing two polynomial distributions P (j | t) (step 1.2.1) and P (t | x)i) (step 1.2.2) to obtain P (j | x)i)。
Construction of conditional probability distribution P (j | x)i) As shown in fig. 4, the specific workflow of (1.2.1) and (1.2.2):
step 1.2.1: a polynomial distribution P (j | t) is constructed for each column of the convolution kernel matrix W.
P(j|t)~PN([|W1t|,...,|Wnt|]),j∈[1,n],t∈[1,d] (3)
Each distribution P (j | t) stores the probability that a different row number j of the matrix W is selected on the premise that the t-th column of the matrix W is selected. Since the matrix W has d columns, d distributions are constructed. PN represents a polynomial distribution. In this distribution, taking the value of j as 5 as an example, the specific probability value is calculated as
Step 1.2.2: for each vector x in xiConstructing a polynomial distribution P (t | x)i)。
P(t|xi)~PN([|xi1s1|,...,|xidsd|]),t∈[1,d] (4)
Wherein s in each termtThe sum of absolute values of the column elements of the convolution kernel matrix calculated in step 1.1. Specifically, for example, t takes a value of 3(3 ∈ [1, d ]]) Has a probability of
Step 1.3: according to the probability distribution P (j | x)i) Sampling for tau times to obtain a convolution kernel vector number each time, and obtaining a convolution kernel vector candidate set V after sampling for tau timespre. Tau represents the sampling times, and the specific value is customized by a technician according to the experimental effect.
According to the probability distribution P (j | x)i) The specific workflow of sampling is shown in fig. 5. Comprises the steps of 1.3.1, 1.3.2 and 1.3.3:
step 1.3.1: according to P (t | x)i) Sampling to obtain a column number t of a convolution kernel matrix Wchosen。
Step 1.3.2: find the tth probability distribution in the set of probability distributions constructed in step 1.2.1chosenA probability distribution P (j | t ═ t)chosen) Sampling from the distribution to obtain a number jchosen,jchosenI.e. the number of the convolution kernel vector obtained in the sampling.
Specifically, for one sample, assume that P (t | x) is first consideredi) If t is extracted as 3 (step 1.3.1), a probability distribution P is found (j | t as 3), and j as 5 is extracted from the probability distribution (step 1.3.2). The finally extracted convolution kernel vector is numbered 5. The significance of this result is that the input vector x is compared to the other non-decimated convolution kernel vectorsiWith the extracted convolution kernel vector w5Doing the inner product may result in a larger eigenvalue.
Step 1.3.3: repeating 1.3.1 and 1.3.2 times, and obtaining a convolution kernel vector candidate set V after extracting for tau timespre。
Step 1.4: vpreAnd screening to obtain a final candidate convolution kernel vector number set V.
According to the set VpreThe specific workflow of the final convolution kernel vector set V is obtained by screening, and as shown in fig. 6, the specific workflow includes step 1.4.1 and step 1.4.2:
step 1.4.1: calculating a result weight of the convolution kernel serial number j obtained by sampling each time and recording the result weight as omegaj,ωj=ωj+sgn(xitWjt)。
Wherein sgn () represents a sign function when xitWjtWhen > 0, sgn (x)itWjt) 1 is ═ 1; when x isitWjtWhen 0, sgn (x)itWjt) 0; when x isitWjtWhen < 0, sgn (x)itWjt)=-1。
In particular, since the objective is to construct a probability distribution P (j | x)i)∝xiwj TThe inner product xiwj TIs divided into positive and negative values, and two probability distributions P (t | x) are constructedi) The sum P (j | t) is proportional to the inner product absolute value size. Weights of sampling results to make inner product negativeThe weight is reduced, and the result weight omega is constructedj。
Step 1.4.2: will be set VpreAccording to their weight ωjAnd sorting, and keeping the theta vectors with the largest weight as a final set V.
Specifically, θ represents the number of candidate convolution kernel vector numbers that are finally retained, and θ is less than or equal to n. When all convolution kernel vectors are selected, θ is n, and the amount of calculation at this time is the same as that of the conventional convolution method. The specific value of theta is customized by a technician according to the experimental effect.
Step 1.5: vector xiOnly the inner product is made with these extracted vectors in the set V, and the result is filled into the position corresponding to the output feature map Y, and the rest positions are set to 0.
The specific flow of step 1 has been described with reference to specific embodiments, wherein the construction and sampling of the polynomial distribution in step 1.2 can be further accelerated by using existing methods, such as constructing P (t | x) for each input vector using Alias sampling method (Alias Sample)i) Is O (d) and the sampling time complexity is O (1). The existing convolution method requires the calculation of xiWith the product of n convolution kernel vectors, the invention only needs to calculate xiThe forward propagation time complexity of the improved convolution operation is O (N.theta.d); the forward propagation of the prior convolution method needs to calculate complete matrix multiplication with time complexity of O (N.n.d). The invention has the beneficial effect of acceleration because theta is less than or equal to n. In addition, the threshold theta is customized by technicians according to experimental effects, so that the balance between the speed and the accuracy can be adjusted according to actual requirements.
Step 2: the back propagation stage only retains the corresponding gradient values of the neurons participating in the calculation during the forward propagation, and the rest gradient values are ignored to be set to 0. And calculating and updating the convolution kernel parameters by using the pruned gradient matrix.
Specifically, in the existing definition, the forward propagation stage of network training is a process of obtaining final output by giving input data and calculating backward layer by the network; in the back propagation stage, the error between the output and the target value is calculated, and the error is propagated from the last layer of the network layer by layer forward. For the convolutional layer, a convolutional kernel gradient matrix and an output gradient matrix are obtained through calculation according to the input gradient matrix in the back propagation stage, wherein the convolutional kernel gradient matrix is used for updating the convolutional kernel tensor of the current layer, and the output gradient matrix is transmitted to the previous layer and serves as the input gradient matrix of the previous layer. The invention does not change the definition of the prior convolution network to the back propagation process, and only reduces the calculation amount of the convolution kernel gradient matrix and the output gradient matrix in the process.
As shown in fig. 7, the specific workflow of the back propagation and update process in the present invention includes step 2.1, step 2.2, step 2.3, and step 2.4:
According to the definition of back-propagation,the same dimension as the output signature Y. According to the procedure of step 1, for each vector X in the input two-dimensional matrix XiThe convolution kernel vector obtained by sampling is only multiplied, and the result is filled in the position corresponding to the output characteristic diagram Y, so that only the positions actually participate in the calculation of forward propagation. Thus only retainingThe gradient values corresponding to these positions are set to 0, and the rest of the gradient values are set to 0, so as to obtain the gradient values
In particular, the amount of the solvent to be used,only the gradient values of the positions involved in the calculation in the forward propagation are retained, such thatMultiplying X by X so that X is added to each vector in XiOnly the sampled convolution kernel vectors at the corresponding forward propagation stage calculate the gradient values, and the gradient values of the rest convolution kernel vectors are 0.
In particular, using a truncated input gradient matrixMultiplying the convolution kernel matrix W to obtain an output gradient matrix
Step 2.4: from a convolution kernel gradient matrixUpdating the matrix W, outputting the gradient matrixThe input gradient matrix continues to propagate backward as the previous layer in the network.
The update operation is the same as that in the existing convolutional network, and different update strategies are supported, such as using a random gradient descent ruleη is the learning rate set by the technician according to the existing learning rate setting strategy.
And step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached. The stopping condition is the same as that in the existing convolutional network learning process. Thus, the learning process of the convolution network is completed.
According to the technical scheme, the sampling-based convolutional neural network accelerated learning method provided by the embodiment of the application only selects more meaningful weight calculation in the network through a probability sampling principle and updates the weight calculation, and the calculation amount can be reduced under the condition of not influencing the network feature extraction capability, so that the speed of constructing the convolutional network is increased, the calculation efficiency of extracting the features by using a convolutional neural network model is improved, and the requirement of rapid feature extraction in practical application is met. And meanwhile, compared with a hardware-based convolution acceleration method, the method is easier to apply and more cost-saving.
The technical solutions of the present invention have been described with reference to the accompanying drawings and the embodiments, and the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.
Claims (10)
1. A convolutional neural network accelerated learning method based on sampling is characterized by comprising the following steps:
step 1: in the forward propagation stage, an output characteristic diagram is obtained by utilizing probability sampling calculation;
step 2: in the backward propagation stage, only the corresponding gradient values of the neurons participating in calculation in forward propagation are reserved, the rest gradient values are ignored to be set to be 0, the gradient matrix is deleted, and the deleted gradient matrix is used for calculating and updating the convolution kernel parameters;
and step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.
2. The sample-based convolutional neural network accelerated learning method of claim 1, wherein the forward propagation stage uses probabilistic sampling to obtain the output feature map in step 1 by: firstly, unfolding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by utilizing probability sampling for each input vector in the expanded two-dimensional matrix; the input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.
3. The sample-based convolutional neural network accelerated learning method of claim 2, wherein said step 1 specifically comprises the steps of:
step 1.1: expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum s of each column of elements in the matrix Wt;
Step 1.2: according to the sum of absolute values s of each column element of the matrix WtFor each vector X in XiConstructing a conditional probability distribution P (j | x)i);
Step 1.3: according to the probability distribution P (j | x)i) Sampling for tau times to obtain a convolution kernel vector number, and sampling for tau times to obtain a convolution kernel vector candidate set Vpre;
Step 1.4: according to preset conditions to VpreScreening the elements in the vector set, and forming a final candidate convolution kernel vector number set V by the screened elements;
step 1.5: vector xiAnd performing inner product only with the vectors in the set V, filling the result into the position corresponding to the output characteristic diagram Y, and setting the rest positions to be 0.
4. The sample-based convolutional neural network accelerated learning method of claim 3, wherein the conditional probability distribution P (jjxX)i) Is to a vector X in XiExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector wjIs proportional to the input vector xiAnd wjInner product x ofiwj TThe formula is as follows:
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; n is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, i.e. the number of convolution kernel vectors in the matrix.
5. The sample-based convolutional neural network accelerated learning method of claim 3, wherein said step 1.2 comprises:
according to the sum of absolute values s of each column element of the matrix WtOne polynomial distribution P (j | t) is constructed for each column of the convolution kernel matrix W, and for each vector X in XiConstructing a polynomial distribution P (t | x)i) (ii) a According to Construction of conditional probability distribution P (j | x)i) (ii) a Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of the matrix W; d ═ dx=dw,dxIs the dimension of the vector x; dwThe dimension of the convolution kernel vector w.
6. The sample-based convolutional neural network accelerated learning method of claim 5, wherein the polynomial distribution P (jt) and P (txt)i) Expressed by equation (3) and equation (4), respectively:
P(j|t)~PN([|W1t|,...,|Wnt|]),j∈[1,n],t∈[1,d] (3)
P(t|xi)~PN([|xi1s1|,...,|xidsd|]),t∈[1,d] (4)
wherein PN represents a polynomial distribution; in equation (3), each distribution P (j | t) stores the row number j of the selected matrix W which is different on the premise that the t-th column of the matrix W is selectedThe probability of (d); d distributions are constructed due to the fact that the matrix W is provided with d columns; in the formula (4), P (t | x) is distributedi) Representing the input vector xiIn terms of the probability of selecting a different column number t of the matrix W.
7. The sample-based convolutional neural network accelerated learning method of claim 3, wherein the method for obtaining a convolutional kernel vector number in each sample in step 1.3 is as follows: according to P (t | x)i) Sampling to obtain a column number t of a convolution kernel matrix Wchosen(ii) a Finding the t-th in the conditional probability distribution P (j | t)chosenProbability distribution (j | t ═ t)chosen) Sampling from the distribution to obtain a number jchosen,jchosenI.e. the number of the convolution kernel vector obtained in the sampling.
8. The sample-based convolutional neural network accelerated learning method of claim 3, wherein said step 1.4 specifically comprises: firstly, calculating a result weight of a convolution kernel vector number j obtained by sampling each time and recording the result weight as omegajThen set VpreAccording to their weight ωjSorting, and reserving the theta vectors with the maximum weight as a candidate convolution kernel vector number set V; wherein, ω isj=ωj+sgn(xitWjt) (ii) a sgn () represents a sign function when xitWjtWhen > 0, sgn (x)itWjt) 1 is ═ 1; when x isitWjtWhen 0, sgn (x)itWjt) 0; when x isitWjtWhen < 0, sgn (x)itWjt)=-1。
9. The method of claim 8, wherein θ represents the number of candidate convolutional kernel vector numbers to be finally retained, and θ ≦ n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e., the number of convolutional kernel vectors in the matrix.
10. The sample-based convolutional neural network accelerated learning method of claim 1, wherein said step 2 comprises the steps of:
step 2.1: for input gradient matrixNeglecting the unneeded gradient value to set 0, and inputting the gradient matrixPerforming deletion to obtain a deleted input gradient matrix
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110136925.9A CN112784969B (en) | 2021-02-01 | Convolutional neural network acceleration learning method for image feature extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110136925.9A CN112784969B (en) | 2021-02-01 | Convolutional neural network acceleration learning method for image feature extraction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112784969A true CN112784969A (en) | 2021-05-11 |
CN112784969B CN112784969B (en) | 2024-05-14 |
Family
ID=
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871136A (en) * | 2017-03-22 | 2018-04-03 | 中山大学 | The image-recognizing method of convolutional neural networks based on openness random pool |
US20180315399A1 (en) * | 2017-04-28 | 2018-11-01 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
US10140421B1 (en) * | 2017-05-25 | 2018-11-27 | Enlitic, Inc. | Medical scan annotator system |
CN109612708A (en) * | 2018-12-28 | 2019-04-12 | 东北大学 | Based on the power transformer on-line detecting system and method for improving convolutional neural networks |
US20190114544A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Semi-Supervised Learning for Training an Ensemble of Deep Convolutional Neural Networks |
CN109948029A (en) * | 2019-01-25 | 2019-06-28 | 南京邮电大学 | Based on the adaptive depth hashing image searching method of neural network |
CN111428188A (en) * | 2020-03-30 | 2020-07-17 | 南京大学 | Convolution operation method and device |
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871136A (en) * | 2017-03-22 | 2018-04-03 | 中山大学 | The image-recognizing method of convolutional neural networks based on openness random pool |
US20180315399A1 (en) * | 2017-04-28 | 2018-11-01 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
US10140421B1 (en) * | 2017-05-25 | 2018-11-27 | Enlitic, Inc. | Medical scan annotator system |
US20190114544A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Semi-Supervised Learning for Training an Ensemble of Deep Convolutional Neural Networks |
CN109612708A (en) * | 2018-12-28 | 2019-04-12 | 东北大学 | Based on the power transformer on-line detecting system and method for improving convolutional neural networks |
CN109948029A (en) * | 2019-01-25 | 2019-06-28 | 南京邮电大学 | Based on the adaptive depth hashing image searching method of neural network |
CN111428188A (en) * | 2020-03-30 | 2020-07-17 | 南京大学 | Convolution operation method and device |
Non-Patent Citations (2)
Title |
---|
HUAN WANG等: "Structured Probabilistic Pruning for Convolution Neural Network Acceleration", 《ARXIV》, 10 September 2018 (2018-09-10), pages 1 - 13 * |
韩韬: "资源受限下的卷积神经网络模型优化方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 February 2019 (2019-02-15), pages 140 - 91 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188685B (en) | Target counting method and system based on double-attention multi-scale cascade network | |
CN110020682B (en) | Attention mechanism relation comparison network model method based on small sample learning | |
CN110288030B (en) | Image identification method, device and equipment based on lightweight network model | |
CN106991440B (en) | Image classification method of convolutional neural network based on spatial pyramid | |
Chen et al. | A new knowledge distillation for incremental object detection | |
CN108510012A (en) | A kind of target rapid detection method based on Analysis On Multi-scale Features figure | |
CN111209861A (en) | Dynamic gesture action recognition method based on deep learning | |
CN109740679B (en) | Target identification method based on convolutional neural network and naive Bayes | |
CN110334584B (en) | Gesture recognition method based on regional full convolution network | |
CN111062410B (en) | Star information bridge weather prediction method based on deep learning | |
CN112070768A (en) | Anchor-Free based real-time instance segmentation method | |
CN111368935A (en) | SAR time-sensitive target sample augmentation method based on generation countermeasure network | |
CN113449612A (en) | Three-dimensional target point cloud identification method based on sub-flow sparse convolution | |
CN115797808A (en) | Unmanned aerial vehicle inspection defect image identification method, system, device and medium | |
CN114282646B (en) | Optical power prediction method and system based on two-stage feature extraction and BiLSTM improvement | |
CN109145738B (en) | Dynamic video segmentation method based on weighted non-convex regularization and iterative re-constrained low-rank representation | |
CN114005046A (en) | Remote sensing scene classification method based on Gabor filter and covariance pooling | |
CN114299578A (en) | Dynamic human face generation method based on facial emotion analysis | |
CN112862094A (en) | DRBM (distributed resource management protocol) fast adaptation method based on meta-learning | |
CN112784969A (en) | Convolutional neural network accelerated learning method based on sampling | |
CN116433980A (en) | Image classification method, device, equipment and medium of impulse neural network structure | |
CN111985488A (en) | Target detection segmentation method and system based on offline Gaussian model | |
CN112784969B (en) | Convolutional neural network acceleration learning method for image feature extraction | |
CN116363423A (en) | Knowledge distillation method, device and storage medium for small sample learning | |
Du et al. | AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |