CN112784969A - Convolutional neural network accelerated learning method based on sampling - Google Patents

Convolutional neural network accelerated learning method based on sampling Download PDF

Info

Publication number
CN112784969A
CN112784969A CN202110136925.9A CN202110136925A CN112784969A CN 112784969 A CN112784969 A CN 112784969A CN 202110136925 A CN202110136925 A CN 202110136925A CN 112784969 A CN112784969 A CN 112784969A
Authority
CN
China
Prior art keywords
matrix
convolution kernel
vector
sampling
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110136925.9A
Other languages
Chinese (zh)
Other versions
CN112784969B (en
Inventor
杨晓春
张宇杰
许婧楠
王斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202110136925.9A priority Critical patent/CN112784969B/en
Priority claimed from CN202110136925.9A external-priority patent/CN112784969B/en
Publication of CN112784969A publication Critical patent/CN112784969A/en
Application granted granted Critical
Publication of CN112784969B publication Critical patent/CN112784969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a sampling-based convolutional neural network accelerated learning method, and belongs to the technical field of convolutional neural networks. In the method, only partial convolution kernel vectors are sampled and obtained in the forward propagation stage to be multiplied by input data, and the rest vectors are ignored and are not calculated. The back propagation stage also updates only the convolution kernel vectors that participate in the computation during forward propagation. Therefore, compared with the prior convolutional network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculation amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated because only the meaningful weight in the network is calculated and updated each time. The convolution neural network accelerated learning method based on sampling does not need to adjust the macrostructure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and more cost-saving compared with a convolution accelerating method based on hardware.

Description

Convolutional neural network accelerated learning method based on sampling
Technical Field
The invention belongs to the technical field of convolutional neural networks, and particularly relates to a sampling-based convolutional neural network accelerated learning method.
Background
Convolutional Neural Networks (CNNs) are one of the first successful depth models, and have been in the frontier of deep learning commercial applications, and have been attracting attention in the fields of image detection and segmentation, object recognition, speech processing, and the like.
The convolution operation is a process of sliding different convolution kernels over an input picture and performing a certain operation. Specifically, at each sliding position, the elements of the convolution kernel are multiplied by the elements of the input picture in a one-to-one correspondence, and then summed. After which it is de-linearized by an activation function, most commonly a linear rectification, i.e. a ReLU function. Such a calculation principle enables the convolution network to have the capability of extracting local features. The convolutional neural network has the main structure that a plurality of convolutional layers are stacked to serve as a feature extractor, and finally, a fully-connected layer is connected to serve as a classifier. In order to make the convolutional network have better feature extraction capability, a plurality of convolutional layers are stacked, so that the parameter scale of the convolutional neural network is greatly improved along with the increase of the network depth. The computation of the network for forward feature extraction and backward error propagation is generally in the order of millions to hundreds of millions, and the convolution operation of the convolutional layer is the most important one of the computation resources. Therefore, accelerating the convolution operation is the key to improve the calculation efficiency of the convolution neural network model.
The neural network training frameworks that are now commonly used, such as the Caffe and TensorFlow frameworks, spread the input data and convolution kernels into two-dimensional arrays, thereby converting the convolution operation into a matrix multiplication operation. In the learning of the convolution network, each convolution layer needs to perform a cubic matrix product in total, one time is needed in forward propagation, and one time is needed in each of the output gradient matrix and the gradient matrix of the convolution kernel obtained by calculation in reverse propagation, which consumes a large amount of calculation resources. In fact, the calculation of not all the neural units in the convolution network is meaningful, the neural units with larger characteristic values have larger influence on the subsequent network layer, and the process of using the commonly used activation function ReLU to carry out linearization can directly set the negative characteristic value to 0. Therefore, a new convolutional network learning method is necessary to be designed, the values of the more meaningful neural units in the original output are only calculated without calculating complete matrix multiplication, and the calculation processes of the rest neural units are omitted without calculation, so that the calculation cost can be reduced, and the convolutional network training is accelerated. Moreover, the learning method is improved in method implementation and is more practical compared with a convolution acceleration method based on hardware equipment.
Disclosure of Invention
In an actual feature extraction scene, the convolutional network parameter scale is large, the calculation cost is high, and redundancy exists in the calculation process, so that the speed of training a convolutional network model to extract features is slow, and an acceleration method based on hardware equipment is difficult to be practically applied. The invention provides a convolutional neural network accelerated learning method based on sampling aiming at the problems.
The technical scheme of the invention is as follows:
a convolutional neural network accelerated learning method based on sampling comprises the following steps:
step 1: in the forward propagation stage, an output characteristic diagram is obtained by utilizing probability sampling calculation;
step 2: in the backward propagation stage, only the corresponding gradient values of the neurons participating in calculation in forward propagation are reserved, the rest gradient values are ignored to be set to be 0, the gradient matrix is deleted, and the deleted gradient matrix is used for calculating and updating the convolution kernel parameters;
and step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.
Further, according to the convolutional neural network accelerated learning method based on sampling, in step 1, the method for obtaining the output feature map by utilizing probability sampling calculation in the forward propagation stage comprises the following steps: firstly, unfolding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by utilizing probability sampling for each input vector in the expanded two-dimensional matrix; the input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.
Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1 specifically comprises the following steps:
step 1.1: expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum st of each column of elements in the matrix W;
step 1.2: according to the sum of absolute values s of each column element of the matrix WtFor each vector X in XiConstructing a conditional probability distribution P (j | x)i);
Step 1.3: according to the probability distribution P (j | x)i) Sampling for tau times to obtain a convolution kernel vector number, and sampling for tau times to obtain a convolution kernel vector candidate set Vpre
Step 1.4: according to preset conditions to VpreScreening the elements in the vector set, and forming a final candidate convolution kernel vector number set V by the screened elements;
step 1.5: vector xiAnd performing inner product only with the vectors in the set V, filling the result into the position corresponding to the output characteristic diagram Y, and setting the rest positions to be 0.
Further, according to the sample-based convolutional neural network accelerated learning method, wherein the conditional probability distribution P (j | x)i) Is to a vector X in XiExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector wjIs proportional to the input vector xiAnd wjInner product x ofiwj TThe formula is as follows:
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. N is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, i.e. the number of convolution kernel vectors in the matrix.
Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1.2 specifically includes:
according to the sum of absolute values s of each column element of the matrix WtFor each of the convolution kernel matrices WThe columns construct a polynomial distribution P (j | t) and for each vector X in XiConstructing a polynomial distribution P (t | x)i) (ii) a According to
Figure BDA0002927324530000031
Figure BDA0002927324530000032
Construction of conditional probability distribution P (j | x)i) (ii) a Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of the matrix W; d ═ dx=dw,dxIs the dimension of the vector x; dwThe dimension of the convolution kernel vector w.
Further, according to the sample-based convolutional neural network accelerated learning method, the polynomial distribution P (j | t) and P (t | x)i) Expressed by equation (3) and equation (4), respectively:
P(j|t)~PN([|W1t|,...,|Wnt|]),j∈[1,n],t∈[1,d] (3)
P(t|xi)~PN([|xi1s1|,...,|xidsd|]),t∈[1,d] (4)
wherein PN represents a polynomial distribution. In formula (3), each distribution P (j | t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the convolution kernel matrix W is selected; since the matrix W has d columns, d distributions are constructed. In the formula (4), P (t | x) is distributedi) Representing the input vector xiIn terms of the probability of selecting a different column number t of the matrix W.
Further, according to the sample-based convolutional neural network accelerated learning method, the method for obtaining a convolutional kernel vector number in each sample in step 1.3 is as follows: according to P (t | x)i) Sampling to obtain a column number t of a convolution kernel matrix Wchosen(ii) a Finding the t-th in the conditional probability distribution P (j | t)chosenA probability distribution P (j | t ═ t)chosen) Sampling from the distribution to obtain a number jchosen,jchosenI.e. the convolution kernel vector obtained in the samplingNumber (n).
Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1.4 specifically includes: firstly, calculating a result weight of a convolution kernel vector number j obtained by sampling each time and recording the result weight as omegajThen set VpreAccording to their weight ωjSorting, and reserving the theta vectors with the maximum weight as a candidate convolution kernel vector number set V; wherein, ω isj=ωj+sgn(xitWjt) (ii) a sgn () represents a sign function when xitWjtWhen > 0, sgn (x)itWjt)=
1; when x isitWjtWhen 0, sgn (x)itWjt) 0; when x isitWjtWhen < 0, sgn (x)itWjt)=-1。
Further, according to the sampling-based convolutional neural network accelerated learning method, θ represents the number of the candidate convolutional kernel vectors which are finally reserved, and θ is less than or equal to n, wherein n represents the number of matrix rows of the two-dimensional matrix W, i.e. the number of convolutional kernel vectors in the matrix.
The convolutional neural network accelerated learning method based on sampling, wherein the step 2 comprises the following steps:
step 2.1: for input gradient matrix
Figure BDA0002927324530000033
Neglecting the unneeded gradient value to set 0, and inputting the gradient matrix
Figure BDA0002927324530000034
Performing deletion to obtain a deleted input gradient matrix
Figure BDA0002927324530000035
Step 2.2: calculating a gradient matrix of a convolution kernel
Figure BDA0002927324530000036
Step 2.3: computingOutput gradient matrix
Figure BDA0002927324530000041
Step 2.4: from a convolution kernel gradient matrix
Figure BDA0002927324530000042
Updating the matrix W, outputting the gradient matrix
Figure BDA0002927324530000043
The input gradient matrix continues to propagate backward as the previous layer in the network.
Compared with the prior art, the convolutional neural network accelerated learning method based on sampling provided by the invention has the following beneficial effects: the method utilizes the probability sampling principle to reduce the calculated amount under the condition of not influencing the network feature extraction capability, thereby accelerating the speed of constructing the convolutional network, improving the calculation efficiency of using the convolutional neural network model to extract the features and meeting the requirement of rapid feature extraction in practical application. Specifically, in the forward propagation stage, only part of convolution kernel vectors are sampled and obtained to be multiplied by input data, and the rest vectors are ignored and are not calculated. The back propagation stage also updates only the convolution kernel vectors that participate in the computation during forward propagation. Therefore, compared with the prior convolutional network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculation amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated because only the meaningful weight in the network is calculated and updated each time. The convolution neural network accelerated learning method based on sampling does not need to adjust the macrostructure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and more cost-saving compared with a convolution accelerating method based on hardware.
Drawings
FIG. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process provided by the present invention;
FIG. 2 is a schematic flow chart of a sample-based convolutional neural network accelerated learning method provided in the present invention;
FIG. 3 is a schematic flow chart of forward propagation provided by the present invention;
FIG. 4 is a block diagram of the constructed probability distribution P (j | x) provided by the present inventioni) A schematic flow diagram of (a);
FIG. 5 is a schematic diagram of a sampling detailed flow in step 1.3 provided by the present invention;
FIG. 6 is a graph according to set VpreScreening to obtain a flow schematic diagram of a final convolution kernel vector set V;
FIG. 7 is a flowchart illustrating a back propagation and update process according to the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the embodiments and the accompanying drawings, and it is obvious that the described embodiments are one preferred embodiment of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process according to the present invention. When the convolution network is used for feature extraction, a plurality of convolution layers are stacked to improve the feature extraction capability of the network. Taking image feature extraction as an example, the input of each convolution layer is an original picture or a feature map obtained through a previous layer, and the output of each convolution layer is an output feature map obtained through convolution operation. The invention aims to reduce the calculation amount of convolution operation without influencing the characteristic extraction effect of the convolution network and improve the calculation efficiency of characteristic extraction by using a convolution neural network model, so the input and output dimensionality is the same as the definition in the conventional convolution operation. The definitions of the symbolic variables labeled in fig. 1 are shown in table 1.
TABLE 1 meaning table of symbolic variables referred to in FIG. 1
Figure BDA0002927324530000051
An embodiment of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 2, a schematic flow chart of the sample-based convolutional neural network accelerated learning method provided by the present invention includes steps 1, 2, and 3:
step 1: and in the forward propagation stage, the probability sampling calculation is utilized to obtain an output characteristic diagram.
Firstly, the input characteristic diagram and the convolution kernel are unfolded into a two-dimensional matrix. And (4) performing probability sampling on each input vector in the input two-dimensional matrix obtained after expansion to obtain a corresponding candidate convolution kernel vector number set V. The input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.
The specific workflow of the forward propagation stage, as shown in fig. 3, includes step 1.1, step 1.2, step 1.3, step 1.4, and step 1.5:
step 1.1: and expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, and calculating the absolute value sum of each column of elements in the matrix W.
Specifically, as shown in fig. 1, in the process of expanding the input data, according to the definition of the conventional convolution operation, in the process of sliding the convolution kernel on the original image, the elements of the convolution kernel at each sliding position are multiplied by the elements of the input feature pixels in a one-to-one correspondence, and then summed. The convolution operation can be converted to a matrix product. As shown in fig. 1, the convolution kernel is a four-dimensional tensor of kn × kh × kw × kc, which is expanded into a two-dimensional matrix denoted by W. Dimension of W is n x dwEach row of the matrix represents a convolution kernel vector w, the corresponding dimension d of whichwKh × kw × kc; n represents the number of matrix rows, i.e. the number of convolution kernel vectors in the matrix, so n equals kn. Let the input eigen map be a four-dimensional tensor of in X ih X iw X ic, expand the input eigen map into a two-dimensional matrix X according to the dimension of convolution kernel, where X is N X dxIs used for the two-dimensional matrix of (1). Each vector X in the X matrix is an area covered by the convolution kernel each time the convolution kernel slides on the original image, and thus the dimension d of the vector XxAnd a rollDimension d of the product-kernel vector wwAre equal. Let dx=dwD. N denotes how many times the convolution kernel has slid on the input feature map, and N ═ in × oh × ow can be obtained by the definition of the convolution operation. The convolution operation can thus be converted into the product of the expanded two-dimensional matrices X and W.
The sum of absolute values s of the elements of each column of the matrix WtThe calculation is as follows.
Figure BDA0002927324530000061
Wherein t is the column number of the matrix W; wjtI.e. the element in the jth row and the tth column of the convolution kernel matrix W. Calculating this value provides for the subsequent steps of constructing the probability distribution.
Step 1.2: according to the sum of absolute values s of each column element of the matrix WtFor each vector X in XiConstructing a conditional probability distribution P (j | x)i)。
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. The meaning of the probability distribution is to a vector X in XiExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector wjIs proportional to the input vector xiAnd wjInner product x ofiwj TThe absolute value of (a).
Specifically, since the conditional probability distribution P (j | x) is directly constructedi)∝xiwj TIs difficult, therefore according to
Figure BDA0002927324530000062
Figure BDA0002927324530000063
Step 1.2 by constructing two polynomial distributions P (j | t) (step 1.2.1) and P (t | x)i) (step 1.2.2) to obtain P (j | x)i)。
Construction of conditional probability distribution P (j | x)i) As shown in fig. 4, the specific workflow of (1.2.1) and (1.2.2):
step 1.2.1: a polynomial distribution P (j | t) is constructed for each column of the convolution kernel matrix W.
P(j|t)~PN([|W1t|,...,|Wnt|]),j∈[1,n],t∈[1,d] (3)
Each distribution P (j | t) stores the probability that a different row number j of the matrix W is selected on the premise that the t-th column of the matrix W is selected. Since the matrix W has d columns, d distributions are constructed. PN represents a polynomial distribution. In this distribution, taking the value of j as 5 as an example, the specific probability value is calculated as
Figure BDA0002927324530000071
Step 1.2.2: for each vector x in xiConstructing a polynomial distribution P (t | x)i)。
P(t|xi)~PN([|xi1s1|,...,|xidsd|]),t∈[1,d] (4)
Wherein s in each termtThe sum of absolute values of the column elements of the convolution kernel matrix calculated in step 1.1. Specifically, for example, t takes a value of 3(3 ∈ [1, d ]]) Has a probability of
Figure BDA0002927324530000072
Step 1.3: according to the probability distribution P (j | x)i) Sampling for tau times to obtain a convolution kernel vector number each time, and obtaining a convolution kernel vector candidate set V after sampling for tau timespre. Tau represents the sampling times, and the specific value is customized by a technician according to the experimental effect.
According to the probability distribution P (j | x)i) The specific workflow of sampling is shown in fig. 5. Comprises the steps of 1.3.1, 1.3.2 and 1.3.3:
step 1.3.1: according to P (t | x)i) Sampling to obtain a column number t of a convolution kernel matrix Wchosen
Step 1.3.2: find the tth probability distribution in the set of probability distributions constructed in step 1.2.1chosenA probability distribution P (j | t ═ t)chosen) Sampling from the distribution to obtain a number jchosen,jchosenI.e. the number of the convolution kernel vector obtained in the sampling.
Specifically, for one sample, assume that P (t | x) is first consideredi) If t is extracted as 3 (step 1.3.1), a probability distribution P is found (j | t as 3), and j as 5 is extracted from the probability distribution (step 1.3.2). The finally extracted convolution kernel vector is numbered 5. The significance of this result is that the input vector x is compared to the other non-decimated convolution kernel vectorsiWith the extracted convolution kernel vector w5Doing the inner product may result in a larger eigenvalue.
Step 1.3.3: repeating 1.3.1 and 1.3.2 times, and obtaining a convolution kernel vector candidate set V after extracting for tau timespre
Step 1.4: vpreAnd screening to obtain a final candidate convolution kernel vector number set V.
According to the set VpreThe specific workflow of the final convolution kernel vector set V is obtained by screening, and as shown in fig. 6, the specific workflow includes step 1.4.1 and step 1.4.2:
step 1.4.1: calculating a result weight of the convolution kernel serial number j obtained by sampling each time and recording the result weight as omegaj,ωj=ωj+sgn(xitWjt)。
Wherein sgn () represents a sign function when xitWjtWhen > 0, sgn (x)itWjt) 1 is ═ 1; when x isitWjtWhen 0, sgn (x)itWjt) 0; when x isitWjtWhen < 0, sgn (x)itWjt)=-1。
In particular, since the objective is to construct a probability distribution P (j | x)i)∝xiwj TThe inner product xiwj TIs divided into positive and negative values, and two probability distributions P (t | x) are constructedi) The sum P (j | t) is proportional to the inner product absolute value size. Weights of sampling results to make inner product negativeThe weight is reduced, and the result weight omega is constructedj
Step 1.4.2: will be set VpreAccording to their weight ωjAnd sorting, and keeping the theta vectors with the largest weight as a final set V.
Specifically, θ represents the number of candidate convolution kernel vector numbers that are finally retained, and θ is less than or equal to n. When all convolution kernel vectors are selected, θ is n, and the amount of calculation at this time is the same as that of the conventional convolution method. The specific value of theta is customized by a technician according to the experimental effect.
Step 1.5: vector xiOnly the inner product is made with these extracted vectors in the set V, and the result is filled into the position corresponding to the output feature map Y, and the rest positions are set to 0.
The specific flow of step 1 has been described with reference to specific embodiments, wherein the construction and sampling of the polynomial distribution in step 1.2 can be further accelerated by using existing methods, such as constructing P (t | x) for each input vector using Alias sampling method (Alias Sample)i) Is O (d) and the sampling time complexity is O (1). The existing convolution method requires the calculation of xiWith the product of n convolution kernel vectors, the invention only needs to calculate xiThe forward propagation time complexity of the improved convolution operation is O (N.theta.d); the forward propagation of the prior convolution method needs to calculate complete matrix multiplication with time complexity of O (N.n.d). The invention has the beneficial effect of acceleration because theta is less than or equal to n. In addition, the threshold theta is customized by technicians according to experimental effects, so that the balance between the speed and the accuracy can be adjusted according to actual requirements.
Step 2: the back propagation stage only retains the corresponding gradient values of the neurons participating in the calculation during the forward propagation, and the rest gradient values are ignored to be set to 0. And calculating and updating the convolution kernel parameters by using the pruned gradient matrix.
Specifically, in the existing definition, the forward propagation stage of network training is a process of obtaining final output by giving input data and calculating backward layer by the network; in the back propagation stage, the error between the output and the target value is calculated, and the error is propagated from the last layer of the network layer by layer forward. For the convolutional layer, a convolutional kernel gradient matrix and an output gradient matrix are obtained through calculation according to the input gradient matrix in the back propagation stage, wherein the convolutional kernel gradient matrix is used for updating the convolutional kernel tensor of the current layer, and the output gradient matrix is transmitted to the previous layer and serves as the input gradient matrix of the previous layer. The invention does not change the definition of the prior convolution network to the back propagation process, and only reduces the calculation amount of the convolution kernel gradient matrix and the output gradient matrix in the process.
As shown in fig. 7, the specific workflow of the back propagation and update process in the present invention includes step 2.1, step 2.2, step 2.3, and step 2.4:
step 2.1: for input gradient matrix
Figure BDA0002927324530000091
Neglecting the unnecessary gradient value to set 0 to obtain
Figure BDA0002927324530000092
According to the definition of back-propagation,
Figure BDA0002927324530000093
the same dimension as the output signature Y. According to the procedure of step 1, for each vector X in the input two-dimensional matrix XiThe convolution kernel vector obtained by sampling is only multiplied, and the result is filled in the position corresponding to the output characteristic diagram Y, so that only the positions actually participate in the calculation of forward propagation. Thus only retaining
Figure BDA0002927324530000094
The gradient values corresponding to these positions are set to 0, and the rest of the gradient values are set to 0, so as to obtain the gradient values
Figure BDA0002927324530000095
Step 2.2: calculating a gradient matrix of a convolution kernel
Figure BDA0002927324530000096
In particular, the amount of the solvent to be used,
Figure BDA0002927324530000097
only the gradient values of the positions involved in the calculation in the forward propagation are retained, such that
Figure BDA0002927324530000098
Multiplying X by X so that X is added to each vector in XiOnly the sampled convolution kernel vectors at the corresponding forward propagation stage calculate the gradient values, and the gradient values of the rest convolution kernel vectors are 0.
Step 2.3: computing an output gradient matrix
Figure BDA0002927324530000099
In particular, using a truncated input gradient matrix
Figure BDA00029273245300000910
Multiplying the convolution kernel matrix W to obtain an output gradient matrix
Figure BDA00029273245300000911
Step 2.4: from a convolution kernel gradient matrix
Figure BDA00029273245300000912
Updating the matrix W, outputting the gradient matrix
Figure BDA00029273245300000913
The input gradient matrix continues to propagate backward as the previous layer in the network.
The update operation is the same as that in the existing convolutional network, and different update strategies are supported, such as using a random gradient descent rule
Figure BDA00029273245300000914
η is the learning rate set by the technician according to the existing learning rate setting strategy.
And step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached. The stopping condition is the same as that in the existing convolutional network learning process. Thus, the learning process of the convolution network is completed.
According to the technical scheme, the sampling-based convolutional neural network accelerated learning method provided by the embodiment of the application only selects more meaningful weight calculation in the network through a probability sampling principle and updates the weight calculation, and the calculation amount can be reduced under the condition of not influencing the network feature extraction capability, so that the speed of constructing the convolutional network is increased, the calculation efficiency of extracting the features by using a convolutional neural network model is improved, and the requirement of rapid feature extraction in practical application is met. And meanwhile, compared with a hardware-based convolution acceleration method, the method is easier to apply and more cost-saving.
The technical solutions of the present invention have been described with reference to the accompanying drawings and the embodiments, and the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (10)

1. A convolutional neural network accelerated learning method based on sampling is characterized by comprising the following steps:
step 1: in the forward propagation stage, an output characteristic diagram is obtained by utilizing probability sampling calculation;
step 2: in the backward propagation stage, only the corresponding gradient values of the neurons participating in calculation in forward propagation are reserved, the rest gradient values are ignored to be set to be 0, the gradient matrix is deleted, and the deleted gradient matrix is used for calculating and updating the convolution kernel parameters;
and step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.
2. The sample-based convolutional neural network accelerated learning method of claim 1, wherein the forward propagation stage uses probabilistic sampling to obtain the output feature map in step 1 by: firstly, unfolding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by utilizing probability sampling for each input vector in the expanded two-dimensional matrix; the input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.
3. The sample-based convolutional neural network accelerated learning method of claim 2, wherein said step 1 specifically comprises the steps of:
step 1.1: expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum s of each column of elements in the matrix Wt
Step 1.2: according to the sum of absolute values s of each column element of the matrix WtFor each vector X in XiConstructing a conditional probability distribution P (j | x)i);
Step 1.3: according to the probability distribution P (j | x)i) Sampling for tau times to obtain a convolution kernel vector number, and sampling for tau times to obtain a convolution kernel vector candidate set Vpre
Step 1.4: according to preset conditions to VpreScreening the elements in the vector set, and forming a final candidate convolution kernel vector number set V by the screened elements;
step 1.5: vector xiAnd performing inner product only with the vectors in the set V, filling the result into the position corresponding to the output characteristic diagram Y, and setting the rest positions to be 0.
4. The sample-based convolutional neural network accelerated learning method of claim 3, wherein the conditional probability distribution P (jjxX)i) Is to a vector X in XiExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector wjIs proportional to the input vector xiAnd wjInner product x ofiwj TThe formula is as follows:
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; n is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, i.e. the number of convolution kernel vectors in the matrix.
5. The sample-based convolutional neural network accelerated learning method of claim 3, wherein said step 1.2 comprises:
according to the sum of absolute values s of each column element of the matrix WtOne polynomial distribution P (j | t) is constructed for each column of the convolution kernel matrix W, and for each vector X in XiConstructing a polynomial distribution P (t | x)i) (ii) a According to
Figure FDA0002927324520000021
Figure FDA0002927324520000022
Construction of conditional probability distribution P (j | x)i) (ii) a Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of the matrix W; d ═ dx=dw,dxIs the dimension of the vector x; dwThe dimension of the convolution kernel vector w.
6. The sample-based convolutional neural network accelerated learning method of claim 5, wherein the polynomial distribution P (jt) and P (txt)i) Expressed by equation (3) and equation (4), respectively:
P(j|t)~PN([|W1t|,...,|Wnt|]),j∈[1,n],t∈[1,d] (3)
P(t|xi)~PN([|xi1s1|,...,|xidsd|]),t∈[1,d] (4)
wherein PN represents a polynomial distribution; in equation (3), each distribution P (j | t) stores the row number j of the selected matrix W which is different on the premise that the t-th column of the matrix W is selectedThe probability of (d); d distributions are constructed due to the fact that the matrix W is provided with d columns; in the formula (4), P (t | x) is distributedi) Representing the input vector xiIn terms of the probability of selecting a different column number t of the matrix W.
7. The sample-based convolutional neural network accelerated learning method of claim 3, wherein the method for obtaining a convolutional kernel vector number in each sample in step 1.3 is as follows: according to P (t | x)i) Sampling to obtain a column number t of a convolution kernel matrix Wchosen(ii) a Finding the t-th in the conditional probability distribution P (j | t)chosenProbability distribution (j | t ═ t)chosen) Sampling from the distribution to obtain a number jchosen,jchosenI.e. the number of the convolution kernel vector obtained in the sampling.
8. The sample-based convolutional neural network accelerated learning method of claim 3, wherein said step 1.4 specifically comprises: firstly, calculating a result weight of a convolution kernel vector number j obtained by sampling each time and recording the result weight as omegajThen set VpreAccording to their weight ωjSorting, and reserving the theta vectors with the maximum weight as a candidate convolution kernel vector number set V; wherein, ω isj=ωj+sgn(xitWjt) (ii) a sgn () represents a sign function when xitWjtWhen > 0, sgn (x)itWjt) 1 is ═ 1; when x isitWjtWhen 0, sgn (x)itWjt) 0; when x isitWjtWhen < 0, sgn (x)itWjt)=-1。
9. The method of claim 8, wherein θ represents the number of candidate convolutional kernel vector numbers to be finally retained, and θ ≦ n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e., the number of convolutional kernel vectors in the matrix.
10. The sample-based convolutional neural network accelerated learning method of claim 1, wherein said step 2 comprises the steps of:
step 2.1: for input gradient matrix
Figure FDA0002927324520000023
Neglecting the unneeded gradient value to set 0, and inputting the gradient matrix
Figure FDA0002927324520000024
Performing deletion to obtain a deleted input gradient matrix
Figure FDA0002927324520000025
Step 2.2: calculating a gradient matrix of a convolution kernel
Figure FDA0002927324520000026
Step 2.3: computing an output gradient matrix
Figure FDA0002927324520000031
Step 2.4: from a convolution kernel gradient matrix
Figure FDA0002927324520000032
Updating the matrix W, outputting the gradient matrix
Figure FDA0002927324520000033
The input gradient matrix continues to propagate backward as the previous layer in the network.
CN202110136925.9A 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction Active CN112784969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110136925.9A CN112784969B (en) 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110136925.9A CN112784969B (en) 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction

Publications (2)

Publication Number Publication Date
CN112784969A true CN112784969A (en) 2021-05-11
CN112784969B CN112784969B (en) 2024-05-14

Family

ID=

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
US20180315399A1 (en) * 2017-04-28 2018-11-01 Intel Corporation Instructions and logic to perform floating-point and integer operations for machine learning
US10140421B1 (en) * 2017-05-25 2018-11-27 Enlitic, Inc. Medical scan annotator system
CN109612708A (en) * 2018-12-28 2019-04-12 东北大学 Based on the power transformer on-line detecting system and method for improving convolutional neural networks
US20190114544A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Semi-Supervised Learning for Training an Ensemble of Deep Convolutional Neural Networks
CN109948029A (en) * 2019-01-25 2019-06-28 南京邮电大学 Based on the adaptive depth hashing image searching method of neural network
CN111428188A (en) * 2020-03-30 2020-07-17 南京大学 Convolution operation method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
US20180315399A1 (en) * 2017-04-28 2018-11-01 Intel Corporation Instructions and logic to perform floating-point and integer operations for machine learning
US10140421B1 (en) * 2017-05-25 2018-11-27 Enlitic, Inc. Medical scan annotator system
US20190114544A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Semi-Supervised Learning for Training an Ensemble of Deep Convolutional Neural Networks
CN109612708A (en) * 2018-12-28 2019-04-12 东北大学 Based on the power transformer on-line detecting system and method for improving convolutional neural networks
CN109948029A (en) * 2019-01-25 2019-06-28 南京邮电大学 Based on the adaptive depth hashing image searching method of neural network
CN111428188A (en) * 2020-03-30 2020-07-17 南京大学 Convolution operation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUAN WANG等: "Structured Probabilistic Pruning for Convolution Neural Network Acceleration", 《ARXIV》, 10 September 2018 (2018-09-10), pages 1 - 13 *
韩韬: "资源受限下的卷积神经网络模型优化方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 February 2019 (2019-02-15), pages 140 - 91 *

Similar Documents

Publication Publication Date Title
CN110188685B (en) Target counting method and system based on double-attention multi-scale cascade network
CN110020682B (en) Attention mechanism relation comparison network model method based on small sample learning
CN110288030B (en) Image identification method, device and equipment based on lightweight network model
CN106991440B (en) Image classification method of convolutional neural network based on spatial pyramid
Chen et al. A new knowledge distillation for incremental object detection
CN108510012A (en) A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN111209861A (en) Dynamic gesture action recognition method based on deep learning
CN109740679B (en) Target identification method based on convolutional neural network and naive Bayes
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN111062410B (en) Star information bridge weather prediction method based on deep learning
CN112070768A (en) Anchor-Free based real-time instance segmentation method
CN111368935A (en) SAR time-sensitive target sample augmentation method based on generation countermeasure network
CN113449612A (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN114282646B (en) Optical power prediction method and system based on two-stage feature extraction and BiLSTM improvement
CN109145738B (en) Dynamic video segmentation method based on weighted non-convex regularization and iterative re-constrained low-rank representation
CN114005046A (en) Remote sensing scene classification method based on Gabor filter and covariance pooling
CN114299578A (en) Dynamic human face generation method based on facial emotion analysis
CN112862094A (en) DRBM (distributed resource management protocol) fast adaptation method based on meta-learning
CN112784969A (en) Convolutional neural network accelerated learning method based on sampling
CN116433980A (en) Image classification method, device, equipment and medium of impulse neural network structure
CN111985488A (en) Target detection segmentation method and system based on offline Gaussian model
CN112784969B (en) Convolutional neural network acceleration learning method for image feature extraction
CN116363423A (en) Knowledge distillation method, device and storage medium for small sample learning
Du et al. AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant