CN112784969A

CN112784969A - Convolutional neural network accelerated learning method based on sampling

Info

Publication number: CN112784969A
Application number: CN202110136925.9A
Authority: CN
Inventors: 杨晓春; 张宇杰; 许婧楠; 王斌
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-05-11
Anticipated expiration: 2041-02-01

Abstract

The invention discloses a sampling-based convolutional neural network accelerated learning method, and belongs to the technical field of convolutional neural networks. In the method, only partial convolution kernel vectors are sampled and obtained in the forward propagation stage to be multiplied by input data, and the rest vectors are ignored and are not calculated. The back propagation stage also updates only the convolution kernel vectors that participate in the computation during forward propagation. Therefore, compared with the prior convolutional network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculation amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated because only the meaningful weight in the network is calculated and updated each time. The convolution neural network accelerated learning method based on sampling does not need to adjust the macrostructure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and more cost-saving compared with a convolution accelerating method based on hardware.

Description

Convolutional neural network accelerated learning method based on sampling

Technical Field

The invention belongs to the technical field of convolutional neural networks, and particularly relates to a sampling-based convolutional neural network accelerated learning method.

Background

Convolutional Neural Networks (CNNs) are one of the first successful depth models, and have been in the frontier of deep learning commercial applications, and have been attracting attention in the fields of image detection and segmentation, object recognition, speech processing, and the like.

The convolution operation is a process of sliding different convolution kernels over an input picture and performing a certain operation. Specifically, at each sliding position, the elements of the convolution kernel are multiplied by the elements of the input picture in a one-to-one correspondence, and then summed. After which it is de-linearized by an activation function, most commonly a linear rectification, i.e. a ReLU function. Such a calculation principle enables the convolution network to have the capability of extracting local features. The convolutional neural network has the main structure that a plurality of convolutional layers are stacked to serve as a feature extractor, and finally, a fully-connected layer is connected to serve as a classifier. In order to make the convolutional network have better feature extraction capability, a plurality of convolutional layers are stacked, so that the parameter scale of the convolutional neural network is greatly improved along with the increase of the network depth. The computation of the network for forward feature extraction and backward error propagation is generally in the order of millions to hundreds of millions, and the convolution operation of the convolutional layer is the most important one of the computation resources. Therefore, accelerating the convolution operation is the key to improve the calculation efficiency of the convolution neural network model.

The neural network training frameworks that are now commonly used, such as the Caffe and TensorFlow frameworks, spread the input data and convolution kernels into two-dimensional arrays, thereby converting the convolution operation into a matrix multiplication operation. In the learning of the convolution network, each convolution layer needs to perform a cubic matrix product in total, one time is needed in forward propagation, and one time is needed in each of the output gradient matrix and the gradient matrix of the convolution kernel obtained by calculation in reverse propagation, which consumes a large amount of calculation resources. In fact, the calculation of not all the neural units in the convolution network is meaningful, the neural units with larger characteristic values have larger influence on the subsequent network layer, and the process of using the commonly used activation function ReLU to carry out linearization can directly set the negative characteristic value to 0. Therefore, a new convolutional network learning method is necessary to be designed, the values of the more meaningful neural units in the original output are only calculated without calculating complete matrix multiplication, and the calculation processes of the rest neural units are omitted without calculation, so that the calculation cost can be reduced, and the convolutional network training is accelerated. Moreover, the learning method is improved in method implementation and is more practical compared with a convolution acceleration method based on hardware equipment.

Disclosure of Invention

In an actual feature extraction scene, the convolutional network parameter scale is large, the calculation cost is high, and redundancy exists in the calculation process, so that the speed of training a convolutional network model to extract features is slow, and an acceleration method based on hardware equipment is difficult to be practically applied. The invention provides a convolutional neural network accelerated learning method based on sampling aiming at the problems.

The technical scheme of the invention is as follows:

a convolutional neural network accelerated learning method based on sampling comprises the following steps:

step 1: in the forward propagation stage, an output characteristic diagram is obtained by utilizing probability sampling calculation;

step 2: in the backward propagation stage, only the corresponding gradient values of the neurons participating in calculation in forward propagation are reserved, the rest gradient values are ignored to be set to be 0, the gradient matrix is deleted, and the deleted gradient matrix is used for calculating and updating the convolution kernel parameters;

and step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.

Further, according to the convolutional neural network accelerated learning method based on sampling, in step 1, the method for obtaining the output feature map by utilizing probability sampling calculation in the forward propagation stage comprises the following steps: firstly, unfolding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by utilizing probability sampling for each input vector in the expanded two-dimensional matrix; the input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.

Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1 specifically comprises the following steps:

step 1.1: expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum st of each column of elements in the matrix W;

step 1.2: according to the sum of absolute values s of each column element of the matrix W_tFor each vector X in X_iConstructing a conditional probability distribution P (j | x)_i)；

Step 1.3: according to the probability distribution P (j | x)_i) Sampling for tau times to obtain a convolution kernel vector number, and sampling for tau times to obtain a convolution kernel vector candidate set V_pre；

Step 1.4: according to preset conditions to V_preScreening the elements in the vector set, and forming a final candidate convolution kernel vector number set V by the screened elements;

step 1.5: vector x_iAnd performing inner product only with the vectors in the set V, filling the result into the position corresponding to the output characteristic diagram Y, and setting the rest positions to be 0.

Further, according to the sample-based convolutional neural network accelerated learning method, wherein the conditional probability distribution P (j | x)_i) Is to a vector X in X_iExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector w_jIs proportional to the input vector x_iAnd w_jInner product x of_iw_j ^TThe formula is as follows:

P(j|x_i)∝x_iw_j ^T，i∈[1，N]，j∈[1，n] (2)

wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. N is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, i.e. the number of convolution kernel vectors in the matrix.

Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1.2 specifically includes:

according to the sum of absolute values s of each column element of the matrix W_tFor each of the convolution kernel matrices WThe columns construct a polynomial distribution P (j | t) and for each vector X in X_iConstructing a polynomial distribution P (t | x)_i) (ii) a According to

Construction of conditional probability distribution P (j | x)_i) (ii) a Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of the matrix W; d ═ d_x＝d_w，d_xIs the dimension of the vector x; d_wThe dimension of the convolution kernel vector w.

Further, according to the sample-based convolutional neural network accelerated learning method, the polynomial distribution P (j | t) and P (t | x)_i) Expressed by equation (3) and equation (4), respectively:

P(j|t)～PN([|W_1t|，...，|W_nt|])，j∈[1，n]，t∈[1，d] (3)

P(t|x_i)～PN([|x_i1s₁|，...，|x_ids_d|])，t∈[1，d] (4)

wherein PN represents a polynomial distribution. In formula (3), each distribution P (j | t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the convolution kernel matrix W is selected; since the matrix W has d columns, d distributions are constructed. In the formula (4), P (t | x) is distributed_i) Representing the input vector x_iIn terms of the probability of selecting a different column number t of the matrix W.

Further, according to the sample-based convolutional neural network accelerated learning method, the method for obtaining a convolutional kernel vector number in each sample in step 1.3 is as follows: according to P (t | x)_i) Sampling to obtain a column number t of a convolution kernel matrix W_chosen(ii) a Finding the t-th in the conditional probability distribution P (j | t)_chosenA probability distribution P (j | t ═ t)_chosen) Sampling from the distribution to obtain a number j_chosen，j_chosenI.e. the convolution kernel vector obtained in the samplingNumber (n).

Further, according to the sampling-based convolutional neural network accelerated learning method, the step 1.4 specifically includes: firstly, calculating a result weight of a convolution kernel vector number j obtained by sampling each time and recording the result weight as omega_jThen set V_preAccording to their weight ω_jSorting, and reserving the theta vectors with the maximum weight as a candidate convolution kernel vector number set V; wherein, ω is_j＝ω_j+sgn(x_itW_jt) (ii) a sgn () represents a sign function when x_itW_jtWhen > 0, sgn (x)_itW_jt)＝

1; when x is_itW_jtWhen 0, sgn (x)_itW_jt) 0; when x is_itW_jtWhen < 0, sgn (x)_itW_jt)＝-1。

Further, according to the sampling-based convolutional neural network accelerated learning method, θ represents the number of the candidate convolutional kernel vectors which are finally reserved, and θ is less than or equal to n, wherein n represents the number of matrix rows of the two-dimensional matrix W, i.e. the number of convolutional kernel vectors in the matrix.

The convolutional neural network accelerated learning method based on sampling, wherein the step 2 comprises the following steps:

step 2.1: for input gradient matrix

Neglecting the unneeded gradient value to set 0, and inputting the gradient matrix

Performing deletion to obtain a deleted input gradient matrix

Step 2.2: calculating a gradient matrix of a convolution kernel

Step 2.3: computingOutput gradient matrix

Step 2.4: from a convolution kernel gradient matrix

Updating the matrix W, outputting the gradient matrix

The input gradient matrix continues to propagate backward as the previous layer in the network.

Compared with the prior art, the convolutional neural network accelerated learning method based on sampling provided by the invention has the following beneficial effects: the method utilizes the probability sampling principle to reduce the calculated amount under the condition of not influencing the network feature extraction capability, thereby accelerating the speed of constructing the convolutional network, improving the calculation efficiency of using the convolutional neural network model to extract the features and meeting the requirement of rapid feature extraction in practical application. Specifically, in the forward propagation stage, only part of convolution kernel vectors are sampled and obtained to be multiplied by input data, and the rest vectors are ignored and are not calculated. The back propagation stage also updates only the convolution kernel vectors that participate in the computation during forward propagation. Therefore, compared with the prior convolutional network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculation amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated because only the meaningful weight in the network is calculated and updated each time. The convolution neural network accelerated learning method based on sampling does not need to adjust the macrostructure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and more cost-saving compared with a convolution accelerating method based on hardware.

Drawings

FIG. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process provided by the present invention;

FIG. 2 is a schematic flow chart of a sample-based convolutional neural network accelerated learning method provided in the present invention;

FIG. 3 is a schematic flow chart of forward propagation provided by the present invention;

FIG. 4 is a block diagram of the constructed probability distribution P (j | x) provided by the present invention_i) A schematic flow diagram of (a);

FIG. 5 is a schematic diagram of a sampling detailed flow in step 1.3 provided by the present invention;

FIG. 6 is a graph according to set V_preScreening to obtain a flow schematic diagram of a final convolution kernel vector set V;

FIG. 7 is a flowchart illustrating a back propagation and update process according to the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the embodiments and the accompanying drawings, and it is obvious that the described embodiments are one preferred embodiment of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process according to the present invention. When the convolution network is used for feature extraction, a plurality of convolution layers are stacked to improve the feature extraction capability of the network. Taking image feature extraction as an example, the input of each convolution layer is an original picture or a feature map obtained through a previous layer, and the output of each convolution layer is an output feature map obtained through convolution operation. The invention aims to reduce the calculation amount of convolution operation without influencing the characteristic extraction effect of the convolution network and improve the calculation efficiency of characteristic extraction by using a convolution neural network model, so the input and output dimensionality is the same as the definition in the conventional convolution operation. The definitions of the symbolic variables labeled in fig. 1 are shown in table 1.

TABLE 1 meaning table of symbolic variables referred to in FIG. 1

An embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in fig. 2, a schematic flow chart of the sample-based convolutional neural network accelerated learning method provided by the present invention includes steps 1, 2, and 3:

step 1: and in the forward propagation stage, the probability sampling calculation is utilized to obtain an output characteristic diagram.

Firstly, the input characteristic diagram and the convolution kernel are unfolded into a two-dimensional matrix. And (4) performing probability sampling on each input vector in the input two-dimensional matrix obtained after expansion to obtain a corresponding candidate convolution kernel vector number set V. The input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.

The specific workflow of the forward propagation stage, as shown in fig. 3, includes step 1.1, step 1.2, step 1.3, step 1.4, and step 1.5:

step 1.1: and expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, and calculating the absolute value sum of each column of elements in the matrix W.

Specifically, as shown in fig. 1, in the process of expanding the input data, according to the definition of the conventional convolution operation, in the process of sliding the convolution kernel on the original image, the elements of the convolution kernel at each sliding position are multiplied by the elements of the input feature pixels in a one-to-one correspondence, and then summed. The convolution operation can be converted to a matrix product. As shown in fig. 1, the convolution kernel is a four-dimensional tensor of kn × kh × kw × kc, which is expanded into a two-dimensional matrix denoted by W. Dimension of W is n x d_wEach row of the matrix represents a convolution kernel vector w, the corresponding dimension d of which_wKh × kw × kc; n represents the number of matrix rows, i.e. the number of convolution kernel vectors in the matrix, so n equals kn. Let the input eigen map be a four-dimensional tensor of in X ih X iw X ic, expand the input eigen map into a two-dimensional matrix X according to the dimension of convolution kernel, where X is N X d_xIs used for the two-dimensional matrix of (1). Each vector X in the X matrix is an area covered by the convolution kernel each time the convolution kernel slides on the original image, and thus the dimension d of the vector X_xAnd a rollDimension d of the product-kernel vector w_wAre equal. Let d_x＝d_wD. N denotes how many times the convolution kernel has slid on the input feature map, and N ═ in × oh × ow can be obtained by the definition of the convolution operation. The convolution operation can thus be converted into the product of the expanded two-dimensional matrices X and W.

The sum of absolute values s of the elements of each column of the matrix W_tThe calculation is as follows.

Wherein t is the column number of the matrix W; w_jtI.e. the element in the jth row and the tth column of the convolution kernel matrix W. Calculating this value provides for the subsequent steps of constructing the probability distribution.

Step 1.2: according to the sum of absolute values s of each column element of the matrix W_tFor each vector X in X_iConstructing a conditional probability distribution P (j | x)_i)。

P(j|x_i)∝x_iw_j ^T，i∈[1，N]，j∈[1，n] (2)

Wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. The meaning of the probability distribution is to a vector X in X_iExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector w_jIs proportional to the input vector x_iAnd w_jInner product x of_iw_j ^TThe absolute value of (a).

Specifically, since the conditional probability distribution P (j | x) is directly constructed_i)∝x_iw_j ^TIs difficult, therefore according to

Step 1.2 by constructing two polynomial distributions P (j | t) (step 1.2.1) and P (t | x)_i) (step 1.2.2) to obtain P (j | x)_i)。

Construction of conditional probability distribution P (j | x)_i) As shown in fig. 4, the specific workflow of (1.2.1) and (1.2.2):

step 1.2.1: a polynomial distribution P (j | t) is constructed for each column of the convolution kernel matrix W.

P(j|t)～PN([|W_1t|，...，|W_nt|])，j∈[1，n]，t∈[1，d] (3)

Each distribution P (j | t) stores the probability that a different row number j of the matrix W is selected on the premise that the t-th column of the matrix W is selected. Since the matrix W has d columns, d distributions are constructed. PN represents a polynomial distribution. In this distribution, taking the value of j as 5 as an example, the specific probability value is calculated as

Step 1.2.2: for each vector x in x_iConstructing a polynomial distribution P (t | x)_i)。

P(t|x_i)～PN([|x_i1s₁|，...，|x_ids_d|])，t∈[1，d] (4)

Wherein s in each term_tThe sum of absolute values of the column elements of the convolution kernel matrix calculated in step 1.1. Specifically, for example, t takes a value of 3(3 ∈ [1, d ]]) Has a probability of

Step 1.3: according to the probability distribution P (j | x)_i) Sampling for tau times to obtain a convolution kernel vector number each time, and obtaining a convolution kernel vector candidate set V after sampling for tau times_pre. Tau represents the sampling times, and the specific value is customized by a technician according to the experimental effect.

According to the probability distribution P (j | x)_i) The specific workflow of sampling is shown in fig. 5. Comprises the steps of 1.3.1, 1.3.2 and 1.3.3:

step 1.3.1: according to P (t | x)_i) Sampling to obtain a column number t of a convolution kernel matrix W_chosen。

Step 1.3.2: find the tth probability distribution in the set of probability distributions constructed in step 1.2.1_chosenA probability distribution P (j | t ═ t)_chosen) Sampling from the distribution to obtain a number j_chosen，j_chosenI.e. the number of the convolution kernel vector obtained in the sampling.

Specifically, for one sample, assume that P (t | x) is first considered_i) If t is extracted as 3 (step 1.3.1), a probability distribution P is found (j | t as 3), and j as 5 is extracted from the probability distribution (step 1.3.2). The finally extracted convolution kernel vector is numbered 5. The significance of this result is that the input vector x is compared to the other non-decimated convolution kernel vectors_iWith the extracted convolution kernel vector w₅Doing the inner product may result in a larger eigenvalue.

Step 1.3.3: repeating 1.3.1 and 1.3.2 times, and obtaining a convolution kernel vector candidate set V after extracting for tau times_pre。

Step 1.4: v_preAnd screening to obtain a final candidate convolution kernel vector number set V.

According to the set V_preThe specific workflow of the final convolution kernel vector set V is obtained by screening, and as shown in fig. 6, the specific workflow includes step 1.4.1 and step 1.4.2:

step 1.4.1: calculating a result weight of the convolution kernel serial number j obtained by sampling each time and recording the result weight as omega_j，ω_j＝ω_j+sgn(x_itW_jt)。

Wherein sgn () represents a sign function when x_itW_jtWhen > 0, sgn (x)_itW_jt) 1 is ═ 1; when x is_itW_jtWhen 0, sgn (x)_itW_jt) 0; when x is_itW_jtWhen < 0, sgn (x)_itW_jt)＝-1。

In particular, since the objective is to construct a probability distribution P (j | x)_i)∝x_iw_j ^TThe inner product x_iw_j ^TIs divided into positive and negative values, and two probability distributions P (t | x) are constructed_i) The sum P (j | t) is proportional to the inner product absolute value size. Weights of sampling results to make inner product negativeThe weight is reduced, and the result weight omega is constructed_j。

Step 1.4.2: will be set V_preAccording to their weight ω_jAnd sorting, and keeping the theta vectors with the largest weight as a final set V.

Specifically, θ represents the number of candidate convolution kernel vector numbers that are finally retained, and θ is less than or equal to n. When all convolution kernel vectors are selected, θ is n, and the amount of calculation at this time is the same as that of the conventional convolution method. The specific value of theta is customized by a technician according to the experimental effect.

Step 1.5: vector x_iOnly the inner product is made with these extracted vectors in the set V, and the result is filled into the position corresponding to the output feature map Y, and the rest positions are set to 0.

The specific flow of step 1 has been described with reference to specific embodiments, wherein the construction and sampling of the polynomial distribution in step 1.2 can be further accelerated by using existing methods, such as constructing P (t | x) for each input vector using Alias sampling method (Alias Sample)_i) Is O (d) and the sampling time complexity is O (1). The existing convolution method requires the calculation of x_iWith the product of n convolution kernel vectors, the invention only needs to calculate x_iThe forward propagation time complexity of the improved convolution operation is O (N.theta.d); the forward propagation of the prior convolution method needs to calculate complete matrix multiplication with time complexity of O (N.n.d). The invention has the beneficial effect of acceleration because theta is less than or equal to n. In addition, the threshold theta is customized by technicians according to experimental effects, so that the balance between the speed and the accuracy can be adjusted according to actual requirements.

Step 2: the back propagation stage only retains the corresponding gradient values of the neurons participating in the calculation during the forward propagation, and the rest gradient values are ignored to be set to 0. And calculating and updating the convolution kernel parameters by using the pruned gradient matrix.

Specifically, in the existing definition, the forward propagation stage of network training is a process of obtaining final output by giving input data and calculating backward layer by the network; in the back propagation stage, the error between the output and the target value is calculated, and the error is propagated from the last layer of the network layer by layer forward. For the convolutional layer, a convolutional kernel gradient matrix and an output gradient matrix are obtained through calculation according to the input gradient matrix in the back propagation stage, wherein the convolutional kernel gradient matrix is used for updating the convolutional kernel tensor of the current layer, and the output gradient matrix is transmitted to the previous layer and serves as the input gradient matrix of the previous layer. The invention does not change the definition of the prior convolution network to the back propagation process, and only reduces the calculation amount of the convolution kernel gradient matrix and the output gradient matrix in the process.

As shown in fig. 7, the specific workflow of the back propagation and update process in the present invention includes step 2.1, step 2.2, step 2.3, and step 2.4:

step 2.1: for input gradient matrix

Neglecting the unnecessary gradient value to set 0 to obtain

According to the definition of back-propagation,

the same dimension as the output signature Y. According to the procedure of step 1, for each vector X in the input two-dimensional matrix X_iThe convolution kernel vector obtained by sampling is only multiplied, and the result is filled in the position corresponding to the output characteristic diagram Y, so that only the positions actually participate in the calculation of forward propagation. Thus only retaining

The gradient values corresponding to these positions are set to 0, and the rest of the gradient values are set to 0, so as to obtain the gradient values

Step 2.2: calculating a gradient matrix of a convolution kernel

In particular, the amount of the solvent to be used,

only the gradient values of the positions involved in the calculation in the forward propagation are retained, such that

Multiplying X by X so that X is added to each vector in X_iOnly the sampled convolution kernel vectors at the corresponding forward propagation stage calculate the gradient values, and the gradient values of the rest convolution kernel vectors are 0.

Step 2.3: computing an output gradient matrix

In particular, using a truncated input gradient matrix

Multiplying the convolution kernel matrix W to obtain an output gradient matrix

Step 2.4: from a convolution kernel gradient matrix

Updating the matrix W, outputting the gradient matrix

The update operation is the same as that in the existing convolutional network, and different update strategies are supported, such as using a random gradient descent rule

η is the learning rate set by the technician according to the existing learning rate setting strategy.

And step 3: and (5) repeatedly executing the steps 1 and 2 until the network training stopping condition is reached. The stopping condition is the same as that in the existing convolutional network learning process. Thus, the learning process of the convolution network is completed.

According to the technical scheme, the sampling-based convolutional neural network accelerated learning method provided by the embodiment of the application only selects more meaningful weight calculation in the network through a probability sampling principle and updates the weight calculation, and the calculation amount can be reduced under the condition of not influencing the network feature extraction capability, so that the speed of constructing the convolutional network is increased, the calculation efficiency of extracting the features by using a convolutional neural network model is improved, and the requirement of rapid feature extraction in practical application is met. And meanwhile, compared with a hardware-based convolution acceleration method, the method is easier to apply and more cost-saving.

The technical solutions of the present invention have been described with reference to the accompanying drawings and the embodiments, and the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A convolutional neural network accelerated learning method based on sampling is characterized by comprising the following steps:

2. The sample-based convolutional neural network accelerated learning method of claim 1, wherein the forward propagation stage uses probabilistic sampling to obtain the output feature map in step 1 by: firstly, unfolding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by utilizing probability sampling for each input vector in the expanded two-dimensional matrix; the input vector is only multiplied by the vectors in the set V, the calculation result is filled in the corresponding position of the output characteristic diagram, and the rest positions of the output characteristic diagram are set to be 0.

3. The sample-based convolutional neural network accelerated learning method of claim 2, wherein said step 1 specifically comprises the steps of:

step 1.1: expanding the input characteristic diagram into a two-dimensional matrix X according to the dimensionality of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum s of each column of elements in the matrix W_t；

4. The sample-based convolutional neural network accelerated learning method of claim 3, wherein the conditional probability distribution P (jjxX)_i) Is to a vector X in X_iExtracting the vector number from all the convolution kernel vectors to the jth convolution kernel vector w_jIs proportional to the input vector x_iAnd w_jInner product x of_iw_j ^TThe formula is as follows:

P(j|x_i)∝x_iw_j ^T，i∈[1，N]，j∈[1，n] (2)

wherein i is a row number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; n is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, i.e. the number of convolution kernel vectors in the matrix.

5. The sample-based convolutional neural network accelerated learning method of claim 3, wherein said step 1.2 comprises:

according to the sum of absolute values s of each column element of the matrix W_tOne polynomial distribution P (j | t) is constructed for each column of the convolution kernel matrix W, and for each vector X in X_iConstructing a polynomial distribution P (t | x)_i) (ii) a According to

6. The sample-based convolutional neural network accelerated learning method of claim 5, wherein the polynomial distribution P (jt) and P (txt)_i) Expressed by equation (3) and equation (4), respectively:

P(j|t)～PN([|W_1t|，...，|W_nt|])，j∈[1，n]，t∈[1，d] (3)

P(t|x_i)～PN([|x_i1s₁|，...，|x_ids_d|])，t∈[1，d] (4)

wherein PN represents a polynomial distribution; in equation (3), each distribution P (j | t) stores the row number j of the selected matrix W which is different on the premise that the t-th column of the matrix W is selectedThe probability of (d); d distributions are constructed due to the fact that the matrix W is provided with d columns; in the formula (4), P (t | x) is distributed_i) Representing the input vector x_iIn terms of the probability of selecting a different column number t of the matrix W.

7. The sample-based convolutional neural network accelerated learning method of claim 3, wherein the method for obtaining a convolutional kernel vector number in each sample in step 1.3 is as follows: according to P (t | x)_i) Sampling to obtain a column number t of a convolution kernel matrix W_chosen(ii) a Finding the t-th in the conditional probability distribution P (j | t)_chosenProbability distribution (j | t ═ t)_chosen) Sampling from the distribution to obtain a number j_chosen，j_chosenI.e. the number of the convolution kernel vector obtained in the sampling.

8. The sample-based convolutional neural network accelerated learning method of claim 3, wherein said step 1.4 specifically comprises: firstly, calculating a result weight of a convolution kernel vector number j obtained by sampling each time and recording the result weight as omega_jThen set V_preAccording to their weight ω_jSorting, and reserving the theta vectors with the maximum weight as a candidate convolution kernel vector number set V; wherein, ω is_j＝ω_j+sgn(x_itW_jt) (ii) a sgn () represents a sign function when x_itW_jtWhen > 0, sgn (x)_itW_jt) 1 is ═ 1; when x is_itW_jtWhen 0, sgn (x)_itW_jt) 0; when x is_itW_jtWhen < 0, sgn (x)_itW_jt)＝-1。

9. The method of claim 8, wherein θ represents the number of candidate convolutional kernel vector numbers to be finally retained, and θ ≦ n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e., the number of convolutional kernel vectors in the matrix.

10. The sample-based convolutional neural network accelerated learning method of claim 1, wherein said step 2 comprises the steps of:

step 2.1: for input gradient matrix

Performing deletion to obtain a deleted input gradient matrix

Step 2.2: calculating a gradient matrix of a convolution kernel

Step 2.3: computing an output gradient matrix

Step 2.4: from a convolution kernel gradient matrix

Updating the matrix W, outputting the gradient matrix