CN109858618B

CN109858618B - Convolutional neural unit block, neural network formed by convolutional neural unit block and image classification method

Info

Publication number: CN109858618B
Application number: CN201910170897.5A
Authority: CN
Inventors: 李帅; 朱策; 张铁; 郑龙飞; 高艳博
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2020-04-14
Anticipated expiration: 2039-03-07
Also published as: CN109858618A

Abstract

The invention discloses a convolution nerve cell block, a formed nerve network and an image classification method, wherein the convolution nerve cell block comprisesnIs not onlyCo-directional convolution kernel and onem×mConvolution kernel post-stack join ofa1 × 1 convolution kernels; also includes jump connection of identity conversion, in whichaThe number of channels is equal to that of the input feature map; the convolution kernel is decomposed, so that the number of original parameters is reduced while the convolution kernel is ensured to have a large receptive field; the method adopts diagonal convolution to directly acquire the correlation of the original characteristic diagram in the diagonal direction, and enhances the adaptability to space transformation.

Description

Convolutional neural unit block, neural network formed by convolutional neural unit block and image classification method

Technical Field

The invention relates to the technical field of construction and application of a neural network, in particular to a convolutional neural unit block, a constructed neural network and an image classification method.

Background

In the existing computer vision field, a convolutional neural network is a common tool; convolutional neural networks are generally formed by stacking successive convolutional layers, activation layers, and pooling layers; the convolution layer is composed of a plurality of convolution nerve units, receives the input characteristics of the previous layer, calculates by using the convolution nerve units of the layer, and outputs a characteristic diagram with the channel number equal to the number of the convolution nerve units of the layer. The active layer is composed of a linear correction unit; generally, one activation layer only comprises one linear correction unit, and nonlinear mapping is carried out on a feature map output by the previous layer; one pooling layer also performs pooling mapping on the feature map output by the previous layer.

The existing neural network generally comprises two kinds, namely a Resnet network, wherein the Resnet is a network framework proposed in 2016, and identification mapping (identity mapping) of jump connection is added between convolution layers which are directly stacked originally, so that the network only needs to fit residual errors of the original network; heretofore, networks implemented nonlinear transformations by stacking convolutional, active, and pooling layers. The second is the concept of the third version of the concept, in the concept of v3, there is a part of the structure that decomposes a 3 × 3 convolution kernel into successive 3 × 1 convolution kernels and superimposes the 1 × 3 convolution kernels. This part of the structure is similar to the structure of the present invention.

However, in the existing convolutional neural network, because the number of layers of the network is large, model parameters are huge, and in order to reduce the parameters of the network, the convolutional neural network mainly adopts convolution kernels of 3 × 3 and 1 × 1; this makes the direct receptive field of the convolutional layer small; if a convolution kernel with a large receptive field is adopted, the parameter quantity is increased, and the calculation quantity is large, and the adaptability to space non-deformation is poor.

Disclosure of Invention

The invention provides a convolutional neural unit block capable of reducing the number of parameters, a neural network formed by the convolutional neural unit block and an image classification method.

The technical scheme adopted by the invention is as follows: a convolution nerve cell block comprises n convolution kernels in different directions and an m multiplied by m convolution kernel which are stacked and then connected with a 1 multiplied by 1 convolution kernels; and the jump connection of the identity transformation is also included, wherein a is equal to the number of channels of the input feature map.

Further, the convolution kernels in different directions comprise a left oblique convolution kernel bx 1 and a right oblique convolution kernel 1 × b; the left oblique convolution kernel is b × b convolution kernel all positions except the left oblique diagonal are all kept as 0; the right-diagonal convolution kernel is a b × b convolution kernel in which all positions except the right diagonal are all held at 0.

Further, each of the n convolution kernels in different directions includes a stack of directional convolutions in two perpendicular directions.

A neural network adopting convolutional neural unit blocks comprises a convolutional kernel with the length of m multiplied by m, c convolutional neural unit blocks, a convolutional kernel with the step length of d, e convolutional neural unit blocks, a residual block with the step length of d, e convolutional neural unit blocks, a pooling layer with the step length of f and the receptive field of f multiplied by f and a full connection layer which are sequentially connected.

An image classification method using a neural network, comprising the steps of:

step 1: constructing a neural network;

step 2: training the neural network constructed in the step 1;

and step 3: and (3) performing data enhancement processing on the test set picture, and inputting the test set picture into the neural network obtained after training in the step (2), so as to finish the classification of the picture.

The invention has the beneficial effects that:

(1) the convolution kernel is decomposed, so that the number of original parameters is reduced while the convolution kernel is ensured to have a large receptive field;

(2) the invention provides a method for directly obtaining the correlation of the direction of the diagonal of the original characteristic diagram by adopting diagonal convolution and enhancing the adaptability to space transformation;

(3) the method is used for image classification, has higher accuracy and reduces the quantity of parameters required to be stored.

Drawings

FIG. 1 is a diagonal convolution kernel employed in the present invention, a being a left diagonal convolution kernel and b being a right diagonal convolution kernel.

Fig. 2 is a block structure of a convolutional neural unit employed in an embodiment of the present invention.

FIG. 3 is a block diagram of a convolutional neural unit employed in an embodiment of the present invention.

FIG. 4 is a block diagram of a convolutional neural unit employed in an embodiment of the present invention.

FIG. 5 is a block diagram of a convolutional neural unit employed in an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

A convolution nerve cell block comprises n convolution kernels in different directions and an m multiplied by m convolution kernel which are stacked and then connected with a 1 multiplied by 1 convolution kernels; and the jump connection of the identity transformation is also included, wherein a is equal to the number of channels of the input feature map.

The convolution kernels in different directions comprise a left oblique convolution kernel bx 1 and a right oblique convolution kernel 1 x b; the left oblique convolution kernel is b × b convolution kernel all positions except the left oblique diagonal are all kept as 0; the right-diagonal convolution kernel is a b × b convolution kernel in which all positions except the right diagonal are all held at 0.

Each of the n different directional convolution kernels comprises a stack of two orthogonal directional convolutions.

The convolution neural unit is different from a common convolution neural unit, the receptive field of the common convolution neural unit is generally square, the convolution unit provided by the invention is a diagonal convolution unit, and as shown in fig. 1, a is a left oblique convolution kernel and b is a right oblique convolution kernel. The convolution kernel can extract the correlation of the oblique angle direction of the previous layer of feature map; the parameter quantity required to be occupied by one diagonal convolution neural unit is gx 1 × h, wherein g is the size of a convolution kernel, and h is the number of the convolution kernels; the parameter quantity corresponds to the parameter quantity of a normal convolution kernel of g × 1 × h or 1 × g × h. In fig. 1, a is a 5 × 1 left-diagonal convolution kernel, which operates in the same manner as the normal convolution kernel, and corresponds to that all positions of the 5 × 5 normal convolution kernel except for the left diagonal are all kept as 0, and b is the same as above.

FIG. 2 is a diagram of one configuration of the block of convolutional neural cells of the present invention, an improvement made on one block of the original resnet; the original resnet block is stacked by two 3 × 3 convolution kernels, plus a jump connection of identity transform; fig. 2 is an improvement on this, in which the first layer is replaced by four large scale convolution kernels with different directions, and then a 3 × 3 convolution kernel is added. The relevance in different directions can be obtained, the large-scale convolution kernel can increase the receptive field and obtain more information. Then stacking them together, compressing the dimension by using convolution kernel of 1X 1 to obtain a characteristic diagram with the same dimension as the input, and adding the characteristic diagram with the input; in order to reduce the parameter number of the network convolution kernels, the number of the convolution kernels of 5 branches in the first layer is equal to one half of the number of the input convolution kernels, after stacking, the number of the 1 multiplied by 1 convolution kernels is equal to the number of channels of the input feature diagram, the number of the channels of the input feature diagram is ensured to be the same as that of the channels of the output feature diagram, and direct addition can be carried out. Since the two 3 × 3 convolution kernels of the original resnet network correspond to a 5 × 5 receptive field, the scales of the four different directions of the first layer are set to 5 × 1, and the other is set to one 3 × 3 convolution kernel.

FIG. 3 is another block structure of convolutional neural cells proposed on the basis of FIG. 2, which expands the last branch of the first layer into a stack of two 3 × 3 convolutional kernels on the basis of FIG. 2; the structure can better extract other forms of correlation in the feature map, and the parameter setting is the same as that in FIG. 2.

FIG. 4 is a block diagram of another convolutional neural unit proposed in the present invention, in order to better extract the correlation in four different directions; stacking convolution kernels in different directions; the first layer splits into five branches, four branches in different directions and a 3 x 3 convolution kernel to extract spatial correlation except for the straight direction. Each branch becomes a stack of convolutions in two perpendicular directions to extract information from different angles; as the first branch is a stack of 1 x 3 transverse convolution kernels and 3 x 1 vertical convolution kernels; the second branch is a stack of 3 × 1 vertical convolution kernels and 1 × 3 horizontal convolution kernels; the third branch is a stack of a 1 × 3 right skewed convolution kernel and a 3 × 1 left skewed convolution kernel; the fourth branch is a stack of 1 x 3 left and 3 x 1 right skewed convolution kernels. Since the first layer is a convolution kernel of 3 × 1 and the second layer is a convolution kernel of 3 × 3, it is ensured that the same receptive field as before is obtained. Parametrically, the structure is the same as that of the convolution nerve unit shown in FIG. 2; considering the reason of parameter quantity, the number of channels of the convolution kernel of the first layer is set to be half of that of the input image, and the parameter quantity of the second layer is set to be consistent with that of the input feature map, so that the input feature map and the result thereof are conveniently added.

FIG. 5 shows an improvement on the bottleneck of the resnet structure, which is originally formed by stacking three layers of 1 × 1, 3 × 3, and 1 × 1 convolution kernels, and adding a jump connection from the input directly to the output; the architecture of fig. 5 is optimized for the middle layer network; the 3 x 3 convolution kernel is decomposed into four directional convolution kernels, one convolution of 3 x 3, for obtaining characteristic correlations other than the straight-line direction. In order to reduce the number of parameters, the number of convolution kernels of the first layer and the second layer is reduced by half in total, and the number of parameters is further reduced.

The neural network is designed according to the four neural network units, and comprises a convolution kernel with the length of m multiplied by m, c convolution neural unit blocks, a convolution kernel with the step length of d, e convolution neural unit blocks, a residual block with the step length of d, e convolution neural unit blocks, a pooling layer with the step length of f and the receptive field of f multiplied by f and a full connection layer which are connected in sequence.

The neural network designed by the invention can be used for image classification, and comprises the following steps:

step 1: constructing a neural network; a neural network can be constructed using python, tenserflow, or keras;

step 2: training the neural network constructed in the step 1;

The convolutional neural cell block designed by the invention can replace any resnet block, and image classification is carried out by taking resnet32 as an example, and the specific scheme is as follows:

s1: after the image is input, a feature map with 16 channels and the same size as the original image is obtained by a layer of convolution kernel with 16 channels and 3 multiplied by 3 receptive field; the parameter number is consistent with the content network parameter number, and the channel number is set to be 16 through experience.

S2: the connection passes through 10 blocks as shown in fig. 2, and a feature map with 16 channels and unchanged scale is obtained.

S3: the same purpose as pooling is achieved by a convolution kernel with the step length of 2 without using a pooling layer; meanwhile, in order to ensure that the enough degree is obtained, the number of the channels of the convolution kernel is doubled to 32; this step does not use the convolutional neural unit designed by the present invention, and uses the same residual block as resnet 32.

S4: the feature map with the channel number of 32 is obtained by 9 convolutional neural unit blocks as shown in fig. 2.

S5: the resnet residual block with step size 2 is passed through while the number of channels is doubled again to 64.

S6: the feature map with 64 channels is obtained by passing 9 convolutional neural unit blocks as shown in fig. 2.

S7: the step size is 8, and the receptive field is 8 multiplied by 8 pooling layers.

S8: and through the full connection layer, the classification prediction can be completed.

The parameter of the convolutional neural network is reduced by one eighteen times compared with that of the conventional resnet32, but the classification effect is similar to that of resnet 32; for example, compared with the parameter of the resnet block, the convolutional neural block shown in fig. 2 is adopted, assuming that the number of input channels is n, the parameter required by one resnet block is 2 × 3 × 3 × n ═ 18n, and the parameter required by the first layer of the convolutional neural block shown in fig. 2 is 4 × 5 × n × 0.5+3 × 3 × n × 0.5 ═ 14.5 n; the required parameters for the second layer are 1 × 1 × 2.5n — 2.5n, together 17 n; compared with the original content block, the number of the blocks is reduced by one eighteen.

Carrying out image classification on the convolutional neural network obtained by the design of the invention; performing an experiment by using a cifar10 data set, and adopting a random gradient descent method (SGD) on a training strategy; the initial learning rate is 0.1, and 250 rounds of training are carried out; at the 81 st round, the learning rate was modified to 0.01; at the 121 st round, the learning rate was changed to 0.001; at the 181 th round, the learning rate was changed to 0.0001. During training, a momentum parameter is set, and the momentum is 0.9. The loss function is a cross-entropy loss function. Meanwhile, in order to reduce overfitting, data enhancement is carried out on pictures of the cifar10 data set, wherein the data enhancement comprises random horizontal turning, horizontal and vertical micro translation and the like. Finally, 93.09% accuracy was obtained on the cifar10 dataset, which is 0.5% higher than the accuracy using the event 32, and the amount of parameters needed to be stored was reduced.

The diagonal convolution provided by the invention can directly obtain the correlation of the original characteristic diagram in the diagonal direction; the neural network constructed by the convolutional neural unit block designed by the invention has improved adaptability to space transformation. The square convolution kernels can be decomposed by convolution kernels in different directions, and the number of parameters is reduced under the condition that a large receptive field is ensured.

Claims

1. An image classification method based on a convolution nerve cell block is characterized in that the convolution nerve cell block comprises n convolution kernels in different directions and an m multiplied by m convolution kernel which is stacked and then connected with a 1 multiplied by 1 convolution kernels; the method also comprises jump connection of identity transformation, wherein a is equal to the number of channels of the input characteristic diagram; the convolutional neural unit block is used for replacing the unit block in the classification network for image classification; the convolution kernels in different directions comprise a left oblique convolution kernel bx 1 and a right oblique convolution kernel 1 x b; the left oblique convolution kernel is b × b convolution kernel all positions except the left oblique diagonal are all kept as 0; the right-diagonal convolution kernel is a b × b convolution kernel in which all positions except the right diagonal are all held at 0.

2. The method of claim 1, wherein each of the n different directions of the convolution kernels comprises a stack of two perpendicular directional convolutions.

3. The neural network image classification method based on the convolutional neural unit blocks as claimed in claim 1, which comprises a convolutional kernel of m × m, c convolutional neural unit blocks, a convolutional kernel with step length of d, e convolutional neural unit blocks, a residual block with step length of d, e convolutional neural unit blocks, a pooling layer with step length of f and receptive field of f × f, and a full connection layer, which are connected in sequence.

4. The method for classifying neural network images based on convolutional neural unit blocks as claimed in claim 3, comprising the steps of:

step 1: constructing a neural network;

step 2: training the neural network constructed in the step 1;