CN110458841A

CN110458841A - A method of improving image segmentation operating rate

Info

Publication number: CN110458841A
Application number: CN201910535642.4A
Authority: CN
Inventors: 张烨; 樊一超; 郭艺玲
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2019-11-15
Anticipated expiration: 2039-06-20
Also published as: CN110458841B

Abstract

A method of improving image segmentation operating rate, comprising: step 1 designs multiple dimensioned empty convolution kernel；Step 2 designs channel convolutional network；Step 3 designs full convolution connection and deconvolution network；The present invention can be suitable for arbitrary image size dimension by the network of deconvolution and full convolution operation, and can carry out semantic analysis to each pixel of image, achieve the purpose that Fast Segmentation image, and quickly accurate positioning can be carried out to characteristics of image.

Description

A method of improving image segmentation operating rate

Technical field

The present invention relates to a kind of methods for changing image segmentation rate.

Technical background

In recent years with the rapid development of computer science and technology, image procossing, image object based on computer technology Detection etc. also obtains unprecedented fast development, and wherein deep learning is extracted crucial by the digital picture feature of study magnanimity Target signature has been more than the mankind in target detection, is brought to industry one and another pleasantly surprised.With neuroid It rises once again, the video image method based on convolutional Neural metanetwork becomes the mainstream technology of image segmentation and identification, using template The means such as matching, Edge Gradient Feature, histogram of gradients, realization accurately identify image.Although figure neural network based Effective feature identification can be carried out for the target of complex scene as feature detects, and its effect is much better than traditional side Method, but there is also shortcomings: (1) it is to noise anti-interference weaker；(2) over-fitting is solved by using Dropout method Problem improves convolutional neural networks model and parameter, but precision is but declined slightly；(3) introduce changeable type convolution with can Convolutional coding structure is separated, the generalization of model is improved, enhances network model ability in feature extraction, but to the target of complex scene Identification performance is not good enough；(4) newer a kind of image partition method, i.e. End-to-End, direct forecast image pixel classifications information, The pixel positioning of target object is accomplished, but model the problems such as that there are parameter amounts is big, efficiency is slow, segmentation is coarse.In short, traditional There is cumbersome, accuracy of identification is not high, recognition efficiency is slow and divides the problems such as coarse for detection method and video image method.

Summary of the invention

In order to overcome the above-mentioned deficiency of the prior art, the present invention provides a kind of raising figure of full convolution for sample problem As the method for segmentation operating rate.The present invention uses deep learning frame, and convolutional neural networks are optimized and improved；It adopts The parameter amount of model is reduced with the method for channel convolution；Increase the feature of image using multiple dimensioned empty convolution, solves tradition The small problem of network receptive field.

To achieve the above object, the invention adopts the following technical scheme:

A method of image segmentation operating rate is improved, is included the following steps:

Step 1 designs multiple dimensioned empty convolution kernel；

In order to solve the problems, such as to increase receptive field by using traditional convolution sum maximum pond method, present invention employs Original convolution core is become " fluffy " based on sample rate rate is increased on traditional convolution kernel by empty convolution kernel.

In this way while keeping original calculation amount, receptive field is increased, so that the information of image segmentation is accurate enough, then Receptive field size calculation formula based on empty convolution kernel is

In formula: F is current layer receptive field size；Rate is the sample rate of empty convolution kernel, i.e. spacing number, can be by conventional roll The rate of product core is considered as 1, and the sample rate rate of empty convolution is considered as 2.Traditional convolution receptive field calculation formula is

In formula: F_i-1For upper one layer of receptive field size；k_iFor i-th layer of convolution kernel or Chi Huahe size；N is that convolution is total The number of plies；s_iFor the convolution step-length Stride of i-th layer of convolution kernel.

The thought of multi-scale image variation is used for reference to design multiple dimensioned empty convolution, sample rate, convolution kernel size are carried out Diversification processing, enables adaptation to different size clarification of objective extraction process.Multiple dimensioned cavity convolution is calculated as

In formula: y [i] is the corresponding convolution summed result in i-th of step-length position；K is convolution kernel；K is convolution kernel intrinsic parameter Coordinate position, k ∈ K；W [k] is convolution kernel weight；Rate is sample rate, can use 1,2,3 respective value.

Step 2 designs channel convolutional network；

Since traditional convolution mode is all a kind of liter dimension operation, it can be considered to use the side of channel convolution at the beginning Formula has the function that feature convolution dimensionality reduction.Firstly, traditional convolution is changed to two layers of convolution, similar in ResNet Group operation, this new construction shorten that calculate the time be about original 1/8 under the premise of not influencing accuracy rate, reduction parameter About original 1/9 is measured, and can be good at being applied to mobile terminal, realizes that the real-time detection of target, model compression effect are bright It is aobvious.

For traditional convolution, it is assumed that the feature port number of input is M；The width of convolution kernel or high respectively D_kOr D_k； The quantity of convolution kernel is N.Then just there is N number of MD in the every once a certain position of sliding of convolution_k·D_kParameter amount, the step-length of sliding sets It is set to s.Then the picture size size calculation formula after sliding is

In formula: h', w' are the height and width after convolution；Pad is the highly filled boundary of width.Therefore, size after h'w' convolution Certain point corresponds to N number of MD_k·D_kParameter amount, then can be obtained total ginseng population size be NMD_k·D_k·h'·w' (6)

And improved channel convolution mode is used, convolution step is divided into two steps:

1) D is used_k·D_kThe convolution of M carries out convolution to M channel respectively.It is slided, is rolled up using same step-length s Size after product is h', w', then the parameter amount that the step generates is D_k·D_k·M·h'·w' (7)

2) convolution kernel that 11N is arranged carries out liter dimensional feature and extracts.Using step-length at this time is 1 mode to the above results The characteristic pattern obtained carries out feature extraction again, and original M channel characteristics, each carries out feature extraction using N number of convolution kernel, The Headquarters of the General Staff population size then calculated is MNh'w'11 (8)

The convolutional coding structure of the two comprehensive steps, obtaining the last ginseng population size of channel convolution is D_k·D_k·M·h'· w'+M·N·h'·w' (9)

As previously mentioned, the parameter amount and improved channel deconvolution parameter amount of traditional convolution kernel compare size is

From formula (10), analysis can be obtained, if using convolution kernel size, channel convolution operation can be by parameter amount for 3 × 3 It is reduced to original 1/9.

Step 3 designs full convolution connection and deconvolution network；

Traditional network structure final layer is using fixed size, so that the picture of input need to be converted into admittedly in advance Scale cun, is unfavorable for the acquisition of logistics vehicles vehicle commander's coordinate；And there is determining digit space and sit in traditional full connection layer network Mark is lost, and leads to image space information distortion, fails effectively to be accurately positioned target.To solve the problems, such as that information is lost, this Invention uses full convolution connection type and is accurately positioned to the position coordinates of feature in picture.

The convolutional network [b, c, h, w] of preceding part is switched to [b, chw] by the full connection of traditional network, i.e., [b, 4096], then switch to [b, cls], wherein b indicates batch batch size size, and cls indicates classification number.And use full convolution net Network is relative to the convolutional network for being followed by 1 × 1, without full articulamentum.Therefore, referred to as full convolutional network.The calculating side of full convolution Method is

In formula: 1≤n≤N；y_n[i] [j] is the numerical value after the position (i, j) convolution of n-th of convolution kernel；s_iIt is lateral Convolution step-length；s_jFor longitudinal convolution step-length；k_nFor n-th of convolution kernel；D_kIt is wide and high for convolution kernel, the corresponding step of convolution kernel size D in rapid 2_k·D_k；δ_i, δ_jFor the position in the convolution kernel, this layer of a total of N number of different types of convolution kernel, 0≤δ_i,δ_j≤ D_k, and the sliding convolution operation of convolution kernel can switch to two matrix multiple operations.The pixel of correspondence image and the result of convolution can It is expressed as

Wherein: the matrix dimensionality on the left side is [N, MD_k·D_k]；The matrix dimensionality on the right is [MD_k·D_k,w′·h′]； Dimension after convolution is [N, w ' h '].I is img in the matrix on the right, and subscript is followed successively by image width and image height, i.e. I_wh。

It is operated finally by deconvolution, [N, w ' h '] is switched to image size when input, can accurately be known in this way The specific semantic information that not each pixel represents, and avoid loss of spatial information.The concrete operations of deconvolution, are equivalent to convolution Inverse operation, i.e.,

In formula: k₁..., k_NThe corresponding weight of a convolution kernel is by originalVariation isIt should Weight is the weight by training weight size adjusted, with image, semantic information characteristics.

Therefore, arbitrary image size dimension can be suitable for by the network of deconvolution and full convolution operation, and can be right Each pixel of image carries out semantic analysis, achievees the purpose that Fast Segmentation image, and can carry out to characteristics of image fast Fast accurate positioning.

The invention has the advantages that

The present invention improves image segmentation operating rate using a kind of method of full convolution for sample problem, most prominent Out the characteristics of is to have carried out light-weight technologg to image, in the case where guaranteeing segmentation precision, improves the segmentation efficiency of model, Reduce the parameter amount of model by way of the convolution of channel；It is provided with multiple dimensioned empty convolution kernel again, rationally and simply The receptive field for improving model, enhances the generalization of model.The algorithm can be widely applied to framing identification field, than Such as Logistics Park vehicle identification.

Detailed description of the invention

Fig. 1 is existing traditional convolution nuclear convolution operation chart；

Fig. 2 is the convolution operation schematic diagram of improved empty convolution kernel of the invention；

Fig. 3 a~Fig. 3 c is multiple dimensioned empty convolution kernel of the invention, and Fig. 3 a is the empty convolution kernel that sample rate is 1, Fig. 3 b It is the empty convolution kernel that sample rate is 2, Fig. 3 c is the empty convolution kernel that sample rate is 3；

Fig. 4 is existing convolution mode；

Fig. 5 is channel convolution mode of the invention；

Fig. 6 is channel convolutional coding structure of the invention；

Fig. 7 is full convolutional network design structure of the invention；

Fig. 8 is full convolution matrix calculating process schematic diagram of the invention.

Note: in Fig. 6, DW is channel convolution group, indicates the regular collocation of channel convolution kernel composition；BN is batch normalization behaviour Make, solves the problem of that middle layer data distribution changes in the training process；Conv is convolution layer operation；RelU is amendment Linear unit is an activation primitive.

Note: in Fig. 8: k₁..., k_NFor convolution kernel number；For the position weight of n-th of convolution kernel.

Specific embodiment

In order to overcome the above-mentioned deficiency of the prior art, the present invention provides a kind of image point of full convolution for sample problem Segmentation method is optimized and is improved using deep learning frame, and to convolutional neural networks；It is reduced using the method for channel convolution The parameter amount of model；The feature for increasing image using multiple dimensioned empty convolution, solves the problems, such as that traditional network receptive field is small.

Step 1 designs multiple dimensioned empty convolution kernel；

Step 2 designs channel convolutional network；

Step 3 designs full convolution connection and deconvolution network；

In order to verify the superiority of the invention, using Logistics Park vehicle as example, following network model is constructed, is compareed Experiment:

Firstly, carrying out network struction: acquiring cargo, dragon wagon, dumper, tank truck four from Logistics Park The logistics vehicles of seed type, are divided into training set 8 000, each classification 2 000, and test set 4 000, every one kind Other 1 000.Each parameter configuration of the network architecture built is as shown in table 1 below.

In table 1: k is convolution kernel size；S is step-length；P is the size of filling；DW is channel convolution group, indicates channel convolution The regular collocation of core composition；Residual error summation has been used to be conducive to the gradient transmitting of big network；The activation of each layer and batch standardize Operation (Batch Normalization, BN) is conducive to accelerate the training of network；ReLU is amendment linear unit, is one and swashs Function living.

Each parameter designing of 1 network architecture of table

The allocation of computer that this example uses reaches 11 G of GTX1080Ti video memory for Jijia NVIDIA is tall and handsome, 1 607 MHz's Video card.

Finally, compared the model measurement performance of this example network and traditional network, the results are shown in Table 2.

2 lightweight parted pattern performance comparison of table

Evaluation index MPA in table 2 indicates mean pixel point accuracy rate (Mean pixel accuracy)；Before MA expression Scape area accounts for the ratio (Mean accuracy) of label area；And MIOU indicate it is average hand over and with area coverage ratio (Mean Intersection over union), that is, predict that correct region accounts for the ratio of prediction area and label area union；Unit Mpic-1 indicates the training occupied memory of one picture, memory unit million (M)；Unit msiter-1 indicates every iteration one The time of secondary needs, chronomere's millisecond (ms)；After the convolution of channel, the video memory of occupancy reduces 51%, and training speed mentions 78% is risen, test speed improves 79%, divides and is all substantially improved in every evaluation index of positioning, wherein MIOU Promotion amplitude is maximum.

By this example, the performance of model measurement can actually be improved by demonstrating this improved method, i.e. raising image segmentation Operating rate.

The advantages of this programme, is:

Content described in this specification embodiment is only enumerating to the way of realization of inventive concept, protection of the invention Range should not be construed as being limited to the specific forms stated in the embodiments, and protection scope of the present invention is also and in art technology Personnel conceive according to the present invention it is conceivable that equivalent technologies mean.

Claims

1. a kind of method for improving image segmentation operating rate, includes the following steps:

Step 1 designs multiple dimensioned empty convolution kernel；

In order to solve the problems, such as to increase receptive field by using traditional convolution sum maximum pond method, empty convolution is used Original convolution core is become " fluffy " based on sample rate rate is increased on traditional convolution kernel by core；

In this way while keeping original calculation amount, receptive field is increased, so that the information of image segmentation is accurate enough, is then based on The receptive field size calculation formula of empty convolution kernel is

In formula: F is current layer receptive field size；Rate is the sample rate of empty convolution kernel, i.e. spacing number, can be by traditional convolution kernel Rate be considered as 1, and the sample rate rate of empty convolution is considered as 2；Traditional convolution receptive field calculation formula is

In formula: F_i-1For upper one layer of receptive field size；k_iFor i-th layer of convolution kernel or Chi Huahe size；N is the total number of plies of convolution； s_iFor the convolution step-length Stride of i-th layer of convolution kernel；

The thought of multi-scale image variation is used for reference to design multiple dimensioned empty convolution, multiplicity is carried out to sample rate, convolution kernel size Change processing, enables adaptation to different size clarification of objective extraction process；Multiple dimensioned cavity convolution is calculated as

In formula: y [i] is the corresponding convolution summed result in i-th of step-length position；K is convolution kernel；K is convolution kernel intrinsic parameter coordinate Position, k ∈ K；W [k] is convolution kernel weight；Rate is sample rate, can use 1,2,3 respective value；

Step 2 designs channel convolutional network；

Since traditional convolution mode is all a kind of liter dimension operation, reach feature by the way of the convolution of channel at the beginning The effect of convolution dimensionality reduction；Firstly, traditional convolution is changed to two layers of convolution, operated similar to the group in ResNet, it is this new Structure shortens that calculate the time be about original 1/8 under the premise of not influencing accuracy rate, reduces parameter amount is about original 1/9, And it can be good at being applied to mobile terminal, realize that the real-time detection of target, model compression effect are obvious；

For traditional convolution, it is assumed that the feature port number of input is M；The width of convolution kernel or high respectively D_kOr D_k；Convolution The quantity of core is N；Then just there is N number of MD in the every once a certain position of sliding of convolution_k·D_kParameter amount, the step-length of sliding is set as s；Then the picture size size calculation formula after sliding is

In formula: h', w' are the height and width after convolution；Pad is the highly filled boundary of width；Therefore, size is a certain after h'w' convolution The corresponding N number of MD of point_k·D_kParameter amount, then total ginseng population size, which can be obtained, is

N·M·D_k·D_k·h'·w' (6)

1) D is used_k·D_kThe convolution of M carries out convolution to M channel respectively；It is slided using same step-length s, after convolution Size be h', w', then the step generate parameter amount be

D_k·D_k·M·h'·w' (7)

2) convolution kernel that 11N is arranged carries out liter dimensional feature and extracts；Step-length is used to obtain for 1 mode to the above results at this time Characteristic pattern carry out feature extraction again, original M channel characteristics, each using N number of convolution kernel progress feature extraction, then count The Headquarters of the General Staff population size of calculation is

M·N·h'·w'·1·1 (8)

The convolutional coding structure of the two comprehensive steps, obtaining the last ginseng population size of channel convolution is

D_k·D_k·M·h'·w'+M·N·h'·w' (9)

From formula (10), analysis can be obtained, and channel convolution operation reduces parameter amount；

Step 3 designs full convolution connection and deconvolution network；

Traditional network structure final layer is using fixed size, so that the picture of input need to be converted into fixed ruler in advance It is very little, it is unfavorable for the acquisition of logistics vehicles vehicle commander's coordinate；And there is determining digit space coordinate and lose in traditional full connection layer network It loses, leads to image space information distortion, fail effectively to be accurately positioned target；To solve the problems, such as that information is lost, use Full convolution connection type is accurately positioned the position coordinates of feature in picture；

The convolutional network [b, c, h, w] of preceding part is switched to [b, chw] by the full connection of traditional network, i.e., [b, 4096], then Switch to [b, cls], wherein b indicates batch batch size size, and cls indicates classification number；And it is opposite for using full convolutional network In the convolutional network for being followed by 1 × 1, without full articulamentum；Therefore, referred to as full convolutional network；The calculation method of convolution is entirely

In formula: 1≤n≤N；y_n[i] [j] is the numerical value after the position (i, j) convolution of n-th of convolution kernel；s_iFor lateral convolution Step-length；s_jFor longitudinal convolution step-length；k_nFor n-th of convolution kernel；D_kWide and high for convolution kernel, convolution kernel size corresponds in step 2 D_k·D_k；δ_i, δ_jFor the position in the convolution kernel, this layer of a total of N number of different types of convolution kernel, 0≤δ_i,δ_j≤D_k, and The sliding convolution operation of convolution kernel can switch to two matrix multiple operations；The pixel of correspondence image and the result of convolution are represented by

Wherein: the matrix dimensionality on the left side is [N, MD_k·D_k]；The matrix dimensionality on the right is [MD_k·D_k,w′·h′]；Convolution Dimension afterwards is [N, w ' h ']；I is img in the matrix on the right, and subscript is followed successively by image width and image height, i.e. I_wh；

It is operated finally by deconvolution, [N, w ' h '] is switched to image size when input, can accurately identified so every The specific semantic information that one pixel represents, and avoid loss of spatial information；The concrete operations of deconvolution are equivalent to the inverse of convolution Operation, i.e.,

In formula: k₁..., k_NThe corresponding weight of a convolution kernel is by originalVariation is The weight is By training weight size adjusted, the weight with image, semantic information characteristics.