CN109215034A

CN109215034A - A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid

Info

Publication number: CN109215034A
Application number: CN201810739297.1A
Authority: CN
Inventors: 朱策; 段昶; 文宏雕; 徐榕健
Original assignee: Chengdu Map Technology Co Ltd
Current assignee: Chengdu Map Technology Co Ltd
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2019-01-15
Anticipated expiration: 2038-07-06
Also published as: CN109215034B

Abstract

The invention discloses a kind of Weakly supervised image, semantic dividing methods that pond is covered based on spatial pyramid, comprising the following steps: selectes a convolutional neural networks H, handles input picture X by convolutional neural networks H, obtain characteristic of division figure；Spatial pyramid pond module is established according to characteristic of division figure, then carries out spatial pyramid cover, obtains output characteristic pattern；Classification is calculated according to output characteristic pattern and activates vector sum class probability vector, is then established competitive spatial pyramid and is covered pond loss function；Pond loss function training convolutional neural networks H is covered according to competitive spatial pyramid and extracts segmentation characteristic pattern.The present invention realizes richer local feature, the provincial characteristics more perfect and target sizes of excavation and the more robust Weakly supervised image, semantic parted pattern of posture, the extractability for improving local semantic information strengthens localized target or the recognition capability at position in Weakly supervised semantic segmentation.

Description

A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid

Technical field

The invention belongs to technical field of computer vision, and in particular to a kind of weak prison that pond is covered based on spatial pyramid Superintend and direct image, semantic dividing method.

Background technique

Image, semantic is divided as basic Computer Vision Task, and target is to be divided all pixels in image Class.Since it can be fully understood by image in Pixel-level, so all for other visual tasks such as image classification and target identification There is booster action.But because the production of Pixel-level label data needs to expend great effort, the image, semantic of full monitor mode Segmentation is difficult to fast implement extensive extension, and the Weakly supervised image, semantic dividing method for then relying on image class label is extensive Research.

In past computer vision research, there are many pyramid model application.Famous SIFT algorithm passes through La Pula This pyramid extracts key point Expressive Features, becomes a kind of detective operators with Scale invariant property.Later, pyramid model It is successfully applied to convolutional neural networks.(spatial pyramid pooling, SPP) be used to roll up in spatial pyramid pond Local features are extracted in product neural network tail portion, and SPP obtains preferably effect in image classification and target identification.Cavity Spatial pyramid (atrous spatial pyramid pooling, ASPP) module is based on multiple dispersion ratio (dilated Rate empty convolution (dilated convolution)) obtains pyramid feature, and is applied to image, semantic and divides.Golden word Characteristic pattern is sliced into different spaces and explores part and whole semantic relation by tower scene cut model, by entirety and part The semantic information in region compares and combines to obtain more robust segmentation result.

There are no relatively successfully applications in Weakly supervised semantic segmentation for pyramid model.On the one hand, with global pool knot The sorter network of tail is only capable of obtaining target-recognition region due to learning in a manner of Weakly supervised, lacks mentioning for local region information Ability is taken, the shortcomings that pyramid model obviously can make up global pool due to the characteristic of its own.On the other hand, pyramid mould Although type has the characteristics of multi-scale information and local message merge, but how preferably on the basis of grasping global information The problem that time important semantic information is excavated in subregion is fully solved not yet.

Summary of the invention

It is an object of the invention to: above-mentioned deficiency in the prior art is solved, is provided a kind of based on spatial pyramid cover The Weakly supervised image, semantic dividing method in pond, improves the extractability of local semantic information, strengthens Weakly supervised semantic point Cut the recognition capability at middle localized target or position.

To achieve the goals above, the technical solution adopted by the present invention are as follows:

A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid, comprising the following steps:

Step 1: selecting a convolutional neural networks H, handles input picture X by convolutional neural networks H, is classified Characteristic pattern；

Step 2: establishing spatial pyramid pond module according to characteristic of division figure, then carries out spatial pyramid cover, obtains To output characteristic pattern；

Step 3: calculating classification according to output characteristic pattern and activate vector sum class probability vector, then establishes competitive empty Between pyramid cover pond loss function；

Step 4: it covers pond loss function training convolutional neural networks H according to competitive spatial pyramid and extracts and divide Cut characteristic pattern.

Further, the step of input picture X being handled by convolutional neural networks H in above-mentioned steps one specifically:

Step 101: selecting a convolutional neural networks H, map to obtain segmentation characteristic pattern by the convolutional neural networks HWherein, C is target category quantity, and W is weight parameter, For real number field；

Step 102: characteristic pattern dimensionality reduction will be divided by 1 × 1 convolution, obtain characteristic of division figureIt calculates public Formula are as follows:

F_cls=F_seg×W+b

Wherein, b is biasing,

Further, above-mentioned step two specifically:

Step 201: average pond being carried out to all characteristic of division figures, while specifying total pyramid number and Chi Huahe big It is small, obtain corresponding pyramid pond characteristic pattern P_i, wherein i is pyramid serial number, and value range arrives total pyramid number between 1 Between mesh, i-th of pyramid pond characteristic pattern is calculatedCalculation formula are as follows:

Wherein, k_iFor i-th of pyramidal pond core size, x is characterized figure abscissa, and y is characterized figure ordinate, △ k_x For pond core abscissa, △ k_yFor pond core ordinate, c is semantic classes quantity；

Step 202: each pyramid pond characteristic pattern being generated and covers tensorRegion is concurrently set to cover Lid ratio and cover quantization parameter；

Step 203: according to the output characteristic pattern for covering tensor computation spatial pyramid cover pondMeter Calculate formula are as follows:

O_i=P_i⊙M_i

Wherein, ⊙ indicates hadamard product.

Further, in above-mentioned steps 202, remember that i-th of pyramid covers tensor and beRemember jth eka-gold Word tower covers tensor, the region that the different classes of pyramid covers tensor covers position and covers quantization Coefficient is consistent:

The ratio that the cover region generated at random accounts for the pyramid pond number of regions is τ,

Wherein, the value range of τ is [0,1], and the value of τ is bigger, and the region for indicating to cover is more,It indicates (1-τ)H_iThe value that W is obtained is rounded downwards, and I () indicates that indicator function, I () take 1 when input condition meets, other situations Under take 0, i.e.,The position of satisfaction is to be retained region, and other situations are the region covered；

The cover quantization parameter ρ is for inhibiting signal；

Wherein, the value range of ρ is [0,1].

Further, classification is calculated in above-mentioned steps three activate vector sum class probability vector specifically:

Step 301: calculating classification and activate vector o_i, i.e., the output characteristic pattern that spatial pyramid covers pond is classified as one Value, calculation formula are as follows:

Step 302: class probability vector being calculated by Softmax function, obtains the probability of each semantic classes to be determined Value, calculation formula are as follows:

Further, in above-mentioned steps three, establish competitive spatial pyramid cover pond loss function specifically include with Lower step:

Step 303: calculating the Classification Loss l of each pyramid grade_i, calculation formula are as follows:

Wherein,1 other places are taken to take 0, T for image, semantic class label vector and at the corresponding semantic classes target of appearance For transposition symbol；

Step 304: the 0th grade of global pool information being fully retained, other pyramids for having carried out information cover etc. are grading Mobile state competition, calculation formula are as follows:

Wherein, l_clsFor loss function of always classifying, N is pyramid number of levels, l₀For the damage of global pool pyramid grade It loses；

Further, above-mentioned step four specifically: error is calculated with loss function of always classifying and is calculated by backpropagation Method adjusts network parameter after the opposite variation of the loss function is less than 5%, directly by F_segIt takes out and is taken in classification dimension Prediction segmentation figure can be obtained in argmax

Y=argmax (F_seg)

Wherein, argmax F_segThe 3rd dimension.

By adopting the above-described technical solution, the beneficial effects of the present invention are:

It is more robust that the present invention realizes richer local feature, the provincial characteristics more perfect and target sizes of excavation and posture Weakly supervised image, semantic parted pattern.

The present invention obtains finer segmentation result to preferably excavate local message, to spatial pyramid pond module Increase cover mechanism, in addition to the diversity of extension feature figure, identification region can also be inhibited and then encourage identification more times The study in region；

The case where misplacing just present invention is alternatively directed to different pyramidal cover regions, proposes that auxiliary space pyramid is covered The competitive spatial pyramid of pondization training covers pond loss function, reduces the risk of network training failure；

Global pool information of the invention make network to the scale of target will not tetchiness, and spatial pyramid is covered Gai Chihua more has versatility in other visual tasks.

Detailed description of the invention

Fig. 1 is the Weakly supervised image, semantic parted pattern schematic diagram of the invention that pond is covered based on spatial pyramid.

Fig. 2 is that spatial pyramid of the invention covers pond structural schematic diagram.

Fig. 3 is that the Weakly supervised image, semantic of the invention for covering pond based on spatial pyramid divides flow diagram.

Fig. 4 is segmentation result contrast schematic diagram of the invention.

Specific embodiment

Referring to attached drawing 1-4, specific description is done to embodiments of the present invention.

F_cls=F_seg×W+b

Wherein, b is biasing,

Further, above-mentioned step two specifically:

O_i=P_i⊙M_i

Wherein, ⊙ indicates hadamard product.

The cover quantization parameter ρ is for inhibiting signal；

Wherein, the value range of ρ is [0,1].

Y=argmax (F_seg)

Wherein, argmax F_segThe 3rd dimension.

The present embodiment verifies the validity of the method for the invention by Comparative result that image, semantic is divided.

Step 1 chooses DeepLab v2 network as depth characteristic extraction model, input pictureSource In PASCAL VOC semantic segmentation data set, the segmentation characteristic pattern handled by H

Step 2, segmentation characteristic pattern obtain characteristic of division figure by 1 × 1 convolution

Characteristic of division figure is carried out spatial pyramid pond by step 3, and pond core size is respectively 40,20,10,8 and 5, Obtain corresponding pyramid pond characteristic pattern With

Step 4 generates and covers tensor.It is 0 that ratio τ is covered in 0th grade of global pool region, and other pyramid hierarchical regions are covered Lid ratio τ and cover quantization parameter ρ is respectively 0.25 and 0.0.Tensor is so covered to meet:

The output characteristic pattern that each pyramid grade covers pond is calculated in step 5

Step 6 calculates classification activation vector:

Step 7: calculating class probability vector

Step 8, the intersection entropy loss for calculating each hierarchical pyramid,

Step 9 calculates competitive spatial pyramid cover pond loss function,

Step 10 repeats step 1 to step 9 progress network training to the more image samples of convolutional neural networks input, leads to Competitive spatial pyramid is crossed to cover pond loss function reverse propagated error and then update network.When test, one is inputted newly Image and extract segmentation characteristic patternPrediction segmentation figure Y is obtained after seeking argmax again.

It is noted that the setting of ratio τ and cover quantization parameter ρ that input image resolution, region are covered specifically are being flowed It is an example in journey, choosing also in the protected range of this patent in other situations.With average friendship and than (mIoU) As evaluation index, spatial pyramid cover pondization can by PASCAL VOC (Everingham, M., Van Gool, L., Williams,C.K.I.,Winn,J.and Zisserman, A.International Journal of Computer Vision, 88 (2), 303-338,2010) test set performance comparison such as the following table 1:

Table 1

Spatial pyramid cover pond is obvious to the promotion of performance, and accuracy rate alreadys exceed PASCAL VOC submission The full monitor model FCN-8s of list.Spatial pyramid cover pondization and global pool metaplasia at prediction segmentation result comparison such as Fig. 4. Multiple groups test comparative descriptions spatial pyramid and cover pondization with Small object identification, semantic relation correction and fringe region optimization Ability.In conclusion it is to have for the improvement that Weakly supervised image, semantic is divided that spatial pyramid proposed by the present invention, which covers pond, Effect.

Claims

1. a kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid, comprising the following steps:

Step 1: selecting a convolutional neural networks H, handles input picture X by convolutional neural networks H, obtains characteristic of division Figure；

Step 2: establishing spatial pyramid pond module according to characteristic of division figure, then carries out spatial pyramid cover, obtains defeated Characteristic pattern out；

Step 3: calculating classification according to output characteristic pattern and activate vector sum class probability vector, then establishes competitive space gold Word tower covers pond loss function；

Step 4: covering pond loss function training convolutional neural networks H according to competitive spatial pyramid and extracts segmentation spy Sign figure.

2. a kind of Weakly supervised image, semantic dividing method that pond is covered based on spatial pyramid according to claim 1, It is characterized in that the step of handling input picture X by convolutional neural networks H in the step 1 specifically:

Step 102: characteristic pattern dimensionality reduction will be divided by 1 × 1 convolution, obtain characteristic of division figureCalculation formula Are as follows:

F_cls=F_seg×W+b

Wherein, b is biasing,

3. a kind of Weakly supervised image, semantic dividing method that pond is covered based on spatial pyramid according to claim 1, It is characterized in that the step two specifically:

Step 201: average pond being carried out to all characteristic of division figures, while specifying total pyramid number and pond core size, is obtained To corresponding pyramid pond characteristic pattern P_i, wherein i be pyramid serial number, value range between 1 to total pyramid number it Between, calculate i-th of pyramid pond characteristic patternCalculation formula are as follows:

Wherein, k_iFor i-th of pyramidal pond core size, x is characterized figure abscissa, and y is characterized figure ordinate, △ k_xFor pond Change core abscissa, △ k_yFor pond core ordinate, c is semantic classes quantity；

Step 202: each pyramid pond characteristic pattern being generated and covers tensorIt concurrently sets region and covers ratio Example and cover quantization parameter；

Step 203: according to the output characteristic pattern for covering tensor computation spatial pyramid cover pondIt calculates public Formula are as follows:

O_i=P_i⊙M_i

Wherein, ⊙ indicates hadamard product.

4. a kind of Weakly supervised image, semantic dividing method that pond is covered based on spatial pyramid according to claim 3, It is characterized in that remembering that i-th of pyramid covers tensor and be in the step 202Remember that jth pyramid-like is covered Tensor isIt is consistent with quantization parameter is covered that position is covered in the region that the different classes of pyramid covers tensor:

Wherein, the value range of τ is [0,1], and the value of τ is bigger, and the region for indicating to cover is more,It indicates (1- τ) H_iThe value that W is obtained is rounded downwards, and I () indicates that indicator function, I () take 1 when input condition meets, take 0 in other situations, I.e.The position of satisfaction is to be retained region, and other situations are the region covered；

The cover quantization parameter ρ is for inhibiting signal；

Wherein, the value range of ρ is [0,1].

5. a kind of Weakly supervised image, semantic dividing method that pond is covered based on spatial pyramid according to claim 1, It is characterized in that calculating classification in the step 3 activates vector sum class probability vector specifically:

Step 301: calculating classification and activate vector o_i, i.e., the output characteristic pattern that spatial pyramid covers pond is classified as a value, counted Calculate formula are as follows:

Step 302: class probability vector being calculated by Softmax function, obtains the probability value of each semantic classes to be determined, is counted Calculate formula are as follows:

6. a kind of Weakly supervised image, semantic dividing method that pond is covered based on spatial pyramid according to claim 1, It is characterized in that in the step 3, establish competitive spatial pyramid cover pond loss function specifically includes the following steps:

Wherein,Taking 1 other places to take 0, T for image, semantic class label vector and at the corresponding semantic classes target of appearance is to turn Set symbol；

Step 304: the 0th grade of global pool information being fully retained, other pyramid grades for having carried out information cover are moved State competition, calculation formula are as follows:

Wherein, l_clsFor loss function of always classifying, N is pyramid number of levels, l₀For the loss of global pool pyramid grade；

7. a kind of Weakly supervised image, semantic dividing method that pond is covered based on spatial pyramid according to claim 1, It is characterized in that the step four specifically: calculate error with loss function of always classifying and adjust net by back-propagation algorithm Network parameter is after the opposite variation of the loss function is less than 5%, directly by F_segIt takes out and takes argmax that can obtain in classification dimension To prediction segmentation figure

Y=argmax (F_seg)

Wherein, argmax F_segThe 3rd dimension.