CN111695569A

CN111695569A - Image pixel level classification method based on multi-segmentation-map fusion

Info

Publication number: CN111695569A
Application number: CN202010397565.3A
Authority: CN
Inventors: 姚莉; 乔昂
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2020-09-22
Anticipated expiration: 2040-05-12
Also published as: CN111695569B

Abstract

The invention discloses an image pixel level classification method based on multi-segmentation-map fusion. The method comprises three main steps, firstly, a guiding mechanism is introduced among a plurality of segmentation graphs, and a segmentation graph with higher precision is used for guiding a segmentation graph with lower precision to improve the precision. Then, a consensus mechanism is introduced, and classification consensus is achieved among edge area pixel points which may generate classification conflicts in each graph through a negotiation strategy. And finally, a fusion strategy based on a full convolution neural network is used, and the two mechanisms are effectively combined to obtain final output. The invention can effectively solve the problem of pixel classification conflict of the edge region, and a plurality of segmentation maps are fused to obtain a pixel classification result with finer granularity. The method can be used in combination with a variety of techniques including, but not limited to, supervised learning methods such as deep neural networks, random forests, and support vector machines. By using the method, the defect that the existing method attaches importance to the conflict pixel point can be effectively overcome, and a fusion segmentation result with higher quality is obtained.

Description

Image pixel level classification method based on multi-segmentation-map fusion

Technical Field

The invention relates to an image processing and analyzing technology, and belongs to the technical field of image content understanding.

Background

Image content understanding is an important research goal in the field of computer vision. With the continuous development of computer vision technology, the understanding of image content is also developing in the direction of finer granularity. Segmentation, that is, classification of pixel-size images, is one of the important methods for understanding the content of images, and how to achieve a finer-grained classification effect on the basis of the existing technology is the focus of current research. This inevitably involves the problem of pixel point classification conflicts between different prior art techniques, which generally occur at the edge of the partitioned content in different parts. The existing method still does not provide an effective solution for several kinds of conflict problems existing at present:

1) classification conflicts between different segmentations of the same foreground object.

2) Classification conflicts between different partitions of the same background content.

3) The classification of the contiguous part of the foreground object and background content conflicts.

Disclosure of Invention

The invention aims to solve the problems of classification conflicts among different partitions of the same foreground object, classification conflicts among different partitions of the same background content and classification conflicts of a connected part of the foreground object and the background content.

In order to achieve the purpose, the method adopted by the invention is as follows: an image pixel level classification method based on multi-segmentation map fusion comprises the following steps:

(1) a guiding mechanism is introduced between the multiple segmentation maps. Depending on the quality of the segmentation of the content by the respective portions, the use of an attention mechanism to provide high-precision content to low-precision portions makes it of great interest. Taking the application scenario of the two segmentation map fusion as an example, if the input to be fused is a foreground object and background content segmentation map, the classification accuracy of the edge part is slightly low due to higher requirements on local and global semantics compared with the foreground object segmentation, and the classification result of the foreground object segmentation in the edge area can be used as attention to supplement the semantic information of the background content segmentation in the corresponding area. The multi-partition situation can be extrapolated accordingly.

(2) A consensus mechanism is introduced among the multi-segmentation graphs, and classification conflicts possibly existing in the edge area are resolved by learning a consensus mask in the learning stage of the supervised learning method. Taking the application scenario of the two segmentation maps as an example, the two segmentation maps are coded as 0 and 1, respectively. The common identification mask is a binary mask, which is initially a 0-value mask, and if the (i, j) position value is 0, the two parties achieve that the classification of the pixel at the corresponding position of the input image is the same as that of the 0-code division image, and if the value is 1, the opposite is true. In the learning process, under the guidance of the corresponding item of the loss function, the two parts negotiate continuously to obtain a more reasonable consensus mask result. The multi-partition situation can be extrapolated accordingly.

(3) A fusion strategy based on a full convolution neural network is used, the two mechanisms are effectively combined, the final output is obtained, and two segmentation graph application scenes are taken as an example, and a multi-segmentation graph scene can be deduced according to the two segmentation graph application scenes.

As an improvement of the present invention, the fusion strategy is implemented by the following sub-steps:

(3.1) finishing initialization work, and performing size registration on all the object segmentation blocks obtained in the previous step by taking the size of the input original image as a reference;

(3.2) removing repeated segmentation blocks, and if different segmentation blocks of the same object are not completely overlapped, keeping the overlapped part; solving the pixel attribution problem of the overlapping region between different objects by the subsequent substep;

(3.3) adjusting the contour range, inputting the segmentation blocks of all classes of objects into a full convolution neural network of a coder-decoder structure, and adjusting the contour range of each object through learning;

and (3.4) combining the masks, combining the object masks obtained in the previous step, and judging the attribution of the pixel points in the edge overlapping area according to the consensus masks.

Has the advantages that:

(1) the invention solves the problem of pixel classification conflict between different segmentations of the same foreground object and the same background content of the image by introducing a guidance mechanism between different segmentation images.

(2) The invention solves the problem of classification conflict of the pixels in the edge area between the image foreground object and the background content by introducing a consensus mechanism among different segmentations.

(3) The method provided by the invention can be used in cooperation with various supervised learning methods, including but not limited to deep neural networks, random forests, support vector machines and other methods, and can improve the classification precision of the pixel level and remarkably improve the segmentation quality by matching with corresponding loss functions through a final fusion strategy.

Drawings

FIG. 1 is a flow diagram of an overall multi-partition inter-image pixel classification conflict resolution scheme;

FIG. 2 is a schematic diagram of an attention-based multi-partition guidance mechanism;

FIG. 3 is a schematic diagram of a multi-partition consensus mechanism based on consensus masks;

FIG. 4 is a flow diagram of a full convolutional neural network based multi-partition fusion strategy;

Detailed Description

The following examples are intended to illustrate the present invention, but are not intended to limit the scope of the present invention.

The following is a process for solving pixel conflicts for multiple segmentation maps in conjunction with the accompanying drawings.

As shown in FIG. 1, which is an overall flow chart of the solution of the present invention for the pixel classification conflict between multi-segmentation images, the method of the present invention includes the following steps:

(1) a guiding mechanism is introduced, according to the segmentation quality of each part of segmentation content, a high-precision content is provided for a low-precision part by using an attention mechanism so as to pay attention to the low-precision part, semantic information of a corresponding region is supplemented, and the segmentation precision of the low-precision part is improved;

(2) introducing a consensus mechanism, and under the guidance of a corresponding item of a loss function, continuously negotiating and learning a consensus mask on a pixel point of an edge region in a learning stage of a supervised learning method to resolve possible classification conflicts in the edge region;

(3) and (3) integrating the results of the two mechanisms by using a fusion strategy based on a full convolution neural network to obtain final output.

The invention will be further described below by way of two segmentation figures, from which the multiple segmentation figures can be derived.

(1) And the guiding mechanism is used for guiding the lower-precision segmentation graph to improve the precision by using the higher-precision segmentation graph:

the attention-based guidance mechanism is shown in fig. 2, and for two segmentation graphs, we assume that the input to be fused is a segmentation graph mentor of a foreground object_foGraph in separated from background content_bcThe background content segmentation has higher requirements on local and global semantics compared with the foreground object segmentation, so that the classification precision of the edge part is slightly low, and the classification result of the foreground object segmentation in the edge area can be used as attention to supplement semantic information of the background content segmentation in the corresponding area. If define out_bcFor the guided output of background content parts, the relationship between them can be formalized as

Wherein

And

pixel-by-pixel multiplication and pixel-by-pixel addition operations, respectively, rescale (·) is used for registration of the inter-segmentation map dimensions, and norm (·) is a normalization operation, which is inversely related to the number of segmentation maps.

Defining p and g as prediction output and labeling output, and the guidance loss term in this scenario is

(2) The consensus mechanism resolves the possible classification conflicts in the edge region by learning a consensus mask in the learning stage of the supervised learning method, and the consensus mechanism based on the consensus mask is shown in fig. 3:

(2.1) initializing consensus masks:

the two parts are coded as 0 and 1, respectively. The common identification mask is a binary mask, which is initially a 0-value mask, and if the (i, j) position value is 0, the two parties achieve that the classification of the pixel at the corresponding position of the input image is the same as that of the 0-code division image, and if the value is 1, the opposite is true. And carrying out size registration on the two segmentation images, and adjusting the two segmentation images to be the same size.

(2.2) calculating a foreground mask and a background mask:

and carrying out size registration on the input segmentation graph, and adjusting the input segmentation graph to the same size.

For foreground objects, different partitions of the same generic object are merged into the same mask and truncated using a learned threshold. Then merging each class of mask after cutting, using class number to carry out regularization on mask pixel point values, and calculating to obtain a binary foreground mask.

For background content, the pixel values of the background content portion are assigned the same encoding values as in the initialization phase, and the pixel values of the non-background portion are assigned the opposite values, thereby generating a background content mask.

(2.3) consensus learning:

in the learning process of supervised learning, the related loss items between the two are continuously reduced, and the consensus learning is achieved.

The loss function is defined as follows, f and b are defined as two input segmentation graphs, N represents the number of the segmentation graphs input by continuous iteration in the learning process, and the consensus loss term in the scene is

(3) A fusion strategy based on a full convolution neural network is shown in FIG. 4, and the two mechanisms are effectively combined to obtain a final output. Taking the two-segmentation-graph application scenario as an example, the multi-segmentation-graph scenario can be deduced accordingly, and the fusion strategy is implemented by the following sub-steps:

and (3.1) finishing initialization work. And performing size registration on all the object segmentation blocks obtained in the previous step by taking the size of the input original image as a reference.

And (3.2) removing repeated segmentation blocks. If the same object and different segmentation blocks are not completely overlapped, the overlapped part is reserved; the pixel attribution problem of the overlapping area between different objects is solved by the subsequent sub-steps.

And (3.3) adjusting the profile range. The segmented blocks of all classes of objects are input into a full convolution neural network of a coder-decoder structure, and the outline range of each object is adjusted through learning.

And (3.4) merging the masks. And merging the object masks obtained in the last step, and judging the attribution of the pixel points in the edge overlapping area according to the consensus masks.

Claims

1. An image pixel level classification method based on multi-segmentation map fusion is characterized by comprising the following steps:

2. The method for classifying image pixel levels based on multi-segmentation-map fusion as claimed in claim 1, wherein the guiding mechanism in step (1) is implemented by the following sub-steps:

(1.1) defining the input to be fused as a segmentation graph mentor of the foreground object_foGraph in separated from background content_bcDefine out_bcFor the output of the background content part after guidance, the relationship between them is

Wherein

And

respectively pixel-by-pixel multiplication and pixel-by-pixel multiplicationPrime addition operation, recalcle (·) is used for registration of inter-segmentation graph dimensions, norm (·) is a normalization operation, and is inversely related to the number of segmentation graphs;

(1.2) the following loss items are matched in the corresponding supervised learning method to measure the guidance effect: defining p and g as prediction output and labeling output, and the guidance loss term in this scenario is

3. The method for classifying image pixel levels based on multi-segmentation-map fusion as claimed in claim 1, wherein the consensus mechanism in the step (2) is implemented by the following sub-steps:

(2.1) initializing a common identification mask, respectively coding the two parts into 0 and 1, wherein the common identification mask is a binary mask and is a 0-value mask initially, if the (i, j) position value is 0, the two parts achieve that the classification of the pixels at the corresponding positions of the input image is the same as that of a 0 code division image, otherwise, the two parts are subjected to size registration and adjusted to the same size;

(2.2) calculating a foreground mask and a background mask, carrying out size registration on the input segmentation images, adjusting the input segmentation images to be the same size, combining different segmentation blocks of the same class of objects into the same mask for the foreground objects, using a learned threshold value to carry out truncation, then combining each class of mask after truncation, using the class number to carry out regularization on mask pixel point values, and calculating to obtain a binary foreground mask; for background content, assigning the pixel value of a background content part with the same coding value as that in the initialization stage, and assigning the pixel value of a non-background part with an opposite value to generate a background content mask;

(2.3) consensus learning, wherein in the learning process of supervised learning, the related loss term between the two is continuously reduced to achieve the consensus learning, the loss term is defined as follows, f and b are defined as two input segmentation graphs, N represents the number of the segmentation graphs which are continuously input in the learning process in an iteration mode, and the consensus loss term in the scene is

4. The method for classifying image pixel levels based on multi-segmentation-map fusion as claimed in claim 1, wherein the fusion strategy in the step 3 is implemented by the following sub-steps:

(3.3) adjusting the contour range, inputting the segmentation blocks of all classes of objects into a full convolution neural network of an encoder-decoder structure, and finely adjusting the contour range of each object through learning;