CN112580645A

CN112580645A - Unet semantic segmentation method based on convolutional sparse coding

Info

Publication number: CN112580645A
Application number: CN202011445030.5A
Authority: CN
Inventors: 王倪传; 唐海童; 卢霞; 颜虹杰
Original assignee: Jiangsu Ocean University
Current assignee: Jiangsu Ocean University
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2021-03-30
Anticipated expiration: 2040-12-08
Also published as: CN112580645B

Abstract

The invention discloses a Unet semantic segmentation method based on convolution sparse coding, which combines the convolution sparse coding with a coding network in an Unet model to form a coder of a CSC-Unet model so as to obtain the global information of an image; combining the convolution sparse coding with a decoding network in Unet to form a decoder of a CSC-Unet model so as to obtain the position information of the image; and use of a hopping structure allows global information to be combined with positional information to produce accurate and fine segmentation. The method firstly preprocesses the training pictures and labels in the data set, such as cutting, data enhancement and the like, and then reads in the CSC-Unet segmentation model for training. After training is finished, the test samples and labels in the data set are read into the CSC-Unet segmentation model, and the stored best weight values are loaded into the model, so that the purpose of accurate semantic segmentation by using the model is achieved.

Description

Unet semantic segmentation method based on convolutional sparse coding

Technical Field

The invention relates to the field of semantic segmentation of images, in particular to a Unet semantic segmentation method based on convolutional sparse coding.

Background

Over the past decades, sparse and redundant representation areas have made a significant leap. It grows into a mature discipline with great influence. The sparse domain brings the idea that natural signals can be described as a linear combination consisting of only a few members or components (often called atoms). Sparse domain models are increasingly becoming the core of signal and image processing and machine learning applications, producing the most advanced results in a variety of tasks and many different domains. In recent years, the offspring model Convolutional Sparse Coding (CSC) in the sparse domain and the multi-layered convolutional sparse coding (ML-CSC) have achieved significant results in many different domains ranging from signal and image processing to machine learning. On the other hand, semantic segmentation is one of the key tasks in computer vision. In reality, more and more application scenes need to infer relevant knowledge or semantics (i.e., concrete to abstract processes) from imagery. As a core problem of computer vision, semantic segmentation is increasingly important for scene understanding. Based on the above, the invention provides the Unet semantic segmentation method based on the convolution sparse coding, which can better capture the semantic information and the representation information of the image so as to more accurately perform semantic segmentation on the image.

Disclosure of Invention

The invention aims to obtain a model with more accurate segmentation effect by combining convolutional sparse coding and a Unet segmentation model based on the obvious effect of the convolutional sparse coding in many different fields.

In order to achieve the above object, the present invention provides a method for performing semantic segmentation on a Unet based on convolutional sparse coding, the method comprising the following steps:

s1: reading training samples and labels in a data set into a CSC-Unet semantic segmentation network, and preprocessing training pictures and labels according to actual needs, such as cutting and normalization;

s2: combining the convolution sparse coding with a coding network in Unet to form a coder of a CSC-Unet model so as to obtain global information of the image;

s3: combining the convolutional sparse coding with a decoding network in Unet to form a decoder of a CSC-Unet model so as to obtain the position information of the image, and using a hopping structure to enable the global information to be combined with the position information so as to generate accurate and fine segmentation;

s4: and carrying out post-processing on a result obtained by the CSC-Unet model to obtain a visualized semantic segmentation map.

As a preferred technical solution of the present invention, the S1 and S4 are respectively preprocessing and post-processing of the image, and the S2 and S3 are methods for proposing a CSC-Unet semantic segmentation based on convolutional sparse coding and Unet segmentation model formation.

As a preferred embodiment of the present invention, the step S1: the data preprocessing part can process the data by the following steps:

s1.1, preprocessing data, such as normalization, standardization, cutting and data enhancement, is beneficial to training of a deep network, accelerates a convergence process, avoids an overfitting problem and enhances the generalization capability of a model;

s1.2, preprocessing data and reading the data into a network: and reading the training samples and the labels in the data set into the convolutional neural network according to the size of the batch-size.

As a preferred embodiment of the present invention, the step S2: the design process of the encoder of the CSC-Unet model can be composed of the following steps:

s2.1, designing a two-layer convolution sparse coding model network model to form an ML-CSC module: let the original signal X satisfy two-layer convolution sparse model, which can be expressed as X ═ D₁Γ₁,Γ₁＝D₂Γ₂Wherein

S2.2, solving of ML-CSC problem: how to find gamma₁And Γ₂Can be seen as a Depth Coding Problem (DCP): in | | | Y-D₁Γ₁||₂≤ε,

Γ₁＝D₂Γ₂,

Solving under the condition

Where Y is the original signal X mixed with the noise E, i.e., Y ═ X + E (| E | | non-calculation circuit)₂Epsilon is less than or equal to epsilon). Solving the depth coding problem with a layered basis pursuit algorithm (LBP) can yield:

wherein

S2.3, solving the LBP problem: an approximate solution to the LBP problem can be found using a multi-layered iterative soft threshold algorithm (ML-ISTA):

where T is the number of iterations, T_λIs a threshold operator; if we further assume that the representation coefficients are non-negative, the approximate solution can be written as:

wherein

W_kFor the convolution operation, when t is 0, i.e. there is no iteration number, the ML-CSC module is equivalent to two convolution operations, and the convolution coefficients are:W₁and W₂When the iteration time t is 1,

the number of learnable parameters is not increased in the iterative process;

s2.4, combining the ML-CSC module with a coding end of a traditional Unet segmentation network.

As a preferred embodiment of the present invention, the step S3: the design process of the encoder of the CSC-Unet model and the jump connection can be composed of the following steps:

s3.1, combining the ML-CSC module with a decoding end of a traditional Unet segmentation network;

s3.2, using a jump structure to enable the global information to be combined with the position information, so as to generate accurate and fine segmentation;

and S3.3, selecting NLL _ LOSS as a LOSS function, activating a log-SoftMax function for input parameters, and selecting an Adam optimization function which is an optimization method of self-adaptive learning rate as an optimization function for optimization.

As a preferred technical solution of the present invention, in S4, a result obtained by the CSC-Unet model is post-processed to obtain a visualized accurate semantic segmentation map; the process can be composed of the following steps:

s4.1, performing precision operation on the result output by the model and the real label to obtain a confusion matrix so as to obtain corresponding index measurement parameters such as average intersection ratio (Miou), pixel precision, average pixel precision and the like, and further measure the network performance;

and S4.2, storing the prediction result of the model in a picture form, and visually feeling the accuracy of the segmentation.

The invention has the beneficial effects that: compared with the segmentation method in the prior art, the method has the advantages that the global information of the original image can be better captured at the encoding end, the position information of the original image can be better captured at the decoding end, and finally the captured global information and the position information are combined to obtain more accurate semantic segmentation of the image.

Drawings

FIG. 1 is a simplified flowchart of the Unet semantic segmentation method based on convolutional sparse coding according to the present invention;

fig. 2-3 are models of the Unet semantic segmentation method based on convolutional sparse coding of the present invention.

Detailed Description

The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention more readily understood by those skilled in the art, and thus will more clearly and distinctly define the scope of the invention.

Example (b): referring to fig. 1-3, the present invention provides a technical solution: the following further describes specific embodiments of the present invention with reference to the drawings.

As shown in fig. 1, the method for performing semantic segmentation on a pnet based on convolutional sparse coding of the present invention includes the following steps:

s1: the data preprocessing part can process the data by the following steps:

s1.1, preprocessing the data, such as normalization, standardization, cutting, data enhancement and the like. This facilitates deep network training, speeds up the convergence process, while also avoiding the over-fitting problem and enhancing the generalization capability of the model.

S2: the design process of the encoder of the CSC-Unet model can be composed of the following steps:

Γ₁＝D₂Γ₂,

Solving under the condition

wherein

where T is the number of iterations, T_λIs a threshold operator. If we further assume that the representation coefficients are non-negative, the approximate solution can be written as:

wherein

W_kFor convolution operation, when t is 0, i.e. there are no iterations, the ML-CSC module is equivalent to two convolution operationsThe convolution coefficients are: w₁And W₂When the iteration time t is 1,

as shown in fig. 2, and does not increase the number of learnable parameters in the iterative process.

S2.4, combining the ML-CSC module with the coding end of the traditional Unet segmentation network, as shown in figure 3.

S3: the design process of the encoder of the CSC-Unet model and the jump connection can be composed of the following steps:

s3.1, combining the ML-CSC module with a decoding end of a traditional Unet segmentation network, as shown in figure 3;

S4: and carrying out post-processing on a result obtained by the CSC-Unet model to obtain a visualized accurate semantic segmentation map. The process can be composed of the following steps:

The method comprises the steps of preprocessing training pictures and labels in a data set, such as cutting and data enhancement, reading in a CSC-Unet segmentation model for training, reading test samples and labels in the data set into a convolutional neural network after all training is finished, and loading the stored best weight values into the model, so that the aim of accurate semantic segmentation is fulfilled. The invention is very beneficial to the research of geographic information systems, unmanned vehicle driving, medical image analysis, robots, image search engines and the like.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. The Unet semantic segmentation method based on convolutional sparse coding is characterized by comprising the following steps of:

2. The method for semantic segmentation of the Unet network based on convolutional sparse coding as claimed in claim 1, wherein: the S1 and S4 are respectively preprocessing and post-processing of the image, and the S2 and S3 are methods for proposing CSC-Unet semantic segmentation based on convolutional sparse coding and Unet segmentation model formation.

3. The method for semantic segmentation of the Unet network based on convolutional sparse coding as claimed in claim 1, wherein: the S1: the data preprocessing part can process the data by the following steps:

4. The method for semantic segmentation of the Unet network based on convolutional sparse coding as claimed in claim 1, wherein: the S2: the design process of the encoder of the CSC-Unet model can be composed of the following steps:

Γ₁＝D₂Γ₂,

Solving under the condition

Where Y is the original signal X mixed with noise E, i.e. Y ═X+E(||E||₂Epsilon is less than or equal to epsilon). Solving the depth coding problem with a layered basis pursuit algorithm (LBP) can yield:

wherein

wherein

W_kFor the convolution operation, when t is 0, i.e. there is no iteration number, the ML-CSC module is equivalent to two convolution operations, and the convolution coefficients are: w₁And W₂When the iteration time t is 1,

the number of learnable parameters is not increased in the iterative process;

5. The method for semantic segmentation of the Unet network based on convolutional sparse coding as claimed in claim 1, wherein: the S3: the design process of the encoder of the CSC-Unet model and the jump connection can be composed of the following steps:

6. The method for semantic segmentation of the Unet network based on convolutional sparse coding as claimed in claim 1, wherein: in the S4, post-processing a result obtained by the CSC-Unet model to obtain a visualized accurate semantic segmentation map; the process can be composed of the following steps: