CN114612477B

CN114612477B - Lightweight image segmentation method, system, medium, terminal and application

Info

Publication number: CN114612477B
Application number: CN202210208581.2A
Authority: CN
Inventors: 朱烨; 胡伟; 魏敏
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2024-07-05
Anticipated expiration: 2042-03-03
Also published as: CN114612477A

Abstract

The invention belongs to the technical field of image processing, and discloses a lightweight image segmentation method, a lightweight image segmentation system, a lightweight image segmentation medium, a lightweight image segmentation terminal and lightweight image segmentation application, wherein a training data set is obtained, and images and real label images in the training data set are subjected to cutting processing; constructing an improved U-Net coding and decoding network, and training the constructed improved U-Net coding and decoding network by utilizing the images in the cut training data set and the real label graph; and performing light-weight image segmentation by using a trained improved U-Net coding and decoding network. The invention improves the feature extraction capability, is lighter, enhances the network training performance and improves the reasoning efficiency; the image segmentation performance is improved, and meanwhile, the segmentation efficiency is considered.

Description

Lightweight image segmentation method, system, medium, terminal and application

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a lightweight image segmentation method, a lightweight image segmentation system, a lightweight image segmentation medium, a lightweight image segmentation terminal and an application.

Background

At present, the utilization of information received by human is mostly dependent on the processing of image information, and image segmentation is an important method for image preprocessing and is an important link for extracting image value. Image segmentation refers to the decomposition of an image into sets of mutually non-overlapping regions. Image segmentation has wide application in such fields as scene object segmentation, autopilot, remote sensing image analysis, medical image analysis, and the like.

Traditional segmentation algorithms such as threshold segmentation, edge detection segmentation and region segmentation are long in time consumption, and only low-level features such as colors, shapes and textures of images can be extracted, so that the image segmentation effect on complex and various features is poor.

In recent years, with rapid iteration of computer software and hardware, deep learning has been rapidly developed in terms of computer vision, and image segmentation using deep learning has become a research hotspot. The U-Net network model is a full convolution neural network of a coding and decoding structure, the coding network part is used for extracting target characteristic information, the decoding network part is used for recovering detail information of an image and up-sampling a characteristic image by using deconvolution to recover to the size of an input picture. And a jump connection structure is used between the corresponding stages of the coding part and the decoding part, the low-level characteristic information is multiplexed, and the image detail information is restored better.

However, with the rapid development of society, more and more tasks are applied to more and more complex scenes, the requirements of the segmentation algorithm are higher and higher under the complex scenes, the segmentation result of the segmentation algorithm is inaccurate due to complex backgrounds and various types of targets, and meanwhile, the deep learning algorithm is often required to calculate a large number of parameters, so that real-time performance is difficult to meet under the environment with poor hardware facilities.

Through the above analysis, the problems and defects existing in the prior art are as follows: the existing segmentation method has poor image segmentation performance effect under a complex scene, the network model is easy to have wrong segmentation and missing segmentation phenomena, the network parameter quantity and the calculated quantity are large, and the training and reasoning efficiency is low.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a lightweight image segmentation method, a lightweight image segmentation system, a lightweight image segmentation medium, a lightweight image segmentation terminal and application, and particularly relates to a lightweight image segmentation method based on U-Net.

The invention is realized in such a way that the light-weight image segmentation method based on U-Net comprises the following steps:

Step one, acquiring a training data set, and cutting and data enhancing images and a real label graph in the training data set;

Constructing an improved U-Net coding and decoding network, and training the constructed improved U-Net coding and decoding network by utilizing the images in the cut training data set and the real label graph;

And thirdly, performing light-weight image segmentation by using a trained improved U-Net coding and decoding network.

Further, the cutting and data enhancing the image and the real label map in the training dataset in the step one includes:

The method comprises the steps of cutting images and a real label graph in a training data set into 256 x 256 sizes in a sliding window mode by taking 256 pixels as step sizes, performing oversampling cutting on objects with fewer categories in the data in order to relieve the problem of category imbalance in the data set, increasing the proportion of the objects with fewer categories, and performing 90-degree, 180-degree and 270-degree overturn on the cut images through an OpenCV library in order to enhance the generalization and robustness of a network in a data enhancement mode, and performing online enhancement on the images with the same batch by using a PyTorch deep learning frame.

Further, in the second step, the codec network of the improved U-Net includes:

The encoding end is used for replacing an encoding part of an original U-Net network by using a EFFICIENTNETV-2-S network to be an encoder taking the EFFICIENTNETV-S network as the U-Net, outputting 5 characteristic diagrams of 1/2, 1/4, 1/8, 1/16 and 1/32 by using the encoder, and encoding by using a multi-scale convolution fusion module;

And the decoding end is used for decoding by utilizing a convolution structure re-parameterization method and combining a channel attention mechanism based on 5 characteristic graphs of 1/2, 1/4, 1/8, 1/16 and 1/32 of encoder output to obtain a segmentation result.

Further, the encoding with the multi-scale convolution fusion module includes:

a multi-scale convolution module and a downsampling module;

The multi-scale convolution module is used for extracting features by convolutions with different convolution kernel sizes and splicing and fusing feature graphs obtained by the multi-scale convolutions;

and the downsampling module is used for transmitting the fused characteristic information to a deep network.

Further, the multi-scale convolution module is sequentially provided with three convolution kernels with the convolution kernel sizes of 3×3,5×5 and 7×7 respectively, three branch features for splicing processing, and one convolution with 3×3 for fusion.

Further, the downsampling module includes:

a max-pooling downsampling unit for performing feature map dimension change by 1×1 convolution;

a downsampling unit for downsampling by adopting a mode of 3×3 convolution with a step length of 2;

And the fusion unit is used for fusing the maximum pooling downsampling unit and the downsampling unit in a corresponding element summation mode, and accessing SiLU functions for nonlinear activation.

Further, the decoding of the 5 feature maps based on encoder outputs 1/2, 1/4, 1/8, 1/16, and 1/32 using a convolution structure re-parameterization method in combination with a channel attention mechanism includes:

Taking the 1/32 characteristic diagram as input of a decoding end, and carrying out splicing fusion processing on the 1/2, 1/4, 1/8 and 1/16 characteristic diagrams serving as characteristic information of jump connection and the decoding end;

Expanding the size of the feature map to 2 times of the original size by utilizing deconvolution processing, gradually recovering the feature map to the original map resolution, and performing splicing processing with the coding features output by each layer of the coding end; and mapping the feature map into a specific number of categories by using an output layer to predict pixel categories, so as to obtain a segmentation result.

Further, the decoding end includes:

RepVGG modules, repVGG-SE modules;

RepVGG module for enhancing feature extraction capability and avoiding gradient vanishing problem;

RepVGG-SE module for improving model segmentation performance, strengthening useful features, and inhibiting invalid information.

Further, the RepVGG module includes:

The information flow is constructed as y=b (x) +g (x) +f (x), if the dimensions of x and f (x) are not matched, y=g (x) +f (x), wherein b (x), g (x) and f (x) are respectively that x is connected through a same-layer branch of a BN layer (identity branch), a1×1 convolution and a3×3 convolution, when reasoning, conversion is carried out from a trained model, a convolution layer and a BN layer of one branch are integrated, a plurality of branch structures are integrated in a linear combination mode, the network branch structure is equivalent to y=h (x), wherein h (x) is only realized by one 3×3 convolution layer and a ReLU activation layer is adopted.

Further, the RepVGG-SE module includes:

Based on RepVGG modules, the SE channel attention mechanism is incorporated in the identity branch structure.

Further, in the second step, training the constructed codec network of the improved U-Net by using the image in the training dataset after cutting and the real tag map includes:

Inputting the images in the cut training dataset into a constructed improved U-Net coding and decoding network by adopting a training strategy of warmup and cosine annealing combination to obtain a segmentation result diagram of the corresponding training image; comparing the obtained segmentation result graph with a real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if the loss function loss value is not converged, updating model parameters through back propagation; and when the loss function loss value converges, obtaining the trained improved U-Net coding and decoding network.

Further, the loss function is as follows:

L_all＝λL_SCE+μL_Dice；

Wherein, L _CE represents cross entropy loss, L _SCE represents cross entropy loss using label smoothing, L _Dice represents price loss, L _all represents total loss function, N represents total number of samples, M represents category number, y _ic represents real category of sample i, 1 is taken if the real category of sample i is c, otherwise 0 is taken; y _ic' represents the true class of the sample i smoothed using the label, p _ic represents the prediction that sample i belongs to class c, ε represents the hyper-parameters of the smoothing quantity, |x n y| represents the intersection of sets X and Y, and X and Y represent the number of set X and Y elements; for the segmentation task, X represents Ground Truth segmented images, Y represents predicted segmented images, and λ and μ represent weighting coefficients.

Further, the lightweight image segmentation method based on U-Net comprises the following steps: cutting the training data with 256 steps and performing data enhancement operation, and inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network to obtain a segmentation result diagram corresponding to the training image; comparing the obtained segmentation result graph with a real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if the loss function loss value is not converged, updating model parameters through back propagation; and when the loss function loss value converges, obtaining the trained improved U-Net coding and decoding network. And (3) reasoning the images to be segmented by using the trained improved U-Net coding and decoding network, so as to obtain a segmentation result graph of the corresponding images.

Another object of the present invention is to provide a lightweight image segmentation system comprising:

The training data set image and real label image cutting processing module is used for acquiring a training data set and cutting the images in the training data set and the real label image;

The improved U-Net coding and decoding network training module is used for constructing an improved U-Net coding and decoding network and training the constructed improved U-Net coding and decoding network by utilizing the images in the cut training data set and the real label graph;

And the lightweight image segmentation module is used for carrying out lightweight image segmentation by utilizing the trained improved U-Net coding and decoding network.

Another object of the present invention is to provide an information data processing terminal for implementing the U-Net based lightweight image segmentation method.

Another object of the present invention is to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to execute the U-Net based lightweight image segmentation method.

The invention further aims to provide application of the U-Net-based lightweight image segmentation method to scene object segmentation, automatic driving, remote sensing image analysis and image segmentation in the medical image analysis field.

In combination with the above technical solution and the technical problems to be solved, please analyze the following aspects to provide the following advantages and positive effects:

First, aiming at the technical problems in the prior art and the difficulty in solving the problems, the technical problems solved by the technical proposal of the invention are analyzed in detail and deeply by tightly combining the technical proposal to be protected, the results and data in the research and development process, and the like, and some technical effects brought after the problems are solved have creative technical effects. The specific description is as follows:

The U-Net network model has poor image segmentation performance effect under a complex scene, is easy to have wrong segmentation and missing segmentation phenomena, has large network parameter quantity and calculation amount, and has low training and reasoning efficiency. The U-Net network model has limited segmentation performance under a complex scene, and the segmentation result is easy to have wrong segmentation and missed segmentation phenomena, because the network model has limited feature extraction capability, the feature information of the targets cannot be fully extracted under the complex scene, the images have rich space texture information and background information, the difficulty of feature extraction is increased, meanwhile, the targets in the images are variable in scale, different in form and size, complex in edge structure and the like, and the difficulty of correct classification of the categories is aggravated. For network training and reasoning inefficiency, the parameters and calculation amount of the network model need to be reduced, so that the network becomes lighter. The invention is based on deep learning, uses the image as input, uses EFFICIENTNETV2-S network as U-Net coding network, enhances the feature extraction capability; the convolution structure re-parameterization method is used in the decoding part, a multi-branch structure is adopted during training, the gradient disappearance problem is avoided, meanwhile, the multi-branch structure enables a network to capture and utilize spatial information of feature graphs with different scales, compared with a single-path architecture, the branch structure further fuses detail and semantic information, the extracted feature expression capacity is stronger, the training performance is improved, and in combination with a channel attention mechanism, effective feature extraction is enhanced, and useless information is restrained; and combining the multi-scale convolution module to obtain a feature map containing more feature information of different scales, transmitting the feature information to a deep network through a downsampling module, and improving the extraction capability of the network to the target features of different scales and better combining context information. The depth separable convolution is largely used in EFFICIENTNETV-s, so that the overall parameter number and calculation amount of the network are reduced, and meanwhile, the depth separable convolution is replaced by the common 3X 3 convolution in a shallow Fused-MBConv module, so that the network speed can be obviously improved; the decoding network starts from 256 channels to the last 24 channels, compared with the original U-Net network, the channels are reduced greatly from 1024 channels, which is the key point of greatly reducing the parameter and the calculation amount; after training, the branch structures are fused into a common 3×3 convolution, so that the parameter quantity and the calculated quantity during reasoning can be further reduced on the basis of the branch structures, and the advantage of the bottom layer on 3×3 convolution optimization acceleration is fully utilized, so that the running speed during network reasoning is further improved. Experimental results show that the image segmentation performance is improved and the segmentation efficiency is improved.

Secondly, the technical scheme is regarded as a whole or from the perspective of products, and the technical scheme to be protected has the following technical effects and advantages:

The invention improves the image segmentation performance and simultaneously gives consideration to the segmentation efficiency. Based on deep learning, the invention uses images as input and EFFICIENTNETV2-S as a coding network, thereby improving the feature extraction capability and realizing lighter weight; the multi-branch structure is fused into a single-path structure during the training by using the convolution structure re-parameterization method, so that the reasoning efficiency is improved, the useful characteristics are strengthened and invalid information is restrained by using a channel attention mechanism in the branch; and combining a multi-scale convolution module with a network coding part to fuse multi-scale characteristics, transmitting shallow information to a deep network by using a downsampling module, enhancing the extraction capability of the network on target characteristics with different scales, and better combining context information.

Drawings

Fig. 1 is a schematic diagram of a light-weight image segmentation method based on U-Net according to an embodiment of the present invention.

Fig. 2 is a flowchart of a light-weight image segmentation method based on U-Net according to an embodiment of the present invention.

Fig. 3 is an overall structure diagram of a U-Net network provided by an embodiment of the present invention.

Fig. 4 is an overall structure diagram of a deep learning network according to an embodiment of the present invention.

Fig. 5 is a block diagram of RepVGG modules provided in an embodiment of the present invention.

Fig. 6 is a diagram of a SE channel attention module according to an embodiment of the present invention.

FIG. 7 is a diagram of RepVGG-SE modules in accordance with an embodiment of the present invention.

Fig. 8 is a block diagram of a multi-scale convolution module according to an embodiment of the present disclosure.

Fig. 9 is a block diagram of a downsampling module according to an embodiment of the present invention.

Fig. 10 is a graph of training strategy learning rate variation provided by an embodiment of the present invention.

Fig. 11 is a view of an image and a real label cut provided by an embodiment of the present invention.

Fig. 12 is a graph of image segmentation results provided by an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

1. The embodiments are explained. In order to fully understand how the invention may be embodied by those skilled in the art, this section is an illustrative embodiment in which the claims are presented for purposes of illustration.

As shown in fig. 1, the principle of the light-weight image segmentation method based on U-Net provided by the embodiment of the invention includes: cutting the training data with 256 steps and performing data enhancement operation, and inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network to obtain a segmentation result diagram corresponding to the training image; comparing the obtained segmentation result graph with a real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if the loss function loss value is not converged, updating model parameters through back propagation; and when the loss function loss value converges, obtaining the trained improved U-Net coding and decoding network.

As shown in fig. 2, the light-weight image segmentation method based on U-Net provided by the embodiment of the invention includes:

s101, acquiring a training data set, and cutting images in the training data set and a real label graph;

s102, constructing an improved U-Net coding and decoding network, and training the constructed improved U-Net coding and decoding network by utilizing the images in the cut training data set and the real label graph;

s103, performing light-weight image segmentation by using a trained improved U-Net coding and decoding network.

The cutting processing of the image and the real label graph in the training data set provided by the embodiment of the invention comprises the following steps:

The improved U-Net coding and decoding network provided by the embodiment of the invention comprises the following components:

The encoding by using the multi-scale convolution fusion module provided by the embodiment of the invention comprises the following steps:

a multi-scale convolution module and a downsampling module;

The multi-scale convolution module provided by the embodiment of the invention is sequentially provided with three convolution kernels with the convolution kernel sizes of 3 multiplied by 3, 5 multiplied by 5 and 7 multiplied by 7, three branch characteristics subjected to splicing processing, and one convolution with the convolution kernel size of 3 multiplied by 3.

The downsampling module provided by the embodiment of the invention comprises the following components:

The decoding of the 5 feature maps based on the encoder output 1/2, 1/4, 1/8, 1/16 and 1/32 provided by the embodiment of the invention by utilizing a convolution structure re-parameterization method and combining a channel attention mechanism comprises the following steps:

The decoding end provided by the embodiment of the invention comprises the following steps:

RepVGG modules, repVGG-SE modules;

RepVGG-SE modules for enhancing the useful features.

The RepVGG module provided by the embodiment of the invention comprises:

The RepVGG-SE module provided by the embodiment of the invention comprises:

The training of the constructed improved U-Net coding and decoding network by utilizing the images in the cut training data set and the real label graph provided by the embodiment of the invention comprises the following steps:

The loss function provided by the embodiment of the invention is as follows:

L_all＝λL_SCE+μL_Dice；

The present invention also provides a lightweight image segmentation system comprising: the training data set image and real label image cutting processing module is used for acquiring a training data set and cutting the images in the training data set and the real label image;

2. Application example. In order to prove the inventive and technical value of the technical solution of the present invention, this section is an application example of the specific product or related technology application of the claim technical solution.

The invention discloses a lightweight image segmentation method based on U-Net, which is based on deep learning to construct an improved U-Net coding and decoding network, wherein the network takes images as input and outputs a segmentation result graph of corresponding images; the image segmentation method is applied to remote sensing image segmentation and comprises the following steps of:

s1, preparing an AI classification and recognition competition of a training dataset 'CCF satellite image', cutting a dataset image and a real label image into 256-256 sizes in a sliding window mode by taking 256 pixels as step sizes, performing oversampling cutting on targets with less categories in the data and adopting a data enhancement mode, specifically, performing 90-degree, 180-degree and 270-degree turnover on the cut images through an OpenCV library, and performing online enhancement by using a PyTorch deep learning frame to enable images of the same batch to be randomly and horizontally turned, as shown in FIG. 11;

S2, based on the U-Net network, as shown in fig. 3, comprises: the U-Net consists of an encoding part and a decoding part. In the coding network, 2 convolution kernels are used for feature extraction for a convolution layer of 3×3 before each downsampling, a ReLU activation function is used after convolution, and a maximum pooling operation with the size of 2×2 is used for reducing feature dimension and increasing receptive field. Each time the downsampling is performed, the image size is reduced by half, the dimension is doubled, and the high-level features of the image can be fully extracted and unnecessary information can be filtered through the repeated operation. The decoding part performs up-sampling using deconvolution, and after up-sampling, the detail information of the image is gradually restored using 2 convolution layers with convolution kernels of 3×3, and finally the feature map is restored to the size of the input picture. The image size is doubled and the dimension halved for each upsampling pass. And a jump connection structure is used between the corresponding stages of the coding part and the decoding part, the low-level characteristic information is multiplexed, and the image detail information is restored better.

The improvement of the U-Net network model is that the whole network is divided into coding and decoding, as shown in figure 4, comprising: the whole network is divided into coding and decoding, the coding part is replaced by EFFICIENTNETV-2-S network, EFFICIENTNETV-S network is composed of Fused-MBConv module and MBConv module, the coder has 5 outputs, namely 1/2, 1/4, 1/8, 1/16 and 1/32 of 5 characteristic diagrams, 1/32 of characteristic diagram is used as the input of the decoding end, and the other 4 characteristic diagrams are used as the coding characteristics of jump connection to be spliced and Fused with the decoding end. And the decoding part expands the size of the feature map to 2 times of the original size through deconvolution operation, gradually restores the feature map to the original map resolution size, performs splicing operation with 4 coding features of the coding part to fuse and obtain more feature information and detail information, and finally maps the feature map into a specific number of categories through an output layer to perform pixel category prediction to obtain a segmentation result. In the coding network, by introducing RepVGG-SE modules to carry out deepening treatment, the RepVGG-SE modules are of a multi-branch structure, the multi-branch structure is kept during training, compared with a single-path architecture, the branch structure further integrates detail information and semantic information, the extracted feature expression capability is stronger, the training performance is improved, the convolution structure is converted into 3X 3 convolution and SEBlock through reconsideration during reasoning, the reasoning speed is improved, the memory is saved, the SE channel attention mechanism in the identity branch structure can be established, the dependence relationship in the channel dimension can be established, the expression of useful information is enhanced, and invalid features are restrained. When the input and output dimensions do not match, the RepVGG module has no identity branches, only 1×1 and 3×3 convolution branches. And the multi-scale convolution module is used for extracting features by convolutions with different convolution kernel sizes, then the feature graphs obtained by the multi-scale convolutions are spliced and fused, shallow layer information is injected into a deep network through the downsampling module, so that the multi-scale feature information is better fused with each layer of the network, and meanwhile, the lost detail information in the deep network is compensated.

The encoding part replaces the encoding part of the original U-Net network by EFFICIENTNETV-S network, the encoder has 5 outputs, namely, 1/2, 1/4, 1/8, 1/16 and 1/32, the 1/32 feature map is used as the input of the decoding end, and the other 4 feature maps are used as the encoding features of jump connection to be spliced and fused with the decoding end;

In the coding network, a multi-scale convolution module is used for extracting features by convolutions with different convolution kernel sizes as shown in fig. 8, then feature graphs obtained by the multi-scale convolutions are spliced and fused, and shallow information is injected into a deep network through a downsampling module as shown in fig. 9.

In fig. 8, the multi-scale convolution module is sequentially provided with three convolution kernels with convolution kernel sizes of 3×3,5×5 and 7×7, three branch features for performing splicing processing, and one convolution with fusion of 3×3.

In fig. 9, the downsampling module includes two branches, one branch performs maximum pooled downsampling and changes the dimension of the feature map through 1×1 convolution, one branch performs downsampling in a 3×3 convolution and step size of 2, and the last two branches are fused through corresponding element summation, and are connected to SiLU functions for nonlinear activation.

The decoding part expands the size of the feature map to 2 times of the original size through deconvolution operation, gradually restores the feature map to the original map resolution, performs splicing operation with the coding features output by each layer of the coding part, and finally maps the feature map into a specific number of categories through the output layer to perform pixel category prediction to obtain a segmentation result;

The coding network is deepened by introducing RepVGG modules and RepVGG-SE modules, as shown in FIG. 5 and FIG. 7 (combining SE channel attention mechanisms in the identity branch structure on the basis of RepVGG modules).

In fig. 5, the information flow is constructed as y=b (x) +g (x) +f (x), if the dimensions of x and f (x) are not matched, y=g (x) +f (x), wherein b (x), g (x) and f (x) are respectively that x is connected through BN layers (identity branches), 1 x 1 convolution and 3 x 3 convolution branches, when reasoning, conversion is performed from a trained model, a convolution layer and BN layers of one branch are integrated, a plurality of branch structures are integrated in a linear combination mode, 1 x 1 convolution corresponds to a degenerated 3 x 3 convolution, and 1 x 1 convolution kernels are filled with 0 to obtain 3 x 3 convolution, the identity branch is a special 1×1 convolution, taking the identity matrix as a matrix core, through transformation, the RepVGG module only has one 3×3 convolution kernel, two 1×1 convolution kernels and three bias parameters, the three bias parameters can be directly combined into one bias by an add mode, the new convolution kernel can be obtained by adding the 1×1 convolution kernel parameters to the center point of the 3×3 convolution kernel, all branch characteristics and final offsets are allocated to one new 3×3 offset convolution, the network branch architecture is equivalent to y=h (x), wherein h (x) is only the final 3×3 offset convolution, and the layer is activated by a ReLU.

The two modules are of a multi-branch structure, the branch structure is kept during training, and the convolution structure is converted into a3 multiplied by 3 convolution sum SEBlock in a reconsideration mode during reasoning; the SE channel attention mechanism in the identity branch structure, as shown in fig. 6, includes: a SEBlock is divided into two processes of Squeeze and specification, firstly, global average pooling operation is carried out on a feature map to obtain global compressed feature quantity of a current feature map, then, the feature map enters two full-connection layers, the first full-connection layer carries out dimension reduction processing, the second full-connection layer carries out dimension increasing processing to be the number of original channels, the purpose is to increase more nonlinear processing processes, complex correlation among channels is fitted, finally, a Sigmoid layer is connected, attention weight of 0-1 among channels is generated, full multiplication operation is carried out on the feature map and the original feature map, the feature with the largest information quantity is enhanced, and useless features are restrained.

The invention can establish the dependency relationship in the channel dimension, enhance the expression of useful information and inhibit invalid characteristics. When the input and output dimensions do not match, the RepVGG module has no identity branches, only 1×1 and 3×3 convolution branches.

S3, inputting the cut pictures into a segmentation model;

s4, outputting a segmentation result diagram of the current picture by the model;

S5, comparing the segmentation result graph with the real label image, and calculating to obtain a loss function loss value;

the loss function refers to the cross entropy loss function and the Dice loss function with label smoothing, alleviates the problems of fitting and class imbalance, and then weights the two together to form the final loss function:

L_all＝λL_SCE+μL_Dice

Where L _CE is cross entropy loss, L _SCE is cross entropy loss using label smoothing, L _Dice is price loss, L _all is total loss function, N represents total number of samples, M represents number of categories, Y _ic is true category representing sample i, 1 is taken if true category of sample i is c, otherwise 0 is taken, Y _ic' represents true category of sample i after label smoothing, p _ic represents prediction result of sample i belonging to category c, ε represents super parameter of smoothing quantity, 0.1 is taken, |X N Y| represents intersection of sets X and Y, X and Y represent number of elements thereof, X represents Ground Truth segmented image for segmentation task, Y represents predicted segmented image, λ and μ are weighting coefficients, and here all are taken as 0.5.

And S6, updating model parameters through back propagation when the loss function loss value is not converged, and indicating that the segmentation model is trained when the loss function loss value is converged.

The model training strategy uses a combination of warmup and cosine annealing as shown in figure 10.

3. Evidence of the effect of the examples. The embodiment of the invention has a great advantage in the research and development or use process, and has the following description in combination with data, charts and the like of the test process.

Table 1 comparison of various indices of model results

Table 2 model lightweight comparison

Through the implementation process, the model is trained and tested by a NVIDIA Geforce 2070S (8G) graphics card, as shown in FIG. 12, a remote sensing image, a real label image, an FCN-8S model segmentation result image, a SegNet model segmentation result image, a DeeplabV3+ model segmentation result image, a U-Net model segmentation result image, a Unet ++ model segmentation result image, a U2-Net model segmentation result image, an Attention U-Net model segmentation result image and a segmentation result image of the invention are sequentially arranged from left to right, and in combination with FIG. 12 and Table 1, evaluation indexes are highest in all methods of comparison, the phenomenon of wrong segmentation and missed segmentation of each target is better improved, targets with different scales can be better identified and completely segmented, the targets are closer to the label image, and segmentation details are finer than other networks. As can be seen from Table 2, the parameters and calculations of the method are greatly reduced compared with the original U-Net network, the parameters are reduced by 9.64M, the calculation amount is reduced by approximately 88%, the training time and testing time are also greatly reduced, and the reduction can be more than 50%, and in all the methods, the method also has advancement. The experimental results of table 2 in combination with fig. 12 show that the method of the invention has higher segmentation performance and lighter weight. The method provided by the invention has obvious effect.

It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims

1. A lightweight image segmentation method, characterized in that the lightweight image segmentation method comprises:

Step one, acquiring a training data set, and cutting images in the training data set and a real label graph;

thirdly, performing light-weight image segmentation by using a trained improved U-Net coding and decoding network;

the cutting processing of the image and the real label graph in the training data set comprises the following steps:

cutting the image and the real label map in the training dataset into 256-256 sizes;

The improved U-Net codec network comprises:

the decoding end is used for decoding by utilizing a convolution structure re-parameterization method and combining a channel attention mechanism based on 5 characteristic graphs of 1/2, 1/4, 1/8, 1/16 and 1/32 output by the encoder to obtain a segmentation result;

The decoding of the 5 feature maps based on encoder outputs 1/2, 1/4, 1/8, 1/16, and 1/32 using a convolution structure re-parameterization method in combination with a channel attention mechanism includes:

expanding the size of the feature map to 2 times of the original size by utilizing deconvolution processing, gradually recovering the feature map to the original map resolution, and performing splicing processing with the coding features output by each layer of the coding end; mapping the feature map into a specific number of categories by using an output layer to predict pixel categories, so as to obtain a segmentation result;

the decoding end comprises:

RepVGG modules, repVGG-SE modules, multi-scale convolution modules and downsampling modules;

RepVGG module for enhancing feature extraction capability and avoiding gradient disappearance;

RepVGG-SE modules for enhancing the useful features;

The RepVGG module includes:

Constructing an information flow as y=b (x) +g (x) +f (x), if x and f (x) dimensions are not matched, y=g (x) +f (x), wherein b (x), g (x) and f (x) are respectively that x is connected through a same-layer branch of a BN layer, a 1×1 convolution and a 3×3 convolution, when reasoning, converting from a trained model, integrating a convolution layer and a BN layer of one branch, integrating a plurality of branch structures in a linear combination mode, and realizing a network branch architecture equivalent to y=h (x), wherein h (x) is only realized by one 3×3 convolution layer and is activated by a ReLU;

The RepVGG-SE module includes:

2. The method of lightweight image segmentation as set forth in claim 1, wherein the encoding with a multi-scale convolution fusion module comprises:

a multi-scale convolution module and a downsampling module;

the downsampling module is used for transmitting the fused characteristic information to a deep network;

The multi-scale convolution module is sequentially provided with three convolution kernels with the convolution kernel sizes of 3 multiplied by 3,5 multiplied by 5 and 7 multiplied by 7 respectively, three branch characteristics are subjected to splicing processing, and one 3 multiplied by 3 is subjected to fusion convolution;

the downsampling module includes:

3. The lightweight image segmentation method as set forth in claim 1, wherein training the constructed codec network of the improved U-Net using the images in the segmented training dataset and the true tag map comprises: inputting the images in the cut training dataset into a constructed improved U-Net coding and decoding network by adopting a training strategy of warmup and cosine annealing combination to obtain a segmentation result diagram of the corresponding training image; comparing the obtained segmentation result graph with a real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if the loss function loss value is not converged, updating model parameters through back propagation; when the loss function loss value converges, obtaining a trained improved U-Net coding and decoding network;

the loss function is as follows:

L_all＝λL_SCE+μL_Dice；

4. The lightweight image segmentation method as set forth in claim 1, further comprising: cutting the training data with 256 steps and performing data enhancement operation, and inputting the images in the cut training data set into a constructed improved U-Net coding and decoding network to obtain a segmentation result diagram corresponding to the training image; comparing the obtained segmentation result graph with a real label graph, and calculating to obtain a loss function loss value; judging whether the loss function loss value is converged or not, and if the loss function loss value is not converged, updating model parameters through back propagation; when the loss function loss value converges, obtaining a trained improved U-Net coding and decoding network; and (3) reasoning the images to be segmented by using the trained improved U-Net coding and decoding network, so as to obtain a segmentation result graph of the corresponding images.

5. A lightweight image segmentation system that implements the lightweight image segmentation method as set forth in claim 1, comprising:

6. An information data processing terminal for implementing the lightweight image segmentation method as set forth in any one of claims 1-4.

7. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the lightweight image segmentation method as set forth in any one of claims 1-4.

8. Use of a lightweight image segmentation method according to any one of claims 1-4 for segmentation of scene objects, autopilot, remote sensing image analysis and image segmentation in the field of medical image analysis.