CN117132774A - Multi-scale polyp segmentation method and system based on PVT - Google Patents

Multi-scale polyp segmentation method and system based on PVT Download PDF

Info

Publication number
CN117132774A
CN117132774A CN202311097260.0A CN202311097260A CN117132774A CN 117132774 A CN117132774 A CN 117132774A CN 202311097260 A CN202311097260 A CN 202311097260A CN 117132774 A CN117132774 A CN 117132774A
Authority
CN
China
Prior art keywords
map
prediction
convolution
graph
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311097260.0A
Other languages
Chinese (zh)
Other versions
CN117132774B (en
Inventor
张朝晖
杨超荣
何哲远
王威
刘晨光
黄丽娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN202311097260.0A priority Critical patent/CN117132774B/en
Publication of CN117132774A publication Critical patent/CN117132774A/en
Application granted granted Critical
Publication of CN117132774B publication Critical patent/CN117132774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a PVT-based multi-scale polyp segmentation method and system, and relates to the technical field of deep learning medical image semantic segmentation. The application comprises the following steps: obtaining a colorectal mirror image to be detected, and performing image preprocessing; carrying out multi-scale feature extraction on the preprocessed image by using a PVTv2 backbone network; step-by-step fusion is carried out on the PVT generated original feature images with different scales by using a parallel Sobel edge decoder, so as to obtain a global prediction image; performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module; gradually guiding and gradually generating a multi-stage prediction graph by using the global prediction graph; and comparing the global prediction graph with the multi-stage prediction graph with the truth graph to obtain prediction loss, wherein the final stage of prediction graph is the final polyp segmentation prediction graph. The method can accurately identify and segment polyps of colorectal mirror images, and provides effective help for doctors to diagnose correctly.

Description

Multi-scale polyp segmentation method and system based on PVT
Technical Field
The application belongs to the technical field of deep learning medical image semantic segmentation, and particularly relates to a PVT-based multi-scale polyp segmentation method and system.
Background
Colorectal cancer is a common malignancy, and early detection and treatment of colorectal cancer is of great importance in improving survival of patients. Since colorectal cancer does not have typical symptoms in the early stages of development, the importance of screening colorectal cancer is becoming important, and one of the screening means is colorectal microscopy. In colorectal microscopy, polyps are characterized by a similar color, a variable shape, and a different size to the normal tissue surrounding it, even though small polyps may stick together, and the boundaries of the polyp are often ambiguous. This makes polyp segmentation for colonoscopic imaging results a number of challenges. Traditional medical image segmentation methods, such as a threshold segmentation method and a region growing method, often require manual labeling assisted by a professional doctor, and the segmentation result is time-consuming and labor-consuming due to the influence of illumination conditions, doctor experience, subjective factors and the like, and has larger errors and instability. How to achieve automatic segmentation of the colonoscope image, and thus obtain polyp segmentation results with clearer and more efficient boundaries, becomes one of the hot spots of medical image segmentation.
In recent years, deep learning techniques have been widely applied to medical image segmentation. In colorectal microscopy, polyp segmentation methods based on convolutional neural networks have been widely used. Convolutional neural networks for polyp segmentation have two main typical architectures: U-Net based U-shaped structures, and PraNet architecture. For example, U-Net uses the encoder-decoder architecture to combine low-level and high-level features via a jump connection to effectively preserve spatial locality information, but is susceptible to noise and occlusion. PraNet first uses a Parallel Partial Decoder (PPD) to aggregate elevation features and generate a global map to roughly locate polyps, then uses a Reverse Attention (RA) module to progressively refine regions and boundaries; however, due to the limitations of convolutional neural networks themselves, there are certain problems with the segmentation accuracy and robustness of the model. Therefore, there is a need for improvements to existing models that improve the polyp segmentation performance of imaging results during colorectal microscopy.
Recently, the successful use of transformers in the field of Natural Language Processing (NLP) inspired computer vision researchers, which led to certain applications and developments of transformers in computer vision research tasks. Since the transfomer-based network is good at capturing long-range dependencies of image objects through global self-attention, application of transfomers to polyp segmentation tasks can be considered, namely: in a polyp segmentation task, a transducer is used for learning the dependency relationship between different areas in a colorectal microscopy image, and the segmentation performance and the robustness of a model are improved by utilizing the information; in addition, the training process of the model can be accelerated by using an advanced optimization algorithm, and the convergence rate of the model can be improved. By applying the techniques, the segmentation effect on polyps in colorectal microscopy images can be further improved, and more accurate and reliable diagnosis results can be provided for clinicians.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a PVT-based multi-scale polyp segmentation method and system, which are used for effectively solving the problem that a polyp region cannot be accurately identified in the prior art, further improving the accuracy and semantic integrity of a polyp segmentation boundary in a colorectal microscopy image and realizing accurate, rapid and automatic polyp segmentation.
In order to achieve the above object, the present application provides the following solutions:
a PVT-based multi-scale polyp segmentation method comprising the steps of:
s1, acquiring a colorectal mirror image to be detected, and preprocessing the colorectal mirror image;
s2, performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
s3, using a parallel Sobel edge decoder to fuse the original feature images step by step to obtain a global prediction image;
s4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module;
s5, gradually guiding the original feature map after multi-receptive field feature extraction by using the global predictive map, and gradually generating a multi-stage predictive map;
s6, comparing the global prediction graph with the multi-stage prediction graph and the truth graph, and calculating loss, wherein the obtained final stage prediction graph is the final polyp segmentation prediction graph.
Preferably, in the step S1, the method for preprocessing the colorectal mirror image includes:
and enhancing the colorectal mirror image data by adopting the technologies of random rotation, vertical overturn, horizontal overturn and normalization, and finally uniformly clipping the colorectal mirror image to 352X 352, and scaling the colorectal mirror image by using a {0.75,1,1.25} multiscale strategy.
Preferably, in the step S2, the method for performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network includes:
judging whether the preprocessed colorectal mirror image input into the PVTv2 backbone network is a 3-channel image or not, if the preprocessed colorectal mirror image is the 3-channel image, directly sending the preprocessed colorectal mirror image into the network to perform feature extraction, and if the preprocessed colorectal mirror image is not the 3-channel image, using 1X 1 convolution once to adjust the channel number of the image to 3;
four-stage feature extraction was performed using a pre-trained model of PVTv 2-B2.
Preferably, in the step S3, the method for obtaining a global prediction graph by using a parallel Sobel edge decoder to perform progressive fusion on the original feature graph includes:
s31: compressing the feature map channel using a 1 x 1 convolution in the first branch;
s32: in the second branch, firstly, 1X 1 convolution is used for compressing the characteristic map channel, and then, 1X 3, 3X 1 asymmetric convolution and 3X 3 convolution with the cavity rate of 3 are used for extracting the characteristic of the characteristic map at one time respectively;
s33: in the third branch, firstly, a 1×1 convolution is used for compressing a feature map channel, and then, an asymmetric convolution of 1×5 and 5×1 and a 3×3 convolution with a void ratio of 5 are used for extracting features of the feature map at one time respectively;
s34: in the fourth branch, firstly, a 1×1 convolution is used for compressing the characteristic map channel, and then, an asymmetric convolution of 1×7 and 7×1 and a 3×3 convolution with a void ratio of 7 are used for extracting the characteristic map at one time respectively;
s35: splicing the compressed characteristic map of the first branch with the characteristic maps of the second branch, the third branch and the fourth branch after characteristic extraction in the channel dimension, and then compressing the characteristic map channel by using convolution of 1 multiplied by 1;
s36: adding the compressed spliced feature map and the original feature map compressed by the channel by convolution pixel by pixel, and then inputting the added feature map into Sobel operation after passing through a ReLU nonlinear activation function;
s37: and adding the feature images subjected to gradient sharpening by the Sobel operator pixel by pixel, and generating an initial polyp segmentation global prediction image by carrying out 1X 1 convolution and using bilinear interpolation up-sampling operation.
Preferably, in the step S4, the method for extracting the features of the multiple receptive fields from the original feature map by using the multi-scale parallel hole convolution attention module includes:
s41: carrying out channel compression on the original feature images of four layers of the PVT encoder through one 1X 1 convolution to obtain the number of channels with the original feature imagesA multiple multichannel profile;
s42: the method comprises the steps of uniformly grouping channels of a characteristic diagram after channel compression, and sending the characteristic diagram into four branches for processing, wherein the processing method comprises the following steps: the 3 multiplied by 3 convolution with the void ratio of 1, 3, 5 and 7 is used for extracting the characteristics of the corresponding branches, and then the processing results of the four branches are spliced;
s43: carrying out 1X 1 convolution on the characteristic graphs spliced by the channels, and then sequentially carrying out nonlinear operation of batch normalization BN and ReLU activation functions to obtain a processed characteristic graph;
s44: and sending the processed feature map to a CBAM module, and carrying out further attention weighting on the feature map to obtain a feature map with more differentiation.
Preferably, in the step S5, the method for gradually guiding the original feature map after extracting the multi-receptive field features by using the global prediction map and gradually generating the multi-stage prediction map includes:
s51: the global predictive graph is subjected to space downsampling, so that the resolution is consistent with the resolution of the feature graph in the fourth stage of PVT; then sending into the RA module to perform anti-attention operation so as to generate an attention map; then, performing element-by-element multiplication operation with the feature map of the PVT fourth stage, and further performing pixel-by-pixel addition with the prediction map of the previous stage after feature dimension reduction of three 3×3 convolutions to generate a prediction map of the stage;
s52: the prediction graph of the present stage is sent to the next stage, and the same operation as that of the S51 stage is performed to guide the generation of the final stage feature graph.
Preferably, in the step S6, the method for comparing the global prediction map and the multi-stage prediction map with a truth map, and calculating the loss, where the obtained final stage prediction map is the final polyp segmentation prediction map includes:
the global predictive map and the multi-stage predictive map are subjected to spatial up-sampling operation of bilinear interpolation, the sizes of all predictive maps are adjusted to be the same as the sizes of truth maps corresponding to input images, and the mixing loss of weighted BCE and weighted IOU is calculated;
the weighted BCE loss calculation method is defined as:
wherein G represents a truth chart; p represents a prediction graph; (x, y) represents any pixel position in the image, the corresponding weighting coefficient ω (x, y) is used to represent the pixel (x, y) importance,will +.>Set to 5;
the weighted IOU penalty calculation method is defined as:
the method for obtaining the mixed loss of the prediction graph relative to the truth graph by combining the loss of the weighted BCE and the weighted IOU comprises the following steps:
the application also provides a PVT-based multi-scale polyp segmentation system, which comprises: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;
the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;
the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;
the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;
the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;
the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.
Compared with the prior art, the application has the beneficial effects that:
1. the application adopts PVTv2 as a backbone network to replace a ResNet backbone network in PraNet, so that the network has better global feature extraction capability.
2. The application provides a parallel Sobel edge decoder which fuses feature images under different scales, so that the segmentation effect of a model on polyps with different sizes is improved.
3. The application also provides a multi-scale parallel cavity convolution attention module, which is used for extracting the characteristics of the characteristic images with different scales under the multi-receptive field, and re-distributing weights to the characteristic images by using CBAM to extract the region of interest.
4. The application also uses a new loss function to train the model by introducing a mode of combining weighted BCE loss and weighted IOU loss, thereby eliminating the influence caused by unbalanced distribution of positive and negative samples and further improving the segmentation precision and robustness of the model.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an implementation of a PVT-based multi-scale polyp segmentation method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a PraNet network model structure;
fig. 3 is a schematic diagram of a network model structure of a PVT-based multi-scale polyp segmentation method constructed in an embodiment of the present application;
FIG. 4 is a block diagram of a parallel Sobel edge decoder according to the present application;
FIG. 5 is a schematic diagram of the RFB operation in the parallel Sobel edge decoder module of the present application;
FIG. 6 is a multi-scale parallel hole convolution attention module of the present application;
fig. 7 is an experimental visual contrast diagram of a PVT-based multi-scale polyp segmentation method in an embodiment.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.
Example 1
Fig. 1 is a schematic flow chart of an implementation of a PVT-based multi-scale polyp segmentation method according to an embodiment of the present application. As shown in fig. 1, the PVT-based multi-scale polyp segmentation method comprises the following steps:
and S1, obtaining colorectal mirror images to be detected in Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS and CVC-T, and preprocessing the images.
In practice, the preprocessing may include the processing modes of random rotation, vertical overturn, horizontal overturn, normalization, and the like of the colorectal mirror image to be detected, and the colorectal mirror image to be detected is processed into an image meeting the detection requirements. The image is then uniformly sized to 352 rows by 352 columns and scaled using a multi-scale strategy of {0.75,1,1.25}, which preprocessing techniques can provide more reliable input data for the neural network model so that the network can handle polyps of different sizes, thereby improving the segmentation effect of polyps in colorectal microscopy.
And S2, performing multi-scale feature extraction on the preprocessed image by using a PVTv2 backbone network.
Referring to fig. 2, which shows a PraNet network model structure, praNet is a parallel reverse attention network that can accurately segment polyps from colonoscopic images. The network firstly utilizes a Parallel Partial Decoder (PPD) to aggregate advanced features, so as to generate an initial global predictive picture which guides the following steps; the relationship between the target area and the boundary is then established by using the reverse attention module, and the complementarity between the edge and the area is fully exerted. But due to the limited acceptance domain of the PraNet backbone network, only local information can be captured, while spatial context and global information are ignored. The application improves the method, and fig. 3 is a schematic diagram of a network model structure of a multi-scale polyp segmentation method based on PVT constructed in the embodiment of the application.
In the embodiment of the application, the PVTv2 backbone network is used for multi-scale feature extraction. PVTv2 is one of the most advanced pre-training models currently, which can provide more accurate and robust feature extraction capabilities with respect to input images due to its architecture of Pyramid Vision Transformer, and which can perform well in a variety of visual tasks including image classification, object detection, and segmentation. Polyps of different resolutions in colorectal mirror images can be better handled by using PVTv2 backbone networks.
PVTv2 marks images with overlapping patch embedding in order to model local continuity information. We expand the patch window, overlap adjacent windows by half, and fill the feature map with zeros to preserve resolution. To ensure PVTv2 has the same linear complexity as CNN, we use zero-filled convolution to achieve overlapping patch embedding. The specific operation of multi-scale feature extraction using PVTv2 backbone network is as follows:
s201, first check whether the network-inputted preprocessed colorectal mirror image is a 3-channel image.
S202, if the image is a 3-channel image, the image is directly sent into a network for feature extraction; if the image is not a 3-channel image, channel adjustment is performed by one 1×1 convolution, so that the number of image channels after adjustment is changed to 3.
S203, the model uses a pre-training model of PVTv2-B2 to conduct four-stage feature extraction. In the four stages, the layers of PVTv2 basic units are respectively 3, 6 and 3, and the generated four-layer original characteristic diagram X i The dimensions (i.e., number of channels x number of rows x number of columns) are: 64× (H/4) × (W/4), 128× (H/8) × (W/8), 320× (H/16) × (W/16), 512× (H/32) × (W/32).
And step S3, performing step-by-step fusion on the original feature images with different scales generated by PVT by using a parallel Sobel edge decoder to obtain a global prediction image. The parallel Sobel edge decoder module of the present application as shown in fig. 4 operates as follows:
as shown in FIG. 4, X in the figure 1 、X 2 、X 3 、X 4 Is the original feature map of four levels of the PVT stage. We perform RFB operations on it in parallel, the specific operation of which is shown in fig. 5, in X 1 For the purposes of illustration, X 1 A convolution operation of four branches is required. The operation is as follows:
s301, the number of channels of the feature map is compressed by using a 1×1 convolution in the first branch, and for the sake of calculation, we compress the number of channels of all features to 32.
S302, in the second branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×3 and 3×1 and a convolution of 3×3 with a void ratio of 3, respectively.
S303, in the third branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×5 and 5×1 and a convolution of 3×3 with a void ratio of 5, respectively.
S304, in the fourth branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×7 and 7×1 and a convolution of 3×3 with a void ratio of 7, respectively, at a time.
S305, in the fourth branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×7 and 7×1 and a convolution of 3×3 with a void ratio of 7, respectively.
S306, splicing the feature graphs of the four branches in the channel dimension, and then compressing the number of channels of the feature graph by using 1X 1 convolution.
S307, the feature map and the original feature map compressed by the channel with the convolution of 1 multiplied by 1 are added pixel by pixel, and then are subjected to nonlinear operation of a ReLU activation function, and then Sobel operation is performed.
X 2 、X 3 、X 4 The RFB operation of the feature map is also as above.
And performing gradient sharpening on the four processed feature maps by using a Sobel operator, adding pixel by pixel, performing 1X 1 convolution, and performing spatial up-sampling operation based on bilinear interpolation to generate an initial polyp segmentation global prediction map.
And S4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module. The multi-scale parallel cavity convolution attention module of the application is shown in fig. 6, and specifically operates as follows:
s401, respectively sending the original feature graphs of four layers of the PVT encoder into a multi-scale parallel cavity convolution attention module for further feature extraction, and firstly compressing a 1X 1 convolution channel to enable the number of the processed feature graph channels to be the number of the original feature graph channels
S402, uniformly dividing the processed feature map into four groups according to channels, respectively sending the four groups of feature maps into four branches for processing, namely respectively carrying out feature extraction by using 3X 3 convolution with void ratios of 1, 3, 5 and 7, and then carrying out channel splicing on processing results.
S403, carrying out 1×1 convolution on the characteristic diagram of channel splicing, and then sequentially carrying out nonlinear activation of batch normalization BN and ReLU functions.
S404, inputting the feature map obtained by the processing in S403 into a CBAM module, and further enhancing the attention of the feature map. Wherein the CBAM module mainly comprises two parts: channel attention and spatial attention. The channel attention module carries out importance allocation among channels with global significance on the feature map, so that redundant calculation is reduced; the spatial attention module then retains more effective characteristic information of the spatial locality by focusing on different degrees of spatial locality information. The feature map through the CBAM module is further optimized to obtain a more differentiated feature map. Finally, feature maps of four dimensions (i.e., the number of channels×the number of rows×the number of columns) of 64× (H/4), 128× (H/8) × (W/8), 320× (H/16) × (W/16), 512× (H/32) × (W/32) are generated.
Step S5, gradually guiding and gradually generating a multi-stage prediction graph by using the global prediction graph. The operation is the same as PraNet, and the specific operation is as follows:
s501, performing spatial downsampling on the global predictive map to enable the resolution of the global predictive map to be consistent with that of the feature map in the PVT fourth stage, then sending the global predictive map into an RA module to perform anti-attention operation to generate an attention map, then performing element-by-element multiplication operation on the global predictive map and the feature map in the PVT fourth stage, performing feature dimension reduction by using three 3X 3 convolutions, and further performing pixel-by-pixel addition on the global predictive map and the predictive map in the previous stage to generate the predictive map in the current stage.
S502, the prediction graph of the stage is sent to the next stage, and the same operation as that of the stage S501 is carried out to guide the generation of the final stage of feature graph.
The four feature maps generated in the S4 are guided step by step and the prediction map is generated step by using the global prediction map generated by the parallel Sobel edge decoder, and the specific operation is as follows:
s501: performing guide prediction by using the global prediction graph generated by S3 and the feature graph with the size of 512× (H/32) × (W/32) generated by S4, firstly performing space downsampling operation on the global prediction graph generated by S3, and reducing the size of the global prediction graph to the size of the feature graph of S4; then, the predicted map with a size of 1× (H/32) × (W/32) is generated by fusing with the S4 feature map. The guided prediction method can combine global and local information to more accurately segment polyps of colorectal mirror images.
S502: performing guided prediction on the feature map with the size of 320× (H/16) × (W/16) generated by S4 by using the prediction map generated by S501, and firstly expanding the size of the prediction map generated by S501 to the size of the feature map of S4 through a spatial upsampling operation; then fusing the S4 feature map to generate a prediction map with the size of 1× (H/16) × (W/16); then, the next stage is guided and predicted in order, and two polyp segmentation prediction maps with sizes of 1× (H/8) × (W/8) and 1× (H/4) × (W/4) are generated in order.
And S6, comparing the generated global prediction graph and the multi-stage prediction graph with a truth value graph, calculating the prediction loss, and obtaining the final polyp segmentation prediction graph by the final stage of prediction graph. The specific operation is as follows:
s601, performing spatial up-sampling operation of bilinear interpolation on the five prediction graphs, adjusting the size of the prediction graphs to be the same as the size of a truth graph corresponding to the input image, and calculating a mixing loss L composed of weighted BCE and weighted IOU seg
The calculation method of the weighted BCE loss is as follows:
g represents a truth diagram, P represents a predictive diagram, and (x, y) represents an arbitrary pixel position in the image.
The weight coefficient ω (x, y) represents the importance of the pixel (x, y) by the following calculation method:
in the concrete calculation, we willSet to 5.
The weighted IOU loss calculation method is as follows:
the mixing loss function of the final prediction graph P relative to the truth graph G is:
wherein the method comprises the steps ofAnd->Is a weighted IOU penalty and a weighted BCE penalty.
To illustrate the effectiveness of the method, the method is trained using the Kvasir-SEG, CVC-clinical db data sets as training sets, then tested using the Kvasir-SEG, CVC-clinical db, CVC-ColonDB, ETIS, and CVC-T data sets, and the test results are compared with the polyp segmentation algorithm mainstream in the prior art. In the specific experimental process, based on a PyTorch 1.8 deep learning framework, training by using 1 NVIDIA RTX 2080Ti video card, wherein the video memory is 11G; the input image size was set to 352 x 352, the initial learning rate 1e-4, using an AdamW optimizer, the batch size was set to 6, and the total training round number was 100epochs. The test results at the Kvasir-SEG, CVC-clinic db dataset are shown in table 1:
TABLE 1 test results of the application and other 7 methods for polyp segmentation in Kvasir-SEG, CVC-ClinicDB datasets
TABLE 1
The experimental results in CVC-ClinicDB, CVC-ColonDB, ETIS data sets are shown in Table 2:
TABLE 2 test results of the application and other 7 methods for polyp segmentation in CVC-ClinicDB, CVC-ColonDB, ETIS dataset
TABLE 2
Among the 7 evaluation indexes, the first 2 are the evaluation indexes commonly used in semantic segmentation tasks, and the closer the values of the first 6 indexes are to 1, the better the segmentation result is; the 7 th index is not lower than 0, and the value is better as the value is close to 0.
Fig. 7 is a visual comparison chart of experimental results of the method and other methods according to the present application, and from the results, the polyp segmentation method according to the present application can obtain segmentation results with more accurate boundaries and more complete semantic structures.
Example two
The application also provides a PVT-based multi-scale polyp segmentation system, which comprises: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;
the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;
the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;
the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;
the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;
the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.
The above embodiments are merely illustrative of the preferred embodiments of the present application, and the scope of the present application is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present application pertains are made without departing from the spirit of the present application, and all modifications and improvements fall within the scope of the present application as defined in the appended claims.

Claims (8)

1. The PVT-based multi-scale polyp segmentation method is characterized by comprising the following steps of:
s1, acquiring a colorectal mirror image to be detected, and preprocessing the colorectal mirror image;
s2, performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
s3, using a parallel Sobel edge decoder to fuse the original feature images step by step to obtain a global prediction image;
s4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module;
s5, gradually guiding the original feature map after multi-receptive field feature extraction by using the global predictive map, and gradually generating a multi-stage predictive map;
s6, comparing the global prediction graph with the multi-stage prediction graph and the truth graph, and calculating loss, wherein the obtained final stage prediction graph is the final polyp segmentation prediction graph.
2. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S1, the method of preprocessing the colorectal scopic image comprises:
and enhancing the colorectal mirror image data by adopting the technologies of random rotation, vertical overturn, horizontal overturn and normalization, and finally uniformly clipping the colorectal mirror image to 352X 352, and scaling the colorectal mirror image by using a {0.75,1,1.25} multiscale strategy.
3. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein the step S2 of performing multi-scale feature extraction on the pre-processed colorectal mirror image using PVTv2 backbone network comprises:
judging whether the preprocessed colorectal mirror image input into the PVTv2 backbone network is a 3-channel image or not, if the preprocessed colorectal mirror image is the 3-channel image, directly sending the preprocessed colorectal mirror image into the network to perform feature extraction, and if the preprocessed colorectal mirror image is not the 3-channel image, using 1X 1 convolution once to adjust the channel number of the image to 3;
four-stage feature extraction was performed using a pre-trained model of PVTv 2-B2.
4. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S3, the step-by-step fusion of the original feature map using a parallel Sobel edge decoder, the method for obtaining a global prediction map comprises:
s31: compressing the feature map channel using a 1 x 1 convolution in the first branch;
s32: in the second branch, firstly, 1X 1 convolution is used for compressing the characteristic map channel, and then, 1X 3, 3X 1 asymmetric convolution and 3X 3 convolution with the cavity rate of 3 are used for extracting the characteristic of the characteristic map at one time respectively;
s33: in the third branch, firstly, a 1×1 convolution is used for compressing a feature map channel, and then, an asymmetric convolution of 1×5 and 5×1 and a 3×3 convolution with a void ratio of 5 are used for extracting features of the feature map at one time respectively;
s34: in the fourth branch, firstly, a 1×1 convolution is used for compressing the characteristic map channel, and then, an asymmetric convolution of 1×7 and 7×1 and a 3×3 convolution with a void ratio of 7 are used for extracting the characteristic map at one time respectively;
s35: splicing the compressed characteristic map of the first branch with the characteristic maps of the second branch, the third branch and the fourth branch after characteristic extraction in the channel dimension, and then compressing the characteristic map channel by using convolution of 1 multiplied by 1;
s36: adding the compressed spliced feature map and the original feature map compressed by the channel by convolution pixel by pixel, and then inputting the added feature map into Sobel operation after passing through a ReLU nonlinear activation function;
s37: and adding the feature images subjected to gradient sharpening by the Sobel operator pixel by pixel, and generating an initial polyp segmentation global prediction image by carrying out 1X 1 convolution and using bilinear interpolation up-sampling operation.
5. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein the method of multi-receptive field feature extraction of the original feature map using a multi-scale parallel hole convolution attention module in S4 comprises:
s41: the original feature images of four layers of the PVT encoder are subjected to channel compression through a 1X 1 convolution to obtain the PVT encoder with the following characteristicsWith the number of channels of the original feature mapA multiple multichannel profile;
s42: the method comprises the steps of uniformly grouping channels of a characteristic diagram after channel compression, and sending the characteristic diagram into four branches for processing, wherein the processing method comprises the following steps: the 3 multiplied by 3 convolution with the void ratio of 1, 3, 5 and 7 is used for extracting the characteristics of the corresponding branches, and then the processing results of the four branches are spliced;
s43: carrying out 1X 1 convolution on the characteristic graphs spliced by the channels, and then sequentially carrying out nonlinear operation of batch normalization BN and ReLU activation functions to obtain a processed characteristic graph;
s44: and sending the processed feature map to a CBAM module, and carrying out further attention weighting on the feature map to obtain a feature map with more differentiation.
6. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S5, the method for gradually guiding the original feature map after multi-receptive field feature extraction using the global prediction map and gradually generating a multi-stage prediction map comprises:
s51: the global predictive graph is subjected to space downsampling, so that the resolution is consistent with the resolution of the feature graph in the fourth stage of PVT; then sending into the RA module to perform anti-attention operation so as to generate an attention map; then, performing element-by-element multiplication operation with the feature map of the PVT fourth stage, and further performing pixel-by-pixel addition with the prediction map of the previous stage after feature dimension reduction of three 3×3 convolutions to generate a prediction map of the stage;
s52: the prediction graph of the present stage is sent to the next stage, and the same operation as that of the S51 stage is performed to guide the generation of the final stage feature graph.
7. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S6, the global prediction map and the multi-stage prediction map are compared with a truth map, a loss is calculated, and the obtained final stage prediction map is a final polyp segmentation prediction map, comprising:
the global predictive map and the multi-stage predictive map are subjected to spatial up-sampling operation of bilinear interpolation, the sizes of all predictive maps are adjusted to be the same as the sizes of truth maps corresponding to input images, and the mixing loss of weighted BCE and weighted IOU is calculated;
the weighted BCE loss calculation method is defined as:
wherein G represents a truth chart; p represents a prediction graph; (x, y) represents any pixel position in the image, the corresponding weighting coefficient ω (x, y) is used to represent the pixel (x, y) importance,will +.>Set to 5;
the weighted IOU penalty calculation method is defined as:
the method for obtaining the mixed loss of the prediction graph relative to the truth graph by combining the loss of the weighted BCE and the weighted IOU comprises the following steps:
8. a PVT-based multi-scale polyp segmentation system, comprising: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;
the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;
the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;
the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;
the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;
the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.
CN202311097260.0A 2023-08-29 2023-08-29 Multi-scale polyp segmentation method and system based on PVT Active CN117132774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311097260.0A CN117132774B (en) 2023-08-29 2023-08-29 Multi-scale polyp segmentation method and system based on PVT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311097260.0A CN117132774B (en) 2023-08-29 2023-08-29 Multi-scale polyp segmentation method and system based on PVT

Publications (2)

Publication Number Publication Date
CN117132774A true CN117132774A (en) 2023-11-28
CN117132774B CN117132774B (en) 2024-03-01

Family

ID=88859454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311097260.0A Active CN117132774B (en) 2023-08-29 2023-08-29 Multi-scale polyp segmentation method and system based on PVT

Country Status (1)

Country Link
CN (1) CN117132774B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117338556A (en) * 2023-12-06 2024-01-05 四川大学华西医院 Gastrointestinal endoscopy pressing system
CN117392157A (en) * 2023-12-13 2024-01-12 长春理工大学 Edge-aware protective cultivation straw coverage rate detection method
CN117853432A (en) * 2023-12-26 2024-04-09 北京长木谷医疗科技股份有限公司 Hybrid model-based osteoarthropathy identification method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220138548A1 (en) * 2022-01-18 2022-05-05 Intel Corporation Analog hardware implementation of activation functions
CN114820635A (en) * 2022-04-21 2022-07-29 重庆理工大学 Polyp segmentation method combining attention U-shaped network and multi-scale feature fusion
CN115331024A (en) * 2022-08-22 2022-11-11 浙江工业大学 Intestinal polyp detection method based on deep supervision and gradual learning
CN115601330A (en) * 2022-10-20 2023-01-13 湖北工业大学(Cn) Colonic polyp segmentation method based on multi-scale space reverse attention mechanism
CN115841495A (en) * 2022-12-19 2023-03-24 安徽大学 Polyp segmentation method and system based on double-boundary guiding attention exploration
CN115965596A (en) * 2022-12-26 2023-04-14 深圳英美达医疗技术有限公司 Blood vessel identification method and device, electronic equipment and readable storage medium
CN116630245A (en) * 2023-05-05 2023-08-22 浙江工业大学 Polyp segmentation method based on saliency map guidance and uncertainty semantic enhancement

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220138548A1 (en) * 2022-01-18 2022-05-05 Intel Corporation Analog hardware implementation of activation functions
CN114820635A (en) * 2022-04-21 2022-07-29 重庆理工大学 Polyp segmentation method combining attention U-shaped network and multi-scale feature fusion
CN115331024A (en) * 2022-08-22 2022-11-11 浙江工业大学 Intestinal polyp detection method based on deep supervision and gradual learning
CN115601330A (en) * 2022-10-20 2023-01-13 湖北工业大学(Cn) Colonic polyp segmentation method based on multi-scale space reverse attention mechanism
CN115841495A (en) * 2022-12-19 2023-03-24 安徽大学 Polyp segmentation method and system based on double-boundary guiding attention exploration
CN115965596A (en) * 2022-12-26 2023-04-14 深圳英美达医疗技术有限公司 Blood vessel identification method and device, electronic equipment and readable storage medium
CN116630245A (en) * 2023-05-05 2023-08-22 浙江工业大学 Polyp segmentation method based on saliency map guidance and uncertainty semantic enhancement

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
VITTORINO MANDUJANO-CORNEJO等: "Polyp2Seg: Improved Polyp Segmentation with Vision Transformer", 《MIUA 2022: MEDICAL IMAGE UNDERSTANDING AND ANALYSIS》, vol. 3413, pages 519 - 534 *
ZHAOJIAN YAO等: "Object localization and edge refinement network for salient object detection", 《EXPERT SYSTEMS WITH APPLICATIONS》, vol. 213, pages 1 - 18 *
傅寰宇: "基于卷积神经网络的CT图像肺结节检测研究", 《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》, no. 06, pages 072 - 113 *
雷莹: "基于视网膜影像的医学辅助诊断方法研究", 《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》, no. 11, pages 073 - 8 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117338556A (en) * 2023-12-06 2024-01-05 四川大学华西医院 Gastrointestinal endoscopy pressing system
CN117338556B (en) * 2023-12-06 2024-03-29 四川大学华西医院 Gastrointestinal endoscopy pressing system
CN117392157A (en) * 2023-12-13 2024-01-12 长春理工大学 Edge-aware protective cultivation straw coverage rate detection method
CN117392157B (en) * 2023-12-13 2024-03-19 长春理工大学 Edge-aware protective cultivation straw coverage rate detection method
CN117853432A (en) * 2023-12-26 2024-04-09 北京长木谷医疗科技股份有限公司 Hybrid model-based osteoarthropathy identification method and device

Also Published As

Publication number Publication date
CN117132774B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN117132774B (en) Multi-scale polyp segmentation method and system based on PVT
CN111784671B (en) Pathological image focus region detection method based on multi-scale deep learning
CN108268870B (en) Multi-scale feature fusion ultrasonic image semantic segmentation method based on counterstudy
CN109191476B (en) Novel biomedical image automatic segmentation method based on U-net network structure
CN113674253A (en) Rectal cancer CT image automatic segmentation method based on U-transducer
CN115661144B (en) Adaptive medical image segmentation method based on deformable U-Net
CN112712528B (en) Intestinal tract focus segmentation method combining multi-scale U-shaped residual error encoder and integral reverse attention mechanism
CN115409733A (en) Low-dose CT image noise reduction method based on image enhancement and diffusion model
CN111951288A (en) Skin cancer lesion segmentation method based on deep learning
CN115170582A (en) Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism
CN113610859B (en) Automatic thyroid nodule segmentation method based on ultrasonic image
CN110991254B (en) Ultrasonic image video classification prediction method and system
CN115471470A (en) Esophageal cancer CT image segmentation method
CN116579982A (en) Pneumonia CT image segmentation method, device and equipment
CN111916206A (en) CT image auxiliary diagnosis system based on cascade connection
CN115272170A (en) Prostate MRI (magnetic resonance imaging) image segmentation method and system based on self-adaptive multi-scale transform optimization
CN116757986A (en) Infrared and visible light image fusion method and device
CN115100165A (en) Colorectal cancer T staging method and system based on tumor region CT image
CN117522891A (en) 3D medical image segmentation system and method
CN117726814A (en) Retinal vessel segmentation method based on cross attention and double branch pooling fusion
CN116958736A (en) RGB-D significance target detection method based on cross-modal edge guidance
CN116188396A (en) Image segmentation method, device, equipment and medium
CN113239978B (en) Method and device for correlation of medical image preprocessing model and analysis model
CN118229712A (en) Liver tumor image segmentation system based on enhanced multidimensional feature perception
Zhang Application of Deep Learning in Super-resolution Processing of Face Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant