CN117132774A

CN117132774A - Multi-scale polyp segmentation method and system based on PVT

Info

Publication number: CN117132774A
Application number: CN202311097260.0A
Authority: CN
Inventors: 张朝晖; 杨超荣; 何哲远; 王威; 刘晨光; 黄丽娜
Original assignee: Hebei Normal University
Current assignee: Hebei Normal University
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-11-28
Anticipated expiration: 2043-08-29
Also published as: CN117132774B

Abstract

The application discloses a PVT-based multi-scale polyp segmentation method and system, and relates to the technical field of deep learning medical image semantic segmentation. The application comprises the following steps: obtaining a colorectal mirror image to be detected, and performing image preprocessing; carrying out multi-scale feature extraction on the preprocessed image by using a PVTv2 backbone network; step-by-step fusion is carried out on the PVT generated original feature images with different scales by using a parallel Sobel edge decoder, so as to obtain a global prediction image; performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module; gradually guiding and gradually generating a multi-stage prediction graph by using the global prediction graph; and comparing the global prediction graph with the multi-stage prediction graph with the truth graph to obtain prediction loss, wherein the final stage of prediction graph is the final polyp segmentation prediction graph. The method can accurately identify and segment polyps of colorectal mirror images, and provides effective help for doctors to diagnose correctly.

Description

Multi-scale polyp segmentation method and system based on PVT

Technical Field

The application belongs to the technical field of deep learning medical image semantic segmentation, and particularly relates to a PVT-based multi-scale polyp segmentation method and system.

Background

Colorectal cancer is a common malignancy, and early detection and treatment of colorectal cancer is of great importance in improving survival of patients. Since colorectal cancer does not have typical symptoms in the early stages of development, the importance of screening colorectal cancer is becoming important, and one of the screening means is colorectal microscopy. In colorectal microscopy, polyps are characterized by a similar color, a variable shape, and a different size to the normal tissue surrounding it, even though small polyps may stick together, and the boundaries of the polyp are often ambiguous. This makes polyp segmentation for colonoscopic imaging results a number of challenges. Traditional medical image segmentation methods, such as a threshold segmentation method and a region growing method, often require manual labeling assisted by a professional doctor, and the segmentation result is time-consuming and labor-consuming due to the influence of illumination conditions, doctor experience, subjective factors and the like, and has larger errors and instability. How to achieve automatic segmentation of the colonoscope image, and thus obtain polyp segmentation results with clearer and more efficient boundaries, becomes one of the hot spots of medical image segmentation.

In recent years, deep learning techniques have been widely applied to medical image segmentation. In colorectal microscopy, polyp segmentation methods based on convolutional neural networks have been widely used. Convolutional neural networks for polyp segmentation have two main typical architectures: U-Net based U-shaped structures, and PraNet architecture. For example, U-Net uses the encoder-decoder architecture to combine low-level and high-level features via a jump connection to effectively preserve spatial locality information, but is susceptible to noise and occlusion. PraNet first uses a Parallel Partial Decoder (PPD) to aggregate elevation features and generate a global map to roughly locate polyps, then uses a Reverse Attention (RA) module to progressively refine regions and boundaries; however, due to the limitations of convolutional neural networks themselves, there are certain problems with the segmentation accuracy and robustness of the model. Therefore, there is a need for improvements to existing models that improve the polyp segmentation performance of imaging results during colorectal microscopy.

Recently, the successful use of transformers in the field of Natural Language Processing (NLP) inspired computer vision researchers, which led to certain applications and developments of transformers in computer vision research tasks. Since the transfomer-based network is good at capturing long-range dependencies of image objects through global self-attention, application of transfomers to polyp segmentation tasks can be considered, namely: in a polyp segmentation task, a transducer is used for learning the dependency relationship between different areas in a colorectal microscopy image, and the segmentation performance and the robustness of a model are improved by utilizing the information; in addition, the training process of the model can be accelerated by using an advanced optimization algorithm, and the convergence rate of the model can be improved. By applying the techniques, the segmentation effect on polyps in colorectal microscopy images can be further improved, and more accurate and reliable diagnosis results can be provided for clinicians.

Disclosure of Invention

Aiming at the defects of the prior art, the application provides a PVT-based multi-scale polyp segmentation method and system, which are used for effectively solving the problem that a polyp region cannot be accurately identified in the prior art, further improving the accuracy and semantic integrity of a polyp segmentation boundary in a colorectal microscopy image and realizing accurate, rapid and automatic polyp segmentation.

In order to achieve the above object, the present application provides the following solutions:

a PVT-based multi-scale polyp segmentation method comprising the steps of:

s1, acquiring a colorectal mirror image to be detected, and preprocessing the colorectal mirror image;

s2, performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;

s3, using a parallel Sobel edge decoder to fuse the original feature images step by step to obtain a global prediction image;

s4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module;

s5, gradually guiding the original feature map after multi-receptive field feature extraction by using the global predictive map, and gradually generating a multi-stage predictive map;

s6, comparing the global prediction graph with the multi-stage prediction graph and the truth graph, and calculating loss, wherein the obtained final stage prediction graph is the final polyp segmentation prediction graph.

Preferably, in the step S1, the method for preprocessing the colorectal mirror image includes:

and enhancing the colorectal mirror image data by adopting the technologies of random rotation, vertical overturn, horizontal overturn and normalization, and finally uniformly clipping the colorectal mirror image to 352X 352, and scaling the colorectal mirror image by using a {0.75,1,1.25} multiscale strategy.

Preferably, in the step S2, the method for performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network includes:

judging whether the preprocessed colorectal mirror image input into the PVTv2 backbone network is a 3-channel image or not, if the preprocessed colorectal mirror image is the 3-channel image, directly sending the preprocessed colorectal mirror image into the network to perform feature extraction, and if the preprocessed colorectal mirror image is not the 3-channel image, using 1X 1 convolution once to adjust the channel number of the image to 3;

four-stage feature extraction was performed using a pre-trained model of PVTv 2-B2.

Preferably, in the step S3, the method for obtaining a global prediction graph by using a parallel Sobel edge decoder to perform progressive fusion on the original feature graph includes:

s31: compressing the feature map channel using a 1 x 1 convolution in the first branch;

s32: in the second branch, firstly, 1X 1 convolution is used for compressing the characteristic map channel, and then, 1X 3, 3X 1 asymmetric convolution and 3X 3 convolution with the cavity rate of 3 are used for extracting the characteristic of the characteristic map at one time respectively;

s33: in the third branch, firstly, a 1×1 convolution is used for compressing a feature map channel, and then, an asymmetric convolution of 1×5 and 5×1 and a 3×3 convolution with a void ratio of 5 are used for extracting features of the feature map at one time respectively;

s34: in the fourth branch, firstly, a 1×1 convolution is used for compressing the characteristic map channel, and then, an asymmetric convolution of 1×7 and 7×1 and a 3×3 convolution with a void ratio of 7 are used for extracting the characteristic map at one time respectively;

s35: splicing the compressed characteristic map of the first branch with the characteristic maps of the second branch, the third branch and the fourth branch after characteristic extraction in the channel dimension, and then compressing the characteristic map channel by using convolution of 1 multiplied by 1;

s36: adding the compressed spliced feature map and the original feature map compressed by the channel by convolution pixel by pixel, and then inputting the added feature map into Sobel operation after passing through a ReLU nonlinear activation function;

s37: and adding the feature images subjected to gradient sharpening by the Sobel operator pixel by pixel, and generating an initial polyp segmentation global prediction image by carrying out 1X 1 convolution and using bilinear interpolation up-sampling operation.

Preferably, in the step S4, the method for extracting the features of the multiple receptive fields from the original feature map by using the multi-scale parallel hole convolution attention module includes:

s41: carrying out channel compression on the original feature images of four layers of the PVT encoder through one 1X 1 convolution to obtain the number of channels with the original feature imagesA multiple multichannel profile;

s42: the method comprises the steps of uniformly grouping channels of a characteristic diagram after channel compression, and sending the characteristic diagram into four branches for processing, wherein the processing method comprises the following steps: the 3 multiplied by 3 convolution with the void ratio of 1, 3, 5 and 7 is used for extracting the characteristics of the corresponding branches, and then the processing results of the four branches are spliced;

s43: carrying out 1X 1 convolution on the characteristic graphs spliced by the channels, and then sequentially carrying out nonlinear operation of batch normalization BN and ReLU activation functions to obtain a processed characteristic graph;

s44: and sending the processed feature map to a CBAM module, and carrying out further attention weighting on the feature map to obtain a feature map with more differentiation.

Preferably, in the step S5, the method for gradually guiding the original feature map after extracting the multi-receptive field features by using the global prediction map and gradually generating the multi-stage prediction map includes:

s51: the global predictive graph is subjected to space downsampling, so that the resolution is consistent with the resolution of the feature graph in the fourth stage of PVT; then sending into the RA module to perform anti-attention operation so as to generate an attention map; then, performing element-by-element multiplication operation with the feature map of the PVT fourth stage, and further performing pixel-by-pixel addition with the prediction map of the previous stage after feature dimension reduction of three 3×3 convolutions to generate a prediction map of the stage;

s52: the prediction graph of the present stage is sent to the next stage, and the same operation as that of the S51 stage is performed to guide the generation of the final stage feature graph.

Preferably, in the step S6, the method for comparing the global prediction map and the multi-stage prediction map with a truth map, and calculating the loss, where the obtained final stage prediction map is the final polyp segmentation prediction map includes:

the global predictive map and the multi-stage predictive map are subjected to spatial up-sampling operation of bilinear interpolation, the sizes of all predictive maps are adjusted to be the same as the sizes of truth maps corresponding to input images, and the mixing loss of weighted BCE and weighted IOU is calculated;

the weighted BCE loss calculation method is defined as:

wherein G represents a truth chart; p represents a prediction graph; (x, y) represents any pixel position in the image, the corresponding weighting coefficient ω (x, y) is used to represent the pixel (x, y) importance,will +.>Set to 5;

the weighted IOU penalty calculation method is defined as:

the method for obtaining the mixed loss of the prediction graph relative to the truth graph by combining the loss of the weighted BCE and the weighted IOU comprises the following steps:

the application also provides a PVT-based multi-scale polyp segmentation system, which comprises: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;

the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;

the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;

the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;

the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;

the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;

the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.

Compared with the prior art, the application has the beneficial effects that:

1. the application adopts PVTv2 as a backbone network to replace a ResNet backbone network in PraNet, so that the network has better global feature extraction capability.

2. The application provides a parallel Sobel edge decoder which fuses feature images under different scales, so that the segmentation effect of a model on polyps with different sizes is improved.

3. The application also provides a multi-scale parallel cavity convolution attention module, which is used for extracting the characteristics of the characteristic images with different scales under the multi-receptive field, and re-distributing weights to the characteristic images by using CBAM to extract the region of interest.

4. The application also uses a new loss function to train the model by introducing a mode of combining weighted BCE loss and weighted IOU loss, thereby eliminating the influence caused by unbalanced distribution of positive and negative samples and further improving the segmentation precision and robustness of the model.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an implementation of a PVT-based multi-scale polyp segmentation method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a PraNet network model structure;

fig. 3 is a schematic diagram of a network model structure of a PVT-based multi-scale polyp segmentation method constructed in an embodiment of the present application;

FIG. 4 is a block diagram of a parallel Sobel edge decoder according to the present application;

FIG. 5 is a schematic diagram of the RFB operation in the parallel Sobel edge decoder module of the present application;

FIG. 6 is a multi-scale parallel hole convolution attention module of the present application;

fig. 7 is an experimental visual contrast diagram of a PVT-based multi-scale polyp segmentation method in an embodiment.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.

Example 1

Fig. 1 is a schematic flow chart of an implementation of a PVT-based multi-scale polyp segmentation method according to an embodiment of the present application. As shown in fig. 1, the PVT-based multi-scale polyp segmentation method comprises the following steps:

and S1, obtaining colorectal mirror images to be detected in Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS and CVC-T, and preprocessing the images.

In practice, the preprocessing may include the processing modes of random rotation, vertical overturn, horizontal overturn, normalization, and the like of the colorectal mirror image to be detected, and the colorectal mirror image to be detected is processed into an image meeting the detection requirements. The image is then uniformly sized to 352 rows by 352 columns and scaled using a multi-scale strategy of {0.75,1,1.25}, which preprocessing techniques can provide more reliable input data for the neural network model so that the network can handle polyps of different sizes, thereby improving the segmentation effect of polyps in colorectal microscopy.

And S2, performing multi-scale feature extraction on the preprocessed image by using a PVTv2 backbone network.

Referring to fig. 2, which shows a PraNet network model structure, praNet is a parallel reverse attention network that can accurately segment polyps from colonoscopic images. The network firstly utilizes a Parallel Partial Decoder (PPD) to aggregate advanced features, so as to generate an initial global predictive picture which guides the following steps; the relationship between the target area and the boundary is then established by using the reverse attention module, and the complementarity between the edge and the area is fully exerted. But due to the limited acceptance domain of the PraNet backbone network, only local information can be captured, while spatial context and global information are ignored. The application improves the method, and fig. 3 is a schematic diagram of a network model structure of a multi-scale polyp segmentation method based on PVT constructed in the embodiment of the application.

In the embodiment of the application, the PVTv2 backbone network is used for multi-scale feature extraction. PVTv2 is one of the most advanced pre-training models currently, which can provide more accurate and robust feature extraction capabilities with respect to input images due to its architecture of Pyramid Vision Transformer, and which can perform well in a variety of visual tasks including image classification, object detection, and segmentation. Polyps of different resolutions in colorectal mirror images can be better handled by using PVTv2 backbone networks.

PVTv2 marks images with overlapping patch embedding in order to model local continuity information. We expand the patch window, overlap adjacent windows by half, and fill the feature map with zeros to preserve resolution. To ensure PVTv2 has the same linear complexity as CNN, we use zero-filled convolution to achieve overlapping patch embedding. The specific operation of multi-scale feature extraction using PVTv2 backbone network is as follows:

s201, first check whether the network-inputted preprocessed colorectal mirror image is a 3-channel image.

S202, if the image is a 3-channel image, the image is directly sent into a network for feature extraction; if the image is not a 3-channel image, channel adjustment is performed by one 1×1 convolution, so that the number of image channels after adjustment is changed to 3.

S203, the model uses a pre-training model of PVTv2-B2 to conduct four-stage feature extraction. In the four stages, the layers of PVTv2 basic units are respectively 3, 6 and 3, and the generated four-layer original characteristic diagram X _i The dimensions (i.e., number of channels x number of rows x number of columns) are: 64× (H/4) × (W/4), 128× (H/8) × (W/8), 320× (H/16) × (W/16), 512× (H/32) × (W/32).

And step S3, performing step-by-step fusion on the original feature images with different scales generated by PVT by using a parallel Sobel edge decoder to obtain a global prediction image. The parallel Sobel edge decoder module of the present application as shown in fig. 4 operates as follows:

as shown in FIG. 4, X in the figure ₁ 、X ₂ 、X ₃ 、X ₄ Is the original feature map of four levels of the PVT stage. We perform RFB operations on it in parallel, the specific operation of which is shown in fig. 5, in X ₁ For the purposes of illustration, X ₁ A convolution operation of four branches is required. The operation is as follows:

s301, the number of channels of the feature map is compressed by using a 1×1 convolution in the first branch, and for the sake of calculation, we compress the number of channels of all features to 32.

S302, in the second branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×3 and 3×1 and a convolution of 3×3 with a void ratio of 3, respectively.

S303, in the third branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×5 and 5×1 and a convolution of 3×3 with a void ratio of 5, respectively.

S304, in the fourth branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×7 and 7×1 and a convolution of 3×3 with a void ratio of 7, respectively, at a time.

S305, in the fourth branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×7 and 7×1 and a convolution of 3×3 with a void ratio of 7, respectively.

S306, splicing the feature graphs of the four branches in the channel dimension, and then compressing the number of channels of the feature graph by using 1X 1 convolution.

S307, the feature map and the original feature map compressed by the channel with the convolution of 1 multiplied by 1 are added pixel by pixel, and then are subjected to nonlinear operation of a ReLU activation function, and then Sobel operation is performed.

X ₂ 、X ₃ 、X ₄ The RFB operation of the feature map is also as above.

And performing gradient sharpening on the four processed feature maps by using a Sobel operator, adding pixel by pixel, performing 1X 1 convolution, and performing spatial up-sampling operation based on bilinear interpolation to generate an initial polyp segmentation global prediction map.

And S4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module. The multi-scale parallel cavity convolution attention module of the application is shown in fig. 6, and specifically operates as follows:

s401, respectively sending the original feature graphs of four layers of the PVT encoder into a multi-scale parallel cavity convolution attention module for further feature extraction, and firstly compressing a 1X 1 convolution channel to enable the number of the processed feature graph channels to be the number of the original feature graph channels

S402, uniformly dividing the processed feature map into four groups according to channels, respectively sending the four groups of feature maps into four branches for processing, namely respectively carrying out feature extraction by using 3X 3 convolution with void ratios of 1, 3, 5 and 7, and then carrying out channel splicing on processing results.

S403, carrying out 1×1 convolution on the characteristic diagram of channel splicing, and then sequentially carrying out nonlinear activation of batch normalization BN and ReLU functions.

S404, inputting the feature map obtained by the processing in S403 into a CBAM module, and further enhancing the attention of the feature map. Wherein the CBAM module mainly comprises two parts: channel attention and spatial attention. The channel attention module carries out importance allocation among channels with global significance on the feature map, so that redundant calculation is reduced; the spatial attention module then retains more effective characteristic information of the spatial locality by focusing on different degrees of spatial locality information. The feature map through the CBAM module is further optimized to obtain a more differentiated feature map. Finally, feature maps of four dimensions (i.e., the number of channels×the number of rows×the number of columns) of 64× (H/4), 128× (H/8) × (W/8), 320× (H/16) × (W/16), 512× (H/32) × (W/32) are generated.

Step S5, gradually guiding and gradually generating a multi-stage prediction graph by using the global prediction graph. The operation is the same as PraNet, and the specific operation is as follows:

s501, performing spatial downsampling on the global predictive map to enable the resolution of the global predictive map to be consistent with that of the feature map in the PVT fourth stage, then sending the global predictive map into an RA module to perform anti-attention operation to generate an attention map, then performing element-by-element multiplication operation on the global predictive map and the feature map in the PVT fourth stage, performing feature dimension reduction by using three 3X 3 convolutions, and further performing pixel-by-pixel addition on the global predictive map and the predictive map in the previous stage to generate the predictive map in the current stage.

S502, the prediction graph of the stage is sent to the next stage, and the same operation as that of the stage S501 is carried out to guide the generation of the final stage of feature graph.

The four feature maps generated in the S4 are guided step by step and the prediction map is generated step by using the global prediction map generated by the parallel Sobel edge decoder, and the specific operation is as follows:

s501: performing guide prediction by using the global prediction graph generated by S3 and the feature graph with the size of 512× (H/32) × (W/32) generated by S4, firstly performing space downsampling operation on the global prediction graph generated by S3, and reducing the size of the global prediction graph to the size of the feature graph of S4; then, the predicted map with a size of 1× (H/32) × (W/32) is generated by fusing with the S4 feature map. The guided prediction method can combine global and local information to more accurately segment polyps of colorectal mirror images.

S502: performing guided prediction on the feature map with the size of 320× (H/16) × (W/16) generated by S4 by using the prediction map generated by S501, and firstly expanding the size of the prediction map generated by S501 to the size of the feature map of S4 through a spatial upsampling operation; then fusing the S4 feature map to generate a prediction map with the size of 1× (H/16) × (W/16); then, the next stage is guided and predicted in order, and two polyp segmentation prediction maps with sizes of 1× (H/8) × (W/8) and 1× (H/4) × (W/4) are generated in order.

And S6, comparing the generated global prediction graph and the multi-stage prediction graph with a truth value graph, calculating the prediction loss, and obtaining the final polyp segmentation prediction graph by the final stage of prediction graph. The specific operation is as follows:

s601, performing spatial up-sampling operation of bilinear interpolation on the five prediction graphs, adjusting the size of the prediction graphs to be the same as the size of a truth graph corresponding to the input image, and calculating a mixing loss L composed of weighted BCE and weighted IOU _seg 。

The calculation method of the weighted BCE loss is as follows:

g represents a truth diagram, P represents a predictive diagram, and (x, y) represents an arbitrary pixel position in the image.

The weight coefficient ω (x, y) represents the importance of the pixel (x, y) by the following calculation method:

in the concrete calculation, we willSet to 5.

The weighted IOU loss calculation method is as follows:

the mixing loss function of the final prediction graph P relative to the truth graph G is:

wherein the method comprises the steps ofAnd->Is a weighted IOU penalty and a weighted BCE penalty.

To illustrate the effectiveness of the method, the method is trained using the Kvasir-SEG, CVC-clinical db data sets as training sets, then tested using the Kvasir-SEG, CVC-clinical db, CVC-ColonDB, ETIS, and CVC-T data sets, and the test results are compared with the polyp segmentation algorithm mainstream in the prior art. In the specific experimental process, based on a PyTorch 1.8 deep learning framework, training by using 1 NVIDIA RTX 2080Ti video card, wherein the video memory is 11G; the input image size was set to 352 x 352, the initial learning rate 1e-4, using an AdamW optimizer, the batch size was set to 6, and the total training round number was 100epochs. The test results at the Kvasir-SEG, CVC-clinic db dataset are shown in table 1:

TABLE 1 test results of the application and other 7 methods for polyp segmentation in Kvasir-SEG, CVC-ClinicDB datasets

TABLE 1

The experimental results in CVC-ClinicDB, CVC-ColonDB, ETIS data sets are shown in Table 2:

TABLE 2 test results of the application and other 7 methods for polyp segmentation in CVC-ClinicDB, CVC-ColonDB, ETIS dataset

TABLE 2

Among the 7 evaluation indexes, the first 2 are the evaluation indexes commonly used in semantic segmentation tasks, and the closer the values of the first 6 indexes are to 1, the better the segmentation result is; the 7 th index is not lower than 0, and the value is better as the value is close to 0.

Fig. 7 is a visual comparison chart of experimental results of the method and other methods according to the present application, and from the results, the polyp segmentation method according to the present application can obtain segmentation results with more accurate boundaries and more complete semantic structures.

Example two

The above embodiments are merely illustrative of the preferred embodiments of the present application, and the scope of the present application is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present application pertains are made without departing from the spirit of the present application, and all modifications and improvements fall within the scope of the present application as defined in the appended claims.

Claims

1. The PVT-based multi-scale polyp segmentation method is characterized by comprising the following steps of:

2. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S1, the method of preprocessing the colorectal scopic image comprises:

3. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein the step S2 of performing multi-scale feature extraction on the pre-processed colorectal mirror image using PVTv2 backbone network comprises:

4. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S3, the step-by-step fusion of the original feature map using a parallel Sobel edge decoder, the method for obtaining a global prediction map comprises:

5. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein the method of multi-receptive field feature extraction of the original feature map using a multi-scale parallel hole convolution attention module in S4 comprises:

s41: the original feature images of four layers of the PVT encoder are subjected to channel compression through a 1X 1 convolution to obtain the PVT encoder with the following characteristicsWith the number of channels of the original feature mapA multiple multichannel profile;

6. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S5, the method for gradually guiding the original feature map after multi-receptive field feature extraction using the global prediction map and gradually generating a multi-stage prediction map comprises:

7. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S6, the global prediction map and the multi-stage prediction map are compared with a truth map, a loss is calculated, and the obtained final stage prediction map is a final polyp segmentation prediction map, comprising:

the weighted BCE loss calculation method is defined as:

the weighted IOU penalty calculation method is defined as:

8. a PVT-based multi-scale polyp segmentation system, comprising: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;