CN116485811A

CN116485811A - Stomach pathological section gland segmentation method based on Swin-Unet model

Info

Publication number: CN116485811A
Application number: CN202310314667.8A
Authority: CN
Inventors: 潘景山; 周贺虎; 李娜; 葛菁; 王迪
Original assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-07-25

Abstract

A stomach pathological section gland segmentation method based on a Swin-Unet model belongs to the technical field of image processing, and improves the accuracy of model segmentation through a novel connecting module between codecs. And the cavity convolution is applied in the connection module, and the weights of different cavities are adjusted through the attention mechanism, so that the perception range of the model is controlled more finely. By designing the prognosis processing module, the model is clearer at the boundary of the prediction result.

Description

Stomach pathological section gland segmentation method based on Swin-Unet model

Technical Field

The invention relates to the technical field of image processing, in particular to a stomach pathological section gland segmentation method based on a Swin-Unet model.

Background

Gastric pathological section segmentation is a medical image processing technique aimed at segmenting gastric pathological section images into different regions so that doctors can better diagnose and treat diseases. The traditional segmentation method requires manual selection of features and parameters, is inefficient and is susceptible to subjective factors. In recent years, the development of deep learning technology makes an image segmentation method based on a convolutional neural network a research hotspot. The Swin-Unet network is a common image segmentation network, and has a structure similar to a self-encoder, and can perform feature extraction and pixel-level classification. In the stomach pathological section gland segmentation, the Swin-Unet network can effectively extract the characteristics of gland images and segment the gland images for a professional doctor to look into and analyze the illness state or put forward a treatment scheme. But only skip connections are used when feature fusion between codecs is performed and less attention is paid to the sharpness of the boundary. The definition of the boundary of the segmentation result is often the key aspect of the doctor.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a method for generating a predicted image with clearer boundary than an original predicted result image.

The technical scheme adopted for overcoming the technical problems is as follows:

a stomach pathological section gland segmentation method based on a Swin-Unet model comprises the following steps:

a) Obtaining stomach pathology images and json annotation files corresponding to the stomach pathology images;

b) Preprocessing stomach pathology images to obtain npz files;

c) Dividing all npz files into a training set and a testing set;

d) Establishing an image segmentation network, inputting a training set into the image segmentation network, and outputting to obtain a gastric pathological section gland segmentation result diagram;

e) Training an image segmentation network to obtain an optimized image segmentation network;

f) And (3) inputting the test set into an optimized image segmentation network, and outputting to obtain a final stomach pathological section gland segmentation result image output_final.

Further, in step a), a stomach pathology image is obtained from a stomach section summary classification in a pathology digital section cloud labeling platform.

Further, step b) comprises the steps of:

b-1) replacing a dataset module in the Swin-Unet model with a mydataset module, wherein the mydataset module comprises an image file, a json file and an npz file in Python, the image file comprises an image segmentation function, an image screening function, a label information visualization function and an image format conversion function, the json file comprises a label information cleaning function, a label segmentation function, a label screening function and a label format conversion function, and the npz file comprises a npz generation function and a file name generation function;

b-2) inputting the acquired json annotation file into an annotation information cleaning function in the json. Py file, renaming the file, removing 3dh in the file name, and obtaining the json annotation file which is formatted after screening out unnecessary annotation information;

b-3) inputting the acquired stomach pathology image into an image segmentation function in an image file, segmenting the stomach pathology image into images with the size of 512 x 512 by using an openlide module in Python, keeping the step pitch equal to the segmentation size, and storing the segmented images in an image folder of a project current catalog; b-4) inputting the json annotation files which are subjected to screening of unnecessary annotation information and formatting into an annotation segmentation function in the json file, segmenting the json annotation files into json annotation files with the size of 512 x 512, keeping the step distance equal to the segmentation size, and storing the segmented json annotation files in an animation folder of the current catalogue of the project;

b-5) traversing all images in an image folder through an image screening function in an image file, screening out corresponding images according to pixel coordinate information in a json annotation file corresponding to the images by calling a numpy module, a json module and an os module in python, storing the images in a select_image folder of a project current directory, traversing all json annotation files in an animation folder through an annotation screening function in the json file, screening out json annotation files with the length of a shape smaller than 4 in the json annotation files by calling a numpy module, a json module and an os module in python, and storing each json annotation file after screening in the select_json folder of the project current directory; b-6) in the annotation information visualization function of the image file, the annotation information is visualized on the screened image according to points in the annotation information by calling a PIL module, an os module, a numpy module and a json module in python, and the annotation information is stored in a visual_json folder of the current catalog of the project; b-7) in the image format conversion function of the image file, converting the image format into png format by calling PLI module and os module in python to traverse all images in the select_image folder, and storing the converted image format into tif_png folder of the current catalog of the project;

b-8) in the annotation format conversion function in the json file, traversing all json annotation files in the select_json folder by calling the json module and the cv2 module in the python, converting the json annotation files into an image in png format, setting the rgb value of the background pixel of the image to be (0, 0), setting the rgb value of other pixels of the image to be (0, 1), and storing the json annotation files converted into png format into the json_png folder of the current catalog of the project;

b-9) transmitting the paths of the tif_png folder and the paths of the json_png folder into a npz generating function of the npz.py file, wherein the npz generating function generates npz files for model training by calling a numpy module, a cv2 module and an os module in python and combining the images and json labeling files correspondingly, a first number group of the npz files is named as image, a second number group of the npz files is named as label, and all npz files are saved in the npz folders of the current catalogue of the project.

Further, step c) comprises the steps of:

c-1) dividing all npz files into a training set and a testing set according to the proportion of 9:1;

c-2) storing the training set data in a train_ npz folder under the current catalog of the project, and storing the test set data in a test_ npz folder under the current catalog of the project;

c-3) inputting the path of the train_ npz folder and the path of the test_ npz folder into the file name generating function of the npz.py file, wherein the file name generating function extracts the file names of all files in the train_ npz folder and stores the file names in the train.txt file of the current catalog of the project by calling the os module in python, and extracts the file names of all files in the test_ npz folder and stores the file names in the test.txt file of the current catalog of the project.

Further, step d) comprises the steps of:

d-1) the image segmentation network consists of an image downsampling unit, a bottleneck unit, an image upsampling unit, an attention connecting unit and a prognosis processing unit;

d-2) an image downsampling unit sequentially comprises a first downsampling module, a second downsampling module and a third downsampling module, wherein the first downsampling module sequentially comprises a Patch Partition layer and a Liear embedding layer, the second downsampling module comprises a Patch Merrgeing layer, the third downsampling module comprises a Patch Merrgeing layer, a first digital image of npz files of a training set is input into the first downsampling module, the image is reduced to H/4*W/4*C, an image downsampleis formed, H is the image height, W is the image width, C is the dimension, the image downsamples are input into a swin-transformation network, and the image downsamples 1 are output;

d-3) the attention connection unit sequentially comprises a hole convolution module and a multi-head attention mechanism, and an image downsample is input into the attention connection unit and output to obtain an image downsample_1;

d-4) inputting image Down sample1 to the thIn the two downsampling modules, the image downsamples 1 is reduced to H/8*W/8 x 2C to form the image downsamples ₁ Down sample the image ₁ Inputting the images into a swin-transformer network, and outputting the images to obtain an image 2;

d-5) downsampling the image ₁ Inputting the image into an attention connection unit, and outputting the image down sample_2;

d-6) inputting the image downsampled 2 into a third downsampling module, and reducing the image downsampled 2 to H/16W/16C to form the image downsampled ₂ Down sample the image ₂ Inputting the images into a swin-transformer network, and outputting the images to obtain an image down sample3;

d-7) downsampling the image ₂ Inputting the image into an attention connection unit, and outputting the image down sample_3;

d-8) the bottleneck unit is sequentially formed by two identical switch-transformation networks, the image downsamples 3 is input into the Patch merge layer to reduce the image downsamples 3 to H/32W/32C, and the image downsamples are formed ₃ Down sample the image ₃ Inputting the image into a bottleneck unit, and outputting the image down sample4;

d-9) inputting the image downsample4 into the Patch expansion layer to expand the image downsample4 to H/16W/16X 4C to form the image upsamples ₁ Image upsample ₁ Adding the image down sample_3, inputting the added image down sample_3 into a swin-transformer network, and outputting an obtained image up sample3;

d-10) the image up-sampling unit sequentially comprises a third up-sampling module, a second up-sampling module and a first up-sampling module, wherein the third up-sampling module is composed of a Patch Expanding layer, the second up-sampling module is composed of a Patch Expanding layer, the first up-sampling module is composed of a Patch Expanding layer, the image up sample3 is input into the third up-sampling module, and the image up sample3 is expanded to H/8*W/8 x 2C to form the image up sample ₂ Image upsample ₂ Adding the image down sample_2, inputting the added image down sample_2 into a swin-transformer network, and outputting the added image down sample_2;

d-11) inputting the image upsample2 into the second upsampling module, expanding the image upsample2 to H/4*W/4*C for forming an image upsample ₃ Image upsample ₃ Adding the image down sample_1, inputting the added image down sample_1 into a swin-transformer network, and outputting the added image down sample_1;

d-12) inputting the image upsample1 into the first upsampling module, expanding the image upsample1 to H W C, and forming the image upsample ₄ Image upsample ₄ Input into Linear Projection and image up sample ₄ Changing into H W CLASS to form an image output, wherein the CLASS is the classified category number;

d-13) the prognosis processing unit is composed of an image reading module and a conditional random field CRF module, wherein the image reading module is composed of a cv2 module and a nupy module in python, the conditional random field CRF module is composed of a pydensecrf module in python, and the image reading module of the prognosis processing unit is used for carrying out reasoning calculation on the image output after reading the image output to obtain a gastric pathological section gland segmentation result diagram.

Preferably, C in step d-2) has a value of 96.

Further, step e) comprises the steps of:

e-1) calculating to obtain total loss through a formula loss=a x cross EntropyLoss+b x diceLoss, wherein a and b are weight values, a+b=1, and the cross EntropyLoss and the diceLoss are loss functions in a swin-unet model;

e-2) training the image segmentation network through total loss by using an SGD optimizer to obtain an optimized image segmentation network.

Preferably, a has a value of 0.4 and b has a value of 0.6.

Further, in step e-2), the momentum factor of the SGD optimizer is set to 0.9, the weight attenuation weight_decay is set to 0.0001, the learning rate lr is set to the basic learning rate 0.01, when the image segmentation network is trained, the iterator DataLoader in python is used to iterate data, the parameter of the DataLoader is set to be batch_size, the buffer is set to True, the num_works is set to 0, the pin_memory is set to tube, 180 epochs are trained, one model weight result is saved for every ten epochs from half of the model training, the training is ended, the model parameters are saved, and the optimized image segmentation network is obtained.

The beneficial effects of the invention are as follows: the accuracy of model segmentation is improved, the cavity convolution is applied to the connection module, and the weights of different cavities are adjusted through the attention mechanism, so that the perception range of the model is controlled more finely. By designing the prognosis processing module, the model is clearer at the boundary of the prediction result.

Detailed Description

The present invention will be further described below.

a) And obtaining stomach pathology images and corresponding json annotation files.

b) Preprocessing the stomach pathology image to obtain npz file.

c) All npz files are divided into training and test sets.

d) And establishing an image segmentation network, inputting the training set into the image segmentation network, and outputting to obtain a gastric pathological section gland segmentation result graph.

e) Training the image segmentation network to obtain an optimized image segmentation network.

The jump connection between the encoder and the decoder in the original Swin-Unet model is replaced by a new connection module (attention connection module), so that the model has better improvement on the segmentation precision. And the awareness capability of the model is improved by using the cavity convolution in the attention connection module. The attention mechanism can help the model to pay more attention to important features so as to improve the accuracy of the model, and when the hole convolution is used, the weight of different holes is adjusted through the attention mechanism, so that the perception range of the model is controlled more finely. The attention connection module is beneficial to better fusing shallow features when the model up-samples and slowly restores the feature map size in the decoder, and the information of the original feature map is kept as much as possible. And a prediction result prognosis processing module is designed to perform further prognosis processing on the model prediction result, so that the model is clearer at the boundary of the prediction result, and the readability of the model prediction result is greatly improved.

In one embodiment of the invention, in step a) gastric pathology images are obtained from gastric slice summary classifications in a pathology digital slice cloud labeling platform. Since the pathological section image has extremely high resolution (80000×70000) and the image format cannot be directly input into the model for training, further segmentation and format modification processing are required for the image.

In one embodiment of the invention, step b) comprises the steps of:

b-1) replacing a dataset module in the Swin-Unet model with a mydataset module, wherein the mydataset module comprises an image file, a json file and an npz file in Python, the image file comprises an image segmentation function, an image screening function, a label information visualization function and an image format conversion function, the json file comprises a label information cleaning function, a label segmentation function, a label screening function and a label format conversion function, and the npz file comprises a npz generation function and a file name generation function.

b-2) inputting the obtained json annotation file into an annotation information cleaning function in the json.py file, renaming the file, removing 3dh in the file name, cleaning the json annotation file, and obtaining the json annotation file which is formatted and is screened out of unnecessary annotation information.

b-3) inputting the acquired stomach pathology image into an image segmentation function in an image file, segmenting the stomach pathology image into images with the size of 512 x 512 by using an openlide module in Python, keeping the step pitch identical to the segmentation size, and storing the segmented images in an image folder of a project current catalog. b-4) inputting the json annotation file which is subjected to screening and unnecessary annotation information formatting into an annotation segmentation function in the json file, segmenting the json annotation file into json annotation files with the size of 512 x 512, keeping the step distance equal to the segmentation size, and storing the segmented json annotation files in an animation folder of the current item catalog.

b-5) because the segmented image and the label also have a part of blank area (without any information), the data in the image folder and the animation folder need to be further screened. Therefore, all images in an image folder are traversed through an image screening function in an image file, a numpy module, a json module and an os module in python are called, corresponding images are screened out according to pixel coordinate information in json annotation files corresponding to the images and stored in a select_image folder of a current item catalog, all json annotation files in an animation folder are traversed through an annotation screening function in the json file, json annotation files with the value length of a value smaller than 4 in the json annotation files are screened out through the numpy module, the json module and the os module in python, and each json annotation file after screening is stored in the select_json folder of the current item catalog.

b-6) in the annotation information visualization function of the image file, the annotation information is visualized on the screened image according to points in the annotation information by calling a PIL module, an os module, a numpy module and a json module in python, and the annotation information is stored in a visual_json folder of the current catalog of the project, so that the model segmentation effect can be conveniently checked and evaluated.

b-7) in the image format conversion function of the image. Py file, the image format is converted to png format and saved to the tif_png folder of the project current directory by calling the PLI module and os module in python to traverse all images in the select_image folder.

b-8) in the annotation format conversion function in the json file, traversing all json annotation files in the select_json folder by calling the json module and the cv2 module in the python, converting the json annotation files into an image in png format, setting the rgb value of the background pixel of the image to be (0, 0), setting the rgb value of other pixels of the image to be (0, 1), and storing the json annotation files converted into png format in the json_png folder of the current catalog of the project.

In one embodiment of the invention, step c) comprises the steps of:

c-1) all npz files are divided into training sets and test sets according to the ratio of 9:1.

c-2) storing the training set data in the train_ npz folder under the current catalog of the project, and storing the test set data in the test_ npz folder under the current catalog of the project.

In one embodiment of the invention, step d) comprises the steps of:

d-1) the image segmentation network is composed of an image downsampling unit, a bottleneck unit, an image upsampling unit, an attention connecting unit and a prognosis processing unit.

d-2) the image downsampling unit sequentially comprises a first downsampling module, a second downsampling module and a third downsampling module, wherein the first downsampling module sequentially comprises a Patch Partition layer and a Liear embedding layer, the second downsampling module comprises a Patch merge layer, the third downsampling module comprises a Patch merge layer, a first digital image of npz files of a training set is input into the first downsampling module, the image is reduced to H/4*W/4*C, an image downsampleis formed, H is the image height, W is the image width, C is the dimension, the image downsampleis input into a switch-transform network, and the image downsamples 1 is output. The swin-transformer network is composed of a LayerNormer layer, a multi-head attention module, a quick connection and MLP with RELU nonlinearity in sequence, and the structure of the swin-transformer network is the prior art and is not repeated here. Further preferably, C has a value of 96.

d-3) the attention connection unit sequentially comprises a hole convolution module and a multi-head attention mechanism, and an image downsample is input into the attention connection unit and output to obtain an image downsample_1.

d-4) inputting the image downsampled 1 into a second downsampling module, reducing the image downsampled 1 to H/8*W/8 x 2C to form the image downsampled ₁ Down sample the image ₁ And inputting the images into a swin-transformer network, and outputting the images to obtain the image 2.

d-5) downsampling the image ₁ Input to the attention connection unit, and output the obtained image downsample_2.

d-6) inputting the image downsampled 2 into a third downsampling module, and reducing the image downsampled 2 to H/16W/16C to form the image downsampled ₂ Down sample the image ₂ And inputting the images into a swin-transformer network, and outputting the images to obtain the image down sample3.

d-7) downsampling the image ₂ Input to the attention connection unit, and output the obtained image downsample_3.

d-8) the bottleneck unit is sequentially formed by two identical switch-transformation networks, the image downsamples 3 is input into the Patch merge layer to reduce the image downsamples 3 to H/32W/32C, and the image downsamples are formed ₃ Down sample the image ₃ And inputting the image into a bottleneck unit, and outputting the image downsample4.

d-9) inputting the image downsample4 into the Patch expansion layer to expand the image downsample4 to H/16W/16X 4C to form the image upsamples ₁ Image upsample ₁ And adding the image down sample_3, inputting the added image down sample_3 into a swin-transformer network, and outputting the added image down sample_3.

d-10) the image up-sampling unit is sequentially composed of a third up-sampling module, a second up-sampling module and a first up-sampling moduleThe sampling module comprises a third up-sampling module which is composed of a Patch expansion layer, a second up-sampling module which is composed of a Patch expansion layer, a first up-sampling module which is composed of a Patch expansion layer, an image up sample3 which is input into the third up-sampling module, and the image up sample3 which is expanded to H/8*W/8 x 2C, so as to form the image up sample ₂ Image upsample ₂ And adding the image down sample_2, inputting the added image down sample_2 into a swin-transformer network, and outputting the added image down sample_2.

d-11) inputting the image upsample2 into the second upsampling module, expanding the image upsample2 to H/4*W/4*C, forming the image upsample ₃ Image upsample ₃ And adding the image down sample_1, inputting the added image down sample_1 into a swin-transformer network, and outputting the added image down sample_1.

d-12) inputting the image upsample1 into the first upsampling module, expanding the image upsample1 to H W C, and forming the image upsample ₄ Image upsample ₄ Input into Linear Projection and image up sample ₄ Changing into H W CLASS, and forming an image output, wherein the CLASS is the classified category number.

In one embodiment of the invention, step e) comprises the steps of:

e-1) calculating to obtain total loss through a formula loss=a x cross EntropyLoss+b x DiceLoss, wherein a and b are weight values, a+b=1, and the cross EntropyLoss and the DiceLoss are loss functions in a swin-unet model. Further preferably, a has a value of 0.4 and b has a value of 0.6.

e-2) training the image segmentation network through total loss by using an SGD optimizer to obtain an optimized image segmentation network. The momentum factor momentum of the SGD optimizer is set to 0.9, the weight attenuation weight_decay is set to 0.0001, the learning rate lr is set to be the basic learning rate 0.01, an iterator DataLoader in python is used for iterating data when the image segmentation network is trained, the parameter of the DataLoader is set to be the batch_size and is set to be 12, the buffer is set to True, the num_works is set to be 0, the pin_memory is set to be Ture, 180 epochs are trained, one model weight result is saved every ten epochs from half of model training, training is finished, and model parameters are saved, so that the optimized image segmentation network is obtained. Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The stomach pathological section gland segmentation method based on the Swin-Unet model is characterized by comprising the following steps of:

b) Preprocessing stomach pathology images to obtain npz files;

c) Dividing all npz files into a training set and a testing set;

2. The gastric pathological section gland segmentation method based on the Swin-Unet model according to claim 1, wherein the method comprises the following steps: and in the step a), obtaining stomach pathology images from stomach section outline classification in a pathology digital section cloud labeling platform.

3. The method for segmenting gastric pathological segments glands based on the Swin-Unet model according to claim 1, wherein the step b) comprises the following steps:

4. A method for segmentation of gastric pathological section glands based on the Swin-Unet model according to claim 3, wherein step c) comprises the steps of:

5. The method for segmenting gastric pathological segments glands according to claim 4, wherein the step d) comprises the following steps:

d-4) inputting the image downsampled 1 into a second downsampling module, reducing the image downsampled 1 to H/8*W/8 x 2C to form the image downsampled ₁ Down sample the image ₁ Inputting the images into a swin-transformer network, and outputting the images to obtain an image 2;

d-11) inputting the image upsample2 into the second upsampling module, expanding the image upsample2 to H/4*W/4*C, forming the image upsample ₃ Image upsample ₃ Adding with the image downsampled_1, inputting into a swin-transformer network, inputtingObtaining an image upsample1;

6. The gastric pathological section gland segmentation method based on the Swin-Unet model according to claim 1, wherein the method comprises the following steps: in step d-2) the value of C is 96.

7. The method for segmenting gastric pathological sections glands according to claim 1, wherein step e) comprises the steps of:

8. The gastric pathological section gland segmentation method based on the Swin-Unet model according to claim 7, wherein the method comprises the following steps: a is 0.4 and b is 0.6.

9. The gastric pathological section gland segmentation method based on the Swin-Unet model according to claim 7, wherein the method comprises the following steps: in the step e-2), the momentum factor momentum of the SGD optimizer is set to 0.9, the weight attenuation weight_decay is set to 0.0001, the learning rate lr is set to be the basic learning rate 0.01, when the image segmentation network is trained, an iterator DataLoader in python is used for iterating data, the parameter of the DataLoader is set to be the batch_size and is set to 12, the buffer is set to True, the num_works is set to 0, the pin_memory is set to be Ture, 180 epochs are trained, one half of the model weight result is saved for every ten epochs from model training, the training is finished, and the model parameters are saved, so that the optimized image segmentation network is obtained.