CN115641344A

CN115641344A - Method for segmenting optic disc image in fundus image

Info

Publication number: CN115641344A
Application number: CN202211269330.1A
Authority: CN
Inventors: 高阳; 宋宠宠; 王德峰; 宁晓琳; 刘禹辰; 刘展易
Original assignee: Hangzhou Innovation Research Institute of Beihang University
Current assignee: Hangzhou Innovation Research Institute of Beihang University
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2023-01-24

Abstract

The invention relates to a method for segmenting a video disc image in an eye fundus image, which comprises the following steps: s1, obtaining a fundus image with enhanced data quality based on a generation countermeasure network for a fundus image to be segmented; s2, performing primary image segmentation on the fundus image with enhanced data quality based on a first image segmentation network to obtain a first binary image; s3, cutting, resampling and classifying the fundus image to be segmented based on the first binary image and the image classification network to obtain a mother video disc image; s4, for the parent video disc image, obtaining a fine video disc image by network segmentation based on a second image segmentation; pre-training to generate a target style data set used by the countermeasure network, wherein the target style data set is an image set obtained based on a second public data set; the training data used to pre-train the first image segmentation network is a set of images derived based on the second public data set. The segmentation method has better generalization performance and segmentation precision.

Description

Method for segmenting optic disc image in fundus image

Technical Field

The invention relates to the technical field of medical image segmentation, in particular to a method for segmenting a optic disc image in an eye fundus image.

Background

Medical image segmentation is a task based on computer-aided diagnosis, and aims to accurately identify and segment organ regions, tissue regions, lesion regions, tumor regions and the like from a pixel level, and an accurate segmentation result is an important basis for diagnosis and treatment of some diseases, wherein the accurate optic disc and optic cup image segmentation result is one of main standards for clinical screening and diagnosis of glaucoma, and the eye fundus blood vessel image segmentation result based on the optic disc image and geometric morphological parameters thereof are important parameters for researching neurodegenerative diseases such as alzheimer disease. Therefore, the precise segmentation of the optic disc image in the fundus image has great significance for clinical medical diagnosis and research. With the improvement of an artificial intelligence algorithm, the image segmentation method based on deep learning can utilize the characteristics of a training sample learning color fundus image to perform end-to-end image segmentation. One basic assumption of deep learning is that the data distribution between data samples is independent and identically distributed, that is, the data distribution between the training sample and the test sample should be as similar as possible, so that the generalization performance of the model may reach the ideal state. In practice, however, different acquisition devices and acquisition modes may cause fundus images to exhibit relatively large differences in resolution, size, contrast, sharpness, etc. and we refer to such differences between data sets as data domain migration.

Generally, an image segmentation network based on deep learning is trained on a certain data set only, and can obtain good reasoning performance on a corresponding test set, but not have particularly good generalization performance on all data sets. Due to the existence of the data field migration phenomenon, especially when the difference between the image to be segmented and the data set image used for training is obvious or the contrast of the image to be segmented is poor, the reasoning capability of the image segmentation network is rapidly reduced, so that the image segmentation network cannot obtain an effective segmentation result. In addition, the regional characteristics (e.g., luminance characteristics) of the pathological regions of the patient's fundus are extremely similar to those of the optic disc image, which also interferes with the reasoning ability of the image segmentation network to incorrectly segment these pathological regions into optic disc regions, which is very disadvantageous for subsequent parameter calculation and extraction of relevant regions.

Disclosure of Invention

Technical problem to be solved

In view of the above disadvantages and shortcomings of the prior art, the present invention provides a method for segmenting a video disc image in a fundus image, which solves the technical problems of the existing image segmentation network that the reasoning performance is reduced when a data domain migration phenomenon exists, and the segmentation precision of the fundus image in a pathological region is low.

(II) technical scheme

In order to achieve the purpose, the invention adopts the main technical scheme that:

the embodiment of the invention provides a method for segmenting a video disc image in an eye fundus image, which comprises the following steps:

s1, performing data quality enhancement processing on a fundus image to be segmented on the basis of a generation countermeasure network to obtain a fundus image with enhanced data quality;

s2, performing primary image segmentation on the fundus image with enhanced data quality based on a first image segmentation network to obtain a first binary image containing at least one connected domain, wherein the connected domain is used for marking the position information of the optic disc image;

s3, based on the position information of each connected domain in the first binary image, cutting the region corresponding to the connected domain in the fundus image to be segmented to obtain one or more first images to be selected, wherein at least one image in the first images to be selected comprises a complete optic disc image; resampling the first image to be selected according to a second preset size to obtain a second image to be selected corresponding to the first image to be selected; classifying the second image to be selected according to whether the second image to be selected contains the complete video disc image or not based on an image classification network, and taking the first image to be selected corresponding to the second image to be selected containing the complete video disc image as a parent video disc image based on a classification result;

s4, performing secondary image segmentation on the parent video disc image based on a second image segmentation network, and extracting a fine video disc image from the parent video disc image based on a secondary image segmentation result;

the training data used for pre-training the generated confrontation network comprises an original style data set and a target style data set, wherein the original style data set is an image set obtained from a first public data set through preprocessing, and the target style data set is an image set obtained from a second public data set through preprocessing; the training data used for pre-training the first image segmentation network is an image set obtained by obtaining from a second public data set and performing pre-processing.

According to the segmentation method provided by the embodiment of the invention, firstly, data quality enhancement is carried out on the fundus images to be segmented based on the generation countermeasure network in S1, so that the data style of the images to be segmented is transferred to the data style similar to the second public data set, and the first image segmentation network in S2 is obtained by image training acquired based on the second public data set, so that the fundus images with the same data style and enhanced data quality have better reasoning capability, the influence of the data domain transfer phenomenon existing in the fundus images to be segmented on the first image segmentation network is weakened, the generalization performance of the segmentation method across the data style is improved, and the robustness of the segmentation method is improved.

In addition, the segmentation method provided by the embodiment of the invention obtains the first binary image based on the first image segmentation network in the S2. Theoretically, the connected domain in the first binary image should only mark the position information of the optic disc image, but the connected domain may also mark the position information of the pathological image because the existing image segmentation network has poor capability of distinguishing the optic disc image from the pathological image. Therefore, in S3, a small-sized first candidate image corresponding to each connected domain is first cropped, where some of the first candidate images include the video disc image and some of the first candidate images include the pathological area image. The first image to be selected is resampled to obtain a second image to be selected corresponding to the first image to be selected, the size of the image classification network is input in a unified mode, the influence of non-unified image size on the inference performance of the image classification network is reduced, the image classification network is used for classifying the second image to be selected according to whether the second image to be selected contains a complete video disc image, the first image to be selected corresponding to the second image to be selected containing the complete video disc image is used as a parent video disc image based on the classification result, the size of the parent video disc image is far smaller than that of a fundus image to be segmented, therefore, the area of the second image segmentation network, which needs to be subjected to inference analysis, is reduced, the interference of a diseased area on the second image segmentation network is eliminated, and then the parent video disc image is input into the second image segmentation network in S4 to be finely segmented, and the purpose of improving the segmentation precision of the segmentation method is achieved.

In the steps of S1-S4, S2 overcomes the data field migration phenomenon existing in the fundus image to be segmented based on S1, and finds out the area where the optic disc image possibly exists, S3 cuts, resamples and classifies the area where the optic disc image possibly exists based on the area found out in S2, finds out the small-size mother optic disc image containing the complete optic disc image, and then sends the small-size mother optic disc image into a second image segmentation network for secondary image segmentation, so as to obtain the fine optic disc image. The loops of S1-S4 are buckled, so that a better reasoning performance can be generated for fundus images with a data domain migration phenomenon, particularly a pathologic region, the generalization performance and the segmentation precision of the segmentation method are ensured, and the segmentation method has better robustness.

Optionally, in S4, the second image segmentation network is a W-shaped segmentation network with adaptive weight parameters, which is constructed in advance and obtained through a first pre-training process,

the W-shaped segmentation network comprises a down-sampling branch, a first up-sampling branch and a second up-sampling branch, wherein the down-sampling branch is used for acquiring a characteristic diagram of at least one dimensionality of a mother visual image, the first up-sampling branch is used for extracting boundary information in the characteristic diagram of the at least one dimensionality, and the second up-sampling branch is used for extracting region information in the characteristic diagram of the at least one dimensionality and is fused with the boundary information extracted from the corresponding dimensionality to obtain a secondary image segmentation result.

Optionally, the W-shaped segmentation network includes 15 processing modules, where the 1~4 processing modules connected in sequence are down-sampling branches, the 5~9 processing modules connected in sequence are first up-sampling branches, and the 10 th to 15 th processing modules connected in sequence are second up-sampling branches; the output end of the 4 th processing module is respectively connected with the input end of the 5 th processing module and the input end of the 10 th processing module;

wherein the content of the first and second substances,

the 1~4 processing modules are all down-sampling modules;

the 5 th processing module is a DAC module;

the 6 th processing module is an RMP module;

the 7 th processing module is an up-sampling module;

the 8 th processing module is an up-sampling module, and the input end of the 8 th processing module is connected with the output ends of the 3 rd processing module and the 7 th processing module through a jump connection module;

the 9 th processing module is an up-sampling module, and the input end of the 9 th processing module is connected with the output ends of the 2 nd processing module and the 8 th processing module through a jump connection module;

the 10 th processing module is a DAC module;

the 11 th processing module is an RMP module;

the 12 th processing module is an up-sampling module, and the input end of the 12 th processing module is connected with the output ends of the 6 th processing module and the 11 th processing module through a jump connection module;

the 13 th processing module is an up-sampling module, and the input end of the 13 th processing module is respectively connected with the output ends of the 12 th processing module, the 3 rd processing module and the 7 th processing module through a jump connection module;

the 14 th processing module is an up-sampling module, and the input end of the 14 th processing module is respectively connected with the output ends of the 13 th processing module, the 2 nd processing module and the 8 th processing module through a jump connection module;

the 15 th processing module is an up-sampling module, the input end of the 15 th processing module is respectively connected with the output ends of the 14 th processing module, the 1 st processing module and the 9 th processing module through a jump connection module, and the 15 th processing module is used for outputting a secondary image segmentation result.

Optionally, the first pre-training process comprises:

acquiring first training data, and training the W-shaped segmentation network to be trained by adopting a loss function based on a boundary for a first up-sampling branch and a loss function based on a region for a second up-sampling branch based on the first training data to obtain the W-shaped segmentation network with adaptive weight parameters;

the first training data comprises a plurality of first images containing video disc images, and each first image comprises marking information for marking the area of the video disc image on the first image.

Optionally, the training the W-shaped segmentation network to be trained by using the boundary-based loss function for the first upsampling branch and the region-based loss function for the second upsampling branch includes:

the output end of the 9 th processing module is connected with an up-sampling module and used for outputting a boundary segmentation result obtained after the processing of the first up-sampling branch circuit;

inputting a first image in first training data into a W-shaped segmentation network to be trained to obtain a boundary segmentation result and a secondary image segmentation result;

calculating boundary loss according to a loss function based on the boundary segmentation result and the labeling information contained in the first image; calculating the regional loss according to a loss function based on the region based on the result of the secondary image segmentation and the labeling information contained in the first image;

performing weighting operation based on the boundary loss and the regional loss to obtain an overall loss, and adjusting the weight parameters of the W-shaped segmentation network based on the overall loss to obtain the W-shaped segmentation network with adaptive weight parameters;

the boundary-based penalty function is:

where BL is a boundary loss, G is a region corresponding to the label information in the first image,

a value at point q representing a positive result in the boundary segmentation results, the

According to the formula (2), the method can be obtained,

the formula (2) is

In the formula (I), the compound is shown in the specification,

and the shortest distance from the point q to the edge of the area corresponding to the marking information is shown.

Optionally, the S1 further includes performing background removal processing on the fundus image to be segmented before inputting the fundus image to be segmented into the generation countermeasure network;

the background removing process comprises the following steps:

converting an image to be subjected to background removal processing into a gray image; based on a threshold value method, updating the gray value of the pixel point with the gray value larger than 5 in the gray image to be 255, and updating the gray value of the pixel point with the gray value smaller than 5 to be 0 to obtain a second binary image; and extracting an intermediate image of a corresponding region in the image to be subjected to background removal processing based on the region information of the largest connected domain in the second binary image, and resampling the intermediate image according to a first preset size to obtain the image subjected to background removal processing.

Optionally, the first public data set is an IDRiD data set, and the second public data set is a REFUSE data set;

in S1, the generated countermeasure network is a Cycle-GAN network with adaptive weight parameters obtained through a second pre-training process;

the second pre-training process comprises: acquiring second training data, and training the Cycle-GAN network to be trained on the basis of the second training data to obtain the Cycle-GAN network with adaptive weight parameters;

the second training data comprises a raw style data set and a target style data set, wherein,

the original style data set comprises a plurality of images which are obtained from the IDRiD data set and subjected to background removal processing; the target style dataset includes a plurality of images obtained from a validity dataset of the REFUSE dataset and subjected to background removal processing.

Optionally, in S2, the first image segmentation network is a U-Net network with adaptive weight parameters obtained through a third pre-training process;

the third pre-training process comprises: acquiring third training data, and training the U-Net network to be trained based on the third training data to obtain the U-Net network with adaptive weight parameters;

the obtaining third training data comprises: respectively acquiring a plurality of original images from a Train data set and a Validation data set of a REFUSE data set, wherein the original images comprise marking information for marking the video disc image area; performing data expansion processing on the original image to obtain original training data; performing background removal processing on the image in the original training data to obtain third training data;

the data expansion processing comprises at least one of image rotation processing, image color random transformation processing and image random noise adding processing.

Optionally, in S3, the cropping, based on the position information of each connected component in the first binary image, a region corresponding to the connected component in the fundus image to be segmented, and obtaining one or more first images to be selected includes:

for each connected component in the first binary image, calculating the geometric center of the connected component; and determining a first square target area by taking the corresponding position of the geometric center in the fundus image to be segmented as the center and 2/5 of the length of the short side of the fundus image to be segmented as the length of the side for the fundus image to be segmented, and cutting the fundus image to be segmented based on the first square target area to obtain one or more first images to be selected.

Optionally, in S3, the image classification network is a ResNet50 network with adaptive weight parameters obtained through a fourth pre-training process;

the fourth pre-training process comprises: acquiring fourth training data, and training the ResNet50 network to be trained on the basis of the fourth training data to obtain the ResNet50 network with adaptive weight parameters;

the fourth training data comprises a positive case set and a negative case set, and the obtaining fourth training data comprises: cutting the images in the original training data, taking the images which are cut and contain the complete optic disc images as a positive example set, and taking the images which are cut and contain the optic disc images and non-optic disc images or the images which do not contain the optic disc images as a negative example set;

the clipping process includes:

the method comprises the steps of scaling and converting an image to be cut into an image with the size of 128 pixels multiplied by 128 pixels, sliding on the image with the size of 128 pixels multiplied by 128 pixels by using a cutting frame with the size of 32 pixels multiplied by 32 pixels and with the size of 8 pixels as a step length, determining a plurality of cutting areas, mapping the cutting areas to the image to be cut through inverse scaling conversion and cutting to obtain a plurality of sub-images, and resampling the sub-images according to a second preset size to obtain the image with the cutting processing completed.

(III) advantageous effects

In the embodiment of the invention, firstly, the data quality of the fundus image to be segmented is enhanced based on the generation countermeasure network in S1, so that the data style of the image to be segmented is transferred to the data style similar to the second public data set, and the first image segmentation network in S2 is obtained by training the image acquired from the second public data set, so that the fundus image with the same data style and enhanced data quality has better reasoning capability, the influence of the data field transfer phenomenon existing in the fundus image to be segmented on the first image segmentation network is weakened, the generalization performance of the segmentation method across the data style is improved, and the robustness of the segmentation method is further improved.

The segmentation method provided by the embodiment of the invention is also based on the steps S2 and S3 to cut, resample and classify the area where the optic disc image possibly exists to obtain the small-size parent optic disc image containing the complete optic disc image, and then the parent optic disc image is input into the second image segmentation network for fine segmentation in S4, so that the interference of the ill-conditioned area is eliminated, and the segmentation precision of the segmentation method is effectively improved.

In the steps S1-S4, S2 overcomes the data field migration phenomenon existing in the fundus image to be segmented based on S1, an area where the optic disc image possibly exists is found out, and S3 cuts, resamples and classifies the area where the optic disc image possibly exists based on the area found in S2, finds and determines a parent optic disc image containing the optic disc image, and then sends the parent optic disc image to a second image segmentation network for secondary image segmentation to obtain a fine optic disc image. The loops of S1-S4 are buckled, so that a better reasoning performance can be generated for fundus images with a data domain migration phenomenon, particularly a pathologic region, the generalization performance and the segmentation precision of the segmentation method are ensured, and the segmentation method has better robustness.

In addition, the segmentation method of the embodiment of the invention also provides a W-shaped segmentation network for fine segmentation, wherein a first up-sampling branch of the network extracts boundary information in feature maps of multiple dimensions in a down-sampling branch, and a second up-sampling branch of the network extracts region information in the feature maps of the multiple dimensions in the down-sampling branch and fuses with the boundary information extracted from the same dimension, so that a video disc image with clearer and smoother edge is obtained, and the segmentation precision of the segmentation method is further improved.

Drawings

Fig. 1 is a schematic flowchart of a method for segmenting a video disc image in a fundus image according to an embodiment;

FIG. 2 is a schematic diagram of a model framework of a Cycle-GAN network in an embodiment;

FIG. 3 is a schematic diagram of a model framework of a W-shaped split network in an embodiment.

Detailed Description

In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Example one

As shown in fig. 1, the present embodiment provides a method for segmenting a video disc image in a fundus image, which is used for extracting the video disc image from the fundus image, and the method of the present embodiment can be implemented on any computer device, and the method of the present embodiment includes:

s1, performing data quality enhancement processing on the fundus image to be segmented based on a generation countermeasure network to obtain a fundus image with enhanced data quality.

The generation countermeasure network in this step is used to migrate the data style of the fundus image to be segmented so as to improve the inference performance of the first image segmentation network in step S2, but at the same time, the overall structure of the fundus image cannot be changed too much so as to distort the image, and preferably, the generation countermeasure network may be a Cycle-GAN network (cyclic generation countermeasure network).

As shown in fig. 2, the model framework of the Cycle-GAN network has two generators GA, GB, and two discriminators DA, DB; the generator GA is used for generating a true A-domain image with an original data style in an original style data set into a false B-domain image with a target data style, the generator GB is used for regenerating the false B-domain image into a false A-domain image with the original data style, the discriminator DA is used for judging whether the false B-domain image generated by the generator GA is an image in a real target style data set or not, and the discriminator DB is used for judging whether the false A-domain image generated by the generator GB is an image in a real original style data set or not; the above process is iterated, along with the iteration, the discrimination capability of the two discriminators on the real image is stronger and stronger, the quality of the images generated by the two generators is better and better, and the judgment of the two discriminators can be confused, so that the aim of transferring the data style of the fundus image to be segmented can be achieved by resisting the iteration.

And S2, performing primary image segmentation on the fundus image with enhanced data quality based on a first image segmentation network to obtain a first binary image containing at least one connected domain, wherein the connected domain is used for marking the position information of the optic disc image.

Theoretically, the number of connected domains in the first binary image should be only 1, and the connected domains correspond to the positions of the optic disc images, but since the characteristics of the pathological region of the patient fundus are extremely similar to those of the optic disc images, the first image segmentation network has poor capability of distinguishing the optic disc images from the pathological images, so that the number of connected domains in the first binary image is not less than one, and the extra connected domains mark the position information of the pathological images. Therefore, the following step S3 is required to perform the cropping processing and the classification processing, and obtain the parent disc image including the complete disc image.

It should be noted that the first image segmentation network of this step substantially provides a rough segmentation result for the subsequent second image segmentation network, so that it is not necessary to pursue good segmentation performance, but it needs to have extremely strong robustness to extract images with different qualities.

S3, based on the position information of each connected domain in the first binary image, cutting the region corresponding to the connected domain in the fundus image to be segmented to obtain one or more first images to be selected, wherein at least one image in the first images to be selected comprises a complete optic disc image; resampling the first image to be selected according to a second preset size to obtain a second image to be selected corresponding to the first image to be selected; and classifying the second image to be selected according to whether the second image to be selected contains the complete optic disc image or not based on the image classification network, and taking the first image to be selected corresponding to the second image to be selected containing the complete optic disc image as the parent optic disc image based on the classification result.

It should be noted that, because there is more than one connected domain in the first binary image described in step S2, at least one image of the one or more small-sized first candidate images obtained by clipping in this step includes a complete video disc image, and the other first candidate images include a pathological area image. At this time, resampling processing is carried out on the first image to be selected, a second image to be selected corresponding to the first image to be selected is obtained, the size of the image classification network is input in a unified mode, the influence of non-unified image size on the image classification network reasoning performance is reduced, then the image classification network is utilized, the second image to be selected is classified according to whether the second image to be selected contains the complete optic disc image, the first image to be selected corresponding to the second image to be selected containing the complete optic disc image is used as the parent optic disc image, and secondary image segmentation is carried out in the step S4.

The combination of the step S2 and the step S3 can eliminate some pathological areas in the fundus image to be segmented in advance, and then input into the second image segmentation network in the step S4 to perform secondary image segmentation, so that the interference of the pathological areas on the second image segmentation network is reduced, and the segmentation precision of the segmentation method is improved.

Preferably, the image classification network may be a ResNet50 network, where the ResNet50 network includes a plurality of residual network units connected across layers, and can better distinguish the features of the optic disc region from the features of the pathological region, so as to accurately identify an image including a complete optic disc image from a second image to be selected.

And S4, performing secondary image segmentation on the parent video disc image based on the second image segmentation network, and extracting a fine video disc image from the parent video disc image based on the result of the secondary image segmentation.

Without loss of generality, the neural networks used in each step in this embodiment are obtained through pre-training, where training data used in the pre-training process of each neural network may affect the generalization performance of the segmentation method, and therefore, the segmentation method performs the following settings on the training data used by each neural network:

the training data used for pre-training the generated confrontation network comprises an original style data set and a target style data set, wherein the original style data set is an image set obtained from a first public data set through preprocessing, and the target style data set is an image set obtained from a second public data set through preprocessing; the training data used for pre-training the first image segmentation network is an image set obtained from a second public data set and subjected to pre-processing.

Based on the setting mode of the training data used by the generation countermeasure network and the first image segmentation network, the fundus image to be segmented in the step S1 is subjected to data quality enhancement based on the generation countermeasure network, so that the data style of the image to be segmented is transferred to the data style similar to the second public data set, and then the first image segmentation network in the step S2 is obtained by image training obtained based on the second public data set, so that the fundus image with the same data style and enhanced data quality has better reasoning capability, the influence of the data domain transfer phenomenon existing in the fundus image to be segmented on the segmentation method is weakened, the generalization performance of the segmentation method across the data styles is improved, and the robustness of the segmentation method is improved.

In the steps of S1-S4, S2 overcomes the data field migration phenomenon existing in the fundus image to be segmented based on S1, and finds out the area where the optic disc image possibly exists, S3 performs cutting, resampling and classification based on the area where the optic disc image possibly exists found in S2, and finds out the small-size mother optic disc image containing the complete optic disc image, and then sends the small-size mother optic disc image into a second image segmentation network to perform secondary image segmentation, so as to obtain the fine optic disc image. The loops of S1-S4 are buckled, so that a better reasoning performance can be generated for fundus images with a data domain migration phenomenon, particularly a pathologic region, the generalization performance and the segmentation precision of the segmentation method are ensured, and the segmentation method has better robustness.

Example two

In step S4 of the first embodiment, the inference performance of the second image segmentation network for secondary image segmentation directly affects the final segmentation accuracy of the segmentation method, so that the segmentation method of the present embodiment provides a W-shaped segmentation network with better segmentation accuracy, so as to obtain a video disc image with sharper and smoother edges.

The W-shaped segmentation network comprises a down-sampling branch, a first up-sampling branch and a second up-sampling branch, wherein the down-sampling branch is used for obtaining a feature map of at least one dimension of a parent video disc image, the first up-sampling branch is used for extracting boundary information in the feature map of the at least one dimension, and the second up-sampling branch is used for extracting region information in the feature map of the at least one dimension and is fused with the boundary information extracted from the same dimension to obtain a secondary image segmentation result.

As shown in fig. 3, in a preferred embodiment of the present invention, the W-shaped segmentation network includes 15 processing modules, wherein the 1~4 processing modules connected in sequence are down-sampling branches, the 5~9 processing modules connected in sequence are first up-sampling branches, and the 10 th to 15 th processing modules connected in sequence are second up-sampling branches; the output end of the 4 th processing module is respectively connected with the input end of the 5 th processing module and the input end of the 10 th processing module;

wherein, the first and the second end of the pipe are connected with each other,

the 1~4 processing modules are all down-sampling modules;

the 5 th processing module is a DAC module;

the 6 th processing module is an RMP module;

the 7 th processing module is an up-sampling module;

the 10 th processing module is a DAC module;

the 11 th processing module is an RMP module;

the 15 th processing module is an up-sampling module, the input end of the 15 th processing module is connected with the output ends of the 14 th processing module, the 1 st processing module and the 9 th processing module through a jump connection module, and the 15 th processing module is used for outputting a secondary image segmentation result.

In addition, in order to facilitate the training of the W-shaped segmentation network to be trained by adopting the boundary-based loss function for the first upsampling branch, an upsampling module is connected to the output end of the 9 th processing module and is used for outputting a boundary segmentation result obtained after the processing by the first upsampling branch.

In the W-shaped partition network with 15 processing modules, the specific structures and connection modes of the down-sampling Module, the DAC Module (densely connected hole Convolution Module), the RMP Module (Residual Multi-Kernel Pooling Module), the up-sampling Module, and the jump connection Module are all implemented by using the prior art. Particularly, the output end of the up-sampling module for outputting the boundary segmentation result is provided with a softmax layer for outputting the boundary segmentation result obtained after the processing of the first up-sampling branch; the output end of the up-sampling module of the 15 th processing module is also provided with a softmax layer for outputting a secondary image segmentation result of the W-shaped segmentation network; the down-sampling module used by the 1 st processing module is used to receive the input image and generate a multi-channel feature map, typically a convolution module.

In the W-shaped segmentation network with 15 processing modules, the feature map containing the boundary features extracted by each up-sampling module in the first up-sampling branch is transmitted to the second up-sampling branch through the jump connection module, and is input into the corresponding up-sampling module together with the feature map containing the region features extracted by the second up-sampling branch, so that the fusion of the boundary features and the region features of the optic disc image is realized, the edge of the optic disc image region in the segmentation result of the W-shaped segmentation network is clearer and smoother, and the segmentation precision of the segmentation method is improved.

It should be noted that the DAC module has 4 cascaded branches, where the proportion of the hole convolution is gradually increased in each branch, and the convolution kernel fields of each branch are respectively 3, 7, 9, and 19. Finally, in each branch, a1 × 1 convolution is used for feature channel sorting. With the structure, if a large cavity proportion is used, the receptive field of the network can be improved on the premise of hardly increasing the calculated amount, so that the extracted features are more comprehensive; convolution with a smaller void fraction can extract more subtle objects. By combining the convolutions of different void ratios, the DAC module can extract the features of objects of different sizes. In addition, the RMP module mainly depends on the receptive fields formed by a plurality of effective fields to detect objects with different sizes, and the size of the receptive field roughly determines how much context information is used. Typically, the max pooling operation uses only a single pooling kernel, while the RMP module extracts global context information using 4 different sizes of pooling operations (2 × 2, 3 × 3, 5 × 5, and 6 × 6). In the W-shaped segmentation network with 15 processing modules provided by the segmentation method, before the up-sampling processing is carried out on the high-dimensional feature map by the first up-sampling branch and the second up-sampling branch, the characteristics and the global context information of objects with different sizes in the high-dimensional feature map are extracted by the DAC module and the RMP module, so that the problem of spatial information loss caused by overlarge size difference of the objects in the medical image can be solved well, and the W-shaped segmentation network has better reasoning performance.

Based on the design idea that the boundary features and the region features of the optic disc images are respectively extracted by the W-shaped segmentation network for fusion, when the W-shaped segmentation network is trained in advance, a loss function based on the boundary is adopted for a first up-sampling branch and a loss function based on the region is adopted for a second up-sampling branch, and the W-shaped segmentation network to be trained is trained, so that the W-shaped segmentation network with adaptive weight parameters obtained by training can better extract and fuse the boundary features and the region features of the optic disc images, the optic disc images with clearer and smoother edges are obtained, and the segmentation accuracy of the segmentation method is further improved.

The above-mentioned process of training the W-shaped segmented network to be trained by using the boundary-based loss function for the first upsampling branch and the region-based loss function for the second upsampling branch can be performed with reference to the steps recorded in the third embodiment.

EXAMPLE III

In order to better understand the training process of each neural network in the first and second embodiments, the present embodiment will be described in detail with reference to specific neural networks in steps S1 to S4.

The segmentation method of the embodiment provides a pre-training process of the neural network related to the steps S1-S4 in the first embodiment and the second embodiment.

In this embodiment, the first public data set is an IDRiD data set, and the second public data set is a refoge data set. In addition to the requirement that the sources of the images in the second training data and the third training data have a strict correspondence, the sources of the images in the first training data and the fourth training data may be selected according to actual requirements, and may be the same as or different from the source of the third training data. That is, the image sources in the original style data set in the second training data are the IDRiD data set, and the image sources in the target style data set, the first training data set, the third training data set, and the fourth training data in the second training data are the refage data set.

For the pre-training process of the neural network involved in S2-S4, the training data used by the pre-training process is obtained based on the original training data obtained in the following process of obtaining the original training data.

The process of acquiring raw training data includes: respectively acquiring a plurality of original images from a Train data set and a Validation data set of a REFUSE data set, wherein the original images comprise marking information for marking the video disc image area; and performing data expansion processing on the original image to obtain original training data. The data expansion processing comprises at least one of image rotation processing, image color random transformation processing and random noise adding processing. The data expansion processing includes processing of the original image and/or processing of the annotation information of the original image, and the processing mode of the annotation information corresponds to the processing mode of the original image corresponding to the annotation information, so as to ensure that the image subjected to the data expansion processing and the annotation information of the image are still corresponding. Specifically, the image rotation processing includes rotation processing of the original image and the annotation information thereof at the same angle; the image color random transformation processing and the random noise adding processing can only carry out corresponding processing operation on the original image, and the labeling information is kept unchanged.

The pre-training process of the neural network related to the steps S1-S4 is as follows:

the generation of the countermeasure network involved in S1 is a Cycle-GAN network with adapted weight parameters obtained through a second pre-training process.

the second training data includes a raw style data set, also sometimes referred to as an a-domain data set or an X-domain data set, and a target style data set, also referred to as a B-domain data set or a Y-domain data set.

The original style data set comprises a plurality of images which are obtained from the IDRiD data set and subjected to background removing processing; the target style dataset includes a plurality of images obtained from a validity dataset of the REFUSE dataset and subjected to background removal processing.

The background removing process comprises the following steps:

converting an image to be subjected to background removal processing into a gray image; based on a threshold value method, updating the gray value of the pixel point with the gray value larger than 5 in the gray image to 255, and updating the gray value of the pixel point with the gray value smaller than 5 to 0 to obtain a second binary image; and extracting an intermediate image of a corresponding region in the image to be subjected to background removal processing based on the region information of the largest connected domain in the second binary image, and resampling the intermediate image according to a first preset size to obtain the image subjected to background removal processing. The background removal process is used to remove interference from the background, patient information, etc. on the images in the validity dataset of the IDRID dataset and the REFUGE dataset. Specifically, the first preset size may be 256 pixels × 256 pixels.

It should be noted that the overall image quality in the IDRiD data set is poor, the image definition is low, and the contrast is low; the overall quality of the image in the validity data set of the REFUSE data set is good, the resolution of the image is high, and the outline of the optic disc is clear. Thus, the overall data style of the original style data set obtained based on the IDRiD data set is also biased towards low definition and low contrast; the overall data style of the target style dataset obtained based on the Validation dataset of the REFUSE dataset is biased to high definition and high contrast. Based on the Cycle-GAN network obtained by training the original style data set and the target style data set, in the process of performing data quality enhancement processing on the fundus images to be segmented, the data style of the fundus images to be segmented can be transferred to the target data style, and the image quality can be improved, so that the segmentation precision of the U-Net network in the step S2 is further improved.

The first image segmentation network involved in the S2 is a U-Net network with adaptive weight parameters obtained through a third pre-training process;

the third pre-training process comprises: and acquiring third training data, and training the U-Net network to be trained based on the third training data to obtain the U-Net network with adaptive weight parameters. The third training data includes a plurality of images obtained by performing background removal processing on images in the original training data.

The image classification network involved in S3 is a ResNet50 network with adapted weight parameters obtained through a fourth pre-training process.

the fourth training data comprises a positive case set and a negative case set, and the obtaining fourth training data comprises: cutting the images in the original training data, taking the images which are subjected to cutting processing and contain the complete optic disc images as a positive example set, and taking the images which are subjected to cutting processing and contain the optic disc images and non-optic disc images as a negative example set;

the clipping process includes:

the method comprises the steps of scaling and converting an image to be cut into an image with the size of 128 pixels multiplied by 128 pixels, sliding on the image with the size of 128 pixels multiplied by 128 pixels by using a cutting frame with the size of 32 pixels multiplied by 32 pixels and with the size of 8 pixels as a step length, determining a plurality of cutting areas, mapping the cutting areas to the image to be cut through inverse scaling conversion and cutting to obtain a plurality of sub-images, and resampling the sub-images according to a second preset size to obtain the image with the cutting processing completed. Specifically, the second preset size may be 224 pixels × 224 pixels. The processing method for scaling the image to be cropped to 128 pixels × 128 pixels, determining a plurality of cropping areas and mapping the determined areas onto the image to be cropped through inverse scaling transformation can reduce the operation amount of the cropping processing.

The second preset size is the size of the image in the fourth training data; when the image classification network obtained based on the fourth training data is used for classifying the second image to be selected subsequently, in order to ensure the reasoning performance of the image classification network on the second image to be selected, resampling processing is carried out on the first image to be selected according to a second preset size, so that the size of the second image to be selected is consistent with the size of the image in the fourth training data, and the phenomenon of data domain migration caused by the fact that the training image and the image to be classified are different in size is avoided, and the reasoning performance of the image classification network is not affected.

The second image segmentation network referred to in S4 is a W-shaped segmentation network with 15 processing modules, obtained through a first pre-training process, with adapted weight parameters.

The first pre-training process comprises: and acquiring first training data, training the W-shaped segmentation network to be trained by adopting a loss function based on a boundary for the first up-sampling branch and a loss function based on a region for the second up-sampling branch based on the first training data, and acquiring the W-shaped segmentation network with adaptive weight parameters.

Specifically, the acquiring of the first training data includes: and determining a second square target area by taking the geometric center of the labeling information contained in the image as the center and 2/5 of the side length of the short side of the image as the side length based on the image contained in the original training data, cutting the image based on the second square target area, and forming first training data by a plurality of first images obtained by cutting. The second square target area is sufficient to contain the complete optic disc image, based on the image size and optic disc size in the refage data set, in order to obtain the first image containing the complete optic disc image, the square area determined with 2/5 of the side length of the short side of the image as the side length.

Specifically, the training of the W-shaped segmentation network to be trained by using the boundary-based loss function for the first upsampling branch and using the region-based loss function for the second upsampling branch includes:

calculating boundary loss according to a loss function based on the boundary segmentation result and the labeling information contained in the first image; calculating the region loss according to a region-based loss function based on the result of the secondary image segmentation and the labeling information contained in the first image;

the boundary-based penalty function is:

According to the formula (2), the method obtains,

the formula (2) is

In the formula (I), the compound is shown in the specification,

represents the q pointAnd the shortest distance to the edge of the area corresponding to the marking information.

In addition, the above loss function based on the region adopts the prior art, and is not described herein.

Example four

Based on the neural network with adaptive weight parameters obtained in the training process of each neural network in the third embodiment, the present embodiment describes the segmentation method of the present invention with reference to specific steps.

The present embodiment provides a method for segmenting a disc image in a fundus image, for extracting the disc image from a fundus image to be segmented, the method comprising the steps of:

a1, obtaining a fundus image to be segmented.

And A2, performing background removal processing on the fundus image to be segmented.

And A3, inputting the fundus image subjected to background removal processing into a Cycle-GAN network with adaptive weight parameters to obtain a fundus image with enhanced data quality.

And A4, inputting the fundus image with enhanced data quality into a U-Net network with adaptive weight parameters to obtain a first binary image, wherein the first binary image comprises at least one connected domain.

And A5, calculating the geometric center of each connected domain in the first binary image based on the first binary image.

And A6, determining a first square target area by taking the corresponding position of the geometric center in the fundus image to be segmented as the center and 2/5 of the length of the short side of the fundus image to be segmented as the length of the side for the fundus image to be segmented, cutting the fundus image to be segmented based on the first square target area to obtain one or more pairs of first images to be selected, and resampling the first images to be selected according to the size of 224 pixels multiplied by 224 pixels to obtain second images to be selected corresponding to the first images to be selected.

And A7, inputting the second image to be selected into a ResNet50 network with adaptive weight parameters for classification, and taking the first image to be selected corresponding to the second image to be selected containing the complete optic disc image as the parent optic disc image according to the classification result.

And A8, inputting the parent video disc image into a W-shaped segmentation network with adaptive weight parameters to perform secondary image segmentation to obtain a secondary image segmentation result.

And A9, extracting a fine video disc image from the corresponding area of the parent video disc image based on the result of the secondary image segmentation.

It should be noted that, the result of the secondary image segmentation is generally a third binary image indicating the area where the optic disc image is located, and the fine optic disc image can be obtained by extracting from the corresponding area of the parent optic disc image according to the area where the optic disc image indicated in the third binary image is located.

Since the system/apparatus described in the above embodiments of the present invention is a system/apparatus used for implementing the method of the above embodiments of the present invention, a person skilled in the art can understand the specific structure and modification of the system/apparatus based on the method described in the above embodiments of the present invention, and thus the detailed description is omitted here. All systems/devices adopted by the methods of the above embodiments of the present invention are within the intended scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third and the like are for convenience only and do not denote any order. These words are to be understood as part of the name of the component.

Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims

1. A method for segmenting a video disc image in a fundus image, comprising:

s1, performing data quality enhancement processing on a fundus image to be segmented based on a generation countermeasure network to obtain a fundus image with enhanced data quality;

2. The segmentation method according to claim 1, wherein in S4, the second image segmentation network is a W-shaped segmentation network with adaptive weight parameters, which is constructed in advance and obtained through a first pre-training process,

the W-shaped segmentation network comprises a down-sampling branch, a first up-sampling branch and a second up-sampling branch, wherein the down-sampling branch is used for obtaining a feature map of at least one dimension of a parent video disc image, the first up-sampling branch is used for extracting boundary information in the feature map of the at least one dimension, and the second up-sampling branch is used for extracting region information in the feature map of the at least one dimension and fusing the region information with the boundary information extracted from the corresponding dimension to obtain a secondary image segmentation result.

3. The segmentation method according to claim 2, wherein the W-shaped segmentation network comprises 15 processing modules, wherein the 1~4 processing modules connected in sequence are down-sampling branches, the 5~9 processing modules connected in sequence are first up-sampling branches, and the 10 th to 15 th processing modules connected in sequence are second up-sampling branches; the output end of the 4 th processing module is respectively connected with the input end of the 5 th processing module and the input end of the 10 th processing module;

wherein the content of the first and second substances,

the 1~4 processing modules are all down-sampling modules;

the 5 th processing module is a DAC module;

the 6 th processing module is an RMP module;

the 7 th processing module is an up-sampling module;

the 10 th processing module is a DAC module;

the 11 th processing module is an RMP module;

4. The segmentation method according to claim 3, wherein the first pre-training process comprises:

5. The segmentation method according to claim 4, wherein the training of the W-shaped segmentation network to be trained by using the boundary-based loss function for the first upsampling branch and the region-based loss function for the second upsampling branch comprises:

the boundary-based penalty function is:

According to the formula (2), the method can be obtained,

the formula (2) is

In the formula (I), the compound is shown in the specification,

and the shortest distance from the q point to the area edge corresponding to the marking information is shown.

6. The segmentation method according to any one of claims 1~5, wherein S1 further comprises, before inputting the fundus image to be segmented into the generation countermeasure network, performing background removal processing on the fundus image to be segmented;

the background removing process comprises the following steps:

converting an image to be subjected to background removal processing into a gray image; based on a threshold value method, updating the gray value of the pixel point with the gray value larger than 5 in the gray image to 255, and updating the gray value of the pixel point with the gray value smaller than 5 to 0 to obtain a second binary image; and extracting an intermediate image of a corresponding region in the image to be subjected to background removal processing based on the region information of the largest connected domain in the second binary image, and resampling the intermediate image according to a first preset size to obtain the image subjected to background removal processing.

7. The segmentation method according to claim 6, wherein the first public dataset is an IDRiD dataset and the second public dataset is a REFUSE dataset;

8. The segmentation method according to claim 7, wherein in S2, the first image segmentation network is a U-Net network with adaptive weight parameters obtained through a third pre-training process;

9. The segmentation method according to claim 8, wherein in S3, the cropping the region corresponding to the connected component in the fundus image to be segmented based on the position information of each connected component in the first binary image, and the obtaining one or more first images to be selected includes:

10. The segmentation method according to claim 9, wherein in S3, the image classification network is a ResNet50 network with adaptive weight parameters obtained through a fourth pre-training process;

the fourth training data comprises a positive case set and a negative case set, and the obtaining fourth training data comprises: cutting the images in the original training data, taking the images which are cut and contain the complete video disc images as a positive example set, and taking the images which are cut and contain the video disc images and non-video disc images partially or not containing the video disc images as a negative example set;

the clipping process includes: