CN108764342B - Semantic segmentation method for optic discs and optic cups in fundus image - Google Patents

Semantic segmentation method for optic discs and optic cups in fundus image Download PDF

Info

Publication number
CN108764342B
CN108764342B CN201810534400.9A CN201810534400A CN108764342B CN 108764342 B CN108764342 B CN 108764342B CN 201810534400 A CN201810534400 A CN 201810534400A CN 108764342 B CN108764342 B CN 108764342B
Authority
CN
China
Prior art keywords
network
layer
unit
series
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810534400.9A
Other languages
Chinese (zh)
Other versions
CN108764342A (en
Inventor
刘少鹏
贾西平
关立南
林智勇
高维奇
欧阳佳
梁杰鹏
廖秀秀
马震远
洪佳明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN201810534400.9A priority Critical patent/CN108764342B/en
Publication of CN108764342A publication Critical patent/CN108764342A/en
Application granted granted Critical
Publication of CN108764342B publication Critical patent/CN108764342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a semantic segmentation method for optic discs and optic cups in an eye fundus image, which comprises the following steps: preprocessing any fundus map to obtain fundus map data; obtaining standard segmentation image data about the optic disc and the optic cup in the preprocessed eye fundus picture; initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network; inputting fundus image data into a semantic segmentation network to generate first generation sample data, and inputting standard segmentation image data into a generator network to generate second generation sample data; inputting the first generated sample data, the second generated sample data and the original sample data into a discriminator network for processing and training; and inputting fundus image data obtained after any one fundus image is preprocessed into the trained semantic segmentation network for semantic segmentation to generate expected segmentation image data.

Description

Semantic segmentation method for optic discs and optic cups in fundus image
Technical Field
The invention relates to the technical field of image processing, in particular to a semantic segmentation method for optic discs and optic cups in an eye fundus image.
Background
Glaucoma is the leading cause of blindness and is characterized by a persistent loss of axons by the optic nerve and is not currently available for recovery. However, early detection can significantly slow down or even stop the development of the glaucoma optic neuropathy, and clinically, early screening of glaucoma is significant. Glaucoma is typically characterized by a specific, abnormal appearance of the optic nerve head: optic Disc pits, optic nerve retinal rim loss, are typically considered as Cup to Disc Ratio (CDR) increases. The CDR is considered to be one of the important indicators for detecting the degree of progression of glaucoma and glaucomatous optic neuropathy in patients. The cup-to-disc ratio refers to the ratio between the optic cup and optic disc on the fundus map, and is an important index for the early screening of glaucoma. The greater the CDR values, the higher the probability of glaucoma being diseased. The key to calculating the CDR index is how to accurately segment the optic disc and cup regions of the fundus map.
The existing method for semantic segmentation of optic discs and cups of fundus images employs computer vision techniques and deep learning techniques. Semantic segmentation is to classify each pixel point of the picture. The traditional computer vision technology combines the methods of image brightness, color and contrast enhancement, Graph Cut, edge detection, morphology and the like to process and analyze the fundus image so as to obtain effective characteristic information and respectively detect the optic disc and the optic cup. Because the method excessively depends on manual experience and the scale of data processing is small, the generalization capability of the model is poor, the segmentation effect of the optic disk and the optic cup needs to be improved, and the practical popularization and application value is not high.
The deep learning technology can automatically extract image features without manual intervention, is suitable for tasks such as image semantic segmentation and the like, and is a research hotspot by combining deep learning and analyzing glaucoma medical images. And inputting the fundus image into a full-connection semantic segmentation network U-Net model, calculating and outputting segmentation results of a video disc and a video cup of the fundus image, and training network parameters through a backward propagation technology. However, the existing full-connection semantic segmentation network is directly applied to the optic disc and optic cup segmentation of the eye base map, and the spatial and positional relationship between the optic disc and the optic cup is ignored, so that high-order inconsistency exists between the output result and the real eye base map. In addition, the real labeled samples of the segmented images of the optic disc and the optic cup of the fundus image are fewer, because the real samples of the segmented images of the optic disc and the optic cup need to be labeled by an expert doctor for researching glaucoma, obviously, the real labeled samples are manually made by the expert doctor, the number of the labeled samples is very limited, and great obstacles can be encountered when the existing full-connection semantic segmentation network U-Net model is deeply learned.
In summary, how to construct a deep learning-based optic disc and optic cup semantic segmentation model of an eye fundus image and further optimize a segmentation result is a problem to be solved urgently at present.
Disclosure of Invention
The invention aims to provide a semantic segmentation method for an optic disc and an optic cup in an eyeground picture so as to improve the problems.
The invention provides a semantic segmentation method for optic discs and optic cups in an eye fundus image, which comprises the following steps:
preprocessing any fundus map to obtain fundus map data x;
obtaining standard segmentation image data y about the optic disc and the optic cup in the preprocessed fundus picture;
initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network;
inputting fundus image data x into a semantic segmentation network to perform semantic segmentation, and generating segmented image data y ', thereby constituting first generation sample data (x, y');
inputting the standard segmentation image data y into a generator network for processing, and generating fundus image data x ', thereby constituting second generation sample data (x', y);
inputting first generation sample data (x, y '), second generation sample data (x', y) and original sample data (x, y) into a discriminator network for processing, judging and outputting true and false results of the first generation sample data (x, y ') and the second generation sample data (x', y) based on the original sample data (x, y), wherein according to the true and false results obtained each time, parameters of the discriminator network are updated by using an optimization algorithm, and parameters of the semantic segmentation network and the generator network are updated, so that generative confrontation network training is carried out until training is completed when Nash equilibrium is reached;
and (3) fundus image data obtained after any one fundus image is preprocessed is input into the semantic segmentation network which is trained to perform semantic segmentation, and expected segmented image data is generated.
Wherein the preprocessing comprises a cropping process. Alternatively, the preprocessing includes a clipping process, a rotation process, and a color contrast enhancement process.
Wherein, at initialization, parameter values of the semantic segmentation network, the generator network and the discriminator network are predetermined or random.
The constructed semantic segmentation network adopts a fully-connected convolutional neural network (FCN) facing a semantic segmentation task, the FCN comprises 2 first convolutional layer units, 3 second convolutional layer units and 1 third convolutional layer unit which are sequentially connected in series, and the third convolutional layer unit is used for realizing end-to-end mapping; each first convolution layer unit comprises a first convolution layer, an excitation operation unit ReLU connected with the output end of the first convolution layer in series, a second convolution layer, an excitation operation unit ReLU connected with the output end of the second convolution layer in series, and a maximum pooling layer MaxPool2 d; each second convolutional layer unit comprises 3 groups of convolutional layers connected in series, an excitation operation unit ReLU connected with the output end of each convolutional layer in series, and a maximum pooling layer MaxPool2d connected with the output end of each convolutional layer in series finally; the third convolution unit includes: the device comprises a convolutional layer, an excitation operation unit ReLU, a convolutional layer, an excitation operation unit ReLU and a convolutional layer which are sequentially connected in series; the 2 first convolutional layers are respectively and sequentially called as a first layer block and a second layer block, the 3 second convolutional layers are sequentially called as a third layer block, a fourth layer block and a fifth layer block, the third convolutional layer unit is called as a sixth layer block, the output of the sixth layer block is convolved with the output of the fourth layer block after 2 times of upsampling operation and then fused to obtain a first result, the result is convolved with the output of the third layer block after 2 times of upsampling operation and then fused to obtain a second result, and finally the second result is upsampled by 8 times to obtain segmented image data.
Wherein the structure of the constructed generator network is: the depth of the network is 62 layers, and the network comprises four different network units which are respectively: 1) a network unit comprising a convolutional network, a batch normalization unit BN, and a modified linear unit ReLU in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network, and a batch normalization unit BN in series, 3) a network unit comprising a deconvolution network, a batch normalization unit BN, and a modified linear unit ReLU in series, 4) a network unit comprising a deconvolution network and an activation unit Tanh in series; wherein, the first layer block to the third layer block in series are all network units comprising convolution network, batch normalization unit BN and modified linear unit ReLU in series, the convolution network in the first layer block is 64 filters of 7 × 3, the convolution network in the second layer block is 128 filters of 3 × 64, the convolution network in the third layer block is 256 filters of 3 × 128, which are used for generating 256 feature maps; the fourth layer block is a residual network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network and a batch normalization unit BN which are connected in series, wherein two convolutional networks are composed of 256 filters of 3 × 256, the total number of the convolutional networks is 9, the fourth layer block is connected in series, the fifth layer and the sixth layer which are connected in series are network layers comprising a deconvolution network, a batch normalization unit BN and a modified linear unit ReLU which are connected in series, the deconvolution network in the fifth layer block is composed of 128 filters of 3 × 256, and the deconvolution network in the sixth layer block is composed of 64 filters of 3 × 128; the last layer of blocks are network units comprising a deconvolution network and an activation unit Tanh which are connected in series, the deconvolution network uses 3 filters of 7 x 64, the last layer of blocks are used for realizing end-to-end mapping and reconstructing output results, and the above layer blocks are connected in series.
Wherein the structure of the constructed discriminator network is as follows: the depth of the network is 12 layers, and the network comprises three different network units which are respectively as follows: 1) a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN and an excitation operation unit LeakyReLU connected in series, 3) a convolutional network; wherein: the first layer block of the structure of the discriminator network is a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, and consists of 64 filters of 4 × 6; the second-layer blocks to the fourth-layer blocks are all network units comprising a convolution network, a batch normalization unit BN and an excitation operation unit LeakyReLU which are connected in series, and the filter specifications of the network units are 128 4 × 64, 256 4 × 128 and 512 4 × 256 in sequence; the last layer is a convolution network, which is composed of 1 4 x 512 filter, and is used for realizing end-to-end mapping and reconstructing output results.
According to the semantic segmentation method for the optic disc and the optic cup in the fundus image, a small number (even 1 or more) of real standard segmentation images about the optic disc and the optic cup can be utilized, and the problem that labeled standard samples are too few is solved; through deep learning training, the purpose of accurately and semantically segmenting the optic disc and the optic cup in the eye ground image is achieved, the high-order consistency of segmentation results is guaranteed, and the segmentation accuracy of the optic disc and the optic cup is improved.
Drawings
FIG. 1 is an exemplary diagram of an eye fundus map;
FIG. 2 is a standard segmentation chart for the fundus image shown in FIG. 1 for the optic disc and optic cup;
FIG. 3 is a schematic flow chart of a semantic segmentation method for discs and cups in fundus images according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a semantic segmentation network provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention and the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and the specific embodiments of the present invention.
Fig. 1 is a schematic flow chart of a semantic segmentation method for discs and cups in fundus images according to a first embodiment of the present invention. As shown in fig. 1, the semantic segmentation method for the optic disc and the optic cup in the fundus image comprises the following steps:
step 1: any one fundus image is preprocessed to obtain fundus image data x.
Wherein the preprocessing includes a cropping process such as cropping the large-sized original fundus image obtained by photographing into a desired small-sized fundus image such as a fundus image in a JPG format of 256 × 256 size.
Wherein the preprocessing may include a rotation processing and a color contrast enhancement processing in addition to the clipping processing. The number of samples for training can be increased by performing rotation processing, for example, performing rotation processing of 90 degrees on the fundus image and performing rotation processing of a plurality of angles. Similarly, the number of samples for training may be increased by performing color contrast enhancement processing.
Step 2: standard segmentation image data y on the optic disc and optic cup in the pre-processed fundus image is obtained.
A real standard segmented image about the optic disc and optic cup can be made manually by a specialist in the field of studying glaucoma for a pre-processed fundus map from which the technician obtains the corresponding standard segmented image data y. Note that the number of annotated samples, which the specialist physician in the background art mentions as studying glaucoma, annotates real samples of segmented images of the optic disc and the optic cup for the fundus icon is very limited, that is to say very small. The invention aims to realize the purpose of performing accurate semantic segmentation on the optic disc and the optic cup in the fundus image through deep learning training by using a small number (even 1 or more) of real standard segmentation images about the optic disc and the optic cup.
And step 3: and initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network.
The parameter values for the semantic segmentation network, the generator network and the discriminator network may be predetermined or, alternatively, random at initialization.
How to construct the semantic segmentation network, the generator network, and the arbiter network will be described in detail later.
And 4, step 4: the fundus image data x is input to a semantic segmentation network and semantically segmented, and then segmented image data y 'is generated, thereby constituting first generation sample data (x, y').
In the first generation sample data (x, y '), x is fundus image data x obtained by preprocessing a fundus image, and y ' is segmented image data y ' of a optic disc and a optic cup obtained by semantic segmentation.
And 5: the standard divided image data y is input to a generator network and processed to generate fundus image data x ', thereby constituting second generation sample data (x', y).
In the second generation sample data (x ', y), x ' is the standard divided image data y of the optic disc and the optic cup, which is input to the generator network to generate corresponding fundus image data x ', y is the standard divided image data y of the optic disc and the optic cup.
Step 6: inputting the first generated sample data (x, y '), the second generated sample data (x', y) and the original sample data (x, y) into a discriminator network for processing, judging and outputting true and false results of the first generated sample data (x, y ') and the second generated sample data (x', y) based on the original sample data (x, y), wherein according to the true and false results obtained each time, parameters of the discriminator network are updated by using an optimization algorithm, and parameters of the semantic segmentation network and the generator network are updated, so that the generative confrontation network training is carried out until the training is completed when Nash equilibrium is reached.
In step 6, the optimization algorithm uses Adam algorithm, and adopts a training and learning manner of a generative adaptive network gan (generic adaptive networks), so as to continuously train and learn the countermeasures of the semantic segmentation network, the generator network, and the discriminator network until nash equilibrium is reached, at which time training is completed. For the true and false result output by the discriminator network, 1 may be generally used to indicate that the result is true, and 0 may be used to indicate that the result is false; other numerical representations of the convention may then be employed, by way of non-limiting example here.
In brief, the network is continuously optimized by judging whether the input data is real data or generated sample data through a discriminator network. During training, one party is fixed, the other party is updated, and iteration is performed alternately to maximize the error of the other party until nash equilibrium is reached, so that the segmented image data y 'generated by the semantic segmentation network is not different or almost not different from the real standard segmented image data y, the fundus oculi image data x' generated by the generator network is not different or almost not different from the real fundus oculi image data x, and the discriminator network cannot correctly discriminate the generated sample data and the real data.
And 7: and (3) fundus image data obtained after any one fundus image is preprocessed is input into the semantic segmentation network which is trained to perform semantic segmentation, and expected segmented image data is generated.
Due to the antagonism learning training, the accurate segmentation capability of the semantic segmentation network for semantic segmentation is greatly improved, fundus image data obtained by preprocessing any fundus image is input into the semantic segmentation network, the semantic segmentation network can generate expected and real segmentation image data, and CDR indexes can be calculated and used as one of important bases for early glaucoma screening.
The specific construction of the semantic segmentation network, generator network and discriminator network used by the present invention is described below.
The constructed semantic segmentation network adopts a fully-connected convolutional neural network FCN facing a semantic segmentation task, as shown in a schematic diagram of a convolutional neural network framework shown in FIG. 4, the convolutional neural network FCN comprises 2 first convolutional layer units, 3 second convolutional layer units and 1 third convolutional layer unit which are sequentially connected in series, and the third convolutional layer unit is used for realizing end-to-end mapping; each first convolution layer unit comprises a first convolution layer, an excitation operation unit ReLU connected with the output end of the first convolution layer in series, a second convolution layer, an excitation operation unit ReLU connected with the output end of the second convolution layer in series, and a maximum pooling layer MaxPool2 d; each second convolutional layer unit comprises 3 groups of convolutional layers connected in series, an excitation operation unit ReLU connected with the output end of each convolutional layer in series, and a maximum pooling layer MaxPool2d connected with the output end of each convolutional layer in series finally; the third convolution unit includes: the device comprises a convolutional layer, an excitation operation unit ReLU, a convolutional layer, an excitation operation unit ReLU and a convolutional layer which are sequentially connected in series; here, the 2 first convolutional layer units are respectively referred to as a first layer block and a second layer block in sequence, the 3 second convolutional layer units are referred to as a third layer block, a fourth layer block and a fifth layer block in sequence, and the third convolutional layer unit is referred to as a sixth layer block; and performing 2-time upsampling on the output of the sixth layer block, performing convolution on the output of the sixth layer block and the output of the fourth layer block, and then fusing to obtain a first result, performing 2-time upsampling on the result, performing convolution on the result and the output of the third layer block, and then fusing to obtain a second result, and finally performing 8-time upsampling on the second result to obtain segmented image data.
The size of the segmented image obtained after 8 times of upsampling is the same as that of the original fundus image.
Wherein the structure of the constructed generator network is: the depth of the network is 62 layers, and the network comprises four different network units which are respectively: 1) a network unit comprising a convolutional network, a batch normalization unit BN, and a modified linear unit ReLU in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network, and a batch normalization unit BN in series, 3) a network unit comprising a deconvolution network, a batch normalization unit BN, and a modified linear unit ReLU in series, 4) a network unit comprising a deconvolution network and an activation unit Tanh in series; wherein, the first layer block to the third layer block in series are all network units comprising convolution network, batch normalization unit BN and modified linear unit ReLU in series, the convolution network in the first layer block is 64 filters of 7 × 3, the convolution network in the second layer block is 128 filters of 3 × 64, the convolution network in the third layer block is 256 filters of 3 × 128, which are used for generating 256 feature maps; the fourth layer block is a residual network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network and a batch normalization unit BN which are connected in series, wherein two convolutional networks are composed of 256 filters of 3 × 256, the total number of the convolutional networks is 9, the fourth layer block is connected in series, the fifth layer and the sixth layer which are connected in series are network layers comprising a deconvolution network, a batch normalization unit BN and a modified linear unit ReLU which are connected in series, the deconvolution network in the fifth layer block is composed of 128 filters of 3 × 256, and the deconvolution network in the sixth layer block is composed of 64 filters of 3 × 128; the last layer of blocks are network units comprising a deconvolution network and an activation unit Tanh which are connected in series, the deconvolution network uses 3 filters of 7 x 64, the last layer of blocks are used for realizing end-to-end mapping and reconstructing output results, and the above layer blocks are connected in series.
Here, each convolutional network, each batch normalization unit BN, each modified linear unit ReLU, each deconvolution network, and the activation unit Tanh are each considered as one layer, for a total of 62 layers, where each layer is connected in series. The structure of the generator network is schematically represented in the following by a table.
Exemplary generator network architecture:
Figure BDA0001677501180000101
where, Conv represents a convolutional network, ConvTran represents a deconvolution network, BatchNorm2d represents a batch normalization unit, ReLU represents a modified linear unit, Tanh represents an active unit, IN # represents the number of input channels (for example, IN3 represents that the number of input channels is 3), OUT # represents the number of output channels (for example, OUT64 represents that the number of output channels is 64), K # represents a filter size, S # represents a step size, and P # represents a filling number.
Wherein the structure of the constructed discriminator network is as follows: the depth of the network is 12 layers, and the network comprises three different network units which are respectively as follows: 1) a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN and an excitation operation unit LeakyReLU connected in series, 3) a convolutional network; wherein: the first layer block of the structure of the discriminator network is a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, and consists of 64 filters of 4 × 6; the second-layer blocks to the fourth-layer blocks are all network units comprising a convolution network, a batch normalization unit BN and an excitation operation unit LeakyReLU which are connected in series, and the filter specifications of the network units are 128 4 × 64, 256 4 × 128 and 512 4 × 256 in sequence; the last layer of block is a convolution network, which is composed of 1 4 x 512 filter, and is used for realizing end-to-end mapping and reconstructing output results. All the above layer blocks are connected in series.
Here, each convolutional network, each batch normalization unit BN, and each modified linear unit leak are each considered as one layer, a total of 12 layers, where each layer is connected in series. The structure of the network of discriminators is schematically represented in the following table.
Exemplary arbiter network architecture:
Figure BDA0001677501180000111
Figure BDA0001677501180000121
where Conv denotes a convolutional network, leakyreu denotes a modified linear unit, BatchNorm2d denotes a batch normalization unit, IN # denotes the number of input channels (e.g., IN6 denotes the number of input channels is 6), OUT # denotes the number of output channels (e.g., OUT64 denotes the number of output channels is 64), K # denotes the filter size, S # denotes the step size, and P # denotes the number of fills.
Here, the normalization operation is performed using the batch normalization unit BN (BatchNorm2d), preventing the gradient from disappearing or the gradient from exploding.
In the generator network, in order to utilize the depth characteristics of the segmentation maps of the optic disc and the optic cup, a high-resolution fundus map is generated, and a deconvolution operation is introduced. The modified linear unit selects the ReLU activation function, because the modified linear unit has the piecewise linear property, the gradient is easier to calculate, and the problem that the gradient of the activation function such as Tanh disappears in a saturation region can be avoided. The activation unit of the last layer selects a Tanh activation function instead of a ReLU, and mainly considers that the Tanh activation function has better output expression capability, so that the generated image is smoother and more real.
In the discriminator network, an excitation operation unit adopts a LeakyReLU activation function to replace a ReLU activation function, and when the input is a negative value, a small nonzero gradient value is given, so that the problem that neurons cannot be activated is avoided. And introducing a batch normalization unit BN (BatchNorm2d) to execute normalization operation, and zero-averaging the input of each layer to enable each layer to have input samples obeying the same distribution, so that the influence of covariance offset existing inside during deep network parameter training is overcome, and the problems of gradient loss and explosion in back propagation are effectively solved.
In the training and learning process of generating the confrontation network, the discriminator network D finally makes the semantic segmentation network S and the generator network G learn the joint distribution P (x, y) of the fundus image data x and the standard optic disc and optic cup segmentation image y thereof. Against loss LGAN(S, G, D) is defined as follows:
Figure BDA0001677501180000131
wherein alpha belongs to (0, 1) and reflects the importance degree of the semantic segmentation network S and the generator network G in the counterstudy; e denotes a mathematical expectation, which is indexed by a random variable distribution, and D (x, y) is taken to be 0 or 1, i.e. D (x, y) denotes a value taken to be true or false, where 1 denotes true and 0 denotes false.
To ensure the quality of the images generated by the semantic segmentation network S and the generator network G, L is taken into account1(S, G) loss function:
Figure BDA0001677501180000132
e represents a mathematical expectation, with a random variable distribution subscript;
||x-x'||1representing the difference between the fundus map and the generated fundus map, | | y-y' | luminance1Representing the difference between the standard segmentation map and the generated segmentation map. The specific calculation method is as follows: firstly, the difference of the gray value is calculated by the pixel points at the same positions of two images (an eye fundus image and a generated eye fundus image, or a standard segmentation image and a generated segmentation image), then the absolute value of the difference is obtained, and finally the result is obtained by summation.
Thus, the final loss function L (S, G, D) is defined as follows:
L(S,G,D)=LGAN(S,G,D)+L1(S,G)
the global optimization objective is as follows:
Figure BDA0001677501180000141
in countermeasure learning, the goal of the discriminator network D is to maximize the countermeasure loss LGANWhereas the goal of the semantic segmentation network S and the generator network G is to minimize the penalty LGANAnd L1And (4) loss.
The invention has the following advantages and effects:
according to the semantic segmentation method for the optic disc and the optic cup in the fundus image, a small number (even 1 or more) of real standard segmentation images about the optic disc and the optic cup can be utilized, and the problem that labeled standard samples are too few is solved; through deep learning training, the purpose of accurately and semantically segmenting the optic disc and the optic cup in the eye ground image is achieved, the high-order consistency of segmentation results is guaranteed, and the segmentation accuracy of the optic disc and the optic cup is improved.
Table 1 shows the MIoU comparison of the present invention with a previously known fundus disc and cup segmentation model.
TABLE 1
MIoU of Disc MIoU of Cup MIoU of Disc&Cup
The invention 0.741 0.787 0.764
U-Net 0.729 0.758 0.743
U-Net+GANs 0.758 0.767 0.762
M-Net 0.746 0.753 0.749
The MIoU of Disc column refers to the comparison of the optic Disc segmentation effect; the MIoU of Cup column refers to Cup segmentation effect contrast, and the MIoU of Disc & Cup column refers to segmentation effect contrast for both optic Disc and Cup.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (6)

1. A method of semantic segmentation of discs and cups in a fundus image, comprising:
preprocessing any fundus map to obtain fundus map data x;
obtaining standard segmentation image data y about the optic disc and the optic cup in the preprocessed fundus picture;
initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network;
inputting fundus image data x into a semantic segmentation network to perform semantic segmentation, and generating segmented image data y ', thereby constituting first generation sample data (x, y');
inputting the standard segmentation image data y into a generator network for processing, and generating fundus image data x ', thereby constituting second generation sample data (x', y);
inputting first generation sample data (x, y '), second generation sample data (x', y) and original sample data (x, y) into a discriminator network for processing, judging and outputting true and false results of the first generation sample data (x, y ') and the second generation sample data (x', y) based on the original sample data (x, y), wherein according to the true and false results obtained each time, parameters of the discriminator network are updated by using an optimization algorithm, and parameters of the semantic segmentation network and the generator network are updated, so that generative confrontation network training is carried out until training is completed when Nash equilibrium is reached;
and (3) fundus image data obtained after any one fundus image is preprocessed is input into the semantic segmentation network which is trained to perform semantic segmentation, and expected segmented image data is generated.
2. The method of claim 1, wherein the pre-processing comprises a cropping process.
3. The method of claim 1, wherein the pre-processing comprises a cropping process, a rotation process, and a color contrast enhancement process.
4. The method according to claim 1, wherein the constructed semantic segmentation network adopts a fully-connected convolutional neural network (FCN) oriented to the semantic segmentation task, and the FCN comprises 2 first convolutional layer units, 3 second convolutional layer units and 1 third convolutional layer unit which are connected in series in sequence and are used for realizing end-to-end mapping; each first convolution layer unit comprises a first convolution layer, an excitation operation unit ReLU connected with the output end of the first convolution layer in series, a second convolution layer, an excitation operation unit ReLU connected with the output end of the second convolution layer in series, and a maximum pooling layer MaxPool2 d; each second convolutional layer unit comprises 3 groups of convolutional layers connected in series, an excitation operation unit ReLU connected with the output end of each convolutional layer in series, and a maximum pooling layer MaxPool2d connected with the output end of each convolutional layer in series finally; the third convolutional layer unit includes: the device comprises a convolutional layer, an excitation operation unit ReLU, a convolutional layer, an excitation operation unit ReLU and a convolutional layer which are sequentially connected in series; the 2 first convolutional layers are respectively and sequentially called as a first layer block and a second layer block, the 3 second convolutional layers are sequentially called as a third layer block, a fourth layer block and a fifth layer block, the third convolutional layer unit is called as a sixth layer block, the output of the sixth layer block is convolved with the output of the fourth layer block after 2 times of upsampling operation and then fused to obtain a first result, the result is convolved with the output of the third layer block after 2 times of upsampling operation and then fused to obtain a second result, and finally the second result is upsampled by 8 times to obtain segmented image data.
5. The method of claim 1, wherein the structure of the constructed generator network is: the depth of the network is 62 layers, and the network comprises four different network units which are respectively: 1) a network unit comprising a convolutional network, a batch normalization unit BN, and a modified linear unit ReLU in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network, and a batch normalization unit BN in series, 3) a network unit comprising a deconvolution network, a batch normalization unit BN, and a modified linear unit ReLU in series, 4) a network unit comprising a deconvolution network and an activation unit Tanh in series; wherein, the first layer block to the third layer block in series are all network units comprising convolution network, batch normalization unit BN and modified linear unit ReLU in series, the convolution network in the first layer block is 64 filters of 7 × 3, the convolution network in the second layer block is 128 filters of 3 × 64, the convolution network in the third layer block is 256 filters of 3 × 128, which are used for generating 256 feature maps; the fourth layer block is a residual network unit comprising a series connection of a convolution network, a batch normalization unit BN, a modified linear unit ReLU, a convolution network and a batch normalization unit BN, wherein two convolution networks are composed of 256 filters of 3 × 256, and 9 series connection of the fourth layer block are provided, a fifth layer block and a sixth layer block which are connected in series and are composed of a series connection of a deconvolution network, the batch normalization unit BN and the modified linear unit ReLU, the deconvolution network in the fifth layer block is composed of 128 filters of 3 × 256, and the deconvolution network in the sixth layer block is composed of 64 filters of 3 × 128; the last layer of blocks are network units comprising a deconvolution network and an activation unit Tanh which are connected in series, the deconvolution network uses 3 filters of 7 x 64, the last layer of blocks are used for realizing end-to-end mapping and reconstructing output results, and the above layer blocks are all connected in series.
6. The method of claim 1, wherein the structure of the constructed discriminator network is: the depth of the network is 12 layers, and the network comprises three different network units which are respectively as follows: 1) a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN and an excitation operation unit LeakyReLU connected in series, 3) a convolutional network; wherein: the first layer block of the structure of the discriminator network is a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, and consists of 64 filters of 4 × 6; the second-layer blocks to the fourth-layer blocks are all network units comprising a convolution network, a batch normalization unit BN and an excitation operation unit LeakyReLU which are connected in series, and the filter specifications of the network units are 128 4 × 64, 256 4 × 128 and 512 4 × 256 in sequence; the last layer is a convolution network, which is composed of 1 4 x 512 filter, and is used for realizing end-to-end mapping and reconstructing output results.
CN201810534400.9A 2018-05-29 2018-05-29 Semantic segmentation method for optic discs and optic cups in fundus image Active CN108764342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810534400.9A CN108764342B (en) 2018-05-29 2018-05-29 Semantic segmentation method for optic discs and optic cups in fundus image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810534400.9A CN108764342B (en) 2018-05-29 2018-05-29 Semantic segmentation method for optic discs and optic cups in fundus image

Publications (2)

Publication Number Publication Date
CN108764342A CN108764342A (en) 2018-11-06
CN108764342B true CN108764342B (en) 2021-05-14

Family

ID=64003642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810534400.9A Active CN108764342B (en) 2018-05-29 2018-05-29 Semantic segmentation method for optic discs and optic cups in fundus image

Country Status (1)

Country Link
CN (1) CN108764342B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215039B (en) * 2018-11-09 2022-02-01 浙江大学常州工业技术研究院 Method for processing fundus picture based on neural network
CN109615632B (en) * 2018-11-09 2023-07-21 广东技术师范学院 Fundus image optic disc and optic cup segmentation method based on semi-supervision condition generation type countermeasure network
CN109829894B (en) * 2019-01-09 2022-04-26 平安科技(深圳)有限公司 Segmentation model training method, OCT image segmentation method, device, equipment and medium
CN109933526B (en) * 2019-03-06 2023-01-20 颐保医疗科技(上海)有限公司 Picture testing method for AI identification of traditional Chinese medicinal materials
CN110021052B (en) * 2019-04-11 2023-05-30 北京百度网讯科技有限公司 Method and apparatus for generating fundus image generation model
CN110992309B (en) * 2019-11-07 2023-08-18 吉林大学 Fundus image segmentation method based on deep information transfer network
CN111784687A (en) * 2020-07-22 2020-10-16 上海理工大学 Glaucoma fundus image detection method based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408562A (en) * 2016-09-22 2017-02-15 华南理工大学 Fundus image retinal vessel segmentation method and system based on deep learning
CN107256550A (en) * 2017-06-06 2017-10-17 电子科技大学 A kind of retinal image segmentation method based on efficient CNN CRF networks
CN107610141A (en) * 2017-09-05 2018-01-19 华南理工大学 A kind of remote sensing images semantic segmentation method based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014031086A1 (en) * 2012-08-24 2014-02-27 Agency For Science, Technology And Research Methods and systems for automatic location of optic structures in an image of an eye, and for automatic retina cup-to-disc ratio computation
US20150104102A1 (en) * 2013-10-11 2015-04-16 Universidade De Coimbra Semantic segmentation method with second-order pooling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408562A (en) * 2016-09-22 2017-02-15 华南理工大学 Fundus image retinal vessel segmentation method and system based on deep learning
CN107256550A (en) * 2017-06-06 2017-10-17 电子科技大学 A kind of retinal image segmentation method based on efficient CNN CRF networks
CN107610141A (en) * 2017-09-05 2018-01-19 华南理工大学 A kind of remote sensing images semantic segmentation method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Joint Optic Disc and Cup Segmentation Using Fully Convolutional and Adversarial Networks;Sharath M. Shankaranarayana .etc;《Springer International Publishing AG》;20171231;第168-175页 *
基于深度学习的糖尿病视网膜病变分类和病变检测方法的研究;张德彪;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20180215;正文第1-43页 *

Also Published As

Publication number Publication date
CN108764342A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108764342B (en) Semantic segmentation method for optic discs and optic cups in fundus image
CN110889853B (en) Tumor segmentation method based on residual error-attention deep neural network
CN110428432B (en) Deep neural network algorithm for automatically segmenting colon gland image
CN109615632B (en) Fundus image optic disc and optic cup segmentation method based on semi-supervision condition generation type countermeasure network
CN110889852B (en) Liver segmentation method based on residual error-attention deep neural network
CN112132817B (en) Retina blood vessel segmentation method for fundus image based on mixed attention mechanism
CN111815574B (en) Fundus retina blood vessel image segmentation method based on rough set neural network
CN109345538A (en) A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks
CN109191476A (en) The automatic segmentation of Biomedical Image based on U-net network structure
CN112508864B (en) Retinal vessel image segmentation method based on improved UNet +
CN108095683A (en) The method and apparatus of processing eye fundus image based on deep learning
CN107657612A (en) Suitable for full-automatic the retinal vessel analysis method and system of intelligent and portable equipment
CN112258488A (en) Medical image focus segmentation method
CN109919938B (en) Method for obtaining optic disc segmentation atlas of glaucoma
CN109816666B (en) Symmetrical full convolution neural network model construction method, fundus image blood vessel segmentation device, computer equipment and storage medium
CN109658423B (en) Automatic optic disk cup segmentation method for color fundus picture
JP2019192215A (en) 3d quantitative analysis of retinal layers with deep learning
CN115496771A (en) Brain tumor segmentation method based on brain three-dimensional MRI image design
CN109919915A (en) Retinal fundus images abnormal area detection method and equipment based on deep learning
CN112884788B (en) Cup optic disk segmentation method and imaging method based on rich context network
CN113793348B (en) Retinal blood vessel segmentation method and device
CN112017185A (en) Focus segmentation method, device and storage medium
CN113012163A (en) Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network
CN110610480B (en) MCASPP neural network eyeground image optic cup optic disc segmentation model based on Attention mechanism
CN115035127A (en) Retinal vessel segmentation method based on generative confrontation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant