CN108764342B

CN108764342B - Semantic segmentation method for optic discs and optic cups in fundus image

Info

Publication number: CN108764342B
Application number: CN201810534400.9A
Authority: CN
Inventors: 刘少鹏; 贾西平; 关立南; 林智勇; 高维奇; 欧阳佳; 梁杰鹏; 廖秀秀; 马震远; 洪佳明
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2021-05-14
Anticipated expiration: 2038-05-29
Also published as: CN108764342A

Abstract

The invention provides a semantic segmentation method for optic discs and optic cups in an eye fundus image, which comprises the following steps: preprocessing any fundus map to obtain fundus map data; obtaining standard segmentation image data about the optic disc and the optic cup in the preprocessed eye fundus picture; initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network; inputting fundus image data into a semantic segmentation network to generate first generation sample data, and inputting standard segmentation image data into a generator network to generate second generation sample data; inputting the first generated sample data, the second generated sample data and the original sample data into a discriminator network for processing and training; and inputting fundus image data obtained after any one fundus image is preprocessed into the trained semantic segmentation network for semantic segmentation to generate expected segmentation image data.

Description

Semantic segmentation method for optic discs and optic cups in fundus image

Technical Field

The invention relates to the technical field of image processing, in particular to a semantic segmentation method for optic discs and optic cups in an eye fundus image.

Background

Glaucoma is the leading cause of blindness and is characterized by a persistent loss of axons by the optic nerve and is not currently available for recovery. However, early detection can significantly slow down or even stop the development of the glaucoma optic neuropathy, and clinically, early screening of glaucoma is significant. Glaucoma is typically characterized by a specific, abnormal appearance of the optic nerve head: optic Disc pits, optic nerve retinal rim loss, are typically considered as Cup to Disc Ratio (CDR) increases. The CDR is considered to be one of the important indicators for detecting the degree of progression of glaucoma and glaucomatous optic neuropathy in patients. The cup-to-disc ratio refers to the ratio between the optic cup and optic disc on the fundus map, and is an important index for the early screening of glaucoma. The greater the CDR values, the higher the probability of glaucoma being diseased. The key to calculating the CDR index is how to accurately segment the optic disc and cup regions of the fundus map.

The existing method for semantic segmentation of optic discs and cups of fundus images employs computer vision techniques and deep learning techniques. Semantic segmentation is to classify each pixel point of the picture. The traditional computer vision technology combines the methods of image brightness, color and contrast enhancement, Graph Cut, edge detection, morphology and the like to process and analyze the fundus image so as to obtain effective characteristic information and respectively detect the optic disc and the optic cup. Because the method excessively depends on manual experience and the scale of data processing is small, the generalization capability of the model is poor, the segmentation effect of the optic disk and the optic cup needs to be improved, and the practical popularization and application value is not high.

The deep learning technology can automatically extract image features without manual intervention, is suitable for tasks such as image semantic segmentation and the like, and is a research hotspot by combining deep learning and analyzing glaucoma medical images. And inputting the fundus image into a full-connection semantic segmentation network U-Net model, calculating and outputting segmentation results of a video disc and a video cup of the fundus image, and training network parameters through a backward propagation technology. However, the existing full-connection semantic segmentation network is directly applied to the optic disc and optic cup segmentation of the eye base map, and the spatial and positional relationship between the optic disc and the optic cup is ignored, so that high-order inconsistency exists between the output result and the real eye base map. In addition, the real labeled samples of the segmented images of the optic disc and the optic cup of the fundus image are fewer, because the real samples of the segmented images of the optic disc and the optic cup need to be labeled by an expert doctor for researching glaucoma, obviously, the real labeled samples are manually made by the expert doctor, the number of the labeled samples is very limited, and great obstacles can be encountered when the existing full-connection semantic segmentation network U-Net model is deeply learned.

In summary, how to construct a deep learning-based optic disc and optic cup semantic segmentation model of an eye fundus image and further optimize a segmentation result is a problem to be solved urgently at present.

Disclosure of Invention

The invention aims to provide a semantic segmentation method for an optic disc and an optic cup in an eyeground picture so as to improve the problems.

The invention provides a semantic segmentation method for optic discs and optic cups in an eye fundus image, which comprises the following steps:

preprocessing any fundus map to obtain fundus map data x;

obtaining standard segmentation image data y about the optic disc and the optic cup in the preprocessed fundus picture;

initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network;

inputting fundus image data x into a semantic segmentation network to perform semantic segmentation, and generating segmented image data y ', thereby constituting first generation sample data (x, y');

inputting the standard segmentation image data y into a generator network for processing, and generating fundus image data x ', thereby constituting second generation sample data (x', y);

inputting first generation sample data (x, y '), second generation sample data (x', y) and original sample data (x, y) into a discriminator network for processing, judging and outputting true and false results of the first generation sample data (x, y ') and the second generation sample data (x', y) based on the original sample data (x, y), wherein according to the true and false results obtained each time, parameters of the discriminator network are updated by using an optimization algorithm, and parameters of the semantic segmentation network and the generator network are updated, so that generative confrontation network training is carried out until training is completed when Nash equilibrium is reached;

and (3) fundus image data obtained after any one fundus image is preprocessed is input into the semantic segmentation network which is trained to perform semantic segmentation, and expected segmented image data is generated.

Wherein the preprocessing comprises a cropping process. Alternatively, the preprocessing includes a clipping process, a rotation process, and a color contrast enhancement process.

Wherein, at initialization, parameter values of the semantic segmentation network, the generator network and the discriminator network are predetermined or random.

The constructed semantic segmentation network adopts a fully-connected convolutional neural network (FCN) facing a semantic segmentation task, the FCN comprises 2 first convolutional layer units, 3 second convolutional layer units and 1 third convolutional layer unit which are sequentially connected in series, and the third convolutional layer unit is used for realizing end-to-end mapping; each first convolution layer unit comprises a first convolution layer, an excitation operation unit ReLU connected with the output end of the first convolution layer in series, a second convolution layer, an excitation operation unit ReLU connected with the output end of the second convolution layer in series, and a maximum pooling layer MaxPool2 d; each second convolutional layer unit comprises 3 groups of convolutional layers connected in series, an excitation operation unit ReLU connected with the output end of each convolutional layer in series, and a maximum pooling layer MaxPool2d connected with the output end of each convolutional layer in series finally; the third convolution unit includes: the device comprises a convolutional layer, an excitation operation unit ReLU, a convolutional layer, an excitation operation unit ReLU and a convolutional layer which are sequentially connected in series; the 2 first convolutional layers are respectively and sequentially called as a first layer block and a second layer block, the 3 second convolutional layers are sequentially called as a third layer block, a fourth layer block and a fifth layer block, the third convolutional layer unit is called as a sixth layer block, the output of the sixth layer block is convolved with the output of the fourth layer block after 2 times of upsampling operation and then fused to obtain a first result, the result is convolved with the output of the third layer block after 2 times of upsampling operation and then fused to obtain a second result, and finally the second result is upsampled by 8 times to obtain segmented image data.

Wherein the structure of the constructed generator network is: the depth of the network is 62 layers, and the network comprises four different network units which are respectively: 1) a network unit comprising a convolutional network, a batch normalization unit BN, and a modified linear unit ReLU in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network, and a batch normalization unit BN in series, 3) a network unit comprising a deconvolution network, a batch normalization unit BN, and a modified linear unit ReLU in series, 4) a network unit comprising a deconvolution network and an activation unit Tanh in series; wherein, the first layer block to the third layer block in series are all network units comprising convolution network, batch normalization unit BN and modified linear unit ReLU in series, the convolution network in the first layer block is 64 filters of 7 × 3, the convolution network in the second layer block is 128 filters of 3 × 64, the convolution network in the third layer block is 256 filters of 3 × 128, which are used for generating 256 feature maps; the fourth layer block is a residual network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network and a batch normalization unit BN which are connected in series, wherein two convolutional networks are composed of 256 filters of 3 × 256, the total number of the convolutional networks is 9, the fourth layer block is connected in series, the fifth layer and the sixth layer which are connected in series are network layers comprising a deconvolution network, a batch normalization unit BN and a modified linear unit ReLU which are connected in series, the deconvolution network in the fifth layer block is composed of 128 filters of 3 × 256, and the deconvolution network in the sixth layer block is composed of 64 filters of 3 × 128; the last layer of blocks are network units comprising a deconvolution network and an activation unit Tanh which are connected in series, the deconvolution network uses 3 filters of 7 x 64, the last layer of blocks are used for realizing end-to-end mapping and reconstructing output results, and the above layer blocks are connected in series.

Wherein the structure of the constructed discriminator network is as follows: the depth of the network is 12 layers, and the network comprises three different network units which are respectively as follows: 1) a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN and an excitation operation unit LeakyReLU connected in series, 3) a convolutional network; wherein: the first layer block of the structure of the discriminator network is a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, and consists of 64 filters of 4 × 6; the second-layer blocks to the fourth-layer blocks are all network units comprising a convolution network, a batch normalization unit BN and an excitation operation unit LeakyReLU which are connected in series, and the filter specifications of the network units are 128 4 × 64, 256 4 × 128 and 512 4 × 256 in sequence; the last layer is a convolution network, which is composed of 1 4 x 512 filter, and is used for realizing end-to-end mapping and reconstructing output results.

According to the semantic segmentation method for the optic disc and the optic cup in the fundus image, a small number (even 1 or more) of real standard segmentation images about the optic disc and the optic cup can be utilized, and the problem that labeled standard samples are too few is solved; through deep learning training, the purpose of accurately and semantically segmenting the optic disc and the optic cup in the eye ground image is achieved, the high-order consistency of segmentation results is guaranteed, and the segmentation accuracy of the optic disc and the optic cup is improved.

Drawings

FIG. 1 is an exemplary diagram of an eye fundus map;

FIG. 2 is a standard segmentation chart for the fundus image shown in FIG. 1 for the optic disc and optic cup;

FIG. 3 is a schematic flow chart of a semantic segmentation method for discs and cups in fundus images according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a semantic segmentation network provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention and the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and the specific embodiments of the present invention.

Fig. 1 is a schematic flow chart of a semantic segmentation method for discs and cups in fundus images according to a first embodiment of the present invention. As shown in fig. 1, the semantic segmentation method for the optic disc and the optic cup in the fundus image comprises the following steps:

step 1: any one fundus image is preprocessed to obtain fundus image data x.

Wherein the preprocessing includes a cropping process such as cropping the large-sized original fundus image obtained by photographing into a desired small-sized fundus image such as a fundus image in a JPG format of 256 × 256 size.

Wherein the preprocessing may include a rotation processing and a color contrast enhancement processing in addition to the clipping processing. The number of samples for training can be increased by performing rotation processing, for example, performing rotation processing of 90 degrees on the fundus image and performing rotation processing of a plurality of angles. Similarly, the number of samples for training may be increased by performing color contrast enhancement processing.

Step 2: standard segmentation image data y on the optic disc and optic cup in the pre-processed fundus image is obtained.

A real standard segmented image about the optic disc and optic cup can be made manually by a specialist in the field of studying glaucoma for a pre-processed fundus map from which the technician obtains the corresponding standard segmented image data y. Note that the number of annotated samples, which the specialist physician in the background art mentions as studying glaucoma, annotates real samples of segmented images of the optic disc and the optic cup for the fundus icon is very limited, that is to say very small. The invention aims to realize the purpose of performing accurate semantic segmentation on the optic disc and the optic cup in the fundus image through deep learning training by using a small number (even 1 or more) of real standard segmentation images about the optic disc and the optic cup.

And step 3: and initializing parameters of the constructed semantic segmentation network, the generator network and the discriminator network.

The parameter values for the semantic segmentation network, the generator network and the discriminator network may be predetermined or, alternatively, random at initialization.

How to construct the semantic segmentation network, the generator network, and the arbiter network will be described in detail later.

And 4, step 4: the fundus image data x is input to a semantic segmentation network and semantically segmented, and then segmented image data y 'is generated, thereby constituting first generation sample data (x, y').

In the first generation sample data (x, y '), x is fundus image data x obtained by preprocessing a fundus image, and y ' is segmented image data y ' of a optic disc and a optic cup obtained by semantic segmentation.

And 5: the standard divided image data y is input to a generator network and processed to generate fundus image data x ', thereby constituting second generation sample data (x', y).

In the second generation sample data (x ', y), x ' is the standard divided image data y of the optic disc and the optic cup, which is input to the generator network to generate corresponding fundus image data x ', y is the standard divided image data y of the optic disc and the optic cup.

Step 6: inputting the first generated sample data (x, y '), the second generated sample data (x', y) and the original sample data (x, y) into a discriminator network for processing, judging and outputting true and false results of the first generated sample data (x, y ') and the second generated sample data (x', y) based on the original sample data (x, y), wherein according to the true and false results obtained each time, parameters of the discriminator network are updated by using an optimization algorithm, and parameters of the semantic segmentation network and the generator network are updated, so that the generative confrontation network training is carried out until the training is completed when Nash equilibrium is reached.

In step 6, the optimization algorithm uses Adam algorithm, and adopts a training and learning manner of a generative adaptive network gan (generic adaptive networks), so as to continuously train and learn the countermeasures of the semantic segmentation network, the generator network, and the discriminator network until nash equilibrium is reached, at which time training is completed. For the true and false result output by the discriminator network, 1 may be generally used to indicate that the result is true, and 0 may be used to indicate that the result is false; other numerical representations of the convention may then be employed, by way of non-limiting example here.

In brief, the network is continuously optimized by judging whether the input data is real data or generated sample data through a discriminator network. During training, one party is fixed, the other party is updated, and iteration is performed alternately to maximize the error of the other party until nash equilibrium is reached, so that the segmented image data y 'generated by the semantic segmentation network is not different or almost not different from the real standard segmented image data y, the fundus oculi image data x' generated by the generator network is not different or almost not different from the real fundus oculi image data x, and the discriminator network cannot correctly discriminate the generated sample data and the real data.

And 7: and (3) fundus image data obtained after any one fundus image is preprocessed is input into the semantic segmentation network which is trained to perform semantic segmentation, and expected segmented image data is generated.

Due to the antagonism learning training, the accurate segmentation capability of the semantic segmentation network for semantic segmentation is greatly improved, fundus image data obtained by preprocessing any fundus image is input into the semantic segmentation network, the semantic segmentation network can generate expected and real segmentation image data, and CDR indexes can be calculated and used as one of important bases for early glaucoma screening.

The specific construction of the semantic segmentation network, generator network and discriminator network used by the present invention is described below.

The constructed semantic segmentation network adopts a fully-connected convolutional neural network FCN facing a semantic segmentation task, as shown in a schematic diagram of a convolutional neural network framework shown in FIG. 4, the convolutional neural network FCN comprises 2 first convolutional layer units, 3 second convolutional layer units and 1 third convolutional layer unit which are sequentially connected in series, and the third convolutional layer unit is used for realizing end-to-end mapping; each first convolution layer unit comprises a first convolution layer, an excitation operation unit ReLU connected with the output end of the first convolution layer in series, a second convolution layer, an excitation operation unit ReLU connected with the output end of the second convolution layer in series, and a maximum pooling layer MaxPool2 d; each second convolutional layer unit comprises 3 groups of convolutional layers connected in series, an excitation operation unit ReLU connected with the output end of each convolutional layer in series, and a maximum pooling layer MaxPool2d connected with the output end of each convolutional layer in series finally; the third convolution unit includes: the device comprises a convolutional layer, an excitation operation unit ReLU, a convolutional layer, an excitation operation unit ReLU and a convolutional layer which are sequentially connected in series; here, the 2 first convolutional layer units are respectively referred to as a first layer block and a second layer block in sequence, the 3 second convolutional layer units are referred to as a third layer block, a fourth layer block and a fifth layer block in sequence, and the third convolutional layer unit is referred to as a sixth layer block; and performing 2-time upsampling on the output of the sixth layer block, performing convolution on the output of the sixth layer block and the output of the fourth layer block, and then fusing to obtain a first result, performing 2-time upsampling on the result, performing convolution on the result and the output of the third layer block, and then fusing to obtain a second result, and finally performing 8-time upsampling on the second result to obtain segmented image data.

The size of the segmented image obtained after 8 times of upsampling is the same as that of the original fundus image.

Here, each convolutional network, each batch normalization unit BN, each modified linear unit ReLU, each deconvolution network, and the activation unit Tanh are each considered as one layer, for a total of 62 layers, where each layer is connected in series. The structure of the generator network is schematically represented in the following by a table.

Exemplary generator network architecture:

where, Conv represents a convolutional network, ConvTran represents a deconvolution network, BatchNorm2d represents a batch normalization unit, ReLU represents a modified linear unit, Tanh represents an active unit, IN # represents the number of input channels (for example, IN3 represents that the number of input channels is 3), OUT # represents the number of output channels (for example, OUT64 represents that the number of output channels is 64), K # represents a filter size, S # represents a step size, and P # represents a filling number.

Wherein the structure of the constructed discriminator network is as follows: the depth of the network is 12 layers, and the network comprises three different network units which are respectively as follows: 1) a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN and an excitation operation unit LeakyReLU connected in series, 3) a convolutional network; wherein: the first layer block of the structure of the discriminator network is a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, and consists of 64 filters of 4 × 6; the second-layer blocks to the fourth-layer blocks are all network units comprising a convolution network, a batch normalization unit BN and an excitation operation unit LeakyReLU which are connected in series, and the filter specifications of the network units are 128 4 × 64, 256 4 × 128 and 512 4 × 256 in sequence; the last layer of block is a convolution network, which is composed of 1 4 x 512 filter, and is used for realizing end-to-end mapping and reconstructing output results. All the above layer blocks are connected in series.

Here, each convolutional network, each batch normalization unit BN, and each modified linear unit leak are each considered as one layer, a total of 12 layers, where each layer is connected in series. The structure of the network of discriminators is schematically represented in the following table.

Exemplary arbiter network architecture:

where Conv denotes a convolutional network, leakyreu denotes a modified linear unit, BatchNorm2d denotes a batch normalization unit, IN # denotes the number of input channels (e.g., IN6 denotes the number of input channels is 6), OUT # denotes the number of output channels (e.g., OUT64 denotes the number of output channels is 64), K # denotes the filter size, S # denotes the step size, and P # denotes the number of fills.

Here, the normalization operation is performed using the batch normalization unit BN (BatchNorm2d), preventing the gradient from disappearing or the gradient from exploding.

In the generator network, in order to utilize the depth characteristics of the segmentation maps of the optic disc and the optic cup, a high-resolution fundus map is generated, and a deconvolution operation is introduced. The modified linear unit selects the ReLU activation function, because the modified linear unit has the piecewise linear property, the gradient is easier to calculate, and the problem that the gradient of the activation function such as Tanh disappears in a saturation region can be avoided. The activation unit of the last layer selects a Tanh activation function instead of a ReLU, and mainly considers that the Tanh activation function has better output expression capability, so that the generated image is smoother and more real.

In the discriminator network, an excitation operation unit adopts a LeakyReLU activation function to replace a ReLU activation function, and when the input is a negative value, a small nonzero gradient value is given, so that the problem that neurons cannot be activated is avoided. And introducing a batch normalization unit BN (BatchNorm2d) to execute normalization operation, and zero-averaging the input of each layer to enable each layer to have input samples obeying the same distribution, so that the influence of covariance offset existing inside during deep network parameter training is overcome, and the problems of gradient loss and explosion in back propagation are effectively solved.

In the training and learning process of generating the confrontation network, the discriminator network D finally makes the semantic segmentation network S and the generator network G learn the joint distribution P (x, y) of the fundus image data x and the standard optic disc and optic cup segmentation image y thereof. Against loss L_GAN(S, G, D) is defined as follows:

wherein alpha belongs to (0, 1) and reflects the importance degree of the semantic segmentation network S and the generator network G in the counterstudy; e denotes a mathematical expectation, which is indexed by a random variable distribution, and D (x, y) is taken to be 0 or 1, i.e. D (x, y) denotes a value taken to be true or false, where 1 denotes true and 0 denotes false.

To ensure the quality of the images generated by the semantic segmentation network S and the generator network G, L is taken into account₁(S, G) loss function:

e represents a mathematical expectation, with a random variable distribution subscript;

||x-x'||₁representing the difference between the fundus map and the generated fundus map, | | y-y' | luminance₁Representing the difference between the standard segmentation map and the generated segmentation map. The specific calculation method is as follows: firstly, the difference of the gray value is calculated by the pixel points at the same positions of two images (an eye fundus image and a generated eye fundus image, or a standard segmentation image and a generated segmentation image), then the absolute value of the difference is obtained, and finally the result is obtained by summation.

Thus, the final loss function L (S, G, D) is defined as follows:

L(S，G，D)＝L_GAN(S，G，D)+L₁(S，G)

the global optimization objective is as follows:

in countermeasure learning, the goal of the discriminator network D is to maximize the countermeasure loss L_GANWhereas the goal of the semantic segmentation network S and the generator network G is to minimize the penalty L_GANAnd L₁And (4) loss.

The invention has the following advantages and effects:

Table 1 shows the MIoU comparison of the present invention with a previously known fundus disc and cup segmentation model.

TABLE 1

	MIoU of Disc	MIoU of Cup	MIoU of Disc&Cup
				The invention	0.741	0.787	0.764
U-Net	0.729	0.758	0.743
				U-Net+GANs	0.758	0.767	0.762
M-Net	0.746	0.753	0.749

The MIoU of Disc column refers to the comparison of the optic Disc segmentation effect; the MIoU of Cup column refers to Cup segmentation effect contrast, and the MIoU of Disc & Cup column refers to segmentation effect contrast for both optic Disc and Cup.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of semantic segmentation of discs and cups in a fundus image, comprising:

preprocessing any fundus map to obtain fundus map data x;

2. The method of claim 1, wherein the pre-processing comprises a cropping process.

3. The method of claim 1, wherein the pre-processing comprises a cropping process, a rotation process, and a color contrast enhancement process.

4. The method according to claim 1, wherein the constructed semantic segmentation network adopts a fully-connected convolutional neural network (FCN) oriented to the semantic segmentation task, and the FCN comprises 2 first convolutional layer units, 3 second convolutional layer units and 1 third convolutional layer unit which are connected in series in sequence and are used for realizing end-to-end mapping; each first convolution layer unit comprises a first convolution layer, an excitation operation unit ReLU connected with the output end of the first convolution layer in series, a second convolution layer, an excitation operation unit ReLU connected with the output end of the second convolution layer in series, and a maximum pooling layer MaxPool2 d; each second convolutional layer unit comprises 3 groups of convolutional layers connected in series, an excitation operation unit ReLU connected with the output end of each convolutional layer in series, and a maximum pooling layer MaxPool2d connected with the output end of each convolutional layer in series finally; the third convolutional layer unit includes: the device comprises a convolutional layer, an excitation operation unit ReLU, a convolutional layer, an excitation operation unit ReLU and a convolutional layer which are sequentially connected in series; the 2 first convolutional layers are respectively and sequentially called as a first layer block and a second layer block, the 3 second convolutional layers are sequentially called as a third layer block, a fourth layer block and a fifth layer block, the third convolutional layer unit is called as a sixth layer block, the output of the sixth layer block is convolved with the output of the fourth layer block after 2 times of upsampling operation and then fused to obtain a first result, the result is convolved with the output of the third layer block after 2 times of upsampling operation and then fused to obtain a second result, and finally the second result is upsampled by 8 times to obtain segmented image data.

5. The method of claim 1, wherein the structure of the constructed generator network is: the depth of the network is 62 layers, and the network comprises four different network units which are respectively: 1) a network unit comprising a convolutional network, a batch normalization unit BN, and a modified linear unit ReLU in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN, a modified linear unit ReLU, a convolutional network, and a batch normalization unit BN in series, 3) a network unit comprising a deconvolution network, a batch normalization unit BN, and a modified linear unit ReLU in series, 4) a network unit comprising a deconvolution network and an activation unit Tanh in series; wherein, the first layer block to the third layer block in series are all network units comprising convolution network, batch normalization unit BN and modified linear unit ReLU in series, the convolution network in the first layer block is 64 filters of 7 × 3, the convolution network in the second layer block is 128 filters of 3 × 64, the convolution network in the third layer block is 256 filters of 3 × 128, which are used for generating 256 feature maps; the fourth layer block is a residual network unit comprising a series connection of a convolution network, a batch normalization unit BN, a modified linear unit ReLU, a convolution network and a batch normalization unit BN, wherein two convolution networks are composed of 256 filters of 3 × 256, and 9 series connection of the fourth layer block are provided, a fifth layer block and a sixth layer block which are connected in series and are composed of a series connection of a deconvolution network, the batch normalization unit BN and the modified linear unit ReLU, the deconvolution network in the fifth layer block is composed of 128 filters of 3 × 256, and the deconvolution network in the sixth layer block is composed of 64 filters of 3 × 128; the last layer of blocks are network units comprising a deconvolution network and an activation unit Tanh which are connected in series, the deconvolution network uses 3 filters of 7 x 64, the last layer of blocks are used for realizing end-to-end mapping and reconstructing output results, and the above layer blocks are all connected in series.

6. The method of claim 1, wherein the structure of the constructed discriminator network is: the depth of the network is 12 layers, and the network comprises three different network units which are respectively as follows: 1) a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, 2) a network unit comprising a convolutional network, a batch normalization unit BN and an excitation operation unit LeakyReLU connected in series, 3) a convolutional network; wherein: the first layer block of the structure of the discriminator network is a network unit comprising a convolutional network and an excitation operation unit LeakyReLU connected in series, and consists of 64 filters of 4 × 6; the second-layer blocks to the fourth-layer blocks are all network units comprising a convolution network, a batch normalization unit BN and an excitation operation unit LeakyReLU which are connected in series, and the filter specifications of the network units are 128 4 × 64, 256 4 × 128 and 512 4 × 256 in sequence; the last layer is a convolution network, which is composed of 1 4 x 512 filter, and is used for realizing end-to-end mapping and reconstructing output results.