CN112580580A

CN112580580A - Pathological myopia identification method based on data enhancement and model fusion

Info

Publication number: CN112580580A
Application number: CN202011578831.9A
Authority: CN
Inventors: 张晓云; 崔建峰; 黄建玉; 吴万庆; 熊飞兵; 韦程琳; 蒋明哲; 陈浩; 徐飞翔; 刘琳
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-03-30

Abstract

The invention discloses a pathological myopia identification method based on data enhancement and model fusion, which comprises the steps of firstly carrying out algorithm operation on fundus images to be identified so as to improve the data quality, then sending the fundus images into a deep learning model for identification, and outputting identification results; the deep learning model comprises a primary model and a secondary model, and the output of the primary model is connected into a hard voter model to be used as a classifier to form a secondary model frame; taking the output of the primary model as the input of the secondary model; and counting the mode appearing in the prediction result item in each sample in the secondary model to serve as a final recognition result. The method automatically identifies the fundus images based on various classical convolutional neural networks, and simultaneously establishes the model by adopting a model fusion strategy and various data enhancement modes, so that the incomplete condition of model training under a small sample is solved, overfitting is avoided, the generalization effect of the model is improved, and the expression capability and the generalization capability of the model are improved to a certain extent.

Description

Pathological myopia identification method based on data enhancement and model fusion

Technical Field

The invention belongs to the technical field of image processing, relates to application of an artificial intelligence technology in the field of medical imaging, and particularly relates to a pathological myopia identification method based on data enhancement and model fusion.

Background

Pathological Myopia (PM) means more than-8.00D diopters. Myopia continues to increase in intensity, often accompanied by changes in the posterior pole of the eye, including thinning of the sclera, thinning of choroid atrophy and increase of the ocular axis, and may be accompanied by ocular diseases with various complications such as amblyopia, glaucoma, cataract, vitreous opacity, retinal detachment, and the like.

Complications associated with pathological myopia are one of the leading causes of visual impairment and blindness in people today, a condition that is particularly acute in east asia. Pathological myopia can cause a variety of pathologies in the macula, the periphery of the retina, and the optic nerve, resulting in visual impairment. Malformations of the eyeball structure, including posterior scleral staphyloma, accelerate the progression of these diseases. Epidemiological investigations of pathological myopia are numerous, although they may differ in their definition of pathological myopia. Investigation in asian populations has shown that the incidence of pathological myopia is 0.9-3.1%; investigation of the australian population showed that the incidence of pathological myopia was 1.2%. In addition, studies have shown that pathological myopia has become the leading cause of blindness and poor vision in 7% of european populations. The incidence of high myopia and pathological myopia has also increased year by year due to environmental factors and lifestyle changes. Thus, the problem of vision impairment due to pathological myopia-related complications is likely to become more severe in the coming decades.

The number of ophthalmologists in China is far from the difference of developed countries, but the domestic medical and patient demands are huge and the number of the ophthalmologists and patients is gradually increased along with the development of social aging in recent years. The main contradiction in the current medical field is the contradiction between the increasingly demanding medical requirements of people and the medical resources which are in tension on the day. On one hand, the problems of difficult and expensive medical observation still exist, and are mainly caused by the serious scarcity of the prior excellent doctors and the long culture time of the doctors; on the other hand, health problems of people gradually worsen, and the health is highly valued, and the demand for medical services increases. The method is a social problem, but the artificial intelligence technology can bring a chance to the medical industry.

In the past 20 years, with the development of imaging technologies such as optical coherence tomography, frequency domain OCT and three-dimensional magnetic resonance imaging, we have had deeper insight into complications associated with high myopia. For example, using OCT, we can assess the optic nerve, macula, and recent lesions, such as myopic stretch maculopathy and dome maculopathy, etc., at high resolution. The advent of new therapeutic approaches, including anti-neovascular drug therapy and vitrectomy, has also led to improved prognosis of some of the highly myopic complications.

In recent years, with the development of deep learning, the practice and application of deep learning in different fields have been gradually developed, and the medical field is no exception, and the application of machine learning-based artificial intelligence technology in ophthalmology is one of the fields. Many eye diseases are diagnosed largely by the result of auxiliary ophthalmology examination, and most of the auxiliary ophthalmology examination is mainly based on imaging examination. The eye images are fine and complex, the information amount is large, the diagnosis result is often limited by the knowledge level and the clinical experience of doctors, the subjectivity is strong, and the time and the labor are consumed. By applying the machine learning artificial intelligence technology combined with a computer in ophthalmology, the diagnosis efficiency of ophthalmic diseases in clinical work can be greatly improved, and the burden of ophthalmologists is reduced.

The Convolutional Neural Network (CNN) which is driven by data and automatically extracts relevant features has better effect than the traditional method in the aspect of image recognition, such as fcn (fuzzy relational networks) and the like, so that the deep learning related technology is a natural choice in medical image processing, the research of optic disc recognition by using the convolutional neural network is more and more, and the recognition result superior to the recognition method is obtained.

Although the fundus image recognition method based on the convolutional neural network is superior in effect to the conventional method to some extent, there are still some problems such as: the medical image data volume is large but the samples are few, but the most important dependence in the model training process is the data set, and the quality of the data set determines the quality of the model, so that the training effect of the model can be effectively improved by preprocessing the data. For another example, with the continuous development of deep learning in the field of image recognition, excellent neural network models are continuously proposed, each model has unique advantages, but the excellent performance in processing various data cannot be guaranteed, and therefore the expression capability of the model may be short, namely the model is fatigue in recognition of some rare pathological pictures. Aiming at the two existing problems, a plurality of researchers in the industry propose an optimized neural network model from different angles and obtain effective results. However, many uncontrollable factors still exist in the method for improving the existing model, for example, when a researcher optimizes the model (for example, widening and deepening the model), the researcher cannot directly judge the effectiveness of the model, and only can observe whether the performance of the original model is effectively improved by the optimized operation through the training result. Such methods are computationally and temporally demanding, require relatively long development cycles, are often not readily solvable, and their true applicability remains questionable.

Disclosure of Invention

The invention aims to provide a pathological myopia recognition method based on data enhancement and model fusion, which is characterized in that fundus images are automatically recognized based on various classical convolutional neural networks, and a new deep convolutional neural network model is established by adopting a model fusion strategy and various data enhancement modes, so that the conditions of model training incompleteness, overfitting and the like in a small sample are optimized, the model generalization effect, the expression capability and the accuracy of a model recognition picture can be improved to a certain extent by the mode, and a new thought is provided for the recognition and prognosis research of the pathological myopia.

In order to achieve the above purpose, the solution of the invention is:

a pathological myopia identification method based on data enhancement and model fusion comprises the following steps:

step 1, acquiring a fundus image to be identified;

step 2, the fundus image to be identified is sent into a deep learning model for identification, and an identification result is output;

the training method of the deep learning model comprises the following steps:

step a, determining a data set, and enabling the data set to be as follows: 2:1, dividing a training set, a verification set and a test set in proportion;

b, performing data enhancement on the data of the training set by adopting a plurality of data enhancement modes to obtain a corresponding enhanced data set;

step c, respectively sending the data set enhanced in the step b into AlexNet, GooLeNet, VGG-16 and ResNet-50 for training, measuring the learning quality of various data enhancement modes on the network according to the verification accuracy of each network on a verification set, storing the network model with the highest accuracy corresponding to each network on the verification data set, recording the corresponding data enhancement strategy, and taking the AlexNet, GooLeNet, VGG-16 and ResNet-50 at the moment as a primary model of the deep learning model;

and d, recording all recognition results of the primary model on the verification set, taking the recognition results as input data sets of a secondary model, namely the hard voting model, judging the recognition results of the primary model again in a mode of mode selection, and obtaining a final recognition result.

In the step 2, all the primary models are at a learning rate of 0.001, the loss function adopts a logistic loss function, the optimizer adopts an SSD optimizer, the batch _ size is set to be 10, each epoch is 30 batches, and each model is trained by 30 epochs and stored as the primary model;

freezing the convolution layer, the full connection layer, the BN layer and other layers of all the models, and connecting the output layer into a hard voting model to further construct a deep learning model after result fusion.

In the step b, data enhancement is performed on the data set, and the method specifically includes the following steps:

firstly, randomly turning over a picture;

adding random white Gaussian noise to the data set;

adding random brightness, saturation and contrast to the data;

fourthly, randomly cutting the pictures in equal proportion;

fifthly, randomly changing the definition of the picture;

sixthly, randomly stretching the picture;

seventhly, superposing random rotation, random white noise and random color change on the original data set;

stacking, cutting and stretching operations are superposed on the original data set;

ninthly, overlaying random inversion, random white noise, random brightness, saturation, contrast, adding random stretching and random definition on the original data set;

the positive (r) superimposes random flipping and random changing sharpness on the original data set;

in situSuperposing random cutting, random horizontal turning and Gaussian blur on the initial data set;

all the above methods are superimposed on the original data set.

In the step b, the data set after data enhancement is also preprocessed, including firstly normalizing the data and then defining a data reader.

The concrete content of the step c is as follows:

step c1, training on all data sets by taking VGG-16 as a data set filter, wherein each data set is trained for 30 epochs, each epoch traverses all data in the training data set once, and after each epoch is finished, verification is carried out to obtain a verification result; completing the epoch training for 30 times in the form, forming correspondingly trained models on different data sets, selecting the data set with the first four verification accuracy rates as an alternative data set, and taking the model corresponding to the data set with the highest verification accuracy rate as the optimal model of the VGG-16;

step c2, training on 4 alternative data sets respectively by using AlexNet, GooleNet and ResNet-50 in sequence, and screening out the optimal data-enhanced models respectively;

and step c3, combining the optimal model of VGG-16 obtained in the step a and the optimal model of AlexNet, GooleNet and ResNet-50 obtained in the step b into a primary model.

After the scheme is adopted, the invention has the following improvements:

(1) 12 data enhancement modes are designed based on the iChallenge-PM public data set, almost all operations commonly used for data enhancement at present are covered, and the quality of the data set can be effectively improved in the aspect of data preprocessing.

(2) The same optimizer is set based on AlexNeT, VGG-16, GooLeNet and ResNet-50 models, and parameters such as loss functions and learning rates aim to observe the effect of different enhanced data sets on the models in learning features by using a control variable mode. On the basis, the model with the highest accuracy is obtained by training on 12 data sets and serves as a primary model.

(3) And taking the prediction result of the model as input data of the secondary model, completely accessing the output layer of the primary model into the secondary model, namely the hard voter model, completing construction of the fusion model, and training again to form the final fused model.

(4) Through the model after the operation training, under the condition that transfer learning is not used, the accuracy can also achieve a very high effect, and the scale of the data set is effectively expanded by using a data enhancement mode, so that overfitting caused by training is effectively weakened, and the accuracy and generalization effect of the model in recognizing the picture are effectively improved.

Drawings

FIG. 1 is a schematic diagram of the evolution process of the inclusion structure;

FIG. 2 is a schematic diagram of an optimizer;

FIG. 3 is a fusion model logic diagram;

FIG. 4 is a schematic view of fundus images of high and pathological myopia;

FIG. 5 is a raw data set picture;

FIG. 6 is a left-right flipped picture of an original data set;

FIG. 7 is a 90 degree flip picture of an original data set;

FIGS. 8 and 9 are the pictures before and after Gaussian white noise is added;

fig. 10 and 11 are pictures before and after brightness adjustment;

fig. 12 and 13 are pictures before and after the saturation adjustment;

FIGS. 14 and 15 are front and rear images cut in equal proportion;

FIGS. 16 and 17 are pictures before and after changing the sharpness;

FIGS. 18 and 19 are pictures before and after changing Gaussian white noise and randomly flipping;

FIGS. 20 and 21 are views of the front and rear of the rotation, cutting and stretching;

FIGS. 22, 23 are pictures of before and after mixing all methods;

FIGS. 24, 25 are front and back views of the use of a third party library;

FIG. 26 is a graph of experimental results (acc);

FIG. 27 is a graph of the results of the experiment (loss);

FIG. 28 is a graph of experimental results (loss distribution of each model);

fig. 29 shows model-corresponding picture parameters.

Detailed Description

The technical solutions and advantages of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

Firstly, in the aspect of an experimental framework:

1. the prior technical proposal introduces:

first, AlexNet: the AlexNet network proposed by Kizhevsky et al, which adopts 5 convolutional layers and 3 full-link layers for the first time to realize the classification of 1000 types of pictures, has become a mountain-opening task that makes a significant breakthrough in the field of image classification in deep learning. AlexNet employs a series of methods to improve deep convolutional networks over traditional convolutional neural networks. For example, a ReLU nonlinear activation function is used for accelerating the training of the network, a multi-GPU convolution operation is realized to solve the limitation of insufficient video card resources at the moment, and a Dropout random inactivation strategy is introduced to reduce the overfitting condition of the fully-connected layer. In addition, AlexNet also proposes strategies such as local response normalization, overlapping pooling, data enhancement and the like to improve the classification capability and generalization capability of the model.

VGG: simony et al proposed VGG networks that replaced the 5 x 5 and 7 x 7 filters with 3 x 3 filters. The receptive field of a plurality of serially connected small convolutional layers may be the same size as the receptive field of one large convolutional layer. For example, the field of 23 x 3 convolutional layers in series is the same as the field of 15 x 5 convolutional layer, and their convolution effects are comparable. But a plurality of small convolution layers connected in series have fewer parameters and more nonlinear transformation, and have stronger learning capability and better effect on characteristics. In addition, the VGG also increases the network structure to 16 or 19 layers, and as the number of layers of the network increases, the feature representation capability of the network is stronger, and the classification effect of the model is better. The VGG network has a very simple structure and a good effect, so that the VGG network is still widely applied to tasks of image classification, detection, segmentation, super-resolution, image stylization and the like in the field of computer vision.

③ *** lenet: the inclusion series proposed by Google corporation made a significant contribution to the development of deep convolutional neural network structures. The largest contribution of the inclusion-V1 (GoogleLeNet) is that a plurality of convolutions with different sizes are connected in parallel, the width of the network is improved, and the convolution blocks can acquire information of different receptive fields. In addition, the structure fully utilizes the advantages of the 1 x 1 convolution kernel to reduce network parameters, and meanwhile, the utilization efficiency of computing resources is improved. The inclusion-V2 proposes an excellent regularization method, namely Batch Normalization (Batch Normalization), which makes data undergo Batch Normalization once before each convolution, which is now a standard for deep convolutional networks, and well solves the training problem of deep networks. The evolution process of the inclusion structure is shown in fig. 1.

The depth residual error network ResNet proposed by Kaiming He and the like can ensure that the network precision is improved, and simultaneously, the network depth is improved to 152 layers, and then the network depth is further expanded to 1000 layers. Theoretically, the deeper the network, the higher the accuracy should be, but experiments by Kaiming He et al have found that when the depth reaches a certain degree, blind increase of the depth causes degradation of the network. This is mainly due to the problems of gradient explosion and gradient disappearance of the deep network, which makes the model unable to train normally, resulting in poor network performance. Inspired by Highway Network et al, they propose a residual structure, adding a skip connection between the input and the output of the convolution block, so that the input can be passed directly to the output. The residual structure is essentially to learn an identity mapping, and let the non-linear layer portion of the stack learn another mapping, f (x), h (x) -x. In fact, if a network can achieve the desired result by simply manually setting the parameters, the network can be easily trained to converge on the result, so that the added residual structure does not degrade the overall performance of the network. ResNet reduces the training difficulty of the deep network through a residual error module, well solves the degradation problem, exerts the deep potential of the convolutional network to the utmost extent, and finally the performance of the ImageNet classification task exceeds the human level for the first time.

The voter model: the Voting mechanism (Voting) is a combined strategy for classification problem in ensemble learning. The basic idea is to select the class that outputs the most among all machine learning algorithms. The output of the machine learning classification algorithm is of two types: one is to directly output class labels, and the other is to output class probabilities, and the former is used for voting and is called Hard voting (Majority/Hard voting), and the latter is used for classifying and is called Soft voting (Soft voting). Hard voting is the selection algorithm outputting the most tags, and if the number of tags is equal, the selection is done in ascending order. In the soft voting, class probabilities output by the respective algorithms are used to select classes, and when weights are input, a weighted average of the class probabilities for each class is obtained, and a class with a large value is selected. The present embodiment employs a hard voting mechanism.

2. Introduction to points of innovation

AlexNet, GooLeNet, VGG-16, ResNet-50 were used as the primary models. The Learning _ rate was set to 0.001 and each model was trained for 30 rounds using the optimizer and loss function described above. After training is finished, each model with the highest accuracy is selected, and then corresponding model parameters are stored. Then freezing the convolution layer of all the primary learners, so that the data can only be transmitted forwards and can not be transmitted backwards after entering the primary learners.

Model fusion belongs to a main strategy in model integration and is a layered model integration framework. Taking two layers as an example, the first layer is composed of a plurality of base learners (i.e., the first-level learner in this embodiment), the input of the first-level learner is a theoretical original training set (data set after data enhancement in this embodiment), and the model of the second layer is retrained by taking the output of the first-level base learner as the training set, so as to obtain a complete fusion model. As shown in Table 1, the processes 1-3 are trained to form a first-level model, i.e., a first-level learner. Process 5-9 is the result of processing the data in the validation set using the trained model, and this predicted result is used as the training set for the secondary learner. The process 11 is to train a secondary learner with the result of the primary model prediction, i.e. the model we trained last.

The design facilitates the extension of the model, namely the hard voting model can be replaced by other secondary learners according to different data sets.

TABLE 1 hard voting model logic diagram

Introduction of other components in the model:

an optimizer: momentum, SGD was used. The principle can be understood as that difficulties are encountered in crossing ravines, i.e. the curve of the surface in one dimension is much steeper than in the other, which is common near the local optimum. In these cases, the SGD oscillates on the slope of the valley, but slowly progresses along the valley bottom toward a local optimum direction, as shown in fig. 2. Momentum is one method that helps to accelerate the SGD and suppress oscillations in the relevant direction, as shown in fig. 2 b. It does this by adding to the current update vector the fraction γ of the update vector at the past time step, γ typically set to 0.9 or similar.

In essence, when momentum is used, it is again understood to push the ball down the hill. The ball accumulates momentum as it descends the hill, becoming faster and faster en route (if there is air resistance, i.e., γ < 1, until it reaches its limit speed). The same approach is used on parameter updates: the momentum term increases for the dimension where the gradient points in the same direction, and decreases the update for the dimension where the gradient changes direction. In this way faster convergence speed and less oscillation are obtained.

Loss function:

the logistic loss can be calculated by equation (2):

loss＝-Labels*log(sigma(X))-(1-Labels)*log(1-sigma(X)) (2)

it is known that:

substituted into formula (4):

loss＝X-X*Labels+log(1+exp(-X)) (4)

for calculation stability, to prevent exp (-X) overflow, when X < 0, loss will be calculated using equation (5):

loss＝max(X,0)-X*Labels+log(1+exp(-|X|)) (5)

secondly, processing experimental data:

1. the prior technical proposal introduces:

the existing common data enhancement modes include: flipping (horizontal + vertical), noise, random rotation, random flipping, random changing brightness, random changing contrast, random changing saturation, clipping, scaling/stretching, blurring. These methods can improve the quality of the data set to a certain extent so that the features in the pictures of the data set can be more easily learned by a machine, but how to combine the features to achieve the best efficiency is a practical problem. The research can be used for mining the data enhancement mode which can most effectively improve the quality of the data set by combining 12 data enhancement modes, and the data enhancement mode is used as an excellent foundation of subsequent research and has important significance on the early stage of deep learning model training.

2. Introduction to points of innovation

This example investigated 12 data enhancement modes and by combining them, then led to the evaluation of the most effective data enhancement mode with the results, and then served as the basis for the subsequent study as the most effective way of data enhancement on this open data set (ichellenge-PM). Analyzed from the results of training on 12 data sets based on the VGG-16 model, wherein

PALM-Training1600-overturn-dim-imgaug2；

PALM-Training3200-overturn-noise-color-crop-deform-dim；

PALM-Training1600-overturn-crop-deform；

The recognition accuracy of the 4 data sets of PALM-Training800-color (named by public data set abbreviation (PALM) -Training set identification and quantity (Training1600) -combined data enhancement mode) is over 95%, in this embodiment, 95% is used as a threshold, and the quality of the above 4 data sets will enable the model to effectively improve the recognition capability, which can be referred to table 2.

TABLE 2

Experimental procedures and results:

A. introduction to Experimental Environment

Hardware environment: CPU 4 core, RAM 32GB, GPU v100, video memory 16GB, magnetic disk 100GB

And (3) environment configuration: python version Python3.7, framework version PaddlePaddle 1.8.0

B. Data set selection

The training set of the ichalenge-PM challenge data set contained 400 jpg images and the validation set contained 400 jpg images, and no test set was provided. Therefore, combining the training set and the verification set in the original data set and dividing the combined training set and the verification set into a new training set, a new verification set and a new test set in a 7:2:1 manner; the new training set is used as the original training set, and the 12 data sets after data enhancement are combined to jointly form the training set of the experiment.

C. Evaluation index

The main reference index is the accuracy of model prediction, and the indexes in the aspects of recall rate, sensitivity and the like are not considered for the moment. Because of the medical image processing aspect, the recognition accuracy is the most important.

D. First-order model training procedure and results

Firstly, taking VGG-16 as a data set filter, training is carried out on all data sets, as shown in FIG. 26, wherein the horizontal axis is the training step number, the vertical axis is the verification set accuracy rate (train/acc) in the training process, as shown in FIG. 27, the horizontal axis is the training step number, and the vertical axis is the verification set loss rate (train/loss) in the training process; FIG. 28 is another version of FIG. 26, wherein the horizontal axis is the training time history and the vertical axis is the verification set accuracy (train/acc) during the training process, and FIG. 29 is the numerical index history of VGG-16 trained on 13 data sets, each data set is trained for 30 epochs, and each epoch traverses all data in the data set one time to form a corresponding trained model on a different data set. The 13 models were used to perform predictions on the test set, respectively, resulting in the results shown in table 3. The accuracy of the reinforced data set as a whole is higher than that of the original data set by analysis in table 3. Wherein PALM-Training 1600-overurn-dim-imgauge 2; PALM-Training 3200-overshoot-noise-color-crop-eform-dim; PALM-Training 1600-everturn-crop-form; the average precision of PALM-Training800-color is over 95 percent.

Therefore, taking the 4 data sets as the alternative data sets, GooleNet, AlexNet, and ResNet-50 are trained on the 4 data sets, each model is trained on each data set for 30 rounds, and then the trained models are saved for prediction on the test set, and the results as shown in Table 4 are obtained. It can be seen from table 4 that there are differences in the expression ability of different models, and the data set selection is different, wherein GooleNet and ResNet-50 both perform the highest on the PALM-Training 3200-overurring-noise-color-crop-form-dim data set, and AlexNet and VGG-16 both score the highest on the PALM-Training 1600-overring-dim-even-plug 2 data set. The 4 models with the highest accuracy are used as primary models.

TABLE 3 training results of VGG-16, AlexNet, GooLeNet, resnet-50 on the above 4 datasets

E. Fusion model training procedure and results

As shown in fig. 3, which is a logic diagram of the fusion model, Alex _ result, Google _ result, ResNet _ result, and Vgg _ result are regarded as the primary model, the result output by the primary model is regarded as the training data set feature of the secondary model, and meanwhile, label is regarded as label of a new data set, so as to establish the training set of the secondary model. And a hard voter model is accessed behind the primary model and used as a classifier to form a secondary model framework, the mode (i.e. majority obeying minority) appearing in prediction result items of Alex _ result, Google _ result, ResNet _ result and Vgg _ result in each sample is counted in the secondary model and used as a final prediction result, the model is stored after 30 rounds of training, verification is carried out on a test set, and the final accuracy of the fusion model is 97.25%.

Specific examples are as follows:

the data set employed in this example is that used for pathological myopia in the ichellege-PM challenge data set, which provides 800 annotated retinal fundus images. Experimental aim was to classify PM and non-PM (including HM: highly myopic and normal) fundus images, and fig. 4 is an example and explanation of data in a data set, where a is highly myopic fundus and B is pathologic myopic fundus, with large areas of atrophy visible.

The data set acquisition method comprises the following steps:

and acquiring a complete data set from https:// ai, basic, com/broad/interaction, and then importing the data set into a server or a PC (Win, Linux and Mac) for decompression. And acquiring paths and picture names of all picture data in the data set through the script, and storing the paths and the picture names as a research basis for subsequent data enhancement.

Description of the drawings: data set naming rule examples

PALM-dataset name;

Training-Training set 800-800 pictures;

OVERURN-random rotation (keywords)

Data enhancement strategy:

1. randomly turning the picture (including but not limited to 0 degree, 90 degrees, 270 degrees, 360 degrees)

2. Adding random white Gaussian noise to a data set

3. Adding random brightness, saturation, contrast to data

4. Random equal proportion cutting of picture

5. Randomly changing sharpness for pictures

6. Randomly stretching pictures

7. Superimposing random rotation, random white noise and random color variation on the original data set

8. Superimposing a stack crop and stretch operation on an original data set

9. Superimposing random flip, random white noise, random brightness, saturation, contrast, adding random stretch, and random sharpness on the original data set.

10. A random flip and a random change in sharpness are superimposed on the original data set.

11. Superimposing random clipping, random horizontal flipping and Gaussian blur on the original dataset (this operation is based on a third party library: imgauge)

12. All the above methods are superimposed on the original data set.

This forms the basis for subsequent studies, and reference may be made to fig. 4-25.

Data preprocessing:

1. before all the data enter a deep learning network model to be trained, all the data are normalized, specifically, the picture size is scaled to 224 x 224, the picture format is transposed from [ H, W, C ] to [ C, H, W ], and the data range is adjusted to be between [ -1.0,1.0 ].

2. And defining a data reader as a control for controlling the scale of the learning data of the deep learning neural network during training. The method comprises the steps of firstly reading paths and names of all data from a storage catalog based on the data set, then disordering the data, and dividing positive samples and negative samples according to initial letters of picture names. (the file name at the beginning of H represents a high approximation, the file name at the beginning of N represents normal vision. samples of high myopia and normal vision, neither are pathological, belong to negative samples, and are labeled 0, and the file name at the beginning of P is pathological myopia, belong to positive samples, and are labeled 1), and storing each picture and the corresponding label in a temporary memory. And setting the batch _ size, and when the number of samples in the temporary storage reaches the batch _ size, pausing the storage and putting the samples into a depth learning model for training. The data reader in the training set, the data reader in the verification set and the data reader in the test set are similar to the structure of the data reader, and the specific details can be adjusted according to the corresponding service scene.

Building a model:

1. all the first-stage models use the existing classical deep convolutional neural network model, which is respectively as follows:

AlexNet: contains 5 convolutional layers, 3 pooling layers and 3 fully-connected layers, and 2 dropout layers and is set to 0.5.

VGG-16 Standard Structure: contains 5 vgg _ blocks, the number of convolutional layers and the number of output channels in each block being specified by conv _ arch.

Google lenet standard structure: comprises a convolution layer, a pooling layer and an inclusion module. (parameters of convolution, pooling layers differ)

ResNet-50 Standard Structure: the multilayer film comprises a convolution layer, a pooling layer, a BatchNorm layer and a residual fast layer. (convolution, pooling layer parameters vary, ResNet50 contains multiple modules, 2 nd to 5 th modules containing 3, 4, 6, 3 residual blocks, respectively)

2. The models are all at a learning rate of 0.001, the loss function adopts a logistic loss function, the optimizer adopts an SSD optimizer, the batch _ size is set to be 10, each epoch is 30 batches, each model is trained by 30 epochs and is saved as a primary learner.

3. Freezing the convolution layer, the full connection layer, the BN layer and other layers of all the models, and connecting the output layer into a hard voting model to form a fusion model, namely the deep neural network model formed finally by the invention.

4. When the result obtained by each primary learner is judged by the fusion model after the hard voter model is accessed, voting judgment is carried out on the identification result according to the thinking of 'minority obeying majority', and then final judgment is obtained. The final judgment on the data set is pathological myopia or non-pathological myopia.

In conclusion, the invention provides a pathological myopia identification method based on data enhancement and model fusion, which is characterized in that:

(1) the invention designs 12 data enhancement modes, obtains the optimal data enhancement modes corresponding to different models through result verification, and greatly improves the training effect on the original data set based on the enhanced data set obtained by the optimal mode training. It is well known that the ceiling of artificial intelligence, i.e. the manual work behind the artificial intelligence, i.e. the quality of the data set, has a great influence on the results obtained after model learning. Therefore, the importance of the quality of the data set to the follow-up research of artificial intelligence can be known, and the invention focuses on the importance of the data set, so that the most effective data enhancement mode aiming at the data set is obtained through a combined experiment mode. In addition, the data enhancement mode has universality, namely the method mode can be still used for finding the optimal data enhancement mode in the data enhancement work of other data sets;

(2) the method further screens the recognition result of the primary model in a model fusion mode so as to avoid the recognition result from excessively 'believing' a certain model. The method has a generalization effect, and the recognition result can be more objective by a simple principle of 'few obeys most'.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A pathological myopia identification method based on data enhancement and model fusion is characterized by comprising the following steps:

step 1, acquiring a fundus image to be identified;

the training method of the deep learning model comprises the following steps:

2. The method of claim 1, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: in the step 2, all the primary models are at a learning rate of 0.001, the loss function adopts a logistic loss function, the optimizer adopts an SSD optimizer, the batch _ size is set to be 10, each epoch is 30 batches, and each model is trained by 30 epochs and stored as the primary model;

3. The method of claim 3, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: in the step b, data enhancement is performed on the data set, and the method specifically comprises the following steps:

firstly, randomly turning over a picture;

adding random white Gaussian noise to the data set;

adding random brightness, saturation and contrast to the data;

fourthly, randomly cutting the pictures in equal proportion;

fifthly, randomly changing the definition of the picture;

sixthly, randomly stretching the picture;

superposing random cutting, random horizontal turning and Gaussian blur on an original data set;

all the above methods are superimposed on the original data set.

4. The method of claim 1, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: in the step b, the data set after data enhancement is also preprocessed, including firstly normalizing the data and then defining a data reader.

5. The method of claim 1, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: the concrete content of the step c is as follows: