CN112580580A - Pathological myopia identification method based on data enhancement and model fusion - Google Patents

Pathological myopia identification method based on data enhancement and model fusion Download PDF

Info

Publication number
CN112580580A
CN112580580A CN202011578831.9A CN202011578831A CN112580580A CN 112580580 A CN112580580 A CN 112580580A CN 202011578831 A CN202011578831 A CN 202011578831A CN 112580580 A CN112580580 A CN 112580580A
Authority
CN
China
Prior art keywords
model
data
data set
random
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011578831.9A
Other languages
Chinese (zh)
Inventor
张晓云
崔建峰
黄建玉
吴万庆
熊飞兵
韦程琳
蒋明哲
陈浩
徐飞翔
刘琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University of Technology
Original Assignee
Xiamen University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University of Technology filed Critical Xiamen University of Technology
Priority to CN202011578831.9A priority Critical patent/CN112580580A/en
Publication of CN112580580A publication Critical patent/CN112580580A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pathological myopia identification method based on data enhancement and model fusion, which comprises the steps of firstly carrying out algorithm operation on fundus images to be identified so as to improve the data quality, then sending the fundus images into a deep learning model for identification, and outputting identification results; the deep learning model comprises a primary model and a secondary model, and the output of the primary model is connected into a hard voter model to be used as a classifier to form a secondary model frame; taking the output of the primary model as the input of the secondary model; and counting the mode appearing in the prediction result item in each sample in the secondary model to serve as a final recognition result. The method automatically identifies the fundus images based on various classical convolutional neural networks, and simultaneously establishes the model by adopting a model fusion strategy and various data enhancement modes, so that the incomplete condition of model training under a small sample is solved, overfitting is avoided, the generalization effect of the model is improved, and the expression capability and the generalization capability of the model are improved to a certain extent.

Description

Pathological myopia identification method based on data enhancement and model fusion
Technical Field
The invention belongs to the technical field of image processing, relates to application of an artificial intelligence technology in the field of medical imaging, and particularly relates to a pathological myopia identification method based on data enhancement and model fusion.
Background
Pathological Myopia (PM) means more than-8.00D diopters. Myopia continues to increase in intensity, often accompanied by changes in the posterior pole of the eye, including thinning of the sclera, thinning of choroid atrophy and increase of the ocular axis, and may be accompanied by ocular diseases with various complications such as amblyopia, glaucoma, cataract, vitreous opacity, retinal detachment, and the like.
Complications associated with pathological myopia are one of the leading causes of visual impairment and blindness in people today, a condition that is particularly acute in east asia. Pathological myopia can cause a variety of pathologies in the macula, the periphery of the retina, and the optic nerve, resulting in visual impairment. Malformations of the eyeball structure, including posterior scleral staphyloma, accelerate the progression of these diseases. Epidemiological investigations of pathological myopia are numerous, although they may differ in their definition of pathological myopia. Investigation in asian populations has shown that the incidence of pathological myopia is 0.9-3.1%; investigation of the australian population showed that the incidence of pathological myopia was 1.2%. In addition, studies have shown that pathological myopia has become the leading cause of blindness and poor vision in 7% of european populations. The incidence of high myopia and pathological myopia has also increased year by year due to environmental factors and lifestyle changes. Thus, the problem of vision impairment due to pathological myopia-related complications is likely to become more severe in the coming decades.
The number of ophthalmologists in China is far from the difference of developed countries, but the domestic medical and patient demands are huge and the number of the ophthalmologists and patients is gradually increased along with the development of social aging in recent years. The main contradiction in the current medical field is the contradiction between the increasingly demanding medical requirements of people and the medical resources which are in tension on the day. On one hand, the problems of difficult and expensive medical observation still exist, and are mainly caused by the serious scarcity of the prior excellent doctors and the long culture time of the doctors; on the other hand, health problems of people gradually worsen, and the health is highly valued, and the demand for medical services increases. The method is a social problem, but the artificial intelligence technology can bring a chance to the medical industry.
In the past 20 years, with the development of imaging technologies such as optical coherence tomography, frequency domain OCT and three-dimensional magnetic resonance imaging, we have had deeper insight into complications associated with high myopia. For example, using OCT, we can assess the optic nerve, macula, and recent lesions, such as myopic stretch maculopathy and dome maculopathy, etc., at high resolution. The advent of new therapeutic approaches, including anti-neovascular drug therapy and vitrectomy, has also led to improved prognosis of some of the highly myopic complications.
In recent years, with the development of deep learning, the practice and application of deep learning in different fields have been gradually developed, and the medical field is no exception, and the application of machine learning-based artificial intelligence technology in ophthalmology is one of the fields. Many eye diseases are diagnosed largely by the result of auxiliary ophthalmology examination, and most of the auxiliary ophthalmology examination is mainly based on imaging examination. The eye images are fine and complex, the information amount is large, the diagnosis result is often limited by the knowledge level and the clinical experience of doctors, the subjectivity is strong, and the time and the labor are consumed. By applying the machine learning artificial intelligence technology combined with a computer in ophthalmology, the diagnosis efficiency of ophthalmic diseases in clinical work can be greatly improved, and the burden of ophthalmologists is reduced.
The Convolutional Neural Network (CNN) which is driven by data and automatically extracts relevant features has better effect than the traditional method in the aspect of image recognition, such as fcn (fuzzy relational networks) and the like, so that the deep learning related technology is a natural choice in medical image processing, the research of optic disc recognition by using the convolutional neural network is more and more, and the recognition result superior to the recognition method is obtained.
Although the fundus image recognition method based on the convolutional neural network is superior in effect to the conventional method to some extent, there are still some problems such as: the medical image data volume is large but the samples are few, but the most important dependence in the model training process is the data set, and the quality of the data set determines the quality of the model, so that the training effect of the model can be effectively improved by preprocessing the data. For another example, with the continuous development of deep learning in the field of image recognition, excellent neural network models are continuously proposed, each model has unique advantages, but the excellent performance in processing various data cannot be guaranteed, and therefore the expression capability of the model may be short, namely the model is fatigue in recognition of some rare pathological pictures. Aiming at the two existing problems, a plurality of researchers in the industry propose an optimized neural network model from different angles and obtain effective results. However, many uncontrollable factors still exist in the method for improving the existing model, for example, when a researcher optimizes the model (for example, widening and deepening the model), the researcher cannot directly judge the effectiveness of the model, and only can observe whether the performance of the original model is effectively improved by the optimized operation through the training result. Such methods are computationally and temporally demanding, require relatively long development cycles, are often not readily solvable, and their true applicability remains questionable.
Disclosure of Invention
The invention aims to provide a pathological myopia recognition method based on data enhancement and model fusion, which is characterized in that fundus images are automatically recognized based on various classical convolutional neural networks, and a new deep convolutional neural network model is established by adopting a model fusion strategy and various data enhancement modes, so that the conditions of model training incompleteness, overfitting and the like in a small sample are optimized, the model generalization effect, the expression capability and the accuracy of a model recognition picture can be improved to a certain extent by the mode, and a new thought is provided for the recognition and prognosis research of the pathological myopia.
In order to achieve the above purpose, the solution of the invention is:
a pathological myopia identification method based on data enhancement and model fusion comprises the following steps:
step 1, acquiring a fundus image to be identified;
step 2, the fundus image to be identified is sent into a deep learning model for identification, and an identification result is output;
the training method of the deep learning model comprises the following steps:
step a, determining a data set, and enabling the data set to be as follows: 2:1, dividing a training set, a verification set and a test set in proportion;
b, performing data enhancement on the data of the training set by adopting a plurality of data enhancement modes to obtain a corresponding enhanced data set;
step c, respectively sending the data set enhanced in the step b into AlexNet, GooLeNet, VGG-16 and ResNet-50 for training, measuring the learning quality of various data enhancement modes on the network according to the verification accuracy of each network on a verification set, storing the network model with the highest accuracy corresponding to each network on the verification data set, recording the corresponding data enhancement strategy, and taking the AlexNet, GooLeNet, VGG-16 and ResNet-50 at the moment as a primary model of the deep learning model;
and d, recording all recognition results of the primary model on the verification set, taking the recognition results as input data sets of a secondary model, namely the hard voting model, judging the recognition results of the primary model again in a mode of mode selection, and obtaining a final recognition result.
In the step 2, all the primary models are at a learning rate of 0.001, the loss function adopts a logistic loss function, the optimizer adopts an SSD optimizer, the batch _ size is set to be 10, each epoch is 30 batches, and each model is trained by 30 epochs and stored as the primary model;
freezing the convolution layer, the full connection layer, the BN layer and other layers of all the models, and connecting the output layer into a hard voting model to further construct a deep learning model after result fusion.
In the step b, data enhancement is performed on the data set, and the method specifically includes the following steps:
firstly, randomly turning over a picture;
adding random white Gaussian noise to the data set;
adding random brightness, saturation and contrast to the data;
fourthly, randomly cutting the pictures in equal proportion;
fifthly, randomly changing the definition of the picture;
sixthly, randomly stretching the picture;
seventhly, superposing random rotation, random white noise and random color change on the original data set;
stacking, cutting and stretching operations are superposed on the original data set;
ninthly, overlaying random inversion, random white noise, random brightness, saturation, contrast, adding random stretching and random definition on the original data set;
the positive (r) superimposes random flipping and random changing sharpness on the original data set;
Figure BDA0002864234270000041
in situSuperposing random cutting, random horizontal turning and Gaussian blur on the initial data set;
Figure BDA0002864234270000042
all the above methods are superimposed on the original data set.
In the step b, the data set after data enhancement is also preprocessed, including firstly normalizing the data and then defining a data reader.
The concrete content of the step c is as follows:
step c1, training on all data sets by taking VGG-16 as a data set filter, wherein each data set is trained for 30 epochs, each epoch traverses all data in the training data set once, and after each epoch is finished, verification is carried out to obtain a verification result; completing the epoch training for 30 times in the form, forming correspondingly trained models on different data sets, selecting the data set with the first four verification accuracy rates as an alternative data set, and taking the model corresponding to the data set with the highest verification accuracy rate as the optimal model of the VGG-16;
step c2, training on 4 alternative data sets respectively by using AlexNet, GooleNet and ResNet-50 in sequence, and screening out the optimal data-enhanced models respectively;
and step c3, combining the optimal model of VGG-16 obtained in the step a and the optimal model of AlexNet, GooleNet and ResNet-50 obtained in the step b into a primary model.
After the scheme is adopted, the invention has the following improvements:
(1) 12 data enhancement modes are designed based on the iChallenge-PM public data set, almost all operations commonly used for data enhancement at present are covered, and the quality of the data set can be effectively improved in the aspect of data preprocessing.
(2) The same optimizer is set based on AlexNeT, VGG-16, GooLeNet and ResNet-50 models, and parameters such as loss functions and learning rates aim to observe the effect of different enhanced data sets on the models in learning features by using a control variable mode. On the basis, the model with the highest accuracy is obtained by training on 12 data sets and serves as a primary model.
(3) And taking the prediction result of the model as input data of the secondary model, completely accessing the output layer of the primary model into the secondary model, namely the hard voter model, completing construction of the fusion model, and training again to form the final fused model.
(4) Through the model after the operation training, under the condition that transfer learning is not used, the accuracy can also achieve a very high effect, and the scale of the data set is effectively expanded by using a data enhancement mode, so that overfitting caused by training is effectively weakened, and the accuracy and generalization effect of the model in recognizing the picture are effectively improved.
Drawings
FIG. 1 is a schematic diagram of the evolution process of the inclusion structure;
FIG. 2 is a schematic diagram of an optimizer;
FIG. 3 is a fusion model logic diagram;
FIG. 4 is a schematic view of fundus images of high and pathological myopia;
FIG. 5 is a raw data set picture;
FIG. 6 is a left-right flipped picture of an original data set;
FIG. 7 is a 90 degree flip picture of an original data set;
FIGS. 8 and 9 are the pictures before and after Gaussian white noise is added;
fig. 10 and 11 are pictures before and after brightness adjustment;
fig. 12 and 13 are pictures before and after the saturation adjustment;
FIGS. 14 and 15 are front and rear images cut in equal proportion;
FIGS. 16 and 17 are pictures before and after changing the sharpness;
FIGS. 18 and 19 are pictures before and after changing Gaussian white noise and randomly flipping;
FIGS. 20 and 21 are views of the front and rear of the rotation, cutting and stretching;
FIGS. 22, 23 are pictures of before and after mixing all methods;
FIGS. 24, 25 are front and back views of the use of a third party library;
FIG. 26 is a graph of experimental results (acc);
FIG. 27 is a graph of the results of the experiment (loss);
FIG. 28 is a graph of experimental results (loss distribution of each model);
fig. 29 shows model-corresponding picture parameters.
Detailed Description
The technical solutions and advantages of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Firstly, in the aspect of an experimental framework:
1. the prior technical proposal introduces:
first, AlexNet: the AlexNet network proposed by Kizhevsky et al, which adopts 5 convolutional layers and 3 full-link layers for the first time to realize the classification of 1000 types of pictures, has become a mountain-opening task that makes a significant breakthrough in the field of image classification in deep learning. AlexNet employs a series of methods to improve deep convolutional networks over traditional convolutional neural networks. For example, a ReLU nonlinear activation function is used for accelerating the training of the network, a multi-GPU convolution operation is realized to solve the limitation of insufficient video card resources at the moment, and a Dropout random inactivation strategy is introduced to reduce the overfitting condition of the fully-connected layer. In addition, AlexNet also proposes strategies such as local response normalization, overlapping pooling, data enhancement and the like to improve the classification capability and generalization capability of the model.
VGG: simony et al proposed VGG networks that replaced the 5 x 5 and 7 x 7 filters with 3 x 3 filters. The receptive field of a plurality of serially connected small convolutional layers may be the same size as the receptive field of one large convolutional layer. For example, the field of 23 x 3 convolutional layers in series is the same as the field of 15 x 5 convolutional layer, and their convolution effects are comparable. But a plurality of small convolution layers connected in series have fewer parameters and more nonlinear transformation, and have stronger learning capability and better effect on characteristics. In addition, the VGG also increases the network structure to 16 or 19 layers, and as the number of layers of the network increases, the feature representation capability of the network is stronger, and the classification effect of the model is better. The VGG network has a very simple structure and a good effect, so that the VGG network is still widely applied to tasks of image classification, detection, segmentation, super-resolution, image stylization and the like in the field of computer vision.
③ *** lenet: the inclusion series proposed by Google corporation made a significant contribution to the development of deep convolutional neural network structures. The largest contribution of the inclusion-V1 (GoogleLeNet) is that a plurality of convolutions with different sizes are connected in parallel, the width of the network is improved, and the convolution blocks can acquire information of different receptive fields. In addition, the structure fully utilizes the advantages of the 1 x 1 convolution kernel to reduce network parameters, and meanwhile, the utilization efficiency of computing resources is improved. The inclusion-V2 proposes an excellent regularization method, namely Batch Normalization (Batch Normalization), which makes data undergo Batch Normalization once before each convolution, which is now a standard for deep convolutional networks, and well solves the training problem of deep networks. The evolution process of the inclusion structure is shown in fig. 1.
The depth residual error network ResNet proposed by Kaiming He and the like can ensure that the network precision is improved, and simultaneously, the network depth is improved to 152 layers, and then the network depth is further expanded to 1000 layers. Theoretically, the deeper the network, the higher the accuracy should be, but experiments by Kaiming He et al have found that when the depth reaches a certain degree, blind increase of the depth causes degradation of the network. This is mainly due to the problems of gradient explosion and gradient disappearance of the deep network, which makes the model unable to train normally, resulting in poor network performance. Inspired by Highway Network et al, they propose a residual structure, adding a skip connection between the input and the output of the convolution block, so that the input can be passed directly to the output. The residual structure is essentially to learn an identity mapping, and let the non-linear layer portion of the stack learn another mapping, f (x), h (x) -x. In fact, if a network can achieve the desired result by simply manually setting the parameters, the network can be easily trained to converge on the result, so that the added residual structure does not degrade the overall performance of the network. ResNet reduces the training difficulty of the deep network through a residual error module, well solves the degradation problem, exerts the deep potential of the convolutional network to the utmost extent, and finally the performance of the ImageNet classification task exceeds the human level for the first time.
The voter model: the Voting mechanism (Voting) is a combined strategy for classification problem in ensemble learning. The basic idea is to select the class that outputs the most among all machine learning algorithms. The output of the machine learning classification algorithm is of two types: one is to directly output class labels, and the other is to output class probabilities, and the former is used for voting and is called Hard voting (Majority/Hard voting), and the latter is used for classifying and is called Soft voting (Soft voting). Hard voting is the selection algorithm outputting the most tags, and if the number of tags is equal, the selection is done in ascending order. In the soft voting, class probabilities output by the respective algorithms are used to select classes, and when weights are input, a weighted average of the class probabilities for each class is obtained, and a class with a large value is selected. The present embodiment employs a hard voting mechanism.
2. Introduction to points of innovation
AlexNet, GooLeNet, VGG-16, ResNet-50 were used as the primary models. The Learning _ rate was set to 0.001 and each model was trained for 30 rounds using the optimizer and loss function described above. After training is finished, each model with the highest accuracy is selected, and then corresponding model parameters are stored. Then freezing the convolution layer of all the primary learners, so that the data can only be transmitted forwards and can not be transmitted backwards after entering the primary learners.
Model fusion belongs to a main strategy in model integration and is a layered model integration framework. Taking two layers as an example, the first layer is composed of a plurality of base learners (i.e., the first-level learner in this embodiment), the input of the first-level learner is a theoretical original training set (data set after data enhancement in this embodiment), and the model of the second layer is retrained by taking the output of the first-level base learner as the training set, so as to obtain a complete fusion model. As shown in Table 1, the processes 1-3 are trained to form a first-level model, i.e., a first-level learner. Process 5-9 is the result of processing the data in the validation set using the trained model, and this predicted result is used as the training set for the secondary learner. The process 11 is to train a secondary learner with the result of the primary model prediction, i.e. the model we trained last.
The design facilitates the extension of the model, namely the hard voting model can be replaced by other secondary learners according to different data sets.
TABLE 1 hard voting model logic diagram
Figure BDA0002864234270000081
Introduction of other components in the model:
an optimizer: momentum, SGD was used. The principle can be understood as that difficulties are encountered in crossing ravines, i.e. the curve of the surface in one dimension is much steeper than in the other, which is common near the local optimum. In these cases, the SGD oscillates on the slope of the valley, but slowly progresses along the valley bottom toward a local optimum direction, as shown in fig. 2. Momentum is one method that helps to accelerate the SGD and suppress oscillations in the relevant direction, as shown in fig. 2 b. It does this by adding to the current update vector the fraction γ of the update vector at the past time step, γ typically set to 0.9 or similar.
Figure BDA0002864234270000092
In essence, when momentum is used, it is again understood to push the ball down the hill. The ball accumulates momentum as it descends the hill, becoming faster and faster en route (if there is air resistance, i.e., γ < 1, until it reaches its limit speed). The same approach is used on parameter updates: the momentum term increases for the dimension where the gradient points in the same direction, and decreases the update for the dimension where the gradient changes direction. In this way faster convergence speed and less oscillation are obtained.
Loss function:
the logistic loss can be calculated by equation (2):
loss=-Labels*log(sigma(X))-(1-Labels)*log(1-sigma(X)) (2)
it is known that:
Figure BDA0002864234270000091
substituted into formula (4):
loss=X-X*Labels+log(1+exp(-X)) (4)
for calculation stability, to prevent exp (-X) overflow, when X < 0, loss will be calculated using equation (5):
loss=max(X,0)-X*Labels+log(1+exp(-|X|)) (5)
secondly, processing experimental data:
1. the prior technical proposal introduces:
the existing common data enhancement modes include: flipping (horizontal + vertical), noise, random rotation, random flipping, random changing brightness, random changing contrast, random changing saturation, clipping, scaling/stretching, blurring. These methods can improve the quality of the data set to a certain extent so that the features in the pictures of the data set can be more easily learned by a machine, but how to combine the features to achieve the best efficiency is a practical problem. The research can be used for mining the data enhancement mode which can most effectively improve the quality of the data set by combining 12 data enhancement modes, and the data enhancement mode is used as an excellent foundation of subsequent research and has important significance on the early stage of deep learning model training.
2. Introduction to points of innovation
This example investigated 12 data enhancement modes and by combining them, then led to the evaluation of the most effective data enhancement mode with the results, and then served as the basis for the subsequent study as the most effective way of data enhancement on this open data set (ichellenge-PM). Analyzed from the results of training on 12 data sets based on the VGG-16 model, wherein
PALM-Training1600-overturn-dim-imgaug2;
PALM-Training3200-overturn-noise-color-crop-deform-dim;
PALM-Training1600-overturn-crop-deform;
The recognition accuracy of the 4 data sets of PALM-Training800-color (named by public data set abbreviation (PALM) -Training set identification and quantity (Training1600) -combined data enhancement mode) is over 95%, in this embodiment, 95% is used as a threshold, and the quality of the above 4 data sets will enable the model to effectively improve the recognition capability, which can be referred to table 2.
TABLE 2
Figure BDA0002864234270000101
Experimental procedures and results:
A. introduction to Experimental Environment
Hardware environment: CPU 4 core, RAM 32GB, GPU v100, video memory 16GB, magnetic disk 100GB
And (3) environment configuration: python version Python3.7, framework version PaddlePaddle 1.8.0
B. Data set selection
The training set of the ichalenge-PM challenge data set contained 400 jpg images and the validation set contained 400 jpg images, and no test set was provided. Therefore, combining the training set and the verification set in the original data set and dividing the combined training set and the verification set into a new training set, a new verification set and a new test set in a 7:2:1 manner; the new training set is used as the original training set, and the 12 data sets after data enhancement are combined to jointly form the training set of the experiment.
C. Evaluation index
The main reference index is the accuracy of model prediction, and the indexes in the aspects of recall rate, sensitivity and the like are not considered for the moment. Because of the medical image processing aspect, the recognition accuracy is the most important.
D. First-order model training procedure and results
Firstly, taking VGG-16 as a data set filter, training is carried out on all data sets, as shown in FIG. 26, wherein the horizontal axis is the training step number, the vertical axis is the verification set accuracy rate (train/acc) in the training process, as shown in FIG. 27, the horizontal axis is the training step number, and the vertical axis is the verification set loss rate (train/loss) in the training process; FIG. 28 is another version of FIG. 26, wherein the horizontal axis is the training time history and the vertical axis is the verification set accuracy (train/acc) during the training process, and FIG. 29 is the numerical index history of VGG-16 trained on 13 data sets, each data set is trained for 30 epochs, and each epoch traverses all data in the data set one time to form a corresponding trained model on a different data set. The 13 models were used to perform predictions on the test set, respectively, resulting in the results shown in table 3. The accuracy of the reinforced data set as a whole is higher than that of the original data set by analysis in table 3. Wherein PALM-Training 1600-overurn-dim-imgauge 2; PALM-Training 3200-overshoot-noise-color-crop-eform-dim; PALM-Training 1600-everturn-crop-form; the average precision of PALM-Training800-color is over 95 percent.
Therefore, taking the 4 data sets as the alternative data sets, GooleNet, AlexNet, and ResNet-50 are trained on the 4 data sets, each model is trained on each data set for 30 rounds, and then the trained models are saved for prediction on the test set, and the results as shown in Table 4 are obtained. It can be seen from table 4 that there are differences in the expression ability of different models, and the data set selection is different, wherein GooleNet and ResNet-50 both perform the highest on the PALM-Training 3200-overurring-noise-color-crop-form-dim data set, and AlexNet and VGG-16 both score the highest on the PALM-Training 1600-overring-dim-even-plug 2 data set. The 4 models with the highest accuracy are used as primary models.
TABLE 3 training results of VGG-16, AlexNet, GooLeNet, resnet-50 on the above 4 datasets
Figure BDA0002864234270000121
E. Fusion model training procedure and results
As shown in fig. 3, which is a logic diagram of the fusion model, Alex _ result, Google _ result, ResNet _ result, and Vgg _ result are regarded as the primary model, the result output by the primary model is regarded as the training data set feature of the secondary model, and meanwhile, label is regarded as label of a new data set, so as to establish the training set of the secondary model. And a hard voter model is accessed behind the primary model and used as a classifier to form a secondary model framework, the mode (i.e. majority obeying minority) appearing in prediction result items of Alex _ result, Google _ result, ResNet _ result and Vgg _ result in each sample is counted in the secondary model and used as a final prediction result, the model is stored after 30 rounds of training, verification is carried out on a test set, and the final accuracy of the fusion model is 97.25%.
Specific examples are as follows:
the data set employed in this example is that used for pathological myopia in the ichellege-PM challenge data set, which provides 800 annotated retinal fundus images. Experimental aim was to classify PM and non-PM (including HM: highly myopic and normal) fundus images, and fig. 4 is an example and explanation of data in a data set, where a is highly myopic fundus and B is pathologic myopic fundus, with large areas of atrophy visible.
The data set acquisition method comprises the following steps:
and acquiring a complete data set from https:// ai, basic, com/broad/interaction, and then importing the data set into a server or a PC (Win, Linux and Mac) for decompression. And acquiring paths and picture names of all picture data in the data set through the script, and storing the paths and the picture names as a research basis for subsequent data enhancement.
Description of the drawings: data set naming rule examples
PALM-dataset name;
Training-Training set 800-800 pictures;
OVERURN-random rotation (keywords)
Data enhancement strategy:
1. randomly turning the picture (including but not limited to 0 degree, 90 degrees, 270 degrees, 360 degrees)
2. Adding random white Gaussian noise to a data set
3. Adding random brightness, saturation, contrast to data
4. Random equal proportion cutting of picture
5. Randomly changing sharpness for pictures
6. Randomly stretching pictures
7. Superimposing random rotation, random white noise and random color variation on the original data set
8. Superimposing a stack crop and stretch operation on an original data set
9. Superimposing random flip, random white noise, random brightness, saturation, contrast, adding random stretch, and random sharpness on the original data set.
10. A random flip and a random change in sharpness are superimposed on the original data set.
11. Superimposing random clipping, random horizontal flipping and Gaussian blur on the original dataset (this operation is based on a third party library: imgauge)
12. All the above methods are superimposed on the original data set.
This forms the basis for subsequent studies, and reference may be made to fig. 4-25.
Data preprocessing:
1. before all the data enter a deep learning network model to be trained, all the data are normalized, specifically, the picture size is scaled to 224 x 224, the picture format is transposed from [ H, W, C ] to [ C, H, W ], and the data range is adjusted to be between [ -1.0,1.0 ].
2. And defining a data reader as a control for controlling the scale of the learning data of the deep learning neural network during training. The method comprises the steps of firstly reading paths and names of all data from a storage catalog based on the data set, then disordering the data, and dividing positive samples and negative samples according to initial letters of picture names. (the file name at the beginning of H represents a high approximation, the file name at the beginning of N represents normal vision. samples of high myopia and normal vision, neither are pathological, belong to negative samples, and are labeled 0, and the file name at the beginning of P is pathological myopia, belong to positive samples, and are labeled 1), and storing each picture and the corresponding label in a temporary memory. And setting the batch _ size, and when the number of samples in the temporary storage reaches the batch _ size, pausing the storage and putting the samples into a depth learning model for training. The data reader in the training set, the data reader in the verification set and the data reader in the test set are similar to the structure of the data reader, and the specific details can be adjusted according to the corresponding service scene.
Building a model:
1. all the first-stage models use the existing classical deep convolutional neural network model, which is respectively as follows:
AlexNet: contains 5 convolutional layers, 3 pooling layers and 3 fully-connected layers, and 2 dropout layers and is set to 0.5.
VGG-16 Standard Structure: contains 5 vgg _ blocks, the number of convolutional layers and the number of output channels in each block being specified by conv _ arch.
Google lenet standard structure: comprises a convolution layer, a pooling layer and an inclusion module. (parameters of convolution, pooling layers differ)
ResNet-50 Standard Structure: the multilayer film comprises a convolution layer, a pooling layer, a BatchNorm layer and a residual fast layer. (convolution, pooling layer parameters vary, ResNet50 contains multiple modules, 2 nd to 5 th modules containing 3, 4, 6, 3 residual blocks, respectively)
2. The models are all at a learning rate of 0.001, the loss function adopts a logistic loss function, the optimizer adopts an SSD optimizer, the batch _ size is set to be 10, each epoch is 30 batches, each model is trained by 30 epochs and is saved as a primary learner.
3. Freezing the convolution layer, the full connection layer, the BN layer and other layers of all the models, and connecting the output layer into a hard voting model to form a fusion model, namely the deep neural network model formed finally by the invention.
4. When the result obtained by each primary learner is judged by the fusion model after the hard voter model is accessed, voting judgment is carried out on the identification result according to the thinking of 'minority obeying majority', and then final judgment is obtained. The final judgment on the data set is pathological myopia or non-pathological myopia.
In conclusion, the invention provides a pathological myopia identification method based on data enhancement and model fusion, which is characterized in that:
(1) the invention designs 12 data enhancement modes, obtains the optimal data enhancement modes corresponding to different models through result verification, and greatly improves the training effect on the original data set based on the enhanced data set obtained by the optimal mode training. It is well known that the ceiling of artificial intelligence, i.e. the manual work behind the artificial intelligence, i.e. the quality of the data set, has a great influence on the results obtained after model learning. Therefore, the importance of the quality of the data set to the follow-up research of artificial intelligence can be known, and the invention focuses on the importance of the data set, so that the most effective data enhancement mode aiming at the data set is obtained through a combined experiment mode. In addition, the data enhancement mode has universality, namely the method mode can be still used for finding the optimal data enhancement mode in the data enhancement work of other data sets;
(2) the method further screens the recognition result of the primary model in a model fusion mode so as to avoid the recognition result from excessively 'believing' a certain model. The method has a generalization effect, and the recognition result can be more objective by a simple principle of 'few obeys most'.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (5)

1. A pathological myopia identification method based on data enhancement and model fusion is characterized by comprising the following steps:
step 1, acquiring a fundus image to be identified;
step 2, the fundus image to be identified is sent into a deep learning model for identification, and an identification result is output;
the training method of the deep learning model comprises the following steps:
step a, determining a data set, and enabling the data set to be as follows: 2:1, dividing a training set, a verification set and a test set in proportion;
b, performing data enhancement on the data of the training set by adopting a plurality of data enhancement modes to obtain a corresponding enhanced data set;
step c, respectively sending the data set enhanced in the step b into AlexNet, GooLeNet, VGG-16 and ResNet-50 for training, measuring the learning quality of various data enhancement modes on the network according to the verification accuracy of each network on a verification set, storing the network model with the highest accuracy corresponding to each network on the verification data set, recording the corresponding data enhancement strategy, and taking the AlexNet, GooLeNet, VGG-16 and ResNet-50 at the moment as a primary model of the deep learning model;
and d, recording all recognition results of the primary model on the verification set, taking the recognition results as input data sets of a secondary model, namely the hard voting model, judging the recognition results of the primary model again in a mode of mode selection, and obtaining a final recognition result.
2. The method of claim 1, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: in the step 2, all the primary models are at a learning rate of 0.001, the loss function adopts a logistic loss function, the optimizer adopts an SSD optimizer, the batch _ size is set to be 10, each epoch is 30 batches, and each model is trained by 30 epochs and stored as the primary model;
freezing the convolution layer, the full connection layer, the BN layer and other layers of all the models, and connecting the output layer into a hard voting model to further construct a deep learning model after result fusion.
3. The method of claim 3, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: in the step b, data enhancement is performed on the data set, and the method specifically comprises the following steps:
firstly, randomly turning over a picture;
adding random white Gaussian noise to the data set;
adding random brightness, saturation and contrast to the data;
fourthly, randomly cutting the pictures in equal proportion;
fifthly, randomly changing the definition of the picture;
sixthly, randomly stretching the picture;
seventhly, superposing random rotation, random white noise and random color change on the original data set;
stacking, cutting and stretching operations are superposed on the original data set;
ninthly, overlaying random inversion, random white noise, random brightness, saturation, contrast, adding random stretching and random definition on the original data set;
the positive (r) superimposes random flipping and random changing sharpness on the original data set;
Figure FDA0002864234260000021
superposing random cutting, random horizontal turning and Gaussian blur on an original data set;
Figure FDA0002864234260000022
all the above methods are superimposed on the original data set.
4. The method of claim 1, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: in the step b, the data set after data enhancement is also preprocessed, including firstly normalizing the data and then defining a data reader.
5. The method of claim 1, wherein the pathological myopia is identified based on data enhancement and model fusion, and the method comprises: the concrete content of the step c is as follows:
step c1, training on all data sets by taking VGG-16 as a data set filter, wherein each data set is trained for 30 epochs, each epoch traverses all data in the training data set once, and after each epoch is finished, verification is carried out to obtain a verification result; completing the epoch training for 30 times in the form, forming correspondingly trained models on different data sets, selecting the data set with the first four verification accuracy rates as an alternative data set, and taking the model corresponding to the data set with the highest verification accuracy rate as the optimal model of the VGG-16;
step c2, training on 4 alternative data sets respectively by using AlexNet, GooleNet and ResNet-50 in sequence, and screening out the optimal data-enhanced models respectively;
and step c3, combining the optimal model of VGG-16 obtained in the step a and the optimal model of AlexNet, GooleNet and ResNet-50 obtained in the step b into a primary model.
CN202011578831.9A 2020-12-28 2020-12-28 Pathological myopia identification method based on data enhancement and model fusion Pending CN112580580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011578831.9A CN112580580A (en) 2020-12-28 2020-12-28 Pathological myopia identification method based on data enhancement and model fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011578831.9A CN112580580A (en) 2020-12-28 2020-12-28 Pathological myopia identification method based on data enhancement and model fusion

Publications (1)

Publication Number Publication Date
CN112580580A true CN112580580A (en) 2021-03-30

Family

ID=75140304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011578831.9A Pending CN112580580A (en) 2020-12-28 2020-12-28 Pathological myopia identification method based on data enhancement and model fusion

Country Status (1)

Country Link
CN (1) CN112580580A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496481A (en) * 2021-05-20 2021-10-12 北京交通大学 Auxiliary detection method for chest X-Ray image with few samples
CN113763336A (en) * 2021-08-24 2021-12-07 北京鹰瞳科技发展股份有限公司 Image multi-task identification method and electronic equipment
CN113837231A (en) * 2021-08-30 2021-12-24 厦门大学 Image description method based on data enhancement of mixed samples and labels
CN115082459A (en) * 2022-08-18 2022-09-20 北京鹰瞳科技发展股份有限公司 Method for training detection model for diopter detection and related product
CN116188294A (en) * 2022-12-22 2023-05-30 东莞理工学院 Data enhancement method, system, intelligent terminal and medium for medical image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650721A (en) * 2016-12-28 2017-05-10 吴晓军 Industrial character identification method based on convolution neural network
CN108416288A (en) * 2018-03-04 2018-08-17 南京理工大学 The first visual angle interactive action recognition methods based on overall situation and partial situation's network integration
WO2019196268A1 (en) * 2018-04-13 2019-10-17 博众精工科技股份有限公司 Diabetic retina image classification method and system based on deep learning
CN111462068A (en) * 2020-03-30 2020-07-28 华南理工大学 Bolt and nut detection method based on transfer learning
CN111476713A (en) * 2020-03-26 2020-07-31 中南大学 Intelligent weather image identification method and system based on multi-depth convolution neural network fusion
CN111696101A (en) * 2020-06-18 2020-09-22 中国农业大学 Light-weight solanaceae disease identification method based on SE-Inception
CN111862066A (en) * 2020-07-28 2020-10-30 平安科技(深圳)有限公司 Brain tumor image segmentation method, device, equipment and medium based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650721A (en) * 2016-12-28 2017-05-10 吴晓军 Industrial character identification method based on convolution neural network
CN108416288A (en) * 2018-03-04 2018-08-17 南京理工大学 The first visual angle interactive action recognition methods based on overall situation and partial situation's network integration
WO2019196268A1 (en) * 2018-04-13 2019-10-17 博众精工科技股份有限公司 Diabetic retina image classification method and system based on deep learning
CN111476713A (en) * 2020-03-26 2020-07-31 中南大学 Intelligent weather image identification method and system based on multi-depth convolution neural network fusion
CN111462068A (en) * 2020-03-30 2020-07-28 华南理工大学 Bolt and nut detection method based on transfer learning
CN111696101A (en) * 2020-06-18 2020-09-22 中国农业大学 Light-weight solanaceae disease identification method based on SE-Inception
CN111862066A (en) * 2020-07-28 2020-10-30 平安科技(深圳)有限公司 Brain tumor image segmentation method, device, equipment and medium based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄惠宁主编: "《陶瓷墙地砖数字喷墨印刷技术与设备应用》", 北京:中国建材工业出版社, pages: 378 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496481A (en) * 2021-05-20 2021-10-12 北京交通大学 Auxiliary detection method for chest X-Ray image with few samples
CN113496481B (en) * 2021-05-20 2023-11-07 北京交通大学 Auxiliary detection method for X-Ray image of breast with few samples
CN113763336A (en) * 2021-08-24 2021-12-07 北京鹰瞳科技发展股份有限公司 Image multi-task identification method and electronic equipment
CN113763336B (en) * 2021-08-24 2024-06-28 北京鹰瞳科技发展股份有限公司 Image multitasking identification method and electronic equipment
CN113837231A (en) * 2021-08-30 2021-12-24 厦门大学 Image description method based on data enhancement of mixed samples and labels
CN113837231B (en) * 2021-08-30 2024-02-27 厦门大学 Image description method based on data enhancement of mixed sample and label
CN115082459A (en) * 2022-08-18 2022-09-20 北京鹰瞳科技发展股份有限公司 Method for training detection model for diopter detection and related product
CN116188294A (en) * 2022-12-22 2023-05-30 东莞理工学院 Data enhancement method, system, intelligent terminal and medium for medical image
CN116188294B (en) * 2022-12-22 2023-09-19 东莞理工学院 Data enhancement method, system, intelligent terminal and medium for medical image

Similar Documents

Publication Publication Date Title
EP3674968B1 (en) Image classification method, server and computer readable storage medium
CN112580580A (en) Pathological myopia identification method based on data enhancement and model fusion
CN108021916B (en) Deep learning diabetic retinopathy sorting technique based on attention mechanism
Chen et al. Automatic feature learning for glaucoma detection based on deep learning
CN112132817B (en) Retina blood vessel segmentation method for fundus image based on mixed attention mechanism
US20210035689A1 (en) Modeling method and apparatus for diagnosing ophthalmic disease based on artificial intelligence, and storage medium
CN109166126A (en) A method of paint crackle is divided on ICGA image based on condition production confrontation network
CN109345538A (en) A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks
Yang et al. Efficacy for differentiating nonglaucomatous versus glaucomatous optic neuropathy using deep learning systems
CN112101424B (en) Method, device and equipment for generating retinopathy identification model
Firke et al. Convolutional neural network for diabetic retinopathy detection
Subramanian et al. Classification of retinal oct images using deep learning
Lyu et al. Deep tessellated retinal image detection using Convolutional Neural Networks
Sharma et al. Harnessing the Strength of ResNet50 to Improve the Ocular Disease Recognition
CN111863241B (en) Fundus imaging classification system based on integrated deep learning
Ali et al. Cataract disease detection used deep convolution neural network
CN115641309A (en) Method and device for identifying age of eye ground color photo of residual error network model and storage medium
CN115423828A (en) Retina blood vessel image segmentation method based on MRNet
Latha et al. Diagnosis of diabetic retinopathy and glaucoma from retinal images using deep convolution neural network
Santos et al. Generating photorealistic images of people's eyes with strabismus using Deep Convolutional Generative Adversarial Networks
Ameri et al. Segmentation of Hard Exudates in Retina Fundus Images Using BCDU-Net
Nguyen et al. Cataract Detection using Hybrid CNN Model on Retinal Fundus Images
Jose Classification of EYE Diseases Using Multi-Model CNN
Ali et al. Classifying Three Stages of Cataract Disease using CNN
Li et al. A Pyramid Spatial Attention Network For Fovea Localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination