CN112766376A

CN112766376A - Multi-label eye fundus image identification method based on GACNN

Info

Publication number: CN112766376A
Application number: CN202110075947.9A
Authority: CN
Inventors: 胡敏; 朱润笋; 黄宏程
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2021-05-07

Abstract

The invention relates to the technical field of image processing, in particular to a multi-label eye fundus image identification method based on GACNN, which comprises the steps of obtaining an original eye fundus image and preprocessing the original eye fundus image; constructing a GACNN model, and training by utilizing the preprocessed labeled original fundus image, wherein the GACNN model comprises a convolutional neural network, a figure attention network and a fusion layer, and the convolutional neural network is used for extracting image characteristics; the graph attention network is used for modeling the relation among the eyeground multi-labels, each label of an eyeground image is regarded as a group of interdependent nodes, and historical data is used for training to obtain a multi-label classifier; the fusion layer fuses the features obtained by the convolutional neural network and the attention network to obtain a final classification result; inputting an original fundus image to be detected into the trained GACNN model, and outputting a recognition result with a label; the invention fully considers the correlation among the labels when identifying the multiple labels in the fundus image and improves the identification accuracy of the fundus image.

Description

Multi-label eye fundus image identification method based on GACNN

Technical Field

The invention relates to the technical field of image processing, in particular to a multi-label eyeground image identification method based on a GACNN (convolutional neural network).

Background

The fundus image is the main basis for ophthalmologists to diagnose fundus diseases, and the fundus image processing has extremely high significance. As the population with high myopia is growing continuously and the high myopia may cause the occurrence of fundus lesions and blindness, great pressure is brought to the screening of ophthalmologists, and the fundus images processed by the computer technology can effectively help the doctors to relieve the pressure. In the study of fundus image recognition, there are generally a method based on conventional image processing and a method based on deep learning. In a classification method based on conventional image processing, artificial design of features and processing of images are required, but in images of myopic fundus diseases, a plurality of diseases often coexist, feature expressions of different diseases are interlaced together, and it is very difficult to artificially design features to identify the diseases. With the great breakthrough of the deep learning technology in the image processing field, more and more researches adopt the deep learning technology to process the fundus image. The error caused by artificial design characteristics can be avoided by utilizing the deep learning technology, and the eye fundus image and the label of the corresponding eye fundus disease are only required to be corresponded, and are made into a data set to be input into a convolutional neural network model for training, and finally, a model can be obtained for diagnosing the myopia eye fundus disease.

When the deep learning technology is used for classifying the eye fundus diseases, the fact that one eye fundus image corresponds to a plurality of eye fundus disease labels needs to be considered, and the problem is that the images are classified in a multi-label mode. The current image multi-label classification method mainly comprises a problem-based conversion method and an algorithm-based conversion algorithm.

Some multi-label classification algorithms can well learn and classify label data sets, but in fundus images, the characteristics of fundus diseases are not obvious, and different diseases have certain relevance, and if macular degeneration occurs, choroidal atrophy and other diseases probably occur. Due to the above factors, the existing multi-label classification algorithm cannot obtain a more accurate classification result.

Disclosure of Invention

In order to improve the identification effect of the fundus image, the invention provides a multi-label fundus image identification method based on GACNN, which specifically comprises the following steps:

acquiring an original fundus image and preprocessing the fundus image;

constructing a GACNN model, and training by utilizing the preprocessed labeled original fundus image;

and inputting the original fundus image to be detected into the trained GACNN model, and outputting the identification result with the label.

Further, the preprocessing of the acquired original fundus image includes:

normalizing the original fundus pictures, and processing all the pictures into pictures with 224 multiplied by 224 pixels;

the image enhancement processing is carried out on the original fundus picture by utilizing histogram equalization, so that the fundus optic disk and blood vessels in the original fundus picture are highlighted.

Further, the GACNN model comprises a convolutional neural network, a graph attention network and a fusion layer, wherein the convolutional neural network is used for extracting image features; the graph attention network is used for modeling the relation among the eyeground multi-labels, each label of an eyeground image is regarded as a group of interdependent nodes, and historical data is used for training to obtain a multi-label classifier; and the fusion layer fuses the features obtained by the convolutional neural network and the attention network to obtain the final classification result.

Further, the convolutional neural network comprises 5 convolutional blocks, the convolutional blocks are connected through a maximum pooling layer, before each convolutional block is maximally pooled to the next convolutional block, feature maps obtained by the convolutional blocks are globally and maximally pooled to obtain a feature vector, and the feature vectors obtained by each convolutional block are spliced to obtain image features.

Further, the graph attention network includes a plurality of graph attention layers, and the process of obtaining the multi-label classifier according to the graph attention network training includes:

expressing the characteristics of the eye ground label into an F-dimensional word embedding vector, and combining the word embedding vectors of all nodes to be used as the input of a graph attention layer;

in the graph attention layer, the characteristics input into the layer are subjected to linear transformation through a sharing matrix with a fixed size, a self-attention mechanism is utilized to act on nodes, and a correlation coefficient between each node is calculated;

normalizing attention coefficients of all adjacent points of the current node by using a SoftMax function to obtain the attention coefficients;

updating the feature vector of the current node according to the obtained attention coefficient, taking the feature vector as the input of the attention layer of the next layer of graph, repeating the input to know that the feature vector passes through all the attention layers of the graph, wherein the dimension of the node feature output by the attention layer of the last layer of graph is the same as the dimension of the feature vector output by the convolutional neural network;

the node features are mapped to a set of interdependent multi-label classifiers.

Further, 3 layers of image attention layers are arranged in the image attention network, the size of a shared matrix in the first layer of image attention layer is F' × F, wherein F is an F-dimensional word embedded vector represented by the characteristics of the fundus label of the input image attention layer; second layer diagram note that the size of the shared matrix in the layer is F "× F'; the third level of the graph notes that the size of the shared matrix in the level is D × F ", where D is the dimension of the eigenvectors output by the convolutional neural network.

Further, F 'in the shared matrix in the first layer drawing attention layer is 1/2D, and F' in the shared matrix in the second layer drawing attention layer is 3/4D.

Further, updating the feature vector of the current node according to the obtained attention coefficient is represented as:

wherein the content of the first and second substances,

representing the updated feature vector of the Nth label node; n represents a total of N labels to be identified;

a new feature vector representing the node i after fusing the adjacent node features of i; σ (-) represents the activation function; alpha is alpha_ijAttention coefficient for node j to point i;

is the feature vector of node j; w is a feature transformation matrix; n is a radical of_iRepresenting a set of nodes adjacent to node i.

Further, attention coefficient α_ijExpressed as:

wherein a (-) denotes the attention mechanism, e_ijRepresenting the correlation coefficient from node j to node i; n is a radical of_iRepresenting a set of nodes adjacent to node i.

Further, the fusion layer fuses the features obtained by the convolutional neural network and the graph attention network and represents that:

y＝f(Gx)；

wherein y represents the respective node prediction scores; f represents a sigmoid function; g represents a matrix formed by the characteristic vectors of all the labels after the attention network processing; x represents a feature vector of the image.

According to the invention, through the GACNN model, the relevance among labels is fully considered when the labels in the fundus image are identified, and the complexity of artificial feature design is avoided by utilizing the convolutional neural network to extract the features, so that the identification accuracy of the fundus image is improved.

Drawings

FIG. 1 is a model diagram of a multi-label fundus image recognition method based on GACNN according to the present invention;

FIG. 2 is a schematic diagram of global max pooling in a convolutional neural network according to the present invention;

FIG. 3 is a schematic diagram of obtaining the attention coefficient according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a multi-label fundus image identification method based on GACNN, which specifically comprises the following steps:

acquiring an original fundus image and preprocessing the fundus image;

In the specific implementation process, the processing is mainly divided into three parts, namely data preprocessing, classification model construction and classification model training, namely:

data preprocessing

The high-quality fundus image has a crucial effect on detecting fundus lesions, but because the fundus image is restricted by conditions such as an imaging environment and an imaging device, the quality of the obtained fundus image is poor, and factors such as background noise may exist, in the embodiment, the data set is preprocessed before being input into a model for training, and the preprocessing steps include operations such as denoising, normalization, image enhancement and the like, wherein the denoising is to remove noise influence caused by irrelevant factors such as the background; normalization processes all images identically into 224 x 224 specification; image enhancement utilizes histogram equalization to enhance regions of interest such as the fundus optic disk, blood vessels, etc.

(II) constructing a classification model

The invention adopts the deep learning technology to classify the myopic fundus diseases in the fundus images, and the model can identify the myopic fundus diseases in the fundus images because the myopic fundus diseases coexist in a plurality of diseases in the high-myopia population. From the analysis of the disease labels of the data set, it can be seen that a co-occurrence relationship exists among different diseases, such as macular hemorrhage, choroid atrophy and the like, which are probably caused in the fundus image with macular degeneration, and retinal detachment and other pathological changes are rarely caused, and the invention provides a multilayer feature fused GACNN classification model as shown in FIG. 1 in consideration of the relationship among the fundus pathological labels.

The GACNN model shown in FIG. 1 is mainly divided into two parts, namely image feature extraction and label relation modeling:

an vgg16 model with multi-layer feature fusion is adopted to extract image features, it can be seen from the figure that the model divides vgg16 into 5 convolution blocks, before each convolution block is maximally pooled to the next convolution block, feature maps obtained by the convolution blocks are subjected to global maximal pooling to obtain a feature vector, a calculation schematic diagram of the feature vector is shown in fig. 2, 5 feature vectors with different scales are obtained after 5 convolution blocks, and the five feature vectors represent fundus image features from superficial profile features to deep semantic features. And finally, fusing the image features by using a concat method to obtain a D-dimensional image feature X.

Introducing a graph attention network (GAT) to model the relation among the eyeground multi-labels, regarding each label of the eyeground image as a group of interdependent nodes, and finally mapping the trained node features fusing the influence of other nodes to a group of interdependent multi-label classifiers, wherein the computation process of the graph attention network is as follows:

features of fundus tags are expressed as an F-dimensional word-embedded vector, and word-embedded vectors of respective nodes are combined as an input to a graph attention network (GAT), as shown below

Wherein N represents the number of nodes.

In order to obtain the expression capability of each node at a higher level, the input features need to be converted into the features at a higher level, and the input feature vectors are linearly transformed by defining a shared matrix W of F' × F, and then a self-attention (self-attention) is applied to the nodes to calculate the correlation coefficients:

this coefficient can be used to represent the importance of node j to node i, where j ∈ N_i，N_iRepresenting the set of nodes adjacent to node i, a (-) representing the attention mechanism, e is calculated using a single-layer feedforward neural network in this embodiment_ijCalculating the correlation coefficient e of all adjacent nodes j of the node i_ijThen, the attention coefficients of all adjacent points of the node i are normalized by using a SoftMax function, and the attention coefficient alpha is obtained_ijThe calculation formula is as follows:

after calculating attention coefficients of all adjacent points of the node i, a new feature vector of the node i can be calculated

Expressed as:

resulting in a set of output vectors, denoted as:

the dimension of the set of vectors obtained is F ', the set of vectors is taken as the input of the next layer of the graph attention network, and the dimension of the linear shared matrix W of the next layer of the graph attention network is F "× F', and the W matrix of the last layer is D × F". And stacking the nodes into a three-layer GAT network, and expanding the characteristic dimension of each node to D dimension (corresponding to the characteristic vector dimension obtained by the convolutional neural network) by using a linear matrix W. Finally, these node features are mapped to a set of interdependent multi-label classifiers W.

Preferably, in this embodiment, F 'in the shared matrix in the first layer drawing attention layer is 1/2D, and F' in the shared matrix in the second layer drawing attention layer is 3/4D.

It can be seen that the classifier W is a matrix of size C × D, where C represents the number of labels to be classified, and the prediction score of each label is expressed as:

y＝f(Gx)；

when the classification result is calculated by the method, the image characteristics such as contour, semantic and the like are considered, and the correlation relation of each label is combined, so that the method can be better suitable for fundus image classification.

(III) Classification model training

When a training model is utilized, a pytorch deep learning framework is selected to realize the model, in order to enable the model to be trained more sufficiently, an improved cross entropy loss function is provided as a loss function of the model, and the form of the improved cross entropy loss function is expressed as follows:

wherein n represents the number of the labels to be classified, i represents the label of the current classification, y_iIndicates the actual probability of the label i (label i is 1, and label i is 0), a indicates the predicted probability of the label i, and loss indicates the loss value. Since a and (1-a) in the formula are both between 0 and 1, a third power is added to the original cross entropy loss functionThe calculation of (2) is carried out, so that the loss value is increased, more training is needed when the model wants to reach the same loss value, the model is trained more fully, a training set is input into the model, and model parameters are trained and updated continuously until the loss value is minimum.

Inputting the data to be predicted into a classification model, extracting the characteristics of the data according to the input data by the convolutional neural network, and multiplying the extracted characteristics by the interdependent multi-label classifier to obtain the label of the input data.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A multi-label fundus image identification method based on GACNN is characterized by comprising the following steps:

acquiring an original fundus image and preprocessing the fundus image;

2. The method according to claim 1, wherein preprocessing the acquired raw fundus image comprises:

3. The method for identifying the multi-label fundus image based on the GACNN according to the claim 1, characterized in that the GACNN model comprises a convolution neural network, a figure attention network and a fusion layer, the convolution neural network is used for extracting the image characteristics; the graph attention network is used for modeling the relation among the eyeground multi-labels, each label of an eyeground image is regarded as a group of interdependent nodes, and historical data is used for training to obtain a multi-label classifier; and the fusion layer fuses the features obtained by the convolutional neural network and the attention network to obtain the final classification result.

4. The method as claimed in claim 3, wherein the convolutional neural network comprises 5 convolutional blocks, the convolutional blocks are connected through a max pooling layer, before each convolutional block is max pooled to the next convolutional block, feature maps obtained from the convolutional blocks are globally max pooled to obtain a feature vector, and the feature vectors obtained from each convolutional block are spliced to obtain image features.

5. The method for identifying the multi-label fundus image based on the GACNN as claimed in claim 3, wherein the image attention network comprises a plurality of image attention layers, and the process of obtaining the multi-label classifier according to the training of the image attention network comprises:

6. The method for identifying the fundus images with multiple labels based on the GACNN according to claim 5, wherein 3 layers of image attention layers are arranged in the image attention network, the size of a shared matrix in the first layer of image attention layer is F' × F, wherein F is an F-dimensional word embedding vector of the characteristic representation of the fundus label of the input image attention layer; second layer diagram note that the size of the shared matrix in the layer is F "× F'; the third level of the graph notes that the size of the shared matrix in the level is D × F ", where D is the dimension of the eigenvectors output by the convolutional neural network.

7. The method according to claim 6, wherein F' 1/2D in a shared matrix in a first layer of image attention layer and F "3/4D in a shared matrix in a second layer of image attention layer.

8. The method according to claim 5, wherein updating the eigenvector representation of the current node based on the obtained attention coefficient is as follows:

wherein the content of the first and second substances,

9. The method for identifying a multi-label fundus image based on GACNN according to claim 5 or 8, wherein the attention coefficient α is_ijExpressed as:

10. The method for identifying the multi-label fundus image based on the GACNN according to the claim 3, wherein the fusion layer fuses the features obtained by the convolutional neural network and the attention network as follows:

y＝f(Gx)；

wherein y represents the respective node prediction scores; f represents a sigmoid function, and G represents a matrix formed by the characteristic vectors of all labels after the graph attention network processing; x represents a feature vector of the image.