CN111127431A

CN111127431A - Dry eye disease grading evaluation system based on regional self-adaptive multitask neural network

Info

Publication number: CN111127431A
Application number: CN201911349411.0A
Authority: CN
Inventors: 吴健; 陆逸飞; 尤堃; 宋城
Original assignee: Hangzhou Qiushi Innovative Health Technology Co Ltd
Current assignee: Hangzhou Qiushi Innovative Health Technology Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-05-08

Abstract

The invention discloses a dry eye grading evaluation system based on a regional self-adaptive multitask neural network, which comprises a computer memory, a computer processor and a computer program which is stored in the computer memory and can be executed on the computer processor, wherein a trained dry eye grading evaluation model is stored in the computer memory, and comprises a VGG16 network, a regional recommendation network and a U-shaped full convolution network; the computer processor when executing the computer program performs the steps of: acquiring an original eyelid plate infrared image to be detected, performing gray level preprocessing, and performing bilateral filtering processing on a single-channel gray level image obtained after preprocessing; and inputting the processed image into a dry eye classification evaluation model to obtain a dry eye classification result, an eyelid plate positioning result and an acinus segmentation result. By utilizing the method, the infrared photos of the eyelid plates can be automatically analyzed, and the auxiliary diagnosis of the xerophthalmia grading can be effectively carried out.

Description

Dry eye disease grading evaluation system based on regional self-adaptive multitask neural network

Technical Field

The invention belongs to the field of medical image analysis and machine learning, and particularly relates to a regional adaptive multitask neural network-based xerophthalmia grading evaluation system.

Background

Dry eye is a multifactorial disorder of the ocular surface characterized by loss of tear film homeostasis, with accompanying ocular surface symptoms, the pathogenesis of which includes tear film instability, high tear osmolarity, ocular surface inflammation and loss, and neurosensory abnormalities. Multiple studies have shown that the incidence of dry eye fluctuates between 5 and 50%. The traditional dry eye diagnosis and evaluation method can objectively diagnose and classify dry eye according to Ocular Surface Disease Index (OSDI), NIBUT, lacrimal river height, meibomian gland morphology and other indexes. However, this evaluation method has drawbacks: the related evaluation indexes are more, the calculation is complicated, and the diagnosis efficiency is low.

Chinese patent publication No. CN107644676A discloses a convenient dry eye diagnosis and treatment system, the camera is used for shooting eye images, and sends the eye images to the CPU through the acquisition module, so that the CPU processes the acquired eye images to obtain the number of times of short shots in the monitoring time period, and stores the number of times of short shots through the storage module, further uploads the number of times of short shots stored to the server through the communication module, and the server is used for receiving and storing the short shots that the short shots affect the acquisition and processing module to upload and the user information uploaded by the client, the user subjective symptom questionnaire and the diagnosis information related to dry eyes, and the doctor information.

Chinese patent publication No. CN106510615A discloses a dry eye syndrome analysis system, which realizes the functions of statistical examination of tear film rupture time, measurement of lacrimal river height, meibomian gland image acquisition enhancement, human eye lipid layer analysis, and the like, and provides a basis for clinical dry eye diagnosis.

At present, the research only carries out quantitative analysis on the size, the tortuosity, the number and the like of meibomian gland glands, and some hidden characteristics such as relative position relation, size, shape and the like among the glands are often not taken into consideration because of difficult extraction and quantification. One of the reasons for this is that researchers have not found a direct connection between these features and the results, while another, more important reason is that the images contain semantic feature information that may far exceed the information in low-dimensional texture features such as size, morphology and number, but the semantic feature information is difficult to mine and has no way to quantitatively analyze.

Therefore, it is urgently needed to design a dry eye classification system based on deep learning to replace manual dry eye classification, automatically extract high-level semantic features by using a neural network model, and mine hidden information in an eyelid plate image, so that the diagnosis difficulty is reduced, the efficiency is improved, and meanwhile, the accuracy of dry eye classification diagnosis is improved.

Disclosure of Invention

In order to overcome the defects that the existing xerophthalmia grading needs various detection aids, is low in efficiency and low in precision, the invention discloses a xerophthalmia grading evaluation system based on a regional self-adaptive multitask neural network, which can realize automatic analysis of infrared photographs of eyelid plates and effectively perform auxiliary diagnosis on xerophthalmia grading.

A regional adaptive multitask neural network based dry eye grading assessment system comprising a computer memory, a computer processor and a computer program stored in and executable on said computer memory, said computer memory having a trained dry eye grading assessment model stored therein, said dry eye grading assessment model comprising a VGG16 network, a regional recommendation network and a full convolution network of the U-type; the VGG16 network is used for calculating a feature map according to an eyelid plate infrared image, the region recommendation model is used for calculating a recommendation region according to the feature map and respectively transmitting the recommendation region into the FCN network and the U-type full convolution network, and the FCN network obtains a xerophthalmia classification result and an eyelid plate positioning result according to the recommendation region; the U-shaped full convolution network is used for obtaining an acinus segmentation result according to the eyelid plate positioning result and the recommended area;

the computer processor, when executing the computer program, performs the steps of:

acquiring an original eyelid plate infrared image to be detected, performing gray level preprocessing, and performing bilateral filtering processing on a single-channel gray level image obtained after preprocessing;

and inputting the processed image into a dry eye classification evaluation model to obtain a dry eye classification result, an eyelid plate positioning result and an acinus segmentation result.

The system of the invention extracts the dry eye infrared eyelid plate image characteristics by using the convolutional neural network, thereby realizing dry eye classification.

The construction and training process of the xerophthalmia grading evaluation model is as follows:

(1) preprocessing the original eyelid plate infrared image, including bilateral filtering processing, smoothing the image, self-adaptive histogram equalization and pixel normalization and standardization; dividing the processed sample image into a training set, a verification set and a test set;

(2) constructing a VGG16 network: the network consists of 13 convolutional layers, 5 pooling layers, 13 ReLU activation functions and two full-connection layers; building a network by utilizing a Pythrch frame, wherein the model adopts a batch training mode, all training sets are used as one turn after passing through the model, after one turn of training is finished, the verification set is subjected to model operation and the loss of the verification set is calculated, and the loss function is cross entropy loss; setting the maximum training round of the model, stopping training after verifying and training loss convergence, and storing the model as a pre-training classification model; simultaneously storing results of the last convolution before two times of full connection as a characteristic diagram;

(3) constructing a regional recommendation network by using the feature map obtained in the step (2) and a pre-trained classification model, wherein the network consists of 7 convolution layers, 1 batch stratification, 5 ReLU activation functions, 1 maximum pooling layer, 1 global pooling layer and two Softmax; constructing a network by using a Pythrch frame, wherein the training mode is the same as that of the VGG16 network;

(4) training by using the characteristic diagram obtained in the step (2) and the regional recommendation network obtained in the step (3), and respectively transmitting the obtained recommendation regions into the FCN and the U-Net;

(5) performing global mean value and two times of full connection on the recommended region obtained by training in the step (4) to obtain an FCN (fuzzy C-means) network for independently classifying the upper eyelid plate and the lower eyelid plate and positioning the eyelid plates, and transmitting the eyelid plate positioning result into a U-shaped full convolution network; the FCN network utilizes a Pythrch framework to build a network, and the training mode is the same as that of the VGG16 network;

(6) in the eyelid plate positioning result obtained by training in the step (5), training the recommended region obtained by training in the step (4) by adopting a U-shaped full convolution network to obtain an acinus segmentation result, wherein the U-shaped full convolution network consists of 8 convolution layers, 4 upper sampling layers, 8 ReLU activation functions and 1 global pooling layer; and (5) building a network by using a Pythrch framework, wherein the training mode is the same as that of the VGG16 network.

In the step (1), the original eyelid plate infrared image comprises eyelid plate infrared image samples without xerophthalmia, mild, moderate and severe with four different xerophthalmia levels, and the number of the upper eyelid plate and the lower eyelid plate is the same; the sample contains three labels, namely eyelid plate region positioning label, xerophthalmia grading label and eyelid plate gland acinus segmentation label, wherein the eyelid plate region positioning label is an image coordinate point of an eyelid plate rectangular boundary frame in an image, and the xerophthalmia grading label is [0,1,2 and 3] which respectively represents no xerophthalmia, mild degree, moderate degree and severe degree.

In the step (4), for the obtained recommendation area, a small feature map is extracted from the recommendation area by using a RoI Align layer (Region of interest, alignment of regions of interest) to serve as the latest recommendation area, and then the latest recommendation area is respectively transmitted into the FCN and the U-Net.

When the VGG16 network, the area recommendation network, the FCN network and the U-type full convolution network are trained, the Adam algorithm is used as an optimization algorithm, and the learning rate of each training is smaller than or equal to that of the previous training.

Except for initializing parameters of a pre-trained VGG16 model during training of the regional recommended network, in other networks, the weight of the parameter weight of the convolutional layer is initialized by a random orthogonal matrix, the weight regularization mode is L2 regularization, and the offset value is initialized to 0; in the fully-connected layer, the weight is initialized to be random normal distribution, the weight regularization mode is L2 regularization, and the bias value is initialized to be 0.

In the VGG16 network, the convolution kernel size in the first layer of convolution layer is 3 x 3, the sliding step is 1, padding is 1, and the number of channels is 64; the parameter configuration of the convolution kernels of all the subsequent convolution layers is the same as that of the convolution kernel of the first layer; the maximum pooled nuclei size was 2 x 2, the sliding step was 2, padding was 0; and the global pooling adopts mean pooling, and the parameters of all subsequent pooling layers are configured to be the same as those of the first layer of pooling layer.

In the area recommendation network, the convolution kernel size in the first three layers of convolution layers is 3 x 3, the number of channels is 512, the sliding step length is 1, the padding is 0, and the convolution is input to the next layer after being activated by the ReLU; then, the results are respectively sent into a classification branch and a regression branch after global mean pooling; in the classification branch, the first convolution layer adopts a 1 × 1 convolution kernel, padding is 0, and the step length is 1; the second convolution layer adopts an 8 x 8 convolution kernel, the padding is 4, the step length is 8, and finally, the grading result of xerophthalmia is obtained through four classifications of Softmax; in the regression branch, finally adopting Softmax of the second classification to obtain whether the current region is an eyelid region or not, wherein the rest structures are the same as classification scores.

In the U-shaped full convolution network, one-time upsampling is performed firstly, and then convolution is performed twice, wherein the convolution kernel size of a convolution layer is 3 x 3, the sliding step length is 2, the padding is 1, and the number of channels is 256; repeating the step for 3 times, wherein the channel number of the third time is changed into [128,64 and 32] in sequence, and the other parameters are unchanged; and finally, obtaining the result of the acinar segmentation through global pooling, wherein the convolution kernel is 1 x 1, and the number of channels is 2.

Compared with the prior art, the invention has the following beneficial effects:

1. the method analyzes the eyelid plate infrared image through the convolutional neural network, automatically extracts the eyelid plate image characteristics, and has high diagnosis efficiency and high speed compared with the traditional method which needs manual design of the characteristics in advance;

2. by adopting the mask region convolution neural network, the boundary frame of the eyelid plate can be generated by utilizing neural network model regression, the size of the boundary frame does not need to be preset, and the boundary frame is automatically generated according to the actual condition of the image. Therefore, a single model can be used under the condition that the difference of the upper and lower eyelid plate areas is large;

3. and (3) aligning the extracted features with the input by adopting RoI alignment (Region of Interest alignment), realizing the mapping from the feature map to the suggested area and improving the accuracy of the detection of the eyelid plate area.

4. By adopting the U-shaped convolution network, semantic segmentation is realized, eyelid plate acinus can be more clearly defined in an eyelid plate generation area, and the auxiliary diagnosis effect is achieved.

Drawings

FIG. 1 is a block diagram of a dry eye grading assessment model in the system of the present invention;

FIG. 2 is a schematic diagram of a VGG16 network;

fig. 3 is a schematic structural diagram of a regional recommendation network RPN;

FIG. 4 is a schematic structural diagram of a U-shaped full convolution network U-Net.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

The system of the present invention comprises a computer memory having a trained dry eye grading assessment model stored therein, a computer processor, and a computer program stored in the computer memory and executable on the computer processor. The computer processor when executing the computer program performs the steps of: acquiring an original eyelid plate infrared image to be detected, performing gray level preprocessing, and performing bilateral filtering processing on a single-channel gray level image obtained after preprocessing; and inputting the processed image into a dry eye classification evaluation model to obtain a dry eye classification result, an eyelid plate positioning result and an acinus segmentation result.

As shown in fig. 1, the dry eye grading evaluation model includes VGG16 network, regional recommendation network, FCN network, U-type full convolution network, and the like; the VGG16 network is used for calculating a characteristic diagram according to the eyelid plate infrared image, the region recommendation model is used for calculating a recommendation region according to the characteristic diagram and respectively transmitting the recommendation region into the FCN network and the U-shaped full convolution network, and the FCN network is used for carrying out xerophthalmia classification and eyelid plate positioning according to the recommendation region; and the U-shaped full convolution network is used for obtaining the acinus segmentation result according to the eyelid plate positioning result and the recommended area.

The detailed steps of the grading evaluation of the dry eye disease by using the system of the invention are as follows:

step 1, preprocessing the original eyelid plate infrared image: the original image is down-sampled to 224 x 224 and converted to a single channel grayscale.

Step 2, carrying out bilateral filtering processing on the gray level image in the step 1 to smooth the image, but keeping the boundary to play a role in enhancing the smooth noise of the boundary;

step 3, enhancing contrast by a self-adaptive histogram equalization method, and normalizing data to [0,1 ];

step 4, calculating the comprehensive mean value and standard deviation of all images in the training set, and then standardizing the data to improve the generalization of the model in actual use;

step 5, calculating a characteristic diagram in advance by using the VGG16 network, as shown in FIG. 2;

step 5.1: a classification model is pre-trained by VGG16, and a feature map obtained by calculation is retained. Inputting a group of eyelid plate gray scale images with the size of 224 × 1, and performing batch normalization (batch normalization/norm);

step 5.2: then, performing convolution operation with the size of 3 x 3, wherein the number of channels is 64, performing activation function (ReLU), and performing Maxplating operation to down-sample the image to be (112 x 64);

step 5.3: then, performing convolution operation with the size of 3 × 3 and the number of channels being 128, continuously performing convolution twice, performing activation function (ReLU), and performing Maxplating operation to downsample the image into (56 × 128);

step 5.4: then, performing convolution operation with the size of 3 x 3, wherein the number of channels is 256, continuously performing convolution three times, performing activation function (ReLU), and performing Maxplating operation to downsample the image into (28 x 256);

step 5.5: then, performing convolution operation with the size of 3 × 3, wherein the number of channels is 512, continuously performing convolution three times in this way, performing activation function (ReLU), and performing Maxplating operation to downsample the image into (14 × 512);

step 5.6: step 5.5 is repeated once, and the resulting image is (7 × 512) and retained as the final feature map.

Step 5.7: further, the characteristic diagram obtained in the step 5.6 is sequentially subjected to 512-dimensional and 64-dimensional full-connected layers (FC) to obtain grading results of xerophthalmia (xerophthalmia, mild, moderate and severe degrees respectively);

step 6, constructing a regional recommendation network by using the feature map obtained in step S5 and the pre-trained classification model, wherein the process is as shown in fig. 3:

step 6.1: the feature of 5 x 512 was obtained by first convolving once with the convolution kernel of 3 x 512 and then convolving twice with the convolution kernel of 1 x 512.

Step 6.2: then, the result is respectively sent into a Classification branch (Classification) and a Regression branch (Regression) through Global average Pooling (Global average Pooling);

step 6.3: in the classification branch, the result introduced in step 6.2 is respectively passed through two full-connected layers (FC) of 512-dimension and 64-dimension to obtain the grading result of xerophthalmia (xerophthalmia, mild, moderate and severe, respectively);

step 6.4: in the regression branch, the result obtained in step 6.2 is also respectively processed through two full-connected layers of 512 dimensions and 64 dimensions, and finally the frame regression result (yes or no) is obtained;

and 7, training the area recommendation network by using the characteristic diagram obtained by training in the step 5 and the area recommendation network obtained by training in the step 6, extracting a small characteristic diagram from the generated recommendation area by using the RoI Align layer to serve as a new recommendation area, and respectively transmitting the small characteristic diagram into the FCN and the U-Net.

Step 8, the recommended region obtained by training in the step 7 is subjected to overall mean value and then is subjected to full connection twice, a model capable of detecting the eyelid plate region, independently judging whether the upper eyelid plate and the lower eyelid plate are suffered from xerophthalmia or not and grading the severity of the xerophthalmia and an eyelid plate positioning model are obtained, and the eyelid plate positioning result is transmitted into U-Net;

and 9, in the eyelid plate positioning area obtained by training in the step 8, training the recommended area obtained by training in the step 7 by adopting a U-Net model to obtain acinus segmentation. And (3) carrying out auxiliary diagnosis on dry eye classification by using the trained dry eye severity classification model and the number and size of acini, wherein the structure of the U-shaped convolutional neural network is shown in fig. 4:

step 9.1: the feature region (7 x 512) corresponding to the eyelid plate region transmitted in step 8 is found in the feature map and transmitted as an input to the U-Net network.

Step 9.2: firstly, performing one-time upsampling (2 x 2), and performing two-time convolution (3 x 3);

step 9.3: repeating the step for 9.2 times;

step 9.6: finally, convolution is carried out once more (1 x 1), and the result of the acinar segmentation is obtained.

And 6, the positioning labels and the grading labels used in the step 8 need to be processed and generated. And generating a regression label of the boundary frame according to the labeled boundary frame, wherein the central coordinates of the boundary frame are (x, y), and the width and the height of the boundary frame are (w, h) respectively. Sliding on the image using a sliding window of various sizes and aspect ratios, the sizes including [128, 256 ]]Aspect ratio of [2,1,0.5 ]]When the coincidence ratio (intersection over, IoU) of the sliding window and the labeling bounding box is greater than 0.75, the sliding window at the position is set as the labeling sliding window, and the center coordinate thereof is (x)_a,y_a) The length and width are (w)_a,h_a) And calculating the relative position relation between the boundary frame and the boundary frame in the following way:

the model is regressed to the orientation label through the full connected layer. The ratings labeled [0,1,2,3] represent no dry eye, mild, moderate and severe, respectively.

In step 6 and step 8, the Loss function of the bounding box positioning is smooth L1 Loss function, and the graded Loss function is Negative Log likelihood function (Negative Log Loss).

In step 9, labeled as artificially delineated green acinar boundaries, Cross entropy loss (Cross entropy loss) was used.

The structure of the up-sampling part in the U-type network is shown in figure 4. The input of the module with the size W H T is subjected to two path operations, wherein one of the two path operations is subjected to convolution-batch norm-ReLU activation-convolution-batch norm-ReLU-max firing activation to obtain a characteristic output with the size reduced by half. In another path, after the module features are subjected to global pooling (global pooling/pool), features with the length of T are obtained through a full connection layer, and the features are used as weights and are multiplied by previous input features pixel by pixel according to channels to obtain final module output. The module outputs channel weighting, so that the characteristics of partial channels are highlighted, and the characteristic enhancement effect is achieved.

Example (c):

the infrared image of eyelid plate that uses in this embodiment divide into about the eyelid plate, contains 4 xerophthalmia grades respectively, includes: no dry eye, mild, moderate and severe dry eye. The eyelid plate infrared images had 11584 samples, and the number of the upper and lower eyelid plates was the same, with 2545 non-dry eye samples, 3623 mild dry eye, 3242 moderate dry eye, and 2174 severe dry eye. 7823 samples are randomly selected from the positive and negative samples respectively to serve as a training set, 1180 samples serve as a verification set, and 1181 samples serve as a test set. The process of pre-processing and enhancing the eyelid plate image, training and testing the model will be described in detail below.

S1, eyelid plate image preprocessing.

S1-1: the image is sampled to be 224 x 224 in size, so that memory overflow caused by overlarge image during model training is avoided;

s1-2: converting the image into a gray scale image;

s2, eyelid plate image enhancement.

S2-1: a bilateral filter was used for the image, with a filter kernel size of 8, a color space standard deviation of 75, and a coordinate space standard deviation of 75.

S3, and the specific structure of the construction and training of VGG16 is shown in FIG. 2.

S3-1: the model consists of 13 convolution layers, 5 pooling layers and 13 full-connection layers of ReLU activation functions;

s3-2: the convolution kernel size in the first convolutional layer is 3 × 3, the sliding step is 1, padding is 1, and the number of channels is 64. The convolution kernels of all the subsequent convolution layers are configured to be the same as the parameters of the convolution kernels of the first layer. The maximum pooled nuclei size was 2 x 2, the sliding step was 2, and padding was 0. And the global pooling adopts mean pooling, and the parameters of all subsequent pooling layers are configured to be the same as those of the first layer of pooling layer.

S3-3: all the parameter weights in the convolutional layers are initialized to be random orthogonal matrix initialization, the weight regularization mode is L2 regularization, and the bias value is initialized to be 0. In the fully-connected layer, the weight is initialized to be random normal distribution, the weight regularization mode is L2 regularization, and the bias value is initialized to be 0.

S3-4: and building a network by utilizing the Pythrch framework. The model adopts a batch training mode. The sample number of each batch of the training set generator and the verification set generator is 4, all training sets are used as one round (epoch) after being subjected to model, after one round of training is completed, the verification set is subjected to model operation and the loss of the verification set is calculated, and the loss function is cross entropy loss. The model optimizer Adam, the parameters lr being 0.001, the beta being (0.9,0.999), eps being 1e-08, weight _ decade being 0, and amsgrad being False. The maximum training round of the model is 100, the training is stopped after the verification and the training loss are converged, the model is saved as a dat file, the dat file is used as a final pre-training result, and the result of the last convolution before two times of full connection is saved and is used as a feature map.

S4, constructing a regional recommendation network by using the feature map obtained in the step S3 and the pre-trained classification model, wherein the process is as shown in FIG. 2:

s4-1: the network architecture consists of 7 convolutional layers, 1 batch layer, 5 ReLU activation functions, 1 maximum pooling layer, 1 global pooling layer and two Softmax.

S4-2: the convolution kernel size in the first three layers of convolution layers is 3 x 3, the number of channels is 512, the sliding step length is 1, the padding is 0, and after each convolution, the convolution kernel is activated by the ReLU and then input to the next layer.

S4-3: then, the result is respectively sent into a Classification branch (Classification) and a Regression branch (Regression) through Global average Pooling (Global average Pooling);

s4-4: in the classification branch (cls), the first convolutional layer adopts a 1 × 1 convolutional kernel, padding is 0, and the step length is 1; the second convolution layer adopts an 8-by-8 convolution kernel, the padding is 4, the step length is 8, and finally, the four classifications of Softmax are carried out to obtain the grading result of xerophthalmia (xerophthalmia, mild, moderate and severe respectively);

s4-5: the step of frame regression (reg) is the same as the step of S4-4 except that the last Softmax classified into two is used to obtain whether the current area is the eyelid area;

s4-6: all the parameter weights in the convolutional layers are initialized to be random orthogonal matrix initialization, the weight regularization mode is L2 regularization, and the bias value is initialized to be 0.

S4-7: and building a network by utilizing the Pythrch framework. The model adopts a batch training mode. The sample number of each batch of the training set generator and the verification set generator is 4, all training sets are used as one round (epoch) after being subjected to model, after one round of training is completed, the verification set is subjected to model operation and verification set Loss is calculated, wherein the Loss function of cls is a Negative Log Loss, and the Loss of reg is a smooth L1 Loss function. The model optimizer Adam, the parameters lr being 0.001, the beta being (0.9,0.999), eps being 1e-08, weight _ decade being 0, and amsgrad being False. The model maximum training round is 100.

S5, training the feature map obtained by training in the step S3 and the Region recommendation network obtained by training in the step S4, through RoI Align (Region of Interest, Region of Interest alignment), firstly scaling RoI (Region of Interest) represented by floating point numbers to granularity matched with the feature map, then partitioning the scaled RoI, and finally calculating an accurate value of each position by bilinear interpolation (performing two times of linear interpolation), wherein the calculation method is as follows:

and Average polymerization (Average Aggregation) was used for the results to give a new RoI.

And S6, respectively transmitting the characteristic diagram obtained by training in the step S3 and the RoI obtained by training in the step S5 into the FCN and the U-Net.

And S7, obtaining a model capable of detecting the eyelid plate area, independently judging whether the upper eyelid plate and the lower eyelid plate are suffered from xerophthalmia and grading the severity of the xerophthalmia and an eyelid plate positioning model, and transmitting the eyelid plate positioning result into the U-Net.

S7-1: the features are stretched into one-dimensional vectors, and then fully connected twice, wherein each fully connected layer contains 1024 neurons.

S7-2: and finally, respectively outputting results of classification and frame regression.

S7-3: and building a network by utilizing the Pythrch framework. The model adopts a batch training mode. The sample number of each batch of the training set generator and the verification set generator is 4, all training sets are used as one round (epoch) after being subjected to model, after one round of training is completed, the verification set is subjected to model operation and verification set Loss is calculated, wherein the Loss function of cls is a Negative Log Loss, and the Loss of reg is a smooth L1 Loss function. The model optimizer Adam, the parameters lr being 0.001, the beta being (0.9,0.999), eps being 1e-08, weight _ decade being 0, and amsgrad being False. The model maximum training round is 100.

And S8, in the eyelid plate positioning region obtained by training in the step S7, carrying out training on the interested region obtained by training in the step S6 by adopting a U-Net model to obtain acinus segmentation. The structure of the U-shaped convolution neural network is shown in FIG. 4:

s8-1: the network architecture consists of 8 convolutional layers, 4 upsampling layers, 8 ReLU activation functions and 1 global pooling layer.

S8-2: the first layer performs one upsampling and then convolving twice on the image, wherein the convolution kernel size of the convolutional layer is 3 x 3, the sliding step size is 2, the padding size is 1, and the number of channels is 256. After each convolution, it is activated with the ReLU function.

S8-3: the step 8.2 is repeated three more times, and the number of channels becomes [128,64,32] in turn. The remaining parameters were the same as in step 8.2.

S8-4: and finally, performing global pooling to obtain 52 × 2, wherein the convolution kernel is 1 × 1, and the number of channels is 2.

S8-5: and building a network by utilizing the Pythrch framework. The model adopts a batch training mode. The sample number of each batch of the training set generator and the verification set generator is 4, all training sets are used as one round (epoch) after being subjected to model, after one round of training is completed, the verification set is subjected to model operation and verification set loss is calculated, and the positioning loss function is cross entropy loss. The model optimizer Adam, the parameters lr being 0.001, the beta being (0.9,0.999), eps being 1e-08, weight _ decade being 0, and amsgrad being False. The maximum training round of the model is 60, the training is stopped after the verification and the training loss are converged, and the model is saved as a dat file as a final training result.

And S9, loading the model, inputting the preprocessed eyelid plate infrared image test set sample into the model for analysis, and comparing the recognition result with the label to obtain the recognition accuracy of the model.

Through the operation of the steps, the construction, training and testing of the regional recommendation network for dry eye syndrome classification can be realized.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A regional adaptive multitask neural network based dry eye grading assessment system comprising a computer memory, a computer processor and a computer program stored in said computer memory and executable on said computer processor characterized in that:

the computer memory stores a trained xerophthalmia grading evaluation model, and the xerophthalmia grading evaluation model comprises a VGG16 network, a regional recommendation network and a U-shaped full convolution network; the VGG16 network is used for calculating a feature map according to an eyelid plate infrared image, the region recommendation model is used for calculating a recommendation region according to the feature map and respectively transmitting the recommendation region into the FCN network and the U-type full convolution network, and the FCN network obtains a xerophthalmia classification result and an eyelid plate positioning result according to the recommendation region; the U-shaped full convolution network is used for obtaining an acinus segmentation result according to the eyelid plate positioning result and the recommended area;

2. The regional adaptive multitask neural network-based dry eye grading assessment system according to claim 1, wherein the dry eye grading assessment model is constructed and trained as follows:

(4) training by using the characteristic diagram obtained in the step (2) and the area recommendation network obtained in the step (3) to obtain recommendation areas, and respectively transmitting the recommendation areas into the FCN network and the U-shaped full convolution network;

(5) the FCN acquires a dry eye classification result and an eyelid plate positioning result by using the recommended region trained in the step (4) through global mean value and two times of full connection, and transmits the eyelid plate positioning result into a U-shaped full convolution network; the FCN utilizes a Pythrch framework to build a network, and the training mode is the same as that of the VGG16 network;

3. The regional adaptive multitask neural network-based dry eye grading assessment system according to claim 2, wherein the original eyelid plate infrared image comprises eyelid plate infrared image samples without dry eye, with four different dry eye levels of mild, moderate and severe, and the number of upper and lower eyelid plates is the same; the sample contains three labels, namely eyelid plate region positioning label, xerophthalmia grading label and eyelid plate gland acinus segmentation label, wherein the eyelid plate region positioning label is an image coordinate point of an eyelid plate rectangular boundary frame in an image, and the xerophthalmia grading label is [0,1,2 and 3] which respectively represents no xerophthalmia, mild degree, moderate degree and severe degree.

4. The regional adaptive multitask neural network-based dry eye condition grading assessment system according to claim 2, wherein in the step (4), for the obtained recommended region, a small feature map is extracted from the recommended region by using a RoIAlign layer to serve as a latest recommended region, and then the latest recommended region is respectively transmitted into the FCN and the U-Net.

5. The regional adaptive multitask neural network-based dry eye grading assessment system according to claim 2, wherein an Adam algorithm is used as an optimization algorithm when a VGG16 network, a regional recommendation network, an FCN network and a U-type full convolution network are trained.

6. The regional adaptive multitask neural network-based dry eye condition grading assessment system according to claim 2, wherein when the VGG16 network, the regional recommendation network, the FCN network and the U-type full convolution network are trained, the learning rate of each training is less than or equal to the learning rate of the previous training.

7. The regional adaptive multitask neural network-based dry eye syndrome grading evaluation system according to claim 2, wherein except for the initialization with pre-trained VGG16 model parameters during regional recommended network training, in the rest of networks, the weight of the parameter weight of the convolutional layer is initialized with a random orthogonal matrix, the weight regularization mode is L2 regularization, and the offset value is initialized to 0; in the fully-connected layer, the weight is initialized to be random normal distribution, the weight regularization mode is L2 regularization, and the bias value is initialized to be 0.

8. The regional adaptive multitask neural network-based dry eye grading assessment system according to claim 2, wherein in the VGG16 network, the convolution kernel size in the first convolution layer is 3 x 3, the sliding step size is 1, the padding is 1, and the channel number is 64; the parameter configuration of the convolution kernels of all the subsequent convolution layers is the same as that of the convolution kernel of the first layer; the maximum pooled nuclei size was 2 x 2, the sliding step was 2, padding was 0; and the global pooling adopts mean pooling, and the parameters of all subsequent pooling layers are configured to be the same as those of the first layer of pooling layer.

9. The regional adaptive multitask neural network-based dry eye grading assessment system according to claim 2, wherein in the regional recommendation network, the convolution kernel size in the first three convolution layers is 3 x 3, the number of channels is 512, the sliding step size is 1, padding is 0, and the convolution is activated by the ReLU and then input to the next layer; then, the results are respectively sent into a classification branch and a regression branch after global mean pooling;

in the classification branch, the first convolution layer adopts a 1 × 1 convolution kernel, padding is 0, and the step length is 1; the second convolution layer adopts an 8 x 8 convolution kernel, the padding is 4, the step length is 8, and finally, the grading result of xerophthalmia is obtained through four classifications of Softmax; in the regression branch, finally adopting Softmax of the second classification to obtain whether the current region is an eyelid region or not, wherein the rest structures are the same as classification scores.

10. The regional adaptive multitask neural network-based dry eye grading assessment system according to claim 2, wherein in a U-shaped full convolution network, one up-sampling is performed first, and then convolution is performed twice, wherein the convolution kernel size of a convolution layer is 3 x 3, the sliding step size is 2, the padding is 1, and the number of channels is 256; repeating the step for 3 times, wherein the channel number of the third time is changed into [128,64 and 32] in sequence, and the other parameters are unchanged; and finally, obtaining the result of the acinar segmentation through global pooling, wherein the convolution kernel is 1 x 1, and the number of channels is 2.