CN111080630B

CN111080630B - Fundus image detection device, fundus image detection method, fundus image detection device, and fundus image storage medium

Info

Publication number: CN111080630B
Application number: CN201911327024.7A
Authority: CN
Inventors: 余双; 马锴; 郑冶枫; 边成; 龚丽君; 初春燕; 刘含若; 王宁利
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2024-03-08
Anticipated expiration: 2039-12-20
Also published as: CN111080630A

Abstract

The application discloses fundus image detection equipment, a fundus image detection method, a fundus image detection device and a fundus image storage medium, and belongs to the technical field of images. According to the method, a group of fundus images to be detected is obtained through a fundus image detection device in response to an image detection instruction, the group of fundus images comprise a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, feature extraction is carried out on the first image and the second image through the image classification model, a first feature vector and a second feature vector are obtained, then image classification is carried out on the basis of the first feature vector, the second feature vector and a third feature vector used for indicating differences between the first feature vector and the second feature vector, and a label corresponding to the group of fundus images is output. Image detection can be performed based on image features and image differences of the left eye image and the right eye image, labels corresponding to the images are obtained, and accuracy of fundus image detection results is improved.

Description

Fundus image detection device, fundus image detection method, fundus image detection device, and fundus image storage medium

Technical Field

The present disclosure relates to the field of image technologies, and in particular, to a fundus image detection apparatus, a fundus image detection method, a fundus image detection device, and a fundus image storage medium.

Background

With the development of artificial intelligence, image detection technology based on artificial intelligence is widely applied to various fields of people's life. For example, in the clinical medicine field, detection of a fundus image may be achieved based on an image detection technique, and a computer device may divide a cup and a disc image in the fundus image, calculate a cup-disc ratio based on the divided images, and further determine whether the fundus image is a glaucoma fundus image.

In the above-described image detection process, the cup/disc ratio is generally calculated from the fundus image of the single eye, but in some glaucoma cases, a case where the cup/disc ratio of the single eye falls within a normal range occurs, and in this case, glaucoma detection is performed from only the fundus image of the single eye, which may result in inaccurate detection results.

Disclosure of Invention

The embodiment of the application provides a fundus image detection device, a fundus image detection method, a fundus image detection device and a fundus image detection storage medium, which can improve the accuracy of fundus image detection results. The technical scheme is as follows:

in one aspect, there is provided a fundus image detecting apparatus for:

in response to an image detection instruction, acquiring a group of fundus images to be detected, wherein the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye;

Inputting the first image and the second image into an image classification model;

extracting features of the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector;

based on the first feature vector and the second feature vector, a third feature vector is obtained, wherein the third feature vector is used for indicating the difference between the first feature vector and the second feature vector;

and classifying the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputting a label corresponding to the group of fundus images.

In one aspect, there is provided a fundus image detection method, the method comprising:

In one possible implementation manner, the obtaining a third feature vector based on the first feature vector and the second feature vector includes:

obtaining a difference vector of the first feature vector and the second feature vector;

and taking absolute values of all the numerical values in the difference value vector to obtain the third characteristic vector.

In one possible implementation manner, the image classifying the set of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputting a label corresponding to the set of fundus images includes:

splicing the first feature vector, the second feature vector and the third feature vector to obtain a fourth feature vector corresponding to the group of fundus images;

inputting the fourth feature vector into a first full connection layer, and mapping the fourth feature vector into a two-dimensional vector by the first full connection layer;

and taking the label indicated by the two-dimensional vector as the label corresponding to the group of fundus images.

In one possible implementation manner, after the feature extraction is performed on the first image and the second image by the image classification model to obtain a first feature vector and a second feature vector, the method further includes:

Inputting the first feature vector into a second full-connection layer, and determining a left eye label corresponding to the first image based on an output result of the second full-connection layer;

and inputting the second feature vector into a third full-connection layer, and determining a right eye label corresponding to the second image based on an output result of the third full-connection layer.

In one aspect, there is provided a fundus image detecting apparatus, the apparatus comprising:

the image acquisition module is used for responding to the image detection instruction and acquiring a group of fundus images to be detected, wherein the group of fundus images comprise a first image corresponding to the left eye and a second image corresponding to the right eye;

an input module for inputting the first image and the second image into an image classification model;

the first vector acquisition module is used for extracting the characteristics of the first image and the second image through the image classification model to obtain a first characteristic vector and a second characteristic vector;

a second vector obtaining module, configured to obtain a third feature vector based on the first feature vector and the second feature vector, where the third feature vector is used to indicate a difference between the first feature vector and the second feature vector;

and the image classification module is used for carrying out image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector and outputting labels corresponding to the group of fundus images.

In one possible implementation, the image acquisition module is configured to:

acquiring a first fundus image of the left eye and a second fundus image of the right eye;

the following steps are performed on any fundus image of the first fundus image and the second fundus image:

inputting the fundus image into an image segmentation model, and calculating the probability that each pixel point in the fundus image is a video disc by the image segmentation model to obtain a probability matrix, wherein the larger the numerical value of an element in the probability matrix is, the larger the probability that the position of the element is the video disc is;

based on the numerical values of the elements in the probability matrix, an image of a target area is acquired from the fundus image, and the center of the image of the target area coincides with the center of the optic disc.

In one possible implementation, the image acquisition module is configured to:

based on a probability threshold, performing binarization processing on the probability matrix to obtain a binarization matrix corresponding to the fundus image;

determining the center of the optic disc and the diameter of the optic disc based on the binarization matrix;

the target area is determined based on the disc center and the disc diameter, the center of the target area coinciding with the disc center.

In one possible implementation, the first vector acquisition module is configured to:

And respectively extracting features of the first image and the second image through a first feature extractor and a second feature extractor in the image classification model to obtain a first feature matrix corresponding to the first image and a second feature matrix corresponding to the second image, and respectively carrying out global pooling processing on the first feature matrix and the second feature matrix to obtain the first feature vector and the second feature vector.

In one possible implementation, the first feature extractor and the second feature extractor are the same parameters.

In one possible implementation, the second vector acquisition module is configured to:

In one possible implementation, the image classification module is configured to:

In one possible implementation, the method further includes:

the left eye label determining module is used for inputting the first feature vector into a second full-connection layer and determining a left eye label corresponding to the first image based on an output result of the second full-connection layer;

the right eye label determining module is configured to input the second feature vector into a third full-connection layer, and determine a right eye label corresponding to the second image based on an output result of the third full-connection layer.

In one aspect, a computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to perform operations performed by the fundus image detection method is provided.

According to the technical scheme, a group of fundus images to be detected is obtained through a fundus image detection device in response to an image detection instruction, the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, feature extraction is carried out on the first image and the second image through the image classification model, a first feature vector and a second feature vector are obtained, a third feature vector is obtained based on the first feature vector and the second feature vector, the third feature vector is used for indicating the difference between the first feature vector and the second feature vector, image classification is carried out on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and a label corresponding to the group of fundus images is output. Image detection can be performed based on image features and image differences of the left eye image and the right eye image, labels corresponding to the images are obtained, and accuracy of fundus image detection results is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an image detection system according to an embodiment of the present application;

fig. 2 is a flowchart of a fundus image detection method provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a fundus image detection process provided in an embodiment of the present application;

FIG. 4 is a flowchart of an image classification model training method according to an embodiment of the present application;

fig. 5 is a schematic structural view of a fundus image detecting apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, a camera and a Computer device replace human eyes to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing, so that the Computer device processes the image into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D (3D) techniques, virtual reality, augmented reality, synchronized positioning, and map construction, and the like, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like.

The scheme provided by the embodiment of the application mainly relates to an image processing and image recognition technology in computer vision, and the fundus image is detected through the image processing and image recognition technology, so that whether the detected fundus image is a glaucoma fundus image or not is determined.

Fig. 1 is a schematic diagram of an image detection system provided in an embodiment of the present application, referring to fig. 1, the image detection system 100 includes: a terminal 110 and an image detection platform 140.

The terminal 110 is connected to the image detection platform 110 through a wireless network or a wired network. The terminal 110 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an electronic book reader, an MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert compression standard audio plane 4) player, and a laptop portable computer. The terminal 110 installs and runs an application program supporting image detection. The application may be a detection class application or the like. The terminal 110 is an exemplary terminal used by a user, and a user account is logged into an application running in the terminal 110.

The terminal 110 is connected to the image detection platform 140 through a wireless network or a wired network.

The image detection platform 140 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The image detection platform 140 is used to provide background services for applications that support image detection. Optionally, the image detection platform 140 performs primary detection, and the terminal 110 performs secondary detection; alternatively, the image detection platform 140 performs a secondary detection operation, and the terminal 110 performs a primary detection operation; alternatively, the image detection platform 140 or the terminal 110 may separately undertake the detection work, respectively.

Optionally, the image detection platform 140 includes: an access server, an image detection server and a database. The access server is used to provide access services for the terminal 110. The image detection server is used for providing background services related to image detection. The image detection server may be one or more. When the image detection servers are multiple, there are at least two image detection servers for providing different services, and/or there are at least two image detection servers for providing the same service, such as providing the same service in a load balancing manner, which is not limited in the embodiments of the present application. The image detection server may be provided with an image classification model.

Terminal 110 may refer broadly to one of a plurality of terminals, with the present embodiment being illustrated only by terminal 110.

Those skilled in the art will recognize that the number of terminals may be greater or lesser. For example, the number of the terminals may be only one, or the number of the terminals may be tens or hundreds, or more, and the image detection system may further include other terminals. The number of terminals and the device type are not limited in the embodiment of the present application.

Fig. 2 is a flowchart of a fundus image detection method provided in an embodiment of the present application. The method may be applied to the above terminal or server, and both the terminal and the server may be regarded as a computer device, and in this embodiment of the present application, the fundus image detection device may be any one of the above terminal or server, and therefore, the embodiment of the present application is described based on the computer device as an execution body, referring to fig. 2, and this embodiment may specifically include the following steps:

201. The computer device acquires a first fundus image of a left eye and a second fundus image of a right eye.

In one possible implementation, the computer device may acquire the first fundus image and the second fundus image to be detected when receiving the image detection instruction. The first fundus image and the second fundus image may be images stored in a computer device, images captured by the computer device in a video, or images acquired in real time by the computer device with an image acquisition function, and the embodiment of the application is not limited as to which image is specifically adopted.

202. The computer device acquires probability matrices corresponding to the first fundus image and the second fundus image, respectively.

In one possible implementation manner, the computer device may obtain a probability matrix corresponding to each fundus image based on an image segmentation model, and the computer device may perform the following steps on any fundus image of the first fundus image and the second fundus image: inputting the fundus image into an image segmentation model, and calculating the probability that each pixel point in the fundus image is a video disc by the image segmentation model to obtain a probability matrix, wherein the larger the numerical value of an element in the probability matrix is, the larger the probability that the position of the element is the video disc is. The image segmentation model can be a model trained based on a plurality of groups of sample images, one group of sample images can comprise a left eye fundus image and a right eye fundus image, each fundus image can be added with labeling information, the labeling information can be used for indicating the area where the optic disc is located and the outline of the area where the optic disc is located, the computer equipment can train the image segmentation model based on the plurality of groups of sample images, and each parameter in the image segmentation model is adjusted, so that the image segmentation model can identify the area where the optic disc is located in the fundus image.

Taking the probability matrix corresponding to the first fundus image as an example for explanation, in one possible implementation manner, the process specifically may include the following steps:

step one, the computer equipment inputs the first eye bottom image into the image segmentation model.

After the first fundus image and the second fundus image are input into the image segmentation model by the computer device, the image segmentation model can preprocess the first fundus image, and convert the first fundus image into a digital matrix composed of a plurality of pixel points so that the computer device can carry out subsequent operation processes.

And step two, the computer equipment performs feature extraction on the first fundus image through the image segmentation model to obtain a feature map corresponding to the first fundus image.

In one possible implementation manner, the image segmentation model may include a plurality of convolution layers, the computer device may sequentially perform convolution operations on the digital matrix corresponding to the first fundus image and each convolution layer to extract image features, and the computer device may generate an intermediate feature map based on the operation result output by each convolution layer, and use the intermediate feature map obtained based on the last convolution layer as the feature map of the first fundus image. The specific number of convolution layers in the image segmentation model may be set by a developer, which is not limited in the embodiment of the present application.

Specifically, taking one of the convolution layers as an example to describe the above convolution operation process, one convolution layer may include one or more convolution kernels, each convolution kernel corresponds to a scanning window, the size of the scanning window is the same as that of the convolution kernel, during the convolution operation process of the convolution kernel, the scanning window may slide on the intermediate feature map according to a target step size, and sequentially scan each area of the intermediate feature map, where the target step size may be set by a developer. Taking a convolution kernel as an example, in the process of convolution operation, when a scanning window of the convolution kernel slides to any area of the middle feature map, the computer equipment reads the numerical value corresponding to each feature point in the area, performs dot multiplication operation on the convolution kernel and the numerical value corresponding to each feature point, and then accumulates each product, and takes the accumulated result as a feature point. And then, sliding the scanning window of the convolution kernel to the next area of the intermediate feature map according to the target step length, performing convolution operation again, outputting a feature point until all areas of the feature map are scanned, and forming all the output feature points into a new intermediate feature map as the input of the next convolution layer.

And thirdly, the computer equipment acquires a probability matrix corresponding to the first fundus image based on the feature map.

In one possible implementation, the computer device may upsample or deconvolve the feature map to obtain a target matrix of the same size as the first bottom-eye image. The image segmentation model may include a plurality of transposed convolutional layers for upsampling, the computer device may perform a transposed convolutional operation on the feature map sequentially with each transposed convolutional layer to expand the size of the feature map, and the computer device may obtain the target matrix based on an output result of a last transposed convolutional layer. In the image segmentation model, the specific number of transposed convolution layers may be set by a developer, which is not limited in the embodiment of the present application.

It should be noted that the foregoing description of the upsampling process is merely an exemplary illustration of an upsampling method, and the embodiments of the present application do not limit what upsampling method is specifically used.

The process of obtaining the probability matrix corresponding to the second fundus image is the same as the process of obtaining the probability matrix corresponding to the first fundus image, and will not be described herein.

In this embodiment of the present application, after the computer device obtains the target matrix, each pixel point in the first fundus image may be classified based on the target matrix. In one possible implementation, a sigmoid (S-shaped growth curve) function may be included in the image segmentation model, and the computer device may convert each element in the target matrix into a value belonging to (0, 1) based on the sigmoid function, that is, convert the target matrix into the probability matrix, where each value in the probability matrix may be used to indicate a probability that the value is located in a disc.

203. The computer device performs optic disc segmentation on the first fundus image and the second fundus image respectively to obtain a first image and a second image.

Wherein the first image and the second image include disc information for a left eye and disc information for a right eye, respectively.

In the embodiment of the application, the computer device may acquire an image of a target area from the fundus image based on the numerical values of the elements in the probability matrix, where the center of the image of the target area coincides with the center of the optic disc. In one possible implementation manner, first, the computer device may perform binarization processing on the probability matrix based on a probability threshold to obtain a binarization matrix corresponding to the fundus image, for example, the computer device may compare a value of each element in the probability matrix with the probability threshold, assign an element with a value greater than the probability threshold to a first value, and assign an element with a value less than the probability threshold to a second value, where the probability threshold, the first value, and the second value may all be set by a developer. Then, the computer device may determine the center of the optic disc and the diameter of the optic disc based on the binarization matrix, for example, the computer device may perform connected domain analysis based on the binarization matrix, obtain a region where the element assigned as the first value is located, obtain at least one candidate region, and use the candidate region with the largest area as the region where the optic disc is located, and the computer device may determine the center of the region where the optic disc is located as the center of the optic disc, and determine the diameter of the optic disc based on the size of the region where the optic disc is located. Finally, the computer device may determine the target area based on the disc center and the disc diameter, the center of the target area coinciding with the disc center, for example, the computer device may determine square areas with the disc center as the center and N disc diameter lengths as side lengths as the target area, where N is greater than 0, a specific value thereof may be set by a developer, and the computer device may acquire an image of the target area from a fundus image, specifically, an image of the target area acquired from a first fundus image as a first image and an image of the target area acquired from a second fundus image as a second image.

It should be noted that the above description of acquiring the image of the target area based on the probability matrix is merely an exemplary illustration, and the embodiment of the present application does not limit what method is specifically adopted to acquire the image of the target area.

It should be noted that, the steps 201, 202, and 203 are steps of acquiring a set of fundus images to be detected in response to an image detection instruction, where the set of fundus images includes a first image corresponding to the left eye and a second image corresponding to the right eye. The above description of acquiring an image of a region of interest from a fundus image based on an image segmentation model is merely an exemplary description, and the embodiment of the present application is not limited to what kind of image segmentation model is specifically applied and how an image of a region of interest is specifically acquired. In the embodiment of the application, the follow-up detection step is carried out by acquiring the partial image containing the video disc information from the fundus image, the computer equipment does not need to operate other areas in the image, so that the interference of a large amount of irrelevant data is avoided, and the detection efficiency and the accuracy are improved. Of course, the first image and the second image may be images of the left eye and right eye optic disc regions acquired by a fundus camera or the like, which is not set in the embodiment of the present application.

204. The computer device inputs the first image and the second image into an image classification model.

In an embodiment of the present application, the image classification model may include a first feature extractor and a second feature extractor for extracting image features of the first image and the second image, respectively. Wherein the first feature extractor and the second feature extractor have the same parameters, i.e. the weight parameters of the two feature extractors are shared, the computer device may extract features of the same dimension of the first image and the second image.

In the embodiment of the present application, each feature extractor may be constructed based on a deep neural network, for example, each feature extractor may be a VGG (Visual Geometry Group Network ) model, a res net (Residual Neural Network, residual neural network) model, or the like, and the structure of each feature extractor is not limited in the embodiment of the present application.

After the computer device inputs the first image and the second image into the first feature extractor and the second feature extractor, respectively, each feature extractor may pre-process the input image, and convert the input image into a digital matrix composed of a plurality of pixel values, so that the computer device performs a subsequent feature extraction step.

205. And the computer equipment performs feature extraction on the first image and the second image through the image classification model to obtain a first feature vector and a second feature vector.

The computer equipment can respectively extract the characteristics of the first image and the second image through a first characteristic extractor and a second characteristic extractor in the image classification model to obtain a first characteristic matrix corresponding to the first image and a second characteristic matrix corresponding to the second image, and respectively perform global pooling processing on the first characteristic matrix and the second characteristic matrix to obtain the first characteristic vector and the second characteristic vector.

Taking the first feature vector corresponding to the first image as an example for illustration, in one possible implementation manner, the first feature extractor may include a plurality of convolution layers, the computer device may sequentially perform convolution operations on the first image and each of the convolution layers, and generate a first feature matrix corresponding to the first image based on an output result of a last convolution layer in the first feature extractor. The convolution process is the same as the convolution process in step 202, and is not described here. In one possible implementation manner, the image classification model may further include a global pooling layer, where a size of a scanning window of the global pooling layer is the same as a size of the first feature matrix, for example, a size of the first feature matrix is a×b×c, and a size of the scanning window may be a×b, where A, B, C is a positive integer, and specific values of A, B, C are not limited in this embodiment. The computer device may perform global pooling processing on the first feature matrix through the global pooling layer, for example, performing global tie pooling on the first image, where the computer device may obtain an average value of each element in the scanning window, and obtain a first feature vector of 1×1×c as an element in the first feature vector. Of course, the computer device may also obtain the first feature vector through other pooling methods, which is not limited in the embodiment of the present application. The process of obtaining the second feature vector by the computer device is the same as the process of obtaining the second feature vector, and will not be described herein.

It should be noted that the above description of acquiring the first feature vector is merely an exemplary illustration, and the embodiment of the present application does not limit what image feature extraction method is specifically adopted to acquire the first feature vector.

206. The computer device obtains a third feature vector based on the first feature vector and the second feature vector.

In one possible implementation, the computer device may obtain a difference vector between the first feature vector and the second feature vector, and take an absolute value of each value in the difference vector to obtain the third feature vector, where the third feature vector may be used to indicate a difference between the first feature vector and the second feature vector, that is, may be used to indicate a difference between the first image and the second image.

It should be noted that the above manner of embodying the binocular disparity based on the third feature vector is merely an exemplary illustration of a binocular disparity expression manner, and the computer device may also determine the disparity of the binocular images based on other manners, which is not limited in the embodiments of the present application.

207. The computer device performs image classification on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and outputs a label corresponding to the group of fundus images.

In one possible implementation manner, the computer device may splice the first feature vector, the second feature vector and the third feature vector to obtain a fourth feature vector corresponding to the set of fundus images, input the fourth feature vector into the first fully connected layer, map the fourth feature vector into a two-dimensional vector by the first fully connected layer, for example, the computer device may perform a convolution operation on the fourth feature vector based on at least one weight parameter in the fully connected layer, convert the fourth feature vector into a two-dimensional vector, and the computer device may use a label indicated by the two-dimensional vector as a label corresponding to the set of fundus images. In the embodiment of the application, one label may correspond to one two-dimensional vector, the label may be a fundus image of glaucoma or a fundus image of not glaucoma, and the correspondence between the label and the two-dimensional vector may be set by a developer.

In the embodiment of the application, the characteristic vectors corresponding to the left eye and the right eye and the characteristic vector for indicating the difference between the left eye and the right eye are spliced, prediction is performed based on the spliced vectors, and in the fundus image detection process, the characteristics of the eyes and the characteristic difference of the eyes are fused, so that the accuracy of fundus image detection is improved. Of course, the above description of fusion of binocular features based on feature vector stitching is merely an exemplary illustration of a binocular feature fusion manner, and the embodiment of the present application does not limit what binocular feature fusion manner is specifically adopted.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

According to the technical scheme, a group of fundus images to be detected is obtained through responding to an image detection instruction, the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, feature extraction is conducted on the first image and the second image through the image classification model, a first feature vector and a second feature vector are obtained, a third feature vector is obtained based on the first feature vector and the second feature vector, the third feature vector is used for indicating differences between the first feature vector and the second feature vector, image classification is conducted on the group of fundus images based on the first feature vector, the second feature vector and the third feature vector, and a label corresponding to the group of fundus images is output. In the fundus image detection process, image detection is performed based on the image features and the image differences of the left eye image and the right eye image, the labels corresponding to the images are obtained, and accuracy of fundus image detection results is improved.

In glaucoma detection, by applying the above fundus image detection method, medical staff can input a binocular fundus image of a patient into an image segmentation model and an image classification model, segment the fundus image by the image segmentation model, and identify and classify the segmented image to determine whether the patient suffers from glaucoma. In the embodiment of the application, the important clinical characteristic of glaucoma, namely the difference of the cup-disc ratios of the two eyes, is fused into the deep learning model, so that the image classification model can predict based on the two-eye images, the missing of potential glaucoma patients with the single-eye cup-disc ratio belonging to a normal range but the large difference of the cup-disc ratios between the two eyes is avoided, and the sensitivity and the accuracy of the deep learning model on the case level of glaucoma detection are improved. The fundus image detection method can be applied to various medical institutions to assist medical staff in diagnosis, and improves diagnosis efficiency and accuracy.

In this embodiment of the present application, after the computer device obtains the first feature vector and the second feature vector, the computer device may further perform label prediction on the first image and the second image based on the two feature vectors, that is, the task of predicting the binocular image, and further includes a task of predicting a left eye image and a task of predicting a right eye image. In one possible implementation, the computer device may input the first feature vector into a second full-connection layer, determine a left-eye label corresponding to the first image based on an output result of the second full-connection layer, input the second feature vector into a third full-connection layer, and determine a right-eye label corresponding to the second image based on an output result of the third full-connection layer.

Referring to fig. 3, fig. 3 is a schematic diagram of a fundus image detection process provided in the embodiment of the present application, the computer device may input a second fundus image 302 of a first fundus image 301 of a left eye and a right eye into an image segmentation model, the image segmentation model identifies a region of interest in each fundus image, so as to obtain a first image 303 including left eye optic disc information and a second image 304 including right eye optic disc information, the computer device may input the first image 303 and the second image 304 into an image classification model 305, obtain a first feature vector corresponding to the left eye and a second feature vector corresponding to the right eye based on a first feature extractor, a second feature extractor and other operation layers in the image classification model 305, and the computer device may perform an absolute value operation on a difference value between the feature vectors corresponding to the left eye and the right eye to obtain a third feature vector, splice the first feature vector, the second feature vector and the third feature vector to obtain a fourth feature vector, and perform a prediction label based on whether the fourth feature vector is glaucoma or not by a first full connection layer. In addition, the computer device can also input the features extracted by the left eye and the right eye into the independent full-connection layers respectively to predict whether the single eye is glaucoma, namely, the first feature vector and the second feature vector are respectively input into the second full-connection layer and the third full-connection layer to obtain a left eye label and a right eye label.

In the embodiment of the application, the auxiliary judgment is performed by individually predicting the labels corresponding to the left eye and the right eye, so that the accuracy and the comprehensiveness of the detection result can be improved, and the specific fundus image is positioned as the fundus image of glaucoma. Of course, when fundus eye detection is performed, the left eye image prediction task, the right eye image prediction task and the binocular image prediction task may be combined, and the computer device may execute at least one task of the left eye image prediction task, so as to implement fundus image detection.

The above embodiments mainly describe a process in which the computer device performs fundus image detection based on an image classification model, which is required to be trained before fundus image detection is performed. Fig. 4 is a flowchart of an image classification model training method provided in an embodiment of the present application, and referring to fig. 4, the method may specifically include the following steps:

401. the computer device initializes the various parameters in the image classification model.

The computer equipment carries out random assignment on each parameter in each convolution layer, global pooling layer and full connection layer of the image classification model to realize parameter initialization. In one possible implementation manner, the computer device may perform parameter initialization on the target recognition model by using a gaussian distribution with a variance of 0.01 and a mean of 0, and a specific method for initializing the model parameter in the embodiment of the present application is not limited.

402. The computer device inputs the training data set into the image classification model.

Wherein the training data set may include a plurality of sets of disc images, and a set of disc images may include left eye disc images and right eye disc images from the same patient.

In one possible implementation, the left eye optic disc image and the right eye optic disc image may be acquired based on an image segmentation model, the computer device may input the left eye fundus image and the right eye fundus image from the same patient into the image segmentation model, identify a region of interest, i.e., a region in which the optic disc is located, and the computer device may transform the region of interest, e.g., crop, translate, rotate, etc., to obtain multiple sets of optic disc images based on a set of fundus images, improving the diversity of data.

Of course, labeling information may be added to each video disc image, for example, whether the video disc image is a glaucoma image may be labeled, and the specific content of the labeling information is not limited in the embodiment of the present application.

403. The computer device obtains the output result of the image classification model and calculates the error between the output result and the correct detection result.

In embodiments of the present application, the computer device may calculate the error between the output result and the correct detection result via one or more loss functions.

In the embodiment of the present application, the binocular image prediction task, the left-eye image prediction task, and the right-eye image prediction task may correspond to different weights, where the weight value corresponding to the binocular image prediction task is the largest to highlight the importance of the binocular image prediction task, and each weight value may be set by a developer, for example, the weight of the binocular image prediction task may be set to 0.5, the weight of the left-eye image prediction task may be set to 0.25, and the weight of the right-eye image prediction task may be set to 0.25.

In one possible implementation, the computer device may calculate the error by the following loss function equation (1).

Wherein,can represent a binary cross entropy loss, as shown in the above formula (2)>The network parameter regularization loss can be represented, i can represent the number of each detection task, and in the embodiment of the application, numbers can be respectively added for the binocular image prediction task, the left eye image prediction task and the right eye image prediction task, and y _i Can represent the correct detection result of the i task, < >>Can represent the output result of the i task, u _i Weights corresponding to the tasks respectively can be represented, theta can represent parameters in the image classification model, lambda can represent regularization coefficients of the image classification model, and specific values of the regularization coefficients can be set by a developer.

404. And the computer equipment adjusts each parameter in the image classification model based on the error between the output result and the correct detection result of the image classification model until the target condition is met, so as to obtain the trained image classification model.

In one possible implementation, the computer device may compare the obtained plurality of errors with a plurality of error thresholds, respectively, and when any error value is greater than an error threshold, the computer device back propagates the plurality of errors to the image classification model, and solves each parameter in the object recognition model based on each error, where each parameter includes a parameter corresponding to a plurality of convolution kernels, a parameter corresponding to a global pooling layer, a corresponding parameter of each fully connected layer, and so on. For example, the computer device may reverse the error of the binocular image prediction task back to the first fully connected layer and the respective feature extractors, reverse the error of the left eye image prediction task back to the second fully connected layer and the first feature extractor, and reverse the error of the right eye image prediction task back to the third fully connected layer and the second feature extractor. The error thresholds can be set by a developer, and the number of the error thresholds is the same as the number of the acquired errors.

In the embodiment of the application, the target condition may be set by a developer, and in one possible implementation manner, the target condition may be set to reach a target number by the correct number of acquired output results, where the target number may be set by the developer. When the errors are smaller than the error threshold, the target recognition result obtained by the computer equipment is considered to be correct, the computer equipment continues to read the next group of video disc images, step 403 is executed, and if the number of the output results obtained by the computer equipment is correct and reaches the target number, that is, the target condition is met, the image classification model is considered to be trained.

Fig. 5 is a schematic structural diagram of a fundus image detecting apparatus according to an embodiment of the present application, and referring to fig. 5, the apparatus includes:

an image acquisition module 501, configured to acquire a set of fundus images to be detected in response to an image detection instruction, where the set of fundus images includes a first image corresponding to a left eye and a second image corresponding to a right eye;

An input module 502 for inputting the first image and the second image into an image classification model;

a first vector obtaining module 503, configured to perform feature extraction on the first image and the second image through the image classification model, so as to obtain a first feature vector and a second feature vector;

a second vector obtaining module 504, configured to obtain a third feature vector based on the first feature vector and the second feature vector, where the third feature vector is used to indicate a difference between the first feature vector and the second feature vector;

the image classification module 505 is configured to perform image classification on the set of fundus images based on the first feature vector, the second feature vector and the third feature vector, and output a label corresponding to the set of fundus images.

In one possible implementation, the image acquisition module 501 is configured to:

In one possible implementation, the first vector acquisition module 503 is configured to:

In one possible implementation, the second vector acquisition module 504 is configured to:

In one possible implementation, the image classification module 505 is configured to:

In one possible implementation, the method further includes:

According to the device provided by the embodiment of the application, a group of fundus images to be detected is obtained by responding to an image detection instruction, the group of fundus images comprises a first image corresponding to a left eye and a second image corresponding to a right eye, the first image and the second image are input into an image classification model, the first image and the second image are subjected to feature extraction through the image classification model, a first feature vector and a second feature vector are obtained, a third feature vector is obtained based on the first feature vector and the second feature vector, the third feature vector is used for indicating the difference between the first feature vector and the second feature vector, the group of fundus images are subjected to image classification based on the first feature vector, the second feature vector and the third feature vector, and the label corresponding to the group of fundus images is output. By using the fundus image detection device, image detection is performed based on the image characteristics and the image differences of the left eye image and the right eye image, the labels corresponding to the images are obtained, and the accuracy of fundus image detection results is improved.

It should be noted that: the fundus image detecting device provided in the above embodiment is only exemplified by the division of the above functional modules when detecting fundus images, and in practical application, the above functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the fundus image detection device and the fundus image detection method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not described herein again.

The computer device provided by the above technical solution may be implemented as a terminal or a server, for example, fig. 6 is a schematic structural diagram of a terminal provided in an embodiment of the present application. The terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 600 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 600 includes: one or more processors 601 and one or more memories 602.

Processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 601 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 601 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 601 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 601 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one program code for execution by processor 601 to implement the fundus image detection methods provided by the method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603, and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 603 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 604, a display 605, a camera assembly 606, audio circuitry 607, and a power supply 609.

Peripheral interface 603 may be used to connect at least one Input/Output (I/O) related peripheral to processor 601 and memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 601, memory 602, and peripheral interface 603 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 604 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 604 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 604 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 604 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited in this application.

The display screen 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 605 is a touch display, the display 605 also has the ability to collect touch signals at or above the surface of the display 605. The touch signal may be input as a control signal to the processor 601 for processing. At this point, the display 605 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 605 may be one, providing a front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in some embodiments, the display 605 may be a flexible display, disposed on a curved surface or a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 605 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 606 is used to capture images or video. Optionally, the camera assembly 606 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing, or inputting the electric signals to the radio frequency circuit 604 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 607 may also include a headphone jack.

A power supply 609 is used to power the various components in the terminal 600. The power source 609 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 600 further includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyroscope sensor 612, pressure sensor 613, optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 601 may control the display screen 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 611. The acceleration sensor 611 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 may collect a 3D motion of the user on the terminal 600 in cooperation with the acceleration sensor 611. The processor 601 may implement the following functions based on the data collected by the gyro sensor 612: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 613 may be disposed at a side frame of the terminal 600 and/or at a lower layer of the display 605. When the pressure sensor 613 is disposed at a side frame of the terminal 600, a grip signal of the terminal 600 by a user may be detected, and a left-right hand recognition or a shortcut operation may be performed by the processor 601 according to the grip signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 605. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 615 is used to collect ambient light intensity. In one embodiment, processor 601 may control the display brightness of display 605 based on the intensity of ambient light collected by optical sensor 615. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 605 is turned up; when the ambient light intensity is low, the display brightness of the display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 based on the ambient light intensity collected by the optical sensor 615.

A proximity sensor 616, also referred to as a distance sensor, is typically provided on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front of the terminal 600. In one embodiment, when the proximity sensor 616 detects a gradual decrease in the distance between the user and the front face of the terminal 600, the processor 601 controls the display 605 to switch from the bright screen state to the off screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually increases, the processor 601 controls the display screen 605 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 6 is not limiting of the terminal 600 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 701 and one or more memories 702, where at least one program code is stored in the one or more memories 702, and the at least one program code is loaded and executed by the one or more processors 701 to implement the methods provided in the foregoing method embodiments. Of course, the server 700 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, there is also provided a computer-readable storage medium, for example, a memory including at least one program code executable by a processor to perform the fundus image detection method in the above-described embodiment. For example, the computer readable storage medium may be Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), compact disc Read-Only Memory (CD-ROM), magnetic tape, floppy disk, optical data storage device, and the like.

It will be appreciated by those of ordinary skill in the art that all or part of the steps of implementing the above-described embodiments may be implemented by hardware, or may be implemented by at least one piece of hardware associated with a program, where the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or optical disk, etc.

The foregoing description of the preferred embodiments is merely exemplary in nature and is not intended to limit the invention, but is intended to cover various modifications, substitutions, improvements, and alternatives falling within the spirit and principles of the invention.

Claims

1. A fundus image detecting apparatus, characterized in that the fundus image detecting apparatus is configured to:

splicing the first feature vector, the second feature vector and the third feature vector to obtain a fourth feature vector corresponding to the group of fundus images; inputting the fourth feature vector into a first full connection layer, and mapping the fourth feature vector into a two-dimensional vector by the first full connection layer; taking the label indicated by the two-dimensional vector as a label corresponding to the group of fundus images, wherein the label is a fundus image of glaucoma or a fundus image of glaucoma; outputting labels corresponding to the group of fundus images; the method comprises the steps of,

Inputting the first feature vector into a second full-connection layer, and determining a left eye label corresponding to the first image based on an output result of the second full-connection layer; and inputting the second feature vector into a third full-connection layer, and determining a right eye label corresponding to the second image based on an output result of the third full-connection layer.

2. The apparatus according to claim 1, wherein the fundus image detecting apparatus is configured to:

and acquiring an image of a target area from the fundus image based on the numerical value of each element in the probability matrix, wherein the center of the image of the target area coincides with the center of the optic disc.

3. The apparatus according to claim 2, wherein the fundus image detecting apparatus is configured to:

determining the optic disc center and optic disc diameter based on the binarization matrix;

and determining the target area based on the center of the video disc and the diameter of the video disc, wherein the center of the target area coincides with the center of the video disc.

4. The apparatus according to claim 1, wherein the fundus image detecting apparatus is configured to:

and respectively extracting features of the first image and the second image through a first feature extractor and a second feature extractor in the image classification model to obtain a first feature matrix corresponding to the first image and a second feature matrix corresponding to the second image, and respectively carrying out global pooling treatment on the first feature matrix and the second feature matrix to obtain the first feature vector and the second feature vector.

5. The apparatus of claim 4, wherein the first feature extractor and the second feature extractor are of the same parameters.

6. The apparatus according to claim 1, wherein the fundus image detecting apparatus is configured to:

7. A fundus image detection method, the method comprising:

8. The method of claim 7, wherein the acquiring a set of fundus images to be detected in response to an image detection instruction, the set of fundus images including a first image corresponding to a left eye and a second image corresponding to a right eye, comprises:

9. The method of claim 8, wherein the acquiring an image of the target region from the fundus image based on the numerical sizes of the respective elements in the probability matrix comprises:

10. The method of claim 7, wherein the feature extraction of the first image and the second image by the image classification model to obtain a first feature vector and a second feature vector comprises:

11. The method of claim 10, wherein the first feature extractor and the second feature extractor are of the same parameters.

12. The method of claim 7, wherein the obtaining a third feature vector based on the first feature vector and the second feature vector comprises:

13. A fundus image detection apparatus, the apparatus comprising:

a second vector acquisition module, configured to acquire a third feature vector based on the first feature vector and the second feature vector, where the third feature vector is used to indicate a difference between the first feature vector and the second feature vector;

The image classification module is used for splicing the first feature vector, the second feature vector and the third feature vector to obtain a fourth feature vector corresponding to the group of fundus images; inputting the fourth feature vector into a first full connection layer, and mapping the fourth feature vector into a two-dimensional vector by the first full connection layer; taking the label indicated by the two-dimensional vector as a label corresponding to the group of fundus images, wherein the label is a fundus image of glaucoma or a fundus image of glaucoma; outputting labels corresponding to the group of fundus images; the method comprises the steps of,

the right eye label determining module is used for inputting the second feature vector into a third full-connection layer, and determining a right eye label corresponding to the second image based on an output result of the third full-connection layer.

14. The apparatus of claim 13, wherein the image acquisition module is to:

15. The apparatus of claim 14, wherein the image acquisition module is to:

16. The apparatus of claim 13, wherein the first vector acquisition module is configured to:

17. The apparatus of claim 16, wherein the first feature extractor and the second feature extractor are of the same parameters.

18. The apparatus of claim 13, wherein the second vector acquisition module is to:

19. A computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to perform the operations performed by the fundus image detection method of any of claims 7 to 12.