CN108717693A

CN108717693A - A kind of optic disk localization method based on RPN

Info

Publication number: CN108717693A
Application number: CN201810372284.5A
Authority: CN
Inventors: 王丽冉; 汤平; 汤一平; 何霞; 陈朋
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2018-10-30

Abstract

A kind of optic disk localization method based on RPN includes that the extraction of eye fundus image global feature, the optic disk region Preliminary detection based on RPN networks and the optic disk candidate region based on depth convolutional neural networks based on depth convolutional neural networks carry out position refine.The present invention is based on the method for deep learning, structure depth convolutional neural networks automatically position optic disk, can realize the accurate, quick of optic disk, robust positioning, assist the diagnostic process of fundus oculi disease.

Description

A kind of optic disk localization method based on RPN

Technical field

The present invention relates to a kind of localization methods, and in particular to computer vision, Digital Image Processing, pattern-recognition, depth The technologies such as study and depth convolutional neural networks are automatically positioned the application in field in eye fundus image optic disk.

Background technology

Fundus examination is pre- the ophthalmology diseases such as diabetic retinopathy, glaucoma, treating senile maculopathy It is widely applied in anti-, diagnosing and treating.Blood vessel therein, central fovea, macula lutea, optic disk are the retinal maps in addition to background The main Observable object of picture.In normal retina eye fundus image, optic disk shows as round glassy yellow region, and diameter is about eye As the 1/7 of ROI region, Major Vessels are entered eye from optic disk and are extended to ambient radiation base map, as the confluence of blood vessel, Including a large amount of thicker blood vessel.Automatic optic disk positioning is retinal fundus images analysis and all kinds of retinopathies of computer-aided diagnosis The premise and committed step of change；It can be used for the positioning other anatomical structures of retina such as blood vessel and macula lutea；It is important that it has Help establish the fundus coordinate system of retina, can be used for positioning other retinopathy such as overflow, scarcely perceptible pulse tumor, bleeding Deng.Therefore, quickly and accurately positioning eye fundus image optic disk region is particularly important.

Earliest optic disk localization method mainly using the appearance characteristics of optic disk such as brightness, shape information as location feature, The success rate of optic disk detection of the such methods in the preferable bottom of the normal eyes image of quality is higher.But in lesion image, by Optic disk outward appearance change or other large area light tone lesion regions is caused to interfere in lesion, optic disk detection is easy error.Due to blood vessel Structure is stablized relatively in the picture, characteristic remarkable, therefore the usual robustness of localization method based on vessel properties is preferable, in lesion There is more stable performance in the positioning of image optic disk.The most important problem of such methods is vessel extraction, and itself is a very The work of complicated and time consumption, even up to a few minutes, and blood vessel segmentation is also easy to be done by objects such as lesion, optic disk profiles It disturbs.Current optic disk location algorithm needs the artificial brightness for extracting optic disk, blood vessel, Texture eigenvalue.The quality of Feature Selection exists It has been largely fixed the quality of locating effect.The artificial method experience dependence for carrying out feature extraction is strong, take time and effort, It is difficult to extract the optic disk feature with expressed intact power.Therefore, realize that optic disk efficient to eye fundus image, accurate, robust is fixed There is certain difficulty in position.

Invention content

In order to overcome the shortcomings of that current optic disk localization method can not take into account accuracy and rapidity, the present invention proposes a kind of Method based on deep learning, structure depth convolutional neural networks automatically position optic disk, realize efficient to optic disk, accurate Really, the positioning of robust.

The technical solution adopted by the present invention to solve the technical problems is：

A kind of optic disk localization method based on RPN includes the eye fundus image global feature based on depth convolutional neural networks Extraction, the optic disk region Preliminary detection based on RPN networks, the optic disk candidate region based on depth convolutional neural networks carry out position Refine；

The eye fundus image global feature extraction based on depth convolutional neural networks, the depth convolutional neural networks As the basic network of whole network model, five layers are divided into, the depth knot being alternately made of convolutional layer, active coating and pond layer Structure implicitly carries out parameter learning from given eye fundus image data；

The optic disk region Preliminary detection based on RPN networks, RPN networks generate the candidate region information of target, packet Include target category, confidence level and location information；

The optic disk candidate region based on depth convolutional neural networks carries out position refine, the depth convolutional Neural Network is made of full articulamentum, carries out further feature extraction to the candidate region obtained on last stage, input area is in a network It is mapped layer by layer, obtains different representations, extract its abstract characteristics, to realize the position refine to candidate region, Obtain the result of optic disk positioning.

Further, it in the eye fundus image global feature extraction based on depth convolutional neural networks, is grasped by convolution Make, so that prime information is enhanced and reduce noise；It is operated by pondization, using the principle of image local correlation, son is carried out to image Sampling reduces the treating capacity of data on the basis of retaining image useful information；

Network receives the eye fundus image of arbitrary dimension as input, and network structure is as follows：The volume of first convolutional layer Conv1 Product core number is 96, and size is 7 × 7 × 3, and convolution step-length is 2, Filling power 3；The Chi Huahe of first pond layer (Pool1) It is 7 × 7 × 3, pond step-length is 2, Filling power 1；ReLU active coatings 1 are then carried out to handle；Second convolutional layer Conv2 has 256 convolution kernels, size are 5 × 5 × 96, step-length 2, Filling power 2；The Chi Huahe of second pond layer Pool2 be 7 × 7 × 96, step-length 2, Filling power 1；ReLU active coatings 1 are then carried out to handle；Third convolutional layer Conv3 has 384 convolution kernels, Size is 3 × 3 × 256, Filling power 1；ReLU active coatings 1 are then carried out to handle；4th convolutional layer Conv4 has 384 volumes Product core, size are 3 × 3 × 384, Filling power 1；ReLU active coatings 1 are then carried out to handle；5th convolutional layer Conv5 has 256 A convolution kernel, size are 3 × 3 × 384, Filling power 1；ReLU active coatings 1 are then carried out to handle；

By this five layers of feature extraction, every eye fundus image obtains 256 characteristic patterns, the input as RPN networks.

Further, in the optic disk region Preliminary detection based on RPN networks, RPN receives 256 that basic network generates Characteristic pattern is opened as input, and after-treatment, output rectangular target candidate are carried out to characteristic pattern using three convolutional layers and algorithm layer The set of frame, each frame include 4 position coordinates variables and a score；

It is 3 × 3 × 256 that first convolutional layer Conv1/rpn of RPN networks, which has 256 convolution kernels, size,；RPN networks It is the third convolutional layer Conv3/ of 1 × 1 × 256, RPN networks that second convolutional layer Conv2/rpn, which has 18 convolution kernels, size, It is 1 × 1 × 256 that rpn, which has 36 convolution kernels, size,；

RPN networks additionally add algorithm layer for formation zone candidate frame, and multiple dimensioned convolution behaviour is carried out on characteristic pattern Make, process is：3 kinds of scales and 3 kinds of length-width ratios are used in the position of each sliding window, in being with current sliding window mouth center The heart, and a kind of corresponding scale and length-width ratio, then mapping obtains the candidate region of 9 kinds of different scales in artwork, such as size For the shared convolution characteristic pattern of w × h, then a total of w × h × 9 candidate region；Finally, classification layer exports w × h × 9 × 2 The score of candidate region is the estimated probability of target/non-targeted to each region, return layer output w × h × 9 × 4 ginseng Number, the i.e. coordinate parameters of candidate region；

Training process is as follows in RPN networks：First with each point on 3 × 3 sliding window traversal characteristic pattern, find Sliding window central point is mapped in the position in artwork, and point centered on it at the point, and 3 kinds of scales are generated in artwork (1282,2562,5122) and 3 kinds of length-width ratios (1:1,2:1,1:2) candidate region is put each of on characteristic pattern in artwork 9 candidate regions are all corresponded to, if characteristic pattern size is w × h, then the candidate region number generated is w × h × 9, next to institute There is candidate region to be screened and judged twice twice；Leave out the candidate region beyond artwork range first to complete to sieve for the first time Choosing then calculates itself and all true label areas to remaining candidate region and hands over the ratio between unions i.e. Duplication, and according to than Value is that a binary label is distributed in each candidate region, judges that the region is optic disk with this, criterion is：1) will The maximum candidate region of ratio is considered as positive sample, i.e. optic disk；2) in other candidate regions, if ratio is more than 0.7, then it is assumed that be Positive sample is less than 0.3, then it is assumed that be negative sample, i.e., background, the candidate region that ratio is interposed between the two are given up；

Candidate region and the calculating of true callout box GT Duplication are indicated by formula (1)：

After completing to the postsearch screening of candidate region, second of marker for judgment is carried out to it, i.e., there will be maximum hand over simultaneously with it Label of the label of the true tab area of the ratio between collection as the candidate region, i.e. foreground label, and added for all negative samples Background label carries out stochastical sampling to positive negative sample, and number of samples is set as 128, and oversampling ratio is set as 1:1, under normal circumstances just Sample number is less, if positive sample number is less than 64, differential section is supplied by negative sample, in subsequent network just by 128 Negative sample is merged trains together, with the discrimination of enhancing mark sample and non-mark sample.

Further, the optic disk candidate region based on depth convolutional neural networks carries out position refine, depth volume Product neural network is added pyramid pond layer and carries out dimension normalization, network utilize full articulamentum to the candidate region after sampling into Row feature extraction, candidate region shares 9 kinds of sizes, and full articulamentum requires input size consistent, therefore herein first with pyramid Pond layer carries out dimension normalization, then is sent to three full articulamentums progress further feature extractions, full articulamentum in sub-network Output neuron number is set as 2048, obtains the feature vector of 2048 dimensions；Then, this feature vector is respectively fed to two Full articulamentum carries out Feature Compression, and full articulamentum output neuron number is set as 2 and 8；Finally, by output valve respectively with really Label value compares, and carries out the recurrence constraint of loss function；

Loss function is indicated by formula (2)：

In formula, classification loss function is defined as by formula (3)：

Position returns loss function and is defined as by formula (4)：

R is the loss function smooth of robust_L1, it is expressed as by formula (5)：

In formula, N_clsAnd N_regBe to avoid the regular terms of over-fitting, λ it is weight coefficient, i is the classification rope of the candidate region Draw value, t_iIt is the prediction coordinate shift amount of the candidate region, t*i is the actual coordinate offset of the candidate region, p_iIt is pre- astronomical observation Favored area belongs to the probability of the i-th class, and p*i indicates that its true classification, p*i=0 indicate that background classes, p*i=1 indicate optic disk class；

The error between predicted value and given actual value is calculated separately by the two loss functions, is calculated using backpropagation Method returns error layer by layer, and every layer of parameter is adjusted and is updated using stochastic gradient descent method, more new formula such as formula (6) It is shown so that the predicted value of network is closer to actual value, i.e. in the closer given mark value of output of most latter two full articulamentum Classification and location information；

In formula, w and w' are respectively to update front and back parameter value, and E is the error amount being calculated by loss function layer, η For learning rate.

This invention address that realize eye fundus image efficiently, accurate, robust optic disk positioning.Two keys that the present invention relies on Technology introduction is as follows：

The first, convolutional neural networks

Deep learning was used widely in computer vision field in recent years, this has benefited from the quick of depth learning technology Development, convolutional neural networks can make full use of a large amount of training sample and extract abstracted information therein layer by layer, more directly More fully the further feature of image is arrived in study, these features are proved in a large amount of task than traditional manual extraction feature With stronger characterization ability, the overall structure of image can be described in more detail below.Convolutional neural networks technology from R-CNN, Fast R-CNN develop to Faster R-CNN, develop to FCN from CNN, almost cover the meters such as target detection, classification, segmentation Several big key areas of calculation machine vision.

Convolutional neural networks are that the sensory perceptual system of the imitation mankind is built.Human brain is to pass layer by layer to the processing of information It passs, from specific to an abstract process, low-level feature is handled and extracted to input information, obtains the essence letter of data Breath, so form brain it will be appreciated that higher level of abstraction information, the structure of this class type remains the essential information of object, and Reduce the data volume of human brain processing.The pyramid structure for simulating human brain is transmitted into row information so that depth convolutional neural networks An important advantage be exactly successively to extract information from Pixel-level initial data to abstract semantic concept, to make it extract There is advantage outstanding in terms of the further feature and semantic information of image.

The second, RPN networks

The purpose of RPN be realize " attention " mechanism, tell which region subsequent network should pay attention to, it from appoint The a series of candidate region with objectness score is obtained in the picture for size of anticipating.Concrete operations are：Use one Small network carries out slip scan on the feature map obtained by convolutional calculation, this small network is every A secondary window on a characteristic pattern is slided, and a low-dimensional vector is mapped to after slide, finally that this is low Dimensional vector is sent to the full articulamentum of two independences/parallel：Box returns layer and box classification layers.

Beneficial effects of the present invention are：Method based on deep learning, structure depth convolutional neural networks are automatically to optic disk It is positioned, realizes positioning efficient to optic disk, accurate, robust.

Description of the drawings

Fig. 1 is the overall network frame diagram positioned to optic disk；

Fig. 2 is RPN network structures；

Fig. 3 is optic disk positioning result.

Fig. 4 is a kind of flow chart of the optic disk localization method based on RPN.

Specific implementation mode

The present invention will be further described below in conjunction with the accompanying drawings.

Referring to Fig.1~Fig. 4, a kind of optic disk localization method based on RPN, includes the eyeground based on depth convolutional neural networks The extraction of image global feature, the optic disk region Preliminary detection based on RPN networks, the optic disk based on depth convolutional neural networks are candidate Region carries out position refine；

Loss function is indicated by formula (2)：

In formula, classification loss function is defined as by formula (3)：

Position returns loss function and is defined as by formula (4)：

Claims

1. a kind of optic disk localization method based on RPN, it is characterised in that：The method includes based on depth convolutional neural networks The extraction of eye fundus image global feature, the optic disk region Preliminary detection based on RPN networks, the optic disk based on depth convolutional neural networks Candidate region carries out position refine；

The eye fundus image global feature extraction based on depth convolutional neural networks, the depth convolutional neural networks conduct The basic network of whole network model is divided into five layers, the depth structure being alternately made of convolutional layer, active coating and pond layer, Implicitly parameter learning is carried out from given eye fundus image data；

The optic disk region Preliminary detection based on RPN networks, RPN networks generate the candidate region information of target, including mesh Mark classification, confidence level and location information；

The optic disk candidate region based on depth convolutional neural networks carries out position refine, and depth convolutional neural networks are by complete Articulamentum forms, and carries out further feature extraction to the candidate region obtained on last stage, input area carries out layer by layer in a network Mapping, obtains different representations, extracts its abstract characteristics, to realize the position refine to candidate region, obtains optic disk The result of positioning.

2. a kind of optic disk localization method based on RPN as described in claim 1, it is characterised in that：Described is rolled up based on depth In the eye fundus image global feature extraction of product neural network, by convolution operation, so that prime information is enhanced and reduce noise；Pass through pond Change operation, using the principle of image local correlation, sub-sample is carried out to image, is subtracted on the basis of retaining image useful information The treating capacity of a small number of evidences；

Network receives the eye fundus image of arbitrary dimension as input, and network structure is as follows：The convolution kernel of first convolutional layer Conv1 Number is 96, and size is 7 × 7 × 3, and convolution step-length is 2, Filling power 3；The Chi Huahe of first pond layer (Pool1) be 7 × 7 × 3, pond step-length is 2, Filling power 1；Then carry out ReLU active coating processing；Second convolutional layer Conv2 has 256 volumes Product core, size are 5 × 5 × 96, step-length 2, Filling power 2；The Chi Huahe of second pond layer Pool2 is 7 × 7 × 96, step-length It is 2, Filling power 1；Then carry out ReLU active coating processing；Third convolutional layer Conv3 has 384 convolution kernels, and size is 3 × 3 × 256, Filling power 1；Then carry out ReLU active coating processing；4th convolutional layer Conv4 has 384 convolution kernels, size to be 3 × 3 × 384, Filling power 1；Then carry out ReLU active coating processing；5th convolutional layer Conv5 has 256 convolution kernels, greatly Small is 3 × 3 × 384, Filling power 1；Then carry out ReLU active coating processing；

3. a kind of optic disk localization method based on RPN as claimed in claim 1 or 2, it is characterised in that：It is described to be based on RPN nets In the optic disk region Preliminary detection of network, RPN receives 256 characteristic patterns that basic network generates and is used as input, utilizes three convolution Layer and algorithm layer carry out after-treatment to characteristic pattern, export the set of rectangular target candidate frame, and each frame includes 4 position coordinates Variable and a score；

It is 3 × 3 × 256 that first convolutional layer Conv1/rpn of RPN networks, which has 256 convolution kernels, size,；The second of RPN networks It is the third convolutional layer Conv3/rpn of 1 × 1 × 256, RPN networks that a convolutional layer Conv2/rpn, which has 18 convolution kernels, size, It is 1 × 1 × 256 to have 36 convolution kernels, size；

RPN networks additionally add algorithm layer for formation zone candidate frame, and multiple dimensioned convolution operation is carried out on characteristic pattern, tool Body is embodied as：3 kinds of scales and 3 kinds of length-width ratios are used in the position of each sliding window, in being with current sliding window mouth center The heart, and a kind of corresponding scale and length-width ratio, then mapping obtains the candidate region of 9 kinds of different scales in artwork, such as size For the shared convolution characteristic pattern of w × h, then a total of w × h × 9 candidate region；Finally, classification layer exports w × h × 9 × 2 The score of candidate region is the estimated probability of target/non-targeted to each region, return layer output w × h × 9 × 4 ginseng Number, the i.e. coordinate parameters of candidate region；

Training process is as follows in RPN networks：First with each point on 3 × 3 sliding window traversal characteristic pattern, the point is found Place's sliding window central point is mapped in the position in artwork, and point centered on it, and 3 kinds of scales (128 are generated in artwork², 256², 512²) and 3 kinds of length-width ratios (1:1,2:1,1:2) candidate region, i.e., point corresponds to 9 in artwork each of on characteristic pattern A candidate region, if characteristic pattern size is w × h, then the candidate region number generated is w × h × 9, next to all candidates It is screened and is judged twice twice in region；Leave out the candidate region beyond artwork range first to complete to screen for the first time, then It is calculated to remaining candidate region and hands over the ratio between union i.e. Duplication with all true label areas, and is each according to ratio A binary label is distributed in candidate region, judges that the region is optic disk with this, criterion is：1) by ratio maximum Candidate region be considered as positive sample, i.e. optic disk；2) in other candidate regions, if ratio is more than 0.7, then it is assumed that it is positive sample, Less than 0.3, then it is assumed that be negative sample, i.e., background, the candidate region that ratio is interposed between the two are given up；

After completing to the postsearch screening of candidate region, second of marker for judgment is carried out to it, i.e., will have with it is maximum hand over union it Label of the label of the true tab area of ratio as the candidate region, i.e. optic disk label, and add background for all negative samples Label carries out stochastical sampling to positive negative sample, and number of samples is set as 128, and oversampling ratio is set as 1:1, positive sample under normal circumstances Number is less, if positive sample number is less than 64, differential section is supplied by negative sample, by 128 positive and negative samples in subsequent network This is merged trains together, to enhance the discrimination of optic disk sample and non-optic disk sample.

4. a kind of optic disk localization method based on RPN as claimed in claim 1 or 2, it is characterised in that：It is described to be rolled up based on depth The optic disk candidate region of product neural network carries out in the refine of position, and depth convolutional neural networks are added pyramid pond layer and carry out ruler Degree normalization；

Network carries out feature extraction using full articulamentum to the candidate region after sampling, and candidate region shares 9 kinds of sizes, and connects entirely It is consistent to connect layer requirement input size, therefore carries out dimension normalization first with pyramid pond layer herein, then is sent to three entirely Articulamentum carries out further feature extraction, and full articulamentum output neuron number is set as 2048 in sub-network, obtains 2048 dimensions Feature vector；Then, this feature vector is respectively fed to two full articulamentums and carries out Feature Compression, full articulamentum output nerve First number is set as 2 and 8；Finally, output valve is compared with true tag value respectively, carries out the recurrence constraint of loss function；

Loss function is indicated by formula (2)：

In formula, classification loss function is defined as by formula (3)：

Position returns loss function and is defined as by formula (4)：

In formula, N_clsAnd N_regIt is to avoid the regular terms of over-fitting, λ is weight coefficient, and i is the classification index of the candidate region Value, t_iIt is the prediction coordinate shift amount of the candidate region, t*i is the actual coordinate offset of the candidate region, p_iIt is predicting candidate Region belongs to the probability of the i-th class, and p*i indicates that its true classification, p*i=0 indicate that background classes, p*i=1 indicate optic disk class；

The error between predicted value and given actual value is calculated separately by the two loss functions, it will using back-propagation algorithm Error returns layer by layer, and every layer of parameter is adjusted and is updated using stochastic gradient descent method, more new formula such as formula (6) institute Show so that the predicted value of network is closer to actual value, i.e. in the closer given mark value of output of most latter two full articulamentum Classification and location information；

In formula, w and w' are respectively to update front and back parameter value, and E is the error amount being calculated by loss function layer, and η is to learn Habit rate.