CN111783830A - Retina classification method and device based on OCT, computer equipment and storage medium - Google Patents

Retina classification method and device based on OCT, computer equipment and storage medium Download PDF

Info

Publication number
CN111783830A
CN111783830A CN202010475698.8A CN202010475698A CN111783830A CN 111783830 A CN111783830 A CN 111783830A CN 202010475698 A CN202010475698 A CN 202010475698A CN 111783830 A CN111783830 A CN 111783830A
Authority
CN
China
Prior art keywords
classification
splitting
index
retina
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010475698.8A
Other languages
Chinese (zh)
Inventor
王关政
王立龙
王瑞
范栋轶
吕传峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010475698.8A priority Critical patent/CN111783830A/en
Priority to PCT/CN2020/099518 priority patent/WO2021120587A1/en
Publication of CN111783830A publication Critical patent/CN111783830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention relates to artificial intelligence, and provides a retina classification method, a device, computer equipment and a storage medium based on OCT, wherein the retina classification method based on OCT comprises the following steps: acquiring a sample data set from a preset database; aiming at the training samples in the sample data set, a decision tree is constructed by using a random forest algorithm to obtain a retina classification model; acquiring GCC parameters to be identified, which are obtained by a user through OCT scanning, from a preset user library; performing feature extraction on the GCC parameter to be identified to obtain y data features; and importing the y data characteristics into the retina classification model for classification, and outputting a classification result corresponding to the GCC parameter to be identified. The invention also relates to a blockchain technique, the data characteristics being storable in a blockchain. The invention can improve the accuracy of GCC parameter classification and identification.

Description

Retina classification method and device based on OCT, computer equipment and storage medium
Technical Field
The present invention relates to artificial intelligence, and in particular, to a retina classification method and apparatus based on OCT, a computer device, and a storage medium.
Background
At present, the conventional examination of patients with ophthalmic diseases is mainly based on an Optical Coherence Tomography (OCT), the device can safely and contactlessly obtain the parameter value of the GCC of the fundus retinal macular region of an examinee, and is beneficial to doctors to diagnose the retina by combining classification through identifying and classifying the parameter value of the GCC, so that the diagnosis efficiency and accuracy are improved.
Disclosure of Invention
The embodiment of the invention provides a retina classification method, a retina classification device, computer equipment and a storage medium based on OCT (optical coherence tomography), which aim to solve the problems that the accuracy of the traditional method for identifying and classifying GCC (generalized cognitive control) parameters is low, the diagnosis accuracy of a target user is influenced and the working efficiency is reduced.
An OCT-based retinal classification method comprising:
acquiring a sample data set from a preset database, wherein the sample data set consists of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
aiming at the training samples in the sample data set, a decision tree is constructed by using a random forest algorithm to obtain a retina classification model;
acquiring GCC parameters to be identified, which are obtained by a user through OCT scanning, from a preset user library;
performing feature extraction on the GCC parameter to be identified to obtain y data features, wherein y is a positive integer greater than 1;
and importing the y data characteristics into the retina classification model for classification, and outputting a classification result corresponding to the GCC parameter to be identified.
An OCT-based retinal classification device comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample data set from a preset database, the sample data set consists of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
the construction module is used for constructing a decision tree by using a random forest algorithm aiming at the training samples in the sample data set to obtain a retina classification model;
the second acquisition module is used for acquiring the GCC parameters to be identified, which are obtained by the user through OCT scanning, from a preset user library;
the characteristic extraction module is used for carrying out characteristic extraction on the GCC parameter to be identified to obtain y data characteristics, wherein y is a positive integer greater than 1;
and the classification module is used for importing the y data characteristics into the retina classification model for classification and outputting a classification result corresponding to the GCC parameter to be identified.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the OCT-based retinal classification method described above when executing the computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the OCT-based retinal classification method described above.
According to the OCT-based retina classification method, the OCT-based retina classification device, the computer equipment and the storage medium, the decision tree construction is carried out by utilizing the acquired sample data set to obtain the retina classification model, the to-be-recognized GCC parameters obtained by the user through OCT scanning are acquired, the features of the to-be-recognized GCC parameters are extracted to obtain the data features, and finally the data features are imported into the retina classification model to be classified to obtain the classification results corresponding to the to-be-recognized GCC parameters. By means of the method for constructing the decision tree by using the sample data set to obtain the retina classification model, the retina classification model can be trained by using the data characteristics similar to the diagnosis logic of the target user, accuracy of recognition and classification of the retina classification model is improved, and effectiveness of the classification result is guaranteed, so that the accuracy of diagnosis of the target user according to the classification result is improved, and working efficiency of the target user is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flow chart of OCT image based retinal classification provided by embodiments of the present invention;
fig. 2 is a flowchart of step S2 in classifying retina based on OCT images according to an embodiment of the present invention;
fig. 3 is a flowchart of step S25 in classifying retina based on OCT images according to an embodiment of the present invention;
fig. 4 is a flowchart of step S253 of classifying retina based on OCT images according to an embodiment of the present invention;
FIG. 5 is a flowchart of calculating and splitting a target Gini index in OCT image based retinal classification according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a retina classification device based on OCT images according to an embodiment of the present invention;
fig. 7 is a block diagram of a basic mechanism of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The OCT-based retina classification method is applied to a server side, and the server side can be specifically realized by an independent server or a server cluster consisting of a plurality of servers. In one embodiment, as shown in fig. 1, there is provided an OCT-based retinal classification method, including the steps of:
s1: the method comprises the steps of obtaining a sample data set from a preset database, wherein the sample data set is composed of q training samples, the training samples are GCC parameters, and q is a positive integer larger than 1.
In the embodiment of the present invention, the sample data set is directly obtained from the preset database, where the preset database is a database specially used for storing the sample data set.
It should be noted that the sample data set includes q training samples, the training samples are GCC parameters, each training sample has a corresponding classification feature, and the classification feature is mainly a disease category set by the user.
Further, the training sample is mainly GCC parameters obtained by OCT equipment scanning, and the GCC parameters are composed of data features corresponding to 5 GCC thicknesses, which are respectively: all Avg, Sup Avg, Inf Avg, FLV, GLV.
S2: and aiming at the training samples in the sample data set, performing decision tree construction by using a random forest algorithm to obtain a retina classification model.
In the embodiment of the present invention, a plurality of training samples are randomly extracted from a sample data set, specifically, a random sampling mode may be adopted, the random sampling is replaced random sampling, K rounds of extraction are repeatedly performed in the sample data set, a result of each round of extraction is used as a sub-training set, and K sub-training sets are obtained, where the K sub-training sets are independent of each other, and repeated training samples may exist in the sub-training sets.
It should be noted that, the number of extracted training samples may be specifically obtained according to historical experience, or an appropriate training sample may be extracted according to specific business needs, and the extracted training sample is used as a sub-training set to perform machine model training, although the more the training sample data is, the more accurate the training sample data is, the higher the training cost is, and the more difficult the implementation is, and the specific number may be extracted according to the needs of practical application, which is not limited herein.
Further, a random forest algorithm is used for constructing a decision tree, one decision tree is constructed for each sub-training set to obtain K decision trees, and then a random forest is constructed according to the K generated decision trees to obtain a retina classification model.
S3: and acquiring the GCC parameters to be identified, which are obtained by the user through OCT scanning, from a preset user library.
Specifically, the GCC parameters to be identified, which are obtained by scanning the user through the OCT equipment, are directly acquired from a preset user library, and after the GCC parameters to be identified are acquired, the GCC parameters to be identified are deleted from the preset user library. The preset user library is a database specially used for storing GCC parameters to be identified.
It should be noted that the GCC parameters to be identified include different parameters and identification information corresponding to the parameters, and the identification information mainly includes GCC thickness and non-GCC thickness.
S4: and performing feature extraction on the GCC parameter to be identified to obtain y data features, wherein y is a positive integer greater than 1.
In the embodiment of the invention, identification information corresponding to parameters in GCC parameters to be identified is identified, if the identification information is identified to be GCC thickness, the parameters corresponding to the identification information are extracted, each extracted parameter is used as a data feature, and finally y data features are extracted; if the identification information is identified to be non-GCC thickness, no processing is performed.
It should be noted that the GCC parameters to be identified may specifically include 9 non-GCC thicknesses and 5 GCC thicknesses, where the parameters corresponding to the 5 GCC thicknesses are All Avg, Sup Avg, Inf Avg, FLV, and GLV, respectively.
Furthermore, the type of the retina corresponding to the GCC parameter to be identified is determined by the 5 GCC thicknesses, and the type of the retina corresponding to the GCC parameter to be identified can also be determined by combining the results of the 9 non-GCC thicknesses and the 5 GCC thicknesses.
It is emphasized that the data signature may also be stored in a node of a blockchain in order to further ensure privacy and security of the data signature.
S5: and importing the y data characteristics into a retina classification model for classification, and outputting a classification result corresponding to the GCC parameter to be identified.
Specifically, the y data features are imported into a retina classification model, the retina classification model classifies the data features after receiving the data features, and classification features corresponding to the data features are output as classification results corresponding to the GCC parameters to be identified.
In this embodiment, a decision tree is constructed by using an acquired sample data set to obtain a retina classification model, a GCC parameter to be identified obtained by a user through OCT scanning is obtained, a feature extraction is performed on the GCC parameter to be identified to obtain a data feature, and finally the data feature is imported into the retina classification model for classification to obtain a classification result corresponding to the GCC parameter to be identified. By means of the method for constructing the decision tree by using the sample data set to obtain the retina classification model, the retina classification model can be trained by using the data characteristics similar to the diagnosis logic of the target user, accuracy of recognition and classification of the retina classification model is improved, and effectiveness of the classification result is guaranteed, so that the accuracy of diagnosis of the target user according to the classification result is improved, and working efficiency of the target user is further improved.
In an embodiment, the training samples include classification features, as shown in fig. 2, in step S2, that is, a random forest algorithm is used to construct a decision tree for the training samples in the sample data set, so as to obtain a retina classification model, including the following steps:
s21: and extracting training samples from the sample data set by using a random sampling mode, and constructing K sub-training sets, wherein K is a positive integer greater than 1.
In the embodiment of the invention, a random sampling mode is used for extracting training samples from a sample data set, the random sampling mode can use a resampling technology for extracting the training samples from the sample data set, the resampling technology is to perform sampling with returning in the sample data set, the probability of each training sample in the sample data set being extracted each time is equal, K rounds of extraction are repeatedly performed in the sample data set, the result of each round of extraction is used as a sub-training set, and K sub-training sets are obtained, wherein the number of the training samples in the sub-training sets is less than or equal to the number of the training samples in the sample data set.
S22: for each sub-training set, calculating the information entropy of each classification feature according to formula (1):
H(X)=-∑p(xi)log(2,p(xi) Equation (1)
Where X is a classification feature, h (X) is an information entropy of the classification feature, i is 1,2iFor the ith classification feature, p (x)i) Is the eigenvalue probability of the ith classification characteristic.
S23: and (3) calculating the information gain of each classification characteristic according to the formula (2) according to the information entropy:
gain ═ H (c) — (c) H (c | X) formula (2)
Wherein, gain is the information gain of the classification characteristic, H (c) is the information entropy before splitting according to the classification characteristic X, and H (c | X) is the information entropy after splitting according to the classification characteristic X.
S24: calculating an information gain ratio of each classification feature according to formula (3) and formula (4) according to the information gain:
Figure BDA0002515763350000081
Figure BDA0002515763350000082
wherein IntI is a penalty factor of classification characteristics, D is the total amount of training samples in the sample data set, and WXAnd gr is the information gain ratio of the classification features.
Specifically, the penalty factor corresponding to the classification feature is calculated by using formula (4), and then the information gain ratio of the classification feature is calculated by using formula (3), that is, the information gain ratio of the classification feature is equal to the information gain of the classification feature/the penalty factor of the classification feature.
S25: and selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node, taking the classification characteristics corresponding to other information gain ratios as nodes to be split, and splitting by adopting the splitting node.
In the embodiment of the invention, a C4.5 algorithm is used for constructing a decision tree, penalty factors of the classification features are obtained through calculation according to a formula (4), an information gain ratio of each classification feature is calculated according to a formula (3), the classification feature corresponding to the maximum information gain ratio is used as a splitting node, the classification features corresponding to other information gain ratios are used as nodes to be split, and the splitting node is adopted for splitting.
It should be noted that, if splitting is performed according to the information gain as a split node, the decision tree is constructed by tending to select a classification feature with a larger information gain as the split node, the information gain of the classification feature will be larger, but under the condition that a plurality of classification features exist in a training set and a plurality of values exist, the prediction accuracy of the decision tree obtained by training is lower, an information gain ratio is calculated according to penalty factors of the classification features, and the classification feature corresponding to the largest information gain ratio is used as the split node for splitting, so that adverse effects on the decision tree splitting caused by uniformly distributed attributes can be effectively avoided, and the quality of the decision tree construction is improved.
S26: and returning to the step S22 to continue executing aiming at the classification features corresponding to the nodes to be split until all the classification features are taken as the splitting nodes to finish splitting, and obtaining K decision trees.
In the embodiment of the present invention, for the classification features corresponding to the nodes to be split, the information entropy of the classification features calculated for each sub-training set mentioned in step S22 is returned to continue to be executed until all the classification features are split as split nodes, and split into multiple branches of the decision tree, and K decision trees are established in a recursive manner.
S27: and constructing a random forest according to the K decision trees to obtain a retina classification model.
Specifically, according to the K decision trees generated in steps S22 to S26, the K decision trees are combined into a random forest to obtain a retina classification model for evaluating which type the retina corresponding to the GCC parameter belongs to.
In the embodiment, training samples are extracted from a sample data set in a replaced random sampling mode, a plurality of sub-training sets are constructed and used for machine model training, the uncertainty of data used for model training is enhanced, and the data feature classification quality is improved; calculating the information gain ratio of the classification features aiming at each sub-training set, selecting the classification feature corresponding to the largest information gain ratio as a splitting node for splitting each time until all the classification features are used as the splitting nodes to finish splitting to obtain K decision trees, constructing a random forest according to the generated decision trees to obtain a retina classification model, using the maximum information gain ratio as a splitting node, effectively avoiding adverse effects of uniformly distributed classification characteristics on the splitting of the decision trees, improving the quality of the decision tree construction, moreover, a plurality of decision trees construct a random forest, so that the classification prediction capability of the machine model is enhanced, the accuracy of the retina classification model is improved, therefore, the accuracy of the target user in obtaining the classification result according to the retina classification model for diagnosis is improved, and the working efficiency of the target user is further improved.
In an embodiment, as shown in fig. 3, in S25, selecting the classification feature corresponding to the largest information gain ratio as a splitting node, and taking the classification features corresponding to other information gain ratios as nodes to be split, and splitting by using the splitting node includes the following steps:
s251: and selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node, and taking the classification characteristic corresponding to other information gain ratios as a node to be split.
Specifically, the classification feature corresponding to the largest information gain ratio is selected as a splitting node, and the classification features corresponding to other information gain ratios are used as nodes to be split.
S252: and calculating the Gini index of the split node by using a Gini index formula.
Specifically, the kini index of the split node is calculated using equation (5):
Figure BDA0002515763350000101
wherein G (p) is a Gini index, e is a preset classification condition corresponding to the split node, and pkThe proportion of the same input class in a particular packet.
S253: and comparing the Gini index with a preset index, and splitting according to a comparison result.
Specifically, the kini index is compared with a preset index, the comparison result is compared with description information in a preset rule base, and a set rule matched with the description information is selected for splitting. The preset rule base is a database specially used for storing different description information and setting rules corresponding to the description information.
For example, the existence comparison result is that the kini index is less than or equal to the preset index, the existence description information in the preset rule base is that the kini index is less than or equal to the preset index, and the corresponding set rule is rule a; the existence description information is that the Gini index is larger than the preset index, and the corresponding set rule is B rule; and comparing the comparison result with the description information, and selecting the rule A for splitting.
In this embodiment, the classification feature corresponding to the largest information gain ratio is selected as a splitting node, the classification features corresponding to other information gain ratios are used as nodes to be split, the kini index corresponding to the splitting node is calculated by using the formula (5), and finally the kini index is compared with the preset index and the splitting is performed according to the comparison result. By combining the mode of calculating the kini index, under the condition that all decision trees are obtained, the kini index is used for splitting part of the decision trees, the accuracy of the decision trees is improved, and the accuracy of the follow-up retina classification model training is further improved.
In one embodiment, as shown in fig. 4, the step S253 of comparing the kini index with the preset index and determining the decision tree according to the comparison result includes the following steps:
s2531: the kini index is compared with a preset index.
Specifically, the kini index is compared with a preset index.
S2532: and if the Gini index is less than or equal to the preset index, not splitting.
In the embodiment of the present invention, according to the comparison manner in step S2531, if the kini index is less than or equal to the preset index, it indicates that the classification effect of the split node corresponding to the kini index is good, and the split node is not split.
S2533: and if the Gini index is larger than the preset index, splitting the split node by using a preset classification condition until a preset cut-off condition is reached, and stopping splitting.
In the embodiment of the present invention, according to the comparison manner in step S2531, if the kini index is greater than the preset index, which indicates that the classification effect of the split node corresponding to the kini index is poor, the split node is split by using the preset classification condition, and the splitting is completed until the kini index corresponding to each node after each splitting is less than or equal to the preset index or reaches the preset splitting frequency.
The preset classification condition is a condition for classifying the sample data set according to the actual requirement of the user.
The preset index may be specifically 0.2, or may be set according to the actual requirement of the user, which is not limited herein.
The preset splitting times refer to the times for stopping splitting nodes from splitting set by a user.
For example, there is a certain split node of the decision tree as flv<And 5 is classified as U, otherwise, is classified as R. If 100 training samples are divided into U according to the splitting node, and the labels of the 100 training samples are all U, then pkIf the node is classified well, the node is determined to be flv if the node has a kini index of 0 which indicates that the node is classified well<5 is the split node of the decision tree. If there are 100 samples that are divided into U according to the split node, but only 50 labels in the 100 training samples are U, then pkIf the classification effect of the split node is poor, splitting the split node by using a preset classification condition.
In this embodiment, by comparing the kini index with the preset index, the splitting is not performed when the kini index is less than or equal to the preset index; and under the condition that the Gini index is larger than the preset index, classifying the split nodes by using a preset classification condition until a preset cut-off condition is reached, and stopping splitting. Whether the splitting node needs to be further split or not is determined under different comparison results, the condition that splitting is inaccurate due to calculation errors can be effectively avoided, the accuracy of the splitting process is accordingly improved, and the accuracy of subsequent retina classification model training is further guaranteed.
In one embodiment, as shown in fig. 5, after S26, the OCT-based retina classification further includes the following steps:
s6: and sequencing the kini indexes corresponding to all the decision trees from small to large to obtain a sequencing result.
In the embodiment of the invention, the kini indexes corresponding to all the decision trees are sorted from small to large, that is, the smallest kini index is taken as the first bit, and the largest kini index is taken as the last bit, so that the corresponding sorting result is obtained.
S7: and selecting the first-order a-order and the second-order b-order kini indexes from the ordering result, and respectively performing weight calculation to obtain target kini indexes, wherein a and b are positive integers larger than 1.
In the embodiment of the present invention, the a bits before sorting refer to the first to the a-th bits sorted in the sorting result obtained in step S6, and the b bits after sorting refer to the last to the b-last bits sorted in the sorting result obtained in step S6.
Specifically, according to the sorting result obtained in step S6, selecting a-position damping index before sorting as a first damping index, selecting b-position damping index after sorting as a second damping index, performing double calculation on each first damping index according to a preset first weight, and taking the result after double calculation as a target damping index; and performing halving calculation on each second damping index according to a preset second weight, and taking a result after the halving calculation as a target damping index.
It should be noted that the classification accuracy can be improved by doubling the damping index of a before ranking and halving the damping index of b after ranking, where a and b are both positive integers greater than 1, and a and b are the same, and specific values thereof can be set according to actual needs of users, which is not limited herein.
When a chiny index is a characteristic of the top a, the calculated chiny index is doubled when calculating the chiny index, that is, the chiny index is multiplied by a preset first weight, for example, the original chiny index is 1, the preset first weight is 2, and the doubled chiny index is 2. That is, the important feature classification error is more costly, and the situation of the important feature classification error needs to be reduced.
When a basic index is the characteristic of b after ranking, the basic index is calculated by halving the calculated basic index, that is, multiplying the basic index by a preset second weight, for example, the original basic index is 1, the preset second weight is 0.5, and the basic index after halving is 0.5. That is, the unimportant feature classification error is less costly and does not need to be concerned about the condition of the unimportant feature classification error.
S8: and splitting the decision trees corresponding to the first-order a-order and second-order b-order kini indexes according to the target kini indexes to obtain split decision trees.
Specifically, according to the target kini index obtained in step S7, comparing the target kini index with a preset index, and if the target kini index is less than or equal to the preset index, obtaining a decision tree corresponding to the target kini index; and if the target kini index is larger than the preset index, splitting the decision tree corresponding to the target kini index by using a preset classification condition until the target kini index corresponding to each decision tree is smaller than or equal to the preset index or the preset target splitting times are reached, finishing splitting, and acquiring the decision tree after splitting.
In the embodiment of the invention, the kini indexes corresponding to all decision trees are sequenced from small to large to obtain a sequencing result, the kini indexes of the a position before sequencing and the b position after sequencing are selected from the sequencing result to be respectively subjected to weight calculation to obtain a target kini index, and finally the decision trees corresponding to the kini indexes of the a position before sequencing and the b position after sequencing are split according to the target kini index to obtain the split decision trees. By calculating the target kini index and splitting the decision tree by using the target kini index, the classification features are further optimized, the analysis and calculation of important classification features are increased, and the analysis and calculation of unimportant classification features are reduced, so that the accuracy of the decision tree is improved, and the accuracy of the subsequent retina classification model training is further ensured.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In one embodiment, an OCT image-based retina classification device is provided, which corresponds to the OCT image-based retina classification method in the above embodiments one to one. As shown in fig. 6, the retina classification apparatus based on OCT images includes a first acquisition module 61, a construction module 62, a second acquisition module 63, a feature extraction module 64, and a classification module 65. The functional modules are explained in detail as follows:
the first obtaining module 61 is configured to obtain a sample data set from a preset database, where the sample data set is composed of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
the building module 62 is configured to perform decision tree building by using a random forest algorithm for training samples in the sample data set to obtain a retina classification model;
a second obtaining module 63, configured to obtain, from a preset user library, a to-be-identified GCC parameter obtained by the user through OCT scanning;
the feature extraction module 64 is configured to perform feature extraction on the GCC parameter to be identified to obtain y data features, where y is a positive integer greater than 1, and it needs to be emphasized that, in order to further ensure the privacy and security of the data features, the data features may also be stored in a node of a block chain;
and the classification module 65 is configured to import the y data features into the retina classification model for classification, and output a classification result corresponding to the GCC parameter to be identified.
Further, the building module 62 includes:
the sub-training set constructing sub-module is used for extracting training samples from the sample data set in a random sampling mode and constructing K sub-training sets, wherein K is a positive integer greater than 1;
and the information entropy calculation submodule is used for calculating the information entropy of each classification characteristic according to a formula (1) for each sub-training set:
H(X)=-∑p(xi)log(2,p(xi) Equation (1)
Where X is a classification feature, h (X) is an information entropy of the classification feature, i is 1,2iFor the ith classification feature, p (x)i) The probability of the feature value of the ith classification feature;
and the information gain calculation sub-module is used for calculating the information gain of each classification characteristic according to the information entropy and the formula (2):
gain ═ H (c) — (c) H (c | X) formula (2)
Wherein, gain is the information gain of the classification characteristic, H (c) is the information entropy before splitting according to the classification characteristic X, and H (c | X) is the information entropy after splitting according to the classification characteristic X;
and the information gain ratio calculation sub-module is used for calculating the information gain ratio of each classification characteristic according to the formula (3) and the formula (4) according to the information gain:
Figure BDA0002515763350000151
Figure BDA0002515763350000152
wherein IntI is the penalty of classification characteristicsThe factor D is the total number of training samples in the sample data set, WXThe number of training samples of the classification features, and gr is the information gain ratio of the classification features;
the splitting node selecting submodule is used for selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node, taking the classification characteristic corresponding to other information gain ratios as a node to be split, and splitting by adopting the splitting node;
the decision tree generation submodule is used for returning to the step S22 to continue executing aiming at the classification features corresponding to the nodes to be split until all the classification features are taken as the splitting nodes to finish splitting, and K decision trees are obtained;
and the retina classification model construction submodule is used for constructing a random forest according to the K decision trees to obtain a retina classification model.
Further, the splitting node selection submodule comprises:
the splitting node determining unit is used for selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node and taking the classification characteristic corresponding to other information gain ratios as a node to be split;
the system comprises a Kini index calculation unit, a node splitting unit and a node splitting unit, wherein the Kini index calculation unit is used for calculating the Kini index of a split node by utilizing a Kini index formula;
and the splitting unit is used for comparing the Gini index with a preset index and splitting according to a comparison result.
Further, the splitting unit includes:
the comparison subunit is used for comparing the Gini index with a preset index;
the first comparison subunit is used for not splitting if the Gini index is less than or equal to a preset index;
and the second comparison subunit is used for splitting the split node by using a preset classification condition if the Gini index is greater than the preset index, and stopping splitting until a preset cut-off condition is reached.
Further, the retina classification device based on the OCT image also comprises:
the sorting module is used for sorting the kini indexes corresponding to all the decision trees from small to large to obtain a sorting result;
the weight calculation module is used for selecting the first-order a-bit and the second-order b-bit from the ordering result and respectively performing weight calculation to obtain a target keny index, wherein a and b are positive integers larger than 1;
and the secondary splitting module is used for splitting the decision trees corresponding to the first-order a-bit and the second-order b-bit arranged in the sequence according to the target kini index to obtain the split decision trees.
Some embodiments of the present application disclose a computer device. Referring specifically to fig. 7, a basic structure block diagram of a computer device 90 according to an embodiment of the present application is shown.
As illustrated in fig. 7, the computer device 90 includes a memory 91, a processor 92, and a network interface 93 communicatively connected to each other through a system bus. It is noted that only a computer device 90 having components 91-93 is shown in FIG. 7, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 91 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 91 may be an internal storage unit of the computer device 90, such as a hard disk or a memory of the computer device 90. In other embodiments, the memory 91 may also be an external storage device of the computer device 90, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 90. Of course, the memory 91 may also include both internal and external memory units of the computer device 90. In this embodiment, the memory 91 is generally used for storing an operating system installed in the computer device 90 and various types of application software, such as program codes of the retina classification method based on OCT images. Further, the memory 91 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 92 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device 90. In this embodiment, the processor 92 is configured to run the program code stored in the memory 91 or process data, for example, run the program code of the retina classification method based on OCT images.
The network interface 93 may include a wireless network interface or a wired network interface, and the network interface 93 is generally used to establish a communication connection between the computer device 90 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a data characteristic information entry program, which is executable by at least one processor to cause the at least one processor to perform any of the above-mentioned steps of the OCT image-based retina classification method.
It is emphasized that the data signature may also be stored in a node of a blockchain in order to further ensure privacy and security of the data signature
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a computer device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Finally, it should be noted that the above-mentioned embodiments illustrate only some of the embodiments of the present application, and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An OCT-based retinal classification method, comprising:
acquiring a sample data set from a preset database, wherein the sample data set consists of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
aiming at the training samples in the sample data set, a decision tree is constructed by using a random forest algorithm to obtain a retina classification model;
acquiring GCC parameters to be identified, which are obtained by a user through OCT scanning, from a preset user library;
performing feature extraction on the GCC parameter to be identified to obtain y data features, wherein y is a positive integer greater than 1;
and importing the y data characteristics into the retina classification model for classification, and outputting a classification result corresponding to the GCC parameter to be identified.
2. The OCT-based retinal classification method of claim 1, wherein the training samples include the classification features, and wherein the step of performing decision tree construction using a random forest algorithm on the training samples in the sample data set to obtain a retinal classification model comprises:
extracting the training samples from the sample data set in a random sampling mode to construct K sub-training sets, wherein K is a positive integer greater than 1;
for each sub-training set, calculating the information entropy of each classification feature according to the following formula:
H(X)=-∑p(xi)log(2,p(xi))
wherein X is the classification feature, h (X) is an information entropy of the classification feature, i ═ 1,2iFor the ith classification feature, p (x)i) Is as followsProbability of eigenvalues of i of said classification features;
and calculating the information gain of each classification characteristic according to the information entropy and the following formula:
gain=H(c)-H(c|X)
wherein, gain is the information gain of the classification characteristic, H (c) is the information entropy before splitting according to the classification characteristic X, and H (c | X) is the information entropy after splitting according to the classification characteristic X;
and calculating the information gain ratio of each classification characteristic according to the following formula according to the information gain:
Figure FDA0002515763340000021
Figure FDA0002515763340000022
wherein IntI is a penalty factor of classification characteristics, D is the total amount of training samples in the sample data set, WXThe number of training samples of the classification features, gr is the information gain ratio of the classification features;
selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node, taking the classification characteristics corresponding to other information gain ratios as nodes to be split, and splitting by adopting the splitting node;
returning the classification features corresponding to the nodes to be split to each sub-training set, and continuously executing the step of calculating the information entropy of each classification feature according to the following formula until all the classification features are used as the splitting nodes to complete splitting to obtain K decision trees;
and constructing a random forest according to the K decision trees to obtain a retina classification model.
3. The OCT-based retina classification method of claim 2, wherein the selecting the classification feature corresponding to the largest information gain ratio as a splitting node, and the selecting the classification features corresponding to other information gain ratios as nodes to be split, and the splitting node splitting the nodes for splitting comprises:
selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node, and taking the classification characteristics corresponding to other information gain ratios as nodes to be split;
calculating a kini index of the split node by using a kini index formula;
and comparing the Gini index with a preset index, and splitting according to a comparison result.
4. The OCT-based retinal classification method of claim 3, wherein the step of comparing the kini index with a preset index and determining the decision tree according to the comparison result comprises:
comparing the kini index with a preset index;
if the Gini index is less than or equal to a preset index, splitting is not performed;
and if the Gini index is larger than a preset index, splitting the split node by using a preset classification condition until a preset cut-off condition is reached, and stopping splitting.
5. The OCT-image-based retinal classification method according to claim 2, wherein the step of returning the classification features corresponding to the nodes to be split to the sub-training set and calculating the information entropy of each classification feature according to the following formula is continuously performed until all the classification features are split as the split nodes, and after the step of obtaining K decision trees, the OCT-based retinal classification method further includes:
sorting the kini indexes corresponding to all the decision trees from small to large to obtain a sorting result;
selecting the first-order a-order and the second-order b-order keny indexes from the ordering result to perform weight calculation respectively to obtain target keny indexes, wherein a and b are positive integers larger than 1;
splitting decision trees corresponding to the kiney indexes of the a-position before sorting and the b-position after sorting according to the target kiney indexes to obtain split decision trees.
6. An OCT image-based retina classification device, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample data set from a preset database, the sample data set consists of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
the construction module is used for constructing a decision tree by using a random forest algorithm aiming at the training samples in the sample data set to obtain a retina classification model;
the second acquisition module is used for acquiring the GCC parameters to be identified, which are obtained by the user through OCT scanning, from a preset user library;
the characteristic extraction module is used for carrying out characteristic extraction on the GCC parameter to be identified to obtain y data characteristics, wherein y is a positive integer greater than 1;
and the classification module is used for importing the y data characteristics into the retina classification model for classification and outputting a classification result corresponding to the GCC parameter to be identified.
7. The OCT image-based retina classification device of claim 6, wherein the construction module comprises:
the sub-training set constructing sub-module is used for extracting the training samples from the sample data set in a random sampling mode to construct K sub-training sets, wherein K is a positive integer greater than 1;
an information entropy calculation sub-module, configured to calculate, for each of the sub-training sets, an information entropy of each of the classification features according to the following formula:
H(X)=-∑p(xi)log(2,p(xi))
wherein X is the classification feature, H (X) is of the classification featureInformation entropy, i ═ 1, 2., n, xiFor the ith classification feature, p (x)i) The probability of the feature value of the ith classification feature;
and the information gain calculation sub-module is used for calculating the information gain of each classification characteristic according to the information entropy and the following formula:
gain=H(c)-H(c|X)
wherein, gain is the information gain of the classification characteristic, H (c) is the information entropy before splitting according to the classification characteristic X, and H (c | X) is the information entropy after splitting according to the classification characteristic X;
an information gain ratio calculation sub-module, configured to calculate, according to the information gain, an information gain ratio of each of the classification features according to the following formula:
Figure FDA0002515763340000051
Figure FDA0002515763340000052
wherein IntI is a penalty factor of classification characteristics, D is the total amount of training samples in the sample data set, WXThe number of training samples of the classification features, gr is the information gain ratio of the classification features;
the splitting node selecting submodule is used for selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node, taking the classification characteristic corresponding to other information gain ratios as a node to be split, and splitting by adopting the splitting node;
a decision tree generation submodule, configured to return to each of the sub-training sets for the classification features corresponding to the node to be split, and continue to execute the step of calculating an information entropy of each of the classification features according to the following formula until all the classification features are used as the split nodes to complete splitting, so as to obtain K decision trees;
and the retina classification model construction submodule is used for constructing a random forest according to the K decision trees to obtain a retina classification model.
8. The OCT image-based retina classification device of claim 6, wherein the split node selection submodule comprises:
the splitting node determining unit is used for selecting the classification characteristic corresponding to the maximum information gain ratio as a splitting node and taking the classification characteristic corresponding to other information gain ratios as a node to be split;
the Gini index calculation unit is used for calculating the Gini index of the split node by utilizing a Gini index formula;
and the splitting unit is used for comparing the Gini index with a preset index and splitting according to a comparison result.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor when executing the computer program realizes the steps of the OCT based retina classification method according to any one of claims 1 to 5.
10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the OCT-based retinal classification method of any one of claims 1 to 5.
CN202010475698.8A 2020-05-29 2020-05-29 Retina classification method and device based on OCT, computer equipment and storage medium Pending CN111783830A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010475698.8A CN111783830A (en) 2020-05-29 2020-05-29 Retina classification method and device based on OCT, computer equipment and storage medium
PCT/CN2020/099518 WO2021120587A1 (en) 2020-05-29 2020-06-30 Method and apparatus for retina classification based on oct, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010475698.8A CN111783830A (en) 2020-05-29 2020-05-29 Retina classification method and device based on OCT, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111783830A true CN111783830A (en) 2020-10-16

Family

ID=72754073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010475698.8A Pending CN111783830A (en) 2020-05-29 2020-05-29 Retina classification method and device based on OCT, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111783830A (en)
WO (1) WO2021120587A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677751A (en) * 2022-05-26 2022-06-28 深圳市中文路教育科技有限公司 Learning state monitoring method, monitoring device and storage medium
CN116910669A (en) * 2023-09-13 2023-10-20 深圳市智慧城市科技发展集团有限公司 Data classification method, device, electronic equipment and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642599A (en) * 2021-06-28 2021-11-12 中国铁道科学研究院集团有限公司 Income prediction method, transportation system and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665159A (en) * 2018-05-09 2018-10-16 深圳壹账通智能科技有限公司 A kind of methods of risk assessment, device, terminal device and storage medium
CN110717524A (en) * 2019-09-20 2020-01-21 浙江工业大学 Method for predicting thermal comfort of old people

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363226A (en) * 2019-06-21 2019-10-22 平安科技(深圳)有限公司 Ophthalmology disease classifying identification method, device and medium based on random forest

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665159A (en) * 2018-05-09 2018-10-16 深圳壹账通智能科技有限公司 A kind of methods of risk assessment, device, terminal device and storage medium
CN110717524A (en) * 2019-09-20 2020-01-21 浙江工业大学 Method for predicting thermal comfort of old people

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RYO ASAOKA等: "Validating the Usefulness of the "Random Forests" Classifier to Diagnose Early Glaucoma With Optical Coherence Tomography", 《AMERICAN JOURNAL OF OPHTHALMOLOGY》, 9 November 2016 (2016-11-09), pages 2 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677751A (en) * 2022-05-26 2022-06-28 深圳市中文路教育科技有限公司 Learning state monitoring method, monitoring device and storage medium
CN116910669A (en) * 2023-09-13 2023-10-20 深圳市智慧城市科技发展集团有限公司 Data classification method, device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
WO2021120587A1 (en) 2021-06-24

Similar Documents

Publication Publication Date Title
CN110276369B (en) Feature selection method, device and equipment based on machine learning and storage medium
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN111783875A (en) Abnormal user detection method, device, equipment and medium based on cluster analysis
CN110827924B (en) Clustering method and device for gene expression data, computer equipment and storage medium
WO2021179630A1 (en) Complications risk prediction system, method, apparatus, and device, and medium
CN112017789B (en) Triage data processing method, triage data processing device, triage data processing equipment and triage data processing medium
CN111695593A (en) XGboost-based data classification method and device, computer equipment and storage medium
CN109918498B (en) Problem warehousing method and device
CN113705092B (en) Disease prediction method and device based on machine learning
CN115577858B (en) Block chain-based carbon emission prediction method and device and electronic equipment
CN111783830A (en) Retina classification method and device based on OCT, computer equipment and storage medium
CN110969172A (en) Text classification method and related equipment
CN113570391B (en) Community division method, device, equipment and storage medium based on artificial intelligence
WO2017001885A2 (en) Method of generating a model of an object
CN116168403A (en) Medical data classification model training method, classification method, device and related medium
CN115600926A (en) Post-project evaluation method and device, electronic device and storage medium
CN115147020A (en) Decoration data processing method, device, equipment and storage medium
WO2021114626A1 (en) Method for detecting quality of medical record data and related device
CN113448876A (en) Service testing method, device, computer equipment and storage medium
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN110457393B (en) Information sharing method and related product
CN113704697A (en) Medical data missing processing method, device and equipment based on multiple regression model
CN110475258A (en) A kind of reliability estimation method and system of base station
CN117174285A (en) Second diagnosis and treatment opinion generation system and method based on evidence-based medicine
CN109905340B (en) Feature optimization function selection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination