CN109117956B - Method for determining optimal feature subset - Google Patents

Method for determining optimal feature subset Download PDF

Info

Publication number
CN109117956B
CN109117956B CN201810732008.5A CN201810732008A CN109117956B CN 109117956 B CN109117956 B CN 109117956B CN 201810732008 A CN201810732008 A CN 201810732008A CN 109117956 B CN109117956 B CN 109117956B
Authority
CN
China
Prior art keywords
feature
subset
feature subset
features
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810732008.5A
Other languages
Chinese (zh)
Other versions
CN109117956A (en
Inventor
杨玲波
黄敬峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201810732008.5A priority Critical patent/CN109117956B/en
Publication of CN109117956A publication Critical patent/CN109117956A/en
Application granted granted Critical
Publication of CN109117956B publication Critical patent/CN109117956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for determining an optimal characteristic subset, which comprises the following steps: acquiring a high-resolution image, preprocessing and object-oriented segmentation to obtain a ground object data set; calculating various characteristics of the ground object, including shape, index, spectrum, texture and the like; selecting samples from an original surface feature object data set, wherein the samples comprise training samples and testing samples; based on a cross validation method and machine learning methods such as a random forest, a gradient boosting decision tree, a support vector machine and the like, calculating the importance of each feature by using a training sample, and screening the features by using an improved enhanced feature recursive screening method to obtain the classification precision score of each feature subset under different feature quantities; and determining the optimal feature subset for classification of each method according to the principle of highest score, and removing the residual features as redundant features. The method is simple, rapid and accurate.

Description

Method for determining optimal feature subset
Technical Field
The invention relates to the technical field of acquisition of optimal classification feature subsets, in particular to a method for determining an optimal feature subset.
Background
The feature screening is a process of eliminating redundant features from the original feature set to obtain an optimal feature subset effective for classification, so that the classification calculation time can be reduced, and the classification precision can be improved. The evaluation method of the feature subset is usually based on predefined indexes, such as classification accuracy or class separability. Feature screening is an important step in a machine learning method, and excessive features may cause reduction of classification accuracy and improvement of classification time, and is called dimensionality disaster (Pacifici et al 2009). The ways of feature screening are mainly classified into three types, filtering, packaging, and embedding (Weston et al 2003). The filtering method uses a characteristic subset evaluation method which is independent of a classifier, and an embedded method and a packaging method use characteristic screening which is combined with the classifier. For the embedded feature screening method, the feature screening is a part of the learning algorithm and is bound with a specific machine learning method; for the packed type, a specific learning algorithm is packed to evaluate the optimal feature subset, and the error of the classification result is minimized, and finally a classifier is established.
Recursive feature screening (RFE) is a feature screening technique that is widely applied, and it evaluates the importance of each feature through a training model and ranks them, gradually removes the least important features from a feature set, and evaluates the performance of a feature subset through cross validation, so as to obtain an optimal feature set (Guyon 2001). Because the RFE method is an embedded method, the feature subset acquired by the RFE method can better obtain higher classification precision. However, less important features may have a greater impact on classification accuracy when combined with other features, and feature screening purely by importance ranking may result in a performance degradation of the best feature subset (Chen and Jeong 2007). To solve the problem, Chen and Jeong 2007 provides an Enhanced recursive feature screening (EnRFE) method, which improves the performance of the best feature subset obtained by searching through features with lower importance and improved classification precision after elimination. However, the method still has two disadvantages, one is that the efficiency is low, and the other is that the least important features are directly removed when the features capable of improving the classification accuracy are not searched, which may also cause the performance of the removed feature subset to be greatly reduced.
Aiming at the situation, the invention improves the two problems of the EnRFE method, thereby improving the feature screening efficiency and the performance of the selected optimal feature subset, and establishes a complete technical process from image preprocessing, feature calculation, feature screening to image classification based on the method.
Disclosure of Invention
The invention aims to provide a simple, quick and accurate determination method of an optimal feature subset for mass feature screening and redundant feature elimination in machine learning, which is based on an improved enhanced feature recursive screening method and improves the efficiency of feature screening by limiting the depth of feature search and improving the parallel computing capability of a search algorithm.
A method for determining an optimal subset of features, comprising the steps of:
step 1, acquiring a high-resolution image, preprocessing and object-oriented segmentation to obtain a surface feature object data set;
step 2, calculating the shape class characteristics, the index class characteristics, the spectrum class characteristics and the texture class characteristics of each object in the surface feature object data set obtained in the step 1 to serve as an initial characteristic set;
step 3, selecting samples from the surface feature object data set obtained in the step 1 to obtain training samples and test samples;
step 4, inputting the training sample obtained in the step 3 into a random forest method, a gradient boosting decision tree method or a support vector machine method, calculating the importance of each type of features in the initial feature set in the step 2, and sequencing the features from low to high according to the importance to obtain a sequenced feature set;
step 5, removing the first feature (i.e. the feature with the lowest importance) in the sorted feature set to obtain a first feature subset, evaluating the score of the feature subset by using a cross validation method, removing the second feature (i.e. the feature with the lowest importance) in the sorted feature set to obtain a second feature subset, evaluating the score of the feature subset by using the cross validation method, and so on to obtain the kth feature subset, and evaluating the score of the feature subset by using the cross validation method; screening out the characteristic subset with the highest score from the kth characteristic subset of the first characteristic subset and the second characteristic subset … …;
step 6, inputting the training sample obtained in the step 3 into a random forest method, a gradient boosting decision tree method or a support vector machine method, calculating the importance of each type of features in the feature subset with the highest score screened in the step 5, sorting the features according to the importance from low to high to obtain a new sorted feature set, repeating the step 5, and screening out a new feature subset with the highest score;
step 7, repeating the step 6, and recording the score of the feature subset with the highest score in each iteration until the feature subset is an empty set;
and 8, selecting the feature subset with the highest score as the optimal feature subset according to the score conditions of the feature subsets with different feature quantities obtained in the step 7.
In step 1, the pretreatment comprises: geometric correction, radiometric calibration and atmospheric correction.
In step 2, the shape features include length, area, and the like, the index features include an improved Normalized difference water index (Modified Normalized difference water index), a Normalized Difference Vegetation Index (NDVI), an Enhanced vegetation index (Enhanced vegetation index, EVI), and the like, the spectrum features include an average and a variance of spectra of each band, and the texture features include texture based on a gray level co-occurrence matrix.
In step 3, the training samples are 60% -80% of the total number of the training samples and the testing samples, the testing samples are 20% -40% of the total number of the training samples and the testing samples, and the selecting method is layered random sampling. Further preferably, the training samples are 70% of the total number of the training samples and the test samples, the test samples are 30% of the total number of the training samples and the test samples, and the selection method is hierarchical random sampling. And (3) selecting samples from the surface feature object data set obtained in the step 1 by using methods such as visual interpretation, ground survey and the like, wherein the samples comprise training samples and test samples.
In step 5, k is the feature search depth, the value of k can be set manually according to the actual situation, and the value of k is less than or equal to the total number of features in the initial feature set. The improved and enhanced recursive feature screening method enhances the algorithm synchronous searching capability by limiting the searching depth k, and modifies the feature selection basis from the simple basis to the importance to the highest cross validation score, thereby improving the classification capability of the obtained optimal feature subset. The method limits the feature search depth, the maximum search depth needs to comprehensively consider the search precision and efficiency, the number of the cores is set to be the same as that of the CPU cores of the computer but not less than 4, the number of the cores can be set to be 4-15, namely k is 4-15, further optimization is carried out, the maximum search depth is set to be 5-10, namely k is 5-10, and most optimization is carried out, and k is 7.
And 8, after obtaining the optimal feature subset, classifying the original ground feature object data set by using methods such as a random forest, a gradient feature decision tree, a support vector machine and the like based on the obtained optimal feature subset, and evaluating the classification precision by using a test sample.
Compared with the prior art, the invention has the following advantages:
the invention relates to an optimal feature subset determination method based on an improved enhanced feature recursive screening method, which reduces the feature screening time and improves the performance of the optimal feature subset, thereby improving the classification precision of a machine learning method. The method is simple, rapid and accurate, the efficiency of feature screening is improved by limiting the depth of feature search and improving the parallel computing capability of a search algorithm, and on the other hand, the evaluation basis of feature selection is modified from the importance level to the cross validation score level, so that the performance of the optimal feature subset is improved.
Drawings
FIG. 1 is a flow chart of an optimal feature subset determination method based on an improved enhanced feature recursive screening method according to the present invention;
FIG. 2 is a diagram of the geographic location and raw image of a test area;
FIG. 3 is a distribution diagram of various types of ground feature samples in a test area;
FIG. 4 shows the results of the enhanced feature recursive screening method based on RF, GBDT, SVM models.
Fig. 5 is a result of identifying regional crops based on the best feature subset obtained by screening, wherein fig. 5(a) is an identification result of the RF method, fig. 5(b) is an identification result of the GBDT method, fig. 5(c) is an identification result of the SVM method, fig. 5(d) is an enlargement of a result of the rape planting area, and fig. 5(e) is an enlargement of a result of the chive planting area.
Detailed Description
The invention is further illustrated with reference to the figures and examples.
As shown in fig. 1, which is a flowchart of the optimal feature subset determining method based on the improved enhanced feature recursive screening method of the present invention, geometric correction, radiometric calibration, and atmospheric correction are performed on an acquired high-resolution satellite image; secondly, dividing the image of the research area into ground object objects by using a multi-scale division method, and using the ground object objects as basic units for classification and identification; then, according to visual interpretation and other modes, a part of the ground object objects is extracted as a sample and is divided into a training sample and a test sample; then, calculating four major characteristics of the spectrum, the texture, the shape and the index of each object, wherein the characteristics are large in quantity and high in redundancy, and characteristic screening is needed to obtain an optimal characteristic subset; based on the improved enhanced feature recursive screening method, training data are utilized, and based on RF (Random Forest), GBDT (Gradient Boosting Decision Tree) and SVM (Support Vector Machine) models respectively, the optimal feature subset of each model is calculated and obtained; and finally, after the optimal feature subset is obtained, classifying and identifying all objects based on RF, GBDT and SVM methods, and evaluating the identification precision by using the test sample.
An optimal feature subset determination method based on an improved enhanced feature recursive screening method comprises the following steps:
a, acquiring a high-resolution image, preprocessing and carrying out object-oriented segmentation to obtain a ground feature object data set;
specifically, the obtained high-resolution remote sensing image should be a cloudless clear sky image, and different ground objects in the image can be clearly identified. After the image is obtained, the image needs to be preprocessed, which mainly comprises geometric correction, radiometric calibration and atmospheric correction. The geometric correction can be carried out by acquiring control points on the ground or selecting control points (such as Google Earth) on other high-resolution image base maps, selecting corresponding homonymous points on the image to be corrected, and carrying out geometric fine correction on the image by using a polynomial correction method. Radiometric calibration is corrected using radiometric calibration coefficients of the corresponding satellites; and (4) correcting the atmosphere by using an atmospheric radiation transmission model such as 6S and the like to obtain a surface reflectivity image. And obtaining the ground object as a basic unit for classification by utilizing a multi-scale segmentation method for the corrected image. The test area (as shown in FIGS. 2 and 3) used a total of 5 views, including data from three satellites, e.g., Sentinal-2A, Landsat-8, and GF-1 WFV. FIG. 2 is a diagram of the geographic location and raw image of a test area; FIG. 3 is a distribution diagram of various types of ground feature patterns in a test area.
B, calculating various characteristics of the ground object, including shape, index, spectrum, texture and the like, as an initial characteristic set;
specifically, the number of shape features is 12, which are area, length, width, compact, density, asymmetry, roundness, insulatic, rectangle, main direction, circle index, shape index, and shape index.
The texture parameter calculation firstly needs to perform principal component transformation on each scene image, acquire the first principal component band containing the most information, and perform texture calculation on the first principal component band. There are 8 texture features of each scene image, which are GLCM (Gray-Level Co-occurring Matrix) homogeneity, GLCM contrast, GLCM discrete, GLCM entry, GLCMang.2nd moment, GLCM mean, GLCM StdDev, and GLCM corrlation. The 5 scene images obtain 40 features in total
Spectral features the spectral Mean and Mean square deviation V ariance of the object are calculated for all bands of the 5 scene image. Wherein, 2 scenes of Sentinal-2 AMSI image, 10 bands of each scene image, 2 scenes of Landsat-8OLI image, 7 bands of each scene image, 1 scene of GF-1WFV image, 4 bands of each scene image. The total image has 38 wave bands and 76 spectral characteristics.
The index features include a Normalized Difference Vegetation Index (NDVI), an Enhanced Vegetation Index (EVI), a surface water index (LSWI), and a Modified Normalized Difference Water Index (MNDWI). NDVI (Rouse et al 1974) is one of the most widely used vegetation indexes and has wide application in the remote sensing monitoring fields of crop extraction, crop growth and yield and the like (Fuller 1998; Wardlow et al 2007). The EVI (hue et al.1994) aims at the defect that NDVI is easy to saturate when the vegetation density is high, and by decoupling vegetation canopy signals and atmospheric impedance, vegetation information in a remote sensing image is enhanced, and the sensitivity and the detection capability of a vegetation index in a vegetation dense area are improved (hue et al.2002). The LSWI index is then more sensitive to changes in vegetation canopy moisture content and is less susceptible to atmospheric effects than NDVI (Gao 1996; Jurgens 1997). MNDWI (Xu2006) can effectively distinguish water bodies, vegetation and built-up areas (Mansaray et al 2017). The calculation formula of each index is shown in formulas 1-4, NIR in the formula represents a near infrared band reflectivity value, Red represents a Red light band reflectivity value, SWIR represents a short wave infrared reflectivity value, Blue represents a Blue light band reflectivity value, and Green represents a Green light band reflectivity value. Since there are two short-wave infrared bands in Sentinel2A, when the NDWI and MNDWI indices are calculated using Sentinel2A images, the average of the two SWIR bands is substituted into a formula for calculation. Since the GF-1WFV image has no short wave infrared band, the GF-1WFV image only calculates NDVI and EVI index. Thus, a total of 18 exponential features are obtained.
Figure BDA0001721124980000061
Figure BDA0001721124980000062
Figure BDA0001721124980000063
Figure BDA0001721124980000064
And C, selecting samples from the original ground object data set by using methods such as visual interpretation, ground survey and the like, wherein the samples comprise training samples and testing samples.
Specifically, 2025 objects are randomly selected from the image multi-scale segmentation objects as sample data by using a visual interpretation mode, wherein 649 winter wheat objects, 230 rape objects, 176 chive objects and 970 other objects are selected from the image multi-scale segmentation objects. The types of other objects are mainly the types of ground objects such as buildings, water bodies, wastelands, roads, forest lands, greenhouses and the like, and the distribution of samples is shown in figure 3. By using a layered random sampling method, 1418 samples of 70% are respectively extracted from winter wheat, rape, chive and other sample objects to be used as training samples, and in the model training process of participating in feature screening and machine learning, 607 samples of the rest 30% are used as test samples for analyzing the precision of the final classification result.
And D, calculating the importance of each feature by using a training sample based on a cross validation method and machine learning methods such as a random forest, a gradient boosting decision tree or a support vector machine, and screening the features by using an improved enhanced feature recursive screening method to obtain the classification precision score of each feature subset under different feature quantities.
Specifically, an Enhanced recursive feature screening (EnRFE) technology is used and Improved, and an Improved EnRFE method (Improved EnREF) is used for feature screening, and the specific method is as follows:
(a) inputting the training sample into a random forest method, a gradient boosting decision tree method or a support vector machine method, calculating the importance of various features in the initial feature set, and sequencing the features from low to high according to the importance to obtain a sequenced feature set;
(b) removing a first feature (namely, a feature with the lowest importance) in the sorted feature set to obtain a first feature subset, evaluating the score of the feature subset by using a cross validation method, removing a second feature (namely, a feature with the lowest importance) in the sorted feature set to obtain a second feature subset, evaluating the score of the feature subset by using the cross validation method, and so on to obtain a kth feature subset, and evaluating the score of the feature subset by using the cross validation method; screening out the characteristic subset with the highest score from the kth characteristic subset of the first characteristic subset and the second characteristic subset … …;
k is a feature search depth, and a value of k can be set manually according to actual conditions, in the embodiment, the feature search depth is limited, and the maximum search depth is set to be 7;
(c) inputting the training sample into a random forest method, a gradient boosting decision tree method or a support vector machine method, calculating the importance of each type of features in the feature subset with the highest score screened in the step (b), sequencing the features from low to high according to the importance to obtain a new sequenced feature set, repeating the step (b), and screening a new feature subset with the highest score;
(d) repeating the step (c), and recording the score of the feature subset with the highest score in each iteration until the feature subset is an empty set;
and E, selecting the feature subset with the highest score as the optimal feature subset according to the obtained score conditions of the feature subsets with different feature quantities. According to the principle of highest score, determining the optimal feature subset of each classification method, and removing the residual features as redundant features;
in particular, the improved EnRFE method is used for optimal feature subset screening. The relationship between the feature quantity and the cross validation accuracy of the RF, GBDT and SVM models is shown in FIG. 4, and FIG. 4 shows the result of the enhanced feature recursive screening method based on the RF, GBDT and SVM models. From fig. 4, it can be seen that the cross validation accuracy of the three classification methods shows the characteristic of rapid increase and slow decrease as the number of features increases. When the number of the features is small (less than 10), the classification precision of the three methods is rapidly increased along with the increase of the number of the selected features; when the number of the features is 10-20, the verification precision slowly rises; when the number of the features reaches 20-40, the verification accuracy of the three methods reaches the highest point, and the variation amplitude is small; when the number of features is gradually increased, the cross-validation accuracy of all 3 methods shows a trend of decreasing. The GBDT method has the advantages that the descending amplitude is the minimum, and the GBDT method has better robustness for characteristic redundancy; the accuracy of the RF method then shows a slow but significant downward trend; the accuracy of the SVM method is greatly reduced, particularly in the process that the number of the features is increased from 50 to 70, the accuracy is sharply reduced from 0.87 to 0.83, after the number of the features is more than 70, the overall accuracy is not obviously reduced, but the accuracy stability is low, the amplitude is large, the SVM method is easily influenced by redundant features, and the robustness is relatively low. The highest accuracy of the cross-validation of the GBDT and RF methods is close, both around 0.90, while the accuracy of the SVM method is relatively lower, around 0.88. And finally selecting 30 features as the optimal feature subset according to the highest score principle.
Step F, classifying the original ground feature object data set by using methods such as a random forest, a gradient feature decision tree, a support vector machine and the like based on the obtained optimal feature subset, and evaluating the classification precision by using a test sample;
specifically, the training sample sets are used to respectively train RF, GBDT, and SVM classification models, and the trained classification models are used to classify the ground feature objects in the funxing city, so as to obtain the spatial distribution of crops of Winter wheat (Winter wheat), rape (oiled rape), and Green onion (Green onion), and the result is shown in fig. 5. Fig. 5 is a result of identifying regional crops based on the best feature subset obtained by screening, wherein fig. 5(a) is an identification result of the RF method, fig. 5(b) is an identification result of the GBDT method, fig. 5(c) is an identification result of the SVM method, fig. 5(d) is an enlargement of a result of the rape planting area, and fig. 5(e) is an enlargement of a result of the chive planting area. From fig. 5, it can be seen that the crop identification results of the three classification methods are substantially similar.
The crop extraction precision of each classification method is verified by using a test sample set, and the result shows that the highest Overall classification precision is obtained by combining the GBDT method based on the optimal feature subset obtained by the improved enhanced feature recursive screening method, the OA (Overall accuracy) is 92.5%, and the kappa coefficient is 0.882; secondly, an RF method is adopted, the overall classification precision is 91.7%, and the kappa coefficient is 0.867; the accuracy of the SVM method is relatively lowest, with an OA of 90.5% and a kappa coefficient of 0.853.

Claims (3)

1. A method for determining an optimal subset of features, comprising the steps of:
step 1, acquiring a high-resolution image, preprocessing and object-oriented segmentation to obtain a surface feature object data set;
step 2, calculating the shape class characteristics, the index class characteristics, the spectrum class characteristics and the texture class characteristics of each object in the surface feature object data set obtained in the step 1 to serve as an initial characteristic set;
step 3, selecting samples from the surface feature object data set obtained in the step 1 to obtain training samples and test samples;
step 4, inputting the training sample obtained in the step 3 into a random forest method, a gradient boosting decision tree method or a support vector machine method, calculating the importance of each type of features in the initial feature set in the step 2, and sequencing the features from low to high according to the importance to obtain a sequenced feature set;
step 5, removing the first feature in the sorted feature set to obtain a first feature subset, evaluating the score of the feature subset by using a cross validation method, removing the second feature in the sorted feature set to obtain a second feature subset, evaluating the score of the feature subset by using a cross validation method, and repeating the steps to obtain the kth feature subset, and evaluating the score of the feature subset by using the cross validation method; screening out the feature subset with the highest score from the first feature subset, the second feature subset to the kth feature subset;
step 6, inputting the training sample obtained in the step 3 into a random forest method, a gradient boosting decision tree method or a support vector machine method, calculating the importance of each type of features in the feature subset with the highest score screened in the step 5, sorting the features according to the importance from low to high to obtain a new sorted feature set, repeating the step 5, and screening out a new feature subset with the highest score;
step 7, repeating the step 6, and recording the score of the feature subset with the highest score in each iteration until the feature subset is an empty set;
and 8, selecting the feature subset with the highest score as the optimal feature subset according to the score conditions of the feature subsets with different feature quantities obtained in the step 7.
2. The method for determining the optimal subset of features of claim 1, wherein in step 1, the preprocessing comprises: geometric correction, radiometric calibration and atmospheric correction.
3. The method for determining the optimal feature subset of claim 1, wherein in step 3, the training samples are 60% to 80% of the total number of the training samples and the testing samples, and the testing samples are 20% to 40% of the total number of the training samples and the testing samples.
CN201810732008.5A 2018-07-05 2018-07-05 Method for determining optimal feature subset Active CN109117956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810732008.5A CN109117956B (en) 2018-07-05 2018-07-05 Method for determining optimal feature subset

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810732008.5A CN109117956B (en) 2018-07-05 2018-07-05 Method for determining optimal feature subset

Publications (2)

Publication Number Publication Date
CN109117956A CN109117956A (en) 2019-01-01
CN109117956B true CN109117956B (en) 2021-08-24

Family

ID=64823008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810732008.5A Active CN109117956B (en) 2018-07-05 2018-07-05 Method for determining optimal feature subset

Country Status (1)

Country Link
CN (1) CN109117956B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11151706B2 (en) * 2019-01-16 2021-10-19 Applied Material Israel, Ltd. Method of classifying defects in a semiconductor specimen and system thereof
CN110852475B (en) * 2019-09-24 2020-10-23 广州地理研究所 Extreme gradient lifting algorithm-based vegetation index prediction method, system and equipment
CN110880014B (en) * 2019-10-11 2023-09-05 中国平安财产保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN111028383B (en) * 2019-11-08 2023-03-24 腾讯科技(深圳)有限公司 Vehicle driving data processing method and device
CN111476170A (en) * 2020-04-09 2020-07-31 首都师范大学 Remote sensing image semantic segmentation method combining deep learning and random forest
CN112245728B (en) * 2020-06-03 2022-11-29 北京化工大学 Respirator false positive alarm signal identification method and system based on integrated tree
CN113139578B (en) * 2021-03-23 2022-12-06 广东省科学院智能制造研究所 Deep learning image classification method and system based on optimal training set
CN113413163B (en) * 2021-08-24 2021-11-19 山东大学 Heart sound diagnosis system for mixed deep learning and low-difference forest
CN115399791B (en) * 2022-06-28 2024-06-14 天津大学 Method and system for evaluating functions of lower limbs of stroke based on myoelectric motion multi-data fusion
CN115759446A (en) * 2022-11-25 2023-03-07 南方电网数字电网研究院有限公司 Machine learning feature selection method for new energy high-precision prediction
CN116453000A (en) * 2023-04-21 2023-07-18 成都理工大学 Farmland weed identification method based on visible light image and improved random forest algorithm
CN117079059B (en) * 2023-10-13 2023-12-19 云南师范大学 Tree species automatic classification method based on multi-source satellite image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260437A (en) * 2015-09-30 2016-01-20 陈一飞 Text classification feature selection method and application thereof to biomedical text classification
CN105279520A (en) * 2015-09-25 2016-01-27 天津师范大学 Optimal character subclass selecting method based on classification ability structure vector complementation
CN105469098A (en) * 2015-11-20 2016-04-06 中北大学 Precise LINDAR data ground object classification method based on adaptive characteristic weight synthesis
CN105574363A (en) * 2015-12-14 2016-05-11 大连理工大学 Feature selection method based on SVM-RFE (Support Vector Machine-Recursive Feature Elimination) and overlapping degree
CN106897821A (en) * 2017-01-24 2017-06-27 中国电力科学研究院 A kind of transient state assesses feature selection approach and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107045503B (en) * 2016-02-05 2019-03-05 华为技术有限公司 A kind of method and device that feature set determines

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279520A (en) * 2015-09-25 2016-01-27 天津师范大学 Optimal character subclass selecting method based on classification ability structure vector complementation
CN105260437A (en) * 2015-09-30 2016-01-20 陈一飞 Text classification feature selection method and application thereof to biomedical text classification
CN105469098A (en) * 2015-11-20 2016-04-06 中北大学 Precise LINDAR data ground object classification method based on adaptive characteristic weight synthesis
CN105574363A (en) * 2015-12-14 2016-05-11 大连理工大学 Feature selection method based on SVM-RFE (Support Vector Machine-Recursive Feature Elimination) and overlapping degree
CN106897821A (en) * 2017-01-24 2017-06-27 中国电力科学研究院 A kind of transient state assesses feature selection approach and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Enhanced recursive feature elimination;Xue-wen Chen等;《Sixth International Conference on Machine Learning and Applications (ICMLA 2007)》;20080225;第429-435页 *
最优特征子集选择问题;陈彬等;《计算机学报》;19970228;第20卷(第2期);第133-138页 *
机器学习中的特征选择算法研究;姜百宁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091115;第I140-20页 *

Also Published As

Publication number Publication date
CN109117956A (en) 2019-01-01

Similar Documents

Publication Publication Date Title
CN109117956B (en) Method for determining optimal feature subset
CN109857889B (en) Image retrieval method, device and equipment and readable storage medium
Nandi et al. A machine vision-based maturity prediction system for sorting of harvested mangoes
CN112541921B (en) Urban green land vegetation information data accurate determination method
CN108280396B (en) Hyperspectral image classification method based on depth multi-feature active migration network
CN109146889A (en) A kind of field boundary extracting method based on high-resolution remote sensing image
CN104182767B (en) The hyperspectral image classification method that Active Learning and neighborhood information are combined
CN113936214B (en) Karst wetland vegetation community classification method based on fusion of aerospace remote sensing images
US6990410B2 (en) Cloud cover assessment: VNIR-SWIR
CN109815357A (en) A kind of remote sensing image retrieval method based on Nonlinear Dimension Reduction and rarefaction representation
CN112861810B (en) Artificial forest planting time automatic detection method based on time sequence remote sensing observation data
CN114266961A (en) Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images
CN116310510A (en) Hyperspectral image classification method based on small sample deep learning
CN113723254A (en) Method, device, equipment and storage medium for identifying moso bamboo forest distribution
Jónsson RGB and Multispectral UAV image classification of agricultural fields using a machine learning algorithm
CN114022782B (en) Sea fog detection method based on MODIS satellite data
CN116912578A (en) Crop classification method, system and electronic equipment
Bortolotti et al. A computer vision system for in-field quality evaluation: Preliminary results on peach fruit
CN111882573B (en) Cultivated land block extraction method and system based on high-resolution image data
CN111751295A (en) Modeling method and application of wheat powdery mildew severity detection model based on imaging hyperspectral data
CN113111794B (en) High-resolution annual city green space remote sensing information extraction method for pattern spots
McCann et al. Novel histogram based unsupervised classification technique to determine natural classes from biophysically relevant fit parameters to hyperspectral data
CN112991425B (en) Water area water level extraction method and system and storage medium
CN112651295A (en) Urban green land tree identification system and method
CN112949607A (en) Wetland vegetation feature optimization and fusion method based on JM Relief F

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant