CN115048986B - Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection - Google Patents

Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection Download PDF

Info

Publication number
CN115048986B
CN115048986B CN202210552737.9A CN202210552737A CN115048986B CN 115048986 B CN115048986 B CN 115048986B CN 202210552737 A CN202210552737 A CN 202210552737A CN 115048986 B CN115048986 B CN 115048986B
Authority
CN
China
Prior art keywords
model
hyper
classifier
training
freeze
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210552737.9A
Other languages
Chinese (zh)
Other versions
CN115048986A (en
Inventor
张珂
李曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202210552737.9A priority Critical patent/CN115048986B/en
Publication of CN115048986A publication Critical patent/CN115048986A/en
Application granted granted Critical
Publication of CN115048986B publication Critical patent/CN115048986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N22/00Investigating or analysing materials by the use of microwaves or radio waves, i.e. electromagnetic waves with a wavelength of one millimetre or more
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Electromagnetism (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Geophysics And Detection Of Objects (AREA)
  • Investigating Or Analyzing Materials Using Thermal Means (AREA)

Abstract

The invention discloses a ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection, which comprises the following steps of: carrying out data preprocessing on desert grid point samples of surface temperature data and regional land utilization data of regional observation sites to serve as labels of training samples; collecting brightness temperature data of different frequencies to combine and construct a characteristic index of a training sample; training different classes of base classifiers by using the processed sample labels and the characteristic indexes as model input, and simultaneously carrying out Bayesian optimization on the hyperparameters of the base classifiers; based on the dynamic pruning selection frame, carrying out dynamic pruning and dynamic selection on different base classifiers, and determining an optimal prediction model; and (5) carrying out surface freeze-thaw state classification by adopting the determined optimal prediction model. According to the invention, the earth surface states are rapidly and accurately classified according to the characteristics of different regions by means of dynamic pruning selection of different machine learning models.

Description

Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection
Technical Field
The invention belongs to the technical field of remote sensing, and particularly relates to a method for classifying the freeze-thaw state of the earth surface of a Chinese area based on multi-classifier dynamic pruning selection, which is mainly used for efficiently and highly accurately judging the freeze-thaw state of the earth surface of the whole Chinese area.
Background
Surface freeze-thaw (F/T) status as part of the freezing circle is one of the most important terrestrial physical processes. Its spatiotemporal changes have a significant impact on hydrology, climate and ecosystem processes. About 5000 km per year 2 The land surface of (a) is affected by the variation of F/T, mainly in high latitudes in the northern hemisphere. The permafrost area of China is third in the world and accounts for 22.3 percent of the total land area of China. The freeze-thaw cycle has strong space-time dynamics and wide distribution. Furthermore, it is closely related to hydrological and ecological processes and climate change, affecting the surface energy balance, the hydrological process and the soil greenhouse gas release.
The algorithm for monitoring the freeze-thaw state of the earth surface by passive microwave remote sensing is developed and established according to the specific microwave radiation characteristic of frozen soil and the earth surface characteristics of a research area. The microwave is relatively less affected by the atmosphere, can work all day long, has a long wavelength, has a certain penetration depth to the earth surface, and can obtain information in a certain underground depth range. Due to the significant difference in dielectric properties between freeze-thawed soils, microwaves are very sensitive to the freeze-thaw conditions of the earth's surface. The passive microwave remote sensing has high time resolution, so that the day-by-day freezing and thawing state of the earth surface in a large range can be monitored for a long time. Although there are many algorithms for monitoring the surface freeze-thaw state by dynamic microwave remote sensing at present, most algorithms only consider the surface state at the time of satellite orbit reduction, in alternate spring and autumn, near-surface freeze-thaw cycles may occur within one day, and the freeze-thaw cycles are more sensitive to climate change in the day, which is often ignored by existing research. And the desert also shows similar scattering characteristics to the surface of frozen soil, and is easily mistaken for frozen soil. Such perturbations should be excluded when performing freeze-thaw classification of large complex surfaces. At present, a method for improving the classification precision and reliability of earth surface freeze-thaw states based on satellite remote sensing data by comprehensively utilizing various machine learning models is lacked.
Disclosure of Invention
The invention aims to provide a ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection, which is used for predicting the ground surface freezing and thawing state.
In order to achieve the purpose, the invention adopts the following technical scheme:
the earth surface freezing and thawing state classification method based on the dynamic pruning selection frame is characterized by comprising the following steps of:
step 1, performing data preprocessing on desert lattice point samples of surface temperature data and regional land utilization data of regional observation sites to serve as labels of training samples;
collecting brightness temperature data of different frequencies of a regional passive microwave radiometer, and constructing 6 characteristic indexes of a surface freeze-thaw state training sample;
step 3, training different classes of base classifiers by using the processed sample labels and the characteristic indexes as model input, and simultaneously carrying out Bayesian optimization on the hyperparameters of the base classifiers;
step 4, based on the dynamic pruning selection frame, carrying out dynamic pruning and dynamic selection on different base classifiers to determine an optimal prediction model;
and 5, carrying out surface freeze-thaw state classification by adopting the optimal prediction model determined in the step 4.
The step 1 comprises the following steps:
step 11, determining labels of training samples by using the minimum ground surface temperature of 0cm in 2009, 2398 meteorological stations as the basis for judging the freeze-thaw state of the ground surface, wherein the minimum ground surface temperature T g 0 ℃ C. Or less, the near surface soil being considered as frozen, and conversely, the lowest surface temperature T g >At 0 ℃, the near-surface soil is considered to be in a molten state;
and step 12, randomly extracting desert lattice points as training samples by using land utilization data from the China land utilization current situation remote sensing monitoring database, setting all labels of observation stations corresponding to the desert of the land utilization data as deserts, and eliminating the influence of the deserts on freeze-thaw state judgment.
The step 2 comprises the following steps:
step 21, collecting brightness temperature data of different frequencies of the passive microwave radiometer in the Chinese area, wherein the brightness temperature data comprises ground microwave brightness temperatures of 19.35 GHz, 22.2 GHz, 37.0 GHz and 85.5 GHz;
and step 22, extracting 6 classification characteristic indexes corresponding to all the labels in the step 1 according to the brightness temperature data of the Chinese area. Including the 37GHz and 22GHz vertical polarization luminance temperatures, the 19GHz polarization difference PD, the scattering index SI, the spectral gradient SG, and the difference D between 22GHz and 37 GHz. Wherein 37GHz vertical polarization brightness temperature and 22GHz vertical polarization brightness temperature are proved to be good indexes for distinguishing freeze thawing and desert states. The calculation of the polarization difference PD, the scattering index SI, the spectral gradient SG, and the difference D between 22GHz and 37GHz is as follows:
PD=T B19V -T B19H
F=450.2-0.506×T B19V -1.874=T B22V +0.00637×T B22V 2
SI=F-T B85V
Figure BDA0003651188880000031
D=T B22V -T B37V
in the formula, T B19V Is a vertical polarization bright temperature of 19 GHz; t is B19H Is a horizontal polarization bright temperature of 19 GHz; t is a unit of B22V Is a vertical polarization bright temperature of 22 GHz; t is a unit of B37V Is a vertical polarization bright temperature of 37 GHz; t is B85V Is a vertical polarization bright temperature of 85 GHz; PD is the polarization difference at 19GHz brightness temperature; f is the estimated vertical polarization brightness temperature of 85GHz under the condition of no scattering; SI denotes T due to scattering B85V The degree of deviation of the actual value; SG is the spectral gradient between 19GHz and 37GHz luminance temperatures; d is the difference between the vertical polarization bright temperature of 22GHz and the vertical polarization bright temperature of 37 GHz.
The step 3 comprises the following steps:
step 31, randomly extracting 70% of training data Y by using a hierarchical sampling method to generate a base classifier set: selecting three models, namely a Random Forest (RF), an extreme random tree (ET) and an extreme gradient boost (XGboost) which have the best performance and the highest classification precision and have differences in the test from a plurality of classifiers as a base classifier pool C;
step 32, establishing a Bayesian optimization algorithm with the training accuracy as a target function, and optimizing the hyper-parameters of the model of the base classifier;
step 33, determining the number N of the RF and ET hyper-parameters needing to be optimized as sub-models estimators Characteristic number M features Maximum depth M of tree depth XGboost requires the optimized hyperparameter for the weight L of the model generated for each iteration rate Number of features M feature Determining the optimization range of each hyper-parameter;
step 34, initializing the iteration frequency F =1, and setting the maximum iteration frequency to be F max From the optimization range of each hyper-parameter
Randomly selecting one value to determine the hyper-parameter combination of the F-th iteration;
step 35, calculating the accuracy of each base classifier for sample Y cross validation under the hyper-parameter combination of the F-th iteration, constructing an output target function with the accuracy under the hyper-parameter combination, fitting a target function F (x) by using the hyper-parameter combination of the {1,2,.., F } th iteration and cross validation accuracy data and using gaussian process regression, and determining posterior distribution of the target function of the F-th iteration. The specific learning model of Bayesian optimization is as follows:
p * =argmax(f(p))
wherein P is a hyper-parameter, P belongs to P, P is a hyper-parameter search space, f (P) is an objective function, and P is an optimal hyper-parameter.
Step 36, selecting a confidence interval upper bound algorithm as an acquisition function according to the posterior distribution of the target function of the F-th iteration to search the hyper-parameter combination of the F + 1-th iteration from the optimization range;
step 37, if F is less than F max If yes, let F = F +1, return to step 35; if F is greater than or equal to F max Step 38 is entered;
step 38, select F max And combining the model parameters of each base classifier by the hyper-parameter with the highest accuracy in the hyper-parameter combinations to obtain the trained optimal base classifier pool.
The step 4 comprises the following steps:
step 41, inputting the remaining 30% of samples x generated by hierarchical sampling query Estimating sample x on training samples using KNNE techniques query K nearest neighbors x j (1 ≦ j ≦ K), the set of K nearest neighbors being called the capability Region ROC (Region of compatibility), the initial value of K being set to 3;
step 42, judge the x of the ability area j Whether 3 categories (melting, freezing and desert) are included, go to step 43 if there are 3 different samples, otherwise go to step 44;
step 43, there are 3 classes of x for the capacity region j Dynamic pruning is performed for each x j Is pre-selected at x j To correctly classify at least two different classes of classifiers. When a classifier is selected in advance, dynamically cleaning a classifier pool, temporarily deleting unqualified classifiers, and if at least two classifiers of different classes are not correctly classified, reserving all base classifiers;
step 44, based on all x in ROC j Estimating the capability of the base classifier, assuming that a certain classifier in C can correctly classifyAnd if the class is I samples in the K neighbor samples, the number of votes cast by the classifier during integration is i votes. The votes obtained by each selected base classifier are equal to the number of labels correctly predicted in the ROC, the classifiers are combined into a set to train the models according to the votes, the average value M of the probabilities that all model prediction samples are in a certain class is used as a standard, and the corresponding class with the highest probability is a final prediction result;
in step 45, K = K +1 (K ≦ 3 ≦ 20), repeating steps 41-45, outputting the accuracy A (K) obtained by each training, wherein the corresponding K value is the finally selected K value of the model when A (K) is the maximum value.
And 5, testing the trained model by using the earth surface temperature observation data of different years. And respectively obtaining model comprehensive evaluation indexes from the test results. From the test results, the Accuracy (Accuracy), recall (Recall Rate) and consistency (agent) of classification were calculated. Accuracy, i.e. the number of correct samples divided by the number of all samples. In general, the higher the accuracy, the better the classifier. The recall rate is an index for measuring the coverage rate and represents the proportion of a plurality of positive examples which are divided into positive examples in all the positive examples. The classification consistency is that the percentage of correct prediction days of each observation station all year round is evaluated through point-to-point comparison between the observation result and the prediction result;
Figure BDA0003651188880000051
Figure BDA0003651188880000052
in the formula, F F The number of freezes observed for the model and classified as freezes; f T Is the number of surfaces that are observed as frozen and misclassified as melted by the model; f D The number of freezes observed for the model and classified as deserts; t is F The number of melts observed by the model that are misclassified by the model as frozen; t is T Is the number of surfaces observed to be melted and classified by the model as melted;T D the number of deserts misclassified by the model for the model to observe melting; d F The number of deserts observed by the model that are misclassified by the model as frozen; d T Is the number of surfaces that are misclassified as melted by the model for the observed desert; d D A number of deserts observed for the model and classified by the model as deserts; TP is the number of correctly divided positive cases; FN is the number of instances that are wrongly divided into negative cases.
The invention has the beneficial effects that:
the invention provides a novel method for classifying freeze-thaw states of earth surfaces, which can dynamically select an optimal model on a pixel-by-pixel scale by jointly utilizing a plurality of machine learning models so as to predict the freeze-thaw states of the earth surfaces. The information of the ascending orbit and the descending orbit is integrated, and the earth surface state is classified into 5 types of freezing (freezing in the morning and freezing in the afternoon), thawing (thawing in the morning and thawing in the afternoon), transition (freezing in the morning and thawing in the afternoon), reverse transition (thawing in the morning and freezing in the afternoon), and desert. The method can also be used for predicting the freezing and thawing state of the area without observation data, detecting the freezing and thawing state of each area in China under the condition without ground real data, and researching the interaction of climate and freezing circle, carbon cycle and hydrological process.
Drawings
FIG. 1 is a schematic flow chart of a method for classifying freeze-thaw states of a ground surface according to the present invention;
FIG. 2 is a schematic diagram of a dynamic pruning selection framework provided by the present invention;
FIG. 3 is a 19GHz polarization difference PD clustering characteristic diagram of frozen soil, melting soil and desert in the specific embodiment;
FIG. 4 is a graph of scattering index SI clustering characteristics of frozen earth, melt earth and desert in a specific embodiment;
FIG. 5 is a spatial distribution diagram of predicted results in an exemplary embodiment.
Detailed Description
The invention is further described with reference to the accompanying drawings and specific examples.
It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a method for classifying the freeze-thaw state of the earth's surface based on the dynamic pruning selection of multiple classifiers includes the following steps:
step 1, carrying out data preprocessing on desert samples in surface temperature data of Chinese regional observation stations and Chinese regional land utilization data to serve as labels of training samples;
the method comprises the following steps:
step 11, determining labels of training samples by using the minimum ground surface temperature of 0cm in 2009, 2398 meteorological stations as the basis for judging the freeze-thaw state of the ground surface, wherein the minimum ground surface temperature T g 0 ℃ C. Or less, the near surface soil being considered as frozen, and conversely, the lowest surface temperature T g >At 0 ℃, the near-surface soil is considered to be in a molten state;
and step 12, randomly extracting desert lattice points as training samples by using the land utilization data from the China land utilization current situation remote sensing monitoring database, and setting all labels of observation sites corresponding to the land utilization data in a desert range as deserts. And eliminating the influence of desert on the judgment of the freeze-thaw state.
Collecting brightness temperature data of different frequencies of a passive microwave radiometer in a Chinese area, and constructing characteristic indexes of surface freeze-thaw state training samples;
the method comprises the following steps:
step 21, collecting brightness temperature data of different frequencies of the passive microwave radiometer in the Chinese area, wherein the brightness temperature data comprises ground microwave brightness temperatures of 19.35 GHz, 22.2 GHz, 37.0 GHz and 85.5 GHz;
and step 22, extracting 6 classification characteristic indexes corresponding to all the labels in the step 1 according to the brightness temperature data of the Chinese area. Including 37GHz vertically polarized brightness temperature and 22GHz vertically polarized brightness temperature, polarization difference PD, scattering index SI, spectral gradient SG, and difference D between 22GHz and 37 GHz. Wherein 37GHz vertical polarization brightness temperature and 22GHz vertical (V) polarization brightness temperature are proved to be good indexes for distinguishing freeze-thaw states from desert states. The calculation of the polarization difference PD, the scattering index SI, the spectral gradient SG, and the difference D between 22GHz and 37GHz is as follows:
PD=T B19V -T B19H
F=450.2-0.506×T B19V -1.874×T B22V +0.00637×T B22V 2
SI=F-T B85V
Figure BDA0003651188880000061
D=T B22V -T B37V
in the formula, T B19V Is a vertical polarization bright temperature of 19 GHz; t is B19H Is a horizontal polarization bright temperature of 19 GHz; t is B22V Is a vertical polarization bright temperature of 22 GHz; t is B37V Is a vertical polarization bright temperature of 37 GHz; t is B85V Is a vertical polarization bright temperature of 85 GHz; PD is a 19GHz polarization difference, primarily used to reflect the roughness of the earth's surface, as shown in fig. 3; f is the estimated vertical polarization brightness temperature of 85GHz under the condition of no scattering; the scattering index SI being the index of T due to scattering B85V Degree of deviation of the actual value; SI is mainly used to distinguish strong scatterers from weak and non-scatterers, as shown in fig. 4; SD is the spectral gradient between 19GHz and 37GHz brightness temperature; d is the difference value between the vertical polarization bright temperature of 22GHz and the vertical polarization bright temperature of 37 GHz.
Step 3, constructing training samples and labels on a site scale, training a base classifier by using the processed sample labels and characteristic indexes as model input, and carrying out Bayesian global optimization on the hyperparameters of the base classifier;
the method comprises the following steps:
step 31, randomly extracting 70% of training data Y by using a hierarchical sampling method to generate a base classifier set, and selecting three models, namely a Random Forest (RF), an extreme random tree (ET) and an extreme gradient boost (XGboost) which have the best performance, the highest classification precision and the difference in a test from a plurality of basic machine learning classifiers as a base classifier pool C;
step 32, establishing a Bayesian optimization algorithm with the training accuracy as a target function, and optimizing the hyper-parameters of the model of the base classifier;
step 33, determining the number N of the hyper-parameters needing to be optimized of the random forest and the extreme random tree as sub-models estimators And the number of features M features Maximum depth M of tree depth XGboost requires the optimized hyperparameter for the weight L of the model generated for each iteration rate Number of features M feature Determining the optimization range of each hyper-parameter;
step 34, initializing the iteration frequency F =1, and setting the maximum iteration frequency to be F max From the optimization range of each hyper-parameter
Randomly selecting one value to form a hyper-parameter combination of the F-th iteration;
step 35, calculating the accuracy of each base classifier in cross validation of the sample Y under the hyper-parameter combination of the F-th iteration; and constructing an objective function with the accuracy as output under the hyper-parameter combination, utilizing the hyper-parameter combination of the (1,2., F) th iteration and cross validation accuracy data, and utilizing a Gaussian process regression fitting objective function F (p) to determine the posterior distribution of the objective function of the F-th iteration. The specific learning model of Bayesian optimization is as follows:
p * =argmax(f(p))p∈P
wherein P is a hyper-parameter, P is a hyper-parameter search space, f (P) is an objective function, and P is an optimal hyper-parameter.
Step 36, selecting a confidence interval upper bound algorithm as an acquisition function according to the posterior distribution of the objective function of the F-th iteration to search the hyper-parameter combination of the F + 1-th iteration from the optimization range;
step 37, if F is less than F max If yes, let F = F +1, return to step 35; if F is greater than or equal to F max Then go to step 38;
step 38, select F max And combining the model parameters of each base classifier by the hyper-parameters with the highest accuracy in the hyper-parameter combinations to obtain an optimized base classifier pool.
Step 4, based on the dynamic pruning selection frame, carrying out dynamic pruning and dynamic selection on different base classifiers, and determining an optimal training model;
the method comprises the following steps:
step 41, inputting the remaining 30% of samples x generated by hierarchical sampling query Estimating sample x on training samples using KNNE techniques query K nearest neighbors x j (1 ≦ j ≦ K), the set of K nearest neighbors being called the capability Region ROC (Region of compatibility), the initial value of K being set to 3;
step 42, judge the x of the ability area j Whether 3 categories (melting, freezing and desert) are included, go to step 43 if there are 3 different samples, otherwise go to step 44;
step 43, there are 3 categories of x for the competence area j Dynamic pruning is performed for each x j Is pre-selected at x j To correctly classify at least two different classes of classifiers. When a classifier is selected in advance, dynamically cleaning a classifier pool, temporarily deleting unqualified classifiers, and if at least two classifiers of different classes are not correctly classified, retaining all base classifiers as shown in FIG. 2;
step 44, based on all x in ROC j Estimating the capability of the base classifier, and if a certain classifier in the C can correctly classify i samples in the K neighbor samples, the number of votes cast by the classifier in the integration process is i votes. The votes obtained by each selected base classifier are equal to the number of labels correctly predicted in the ROC, the classifiers are combined into a set to train the models according to the votes, the average value M of the probabilities that all model prediction samples are in a certain class is used as a standard, and the corresponding class with the highest probability is a final prediction result;
in step 45, K = K +1 (K ≦ 3 ≦ 20), repeating steps 41-45, outputting the accuracy A (K) obtained by each training, wherein the corresponding K value is the finally selected K value of the model when A (K) is the maximum value.
And 5, testing the trained model by using the earth surface temperature observation data of different years. And respectively obtaining model comprehensive evaluation indexes from the test results. From the test results, the Accuracy (Accuracy), recall (Recall Rate) and consistency (agent) of the classification were calculated. Accuracy is the most common evaluation index, i.e. the number of correct samples divided by the number of all samples. In general, the higher the accuracy, the better the classifier. The recall rate is an index for measuring the coverage rate and represents the proportion of a plurality of positive examples which are divided into the positive examples in all the positive examples. The classification consistency is that the percentage of correct prediction days of each observation station all year round is evaluated through point-to-point comparison between the observation result and the prediction result;
Figure BDA0003651188880000091
Figure BDA0003651188880000092
in the formula, F, T, D represents the observed freezing, thawing and desert ground states, respectively; the subscripts denote the sorted ground states, which also include three possible states, namely freeze (F), melt (T), and desert (D). Such as F F Freezes observed for the model and classified as number of freezes, and F T Is the number of observations that are frozen and misclassified by the model as melting the ground. If the real category is frozen and the prediction category is frozen, the true category is correctly divided into positive examples, namely the number of the positive examples correctly divided is TP; if the real type is frozen and the prediction type is melting or desert, the false negative case division is indicated, that is, the number of false negative cases division is FN.
And 6, predicting the freeze-thaw state of the earth surface by using the estimated prediction model.
Taking the whole Chinese area ground surface as an example, the pixel-by-pixel ground surface freeze-thaw states of Chinese areas at the morning orbit descending time and the afternoon orbit ascending time from 2009 to 2020 are predicted, the prediction model selects a base classifier combination to classify each pixel according to 6 characteristic indexes of each pixel and the similarity between the characteristic indexes and training samples, and the prediction result shows that the soil freeze-thaw area of Chinese in winter is the largest. The frozen surface area gradually decreases as the temperature increases. In summer, only the surface freeze-thaw type of the partial region of the Qinghai-Tibet plateau is in a transition state, and the other regions are in a complete thawing state. After summer, the freezing area begins to increase, and the area is enlarged from the Qinghai-Tibet plateau area to the periphery. By the end of the year, the surface soil in most regions of china, except the southern border region of china, has been frozen as shown in fig. 5.

Claims (6)

1. A surface freeze-thaw state classification method based on multi-classifier dynamic pruning selection is characterized by comprising the following steps:
step 1, performing data preprocessing on desert lattice point samples of surface temperature data and regional land utilization data of regional observation sites to serve as labels of training samples;
collecting brightness temperature data of different frequencies of the regional passive microwave radiometer, and constructing 6 characteristic indexes of the surface freeze-thaw state training sample;
step 3, training different classes of base classifiers by using the processed sample labels and the characteristic indexes as model input, and simultaneously carrying out Bayesian optimization on the hyperparameters of the base classifiers;
step 4, based on the dynamic pruning selection frame, carrying out dynamic pruning and dynamic selection on different base classifiers to determine an optimal prediction model;
step 5, performing surface freeze-thaw state classification by adopting the optimal prediction model determined in the step 4;
the step 2 comprises the following steps:
step 21, collecting the ground microwave brightness temperature of the area passive microwave radiometer under 19.35, 22.2, 37.0 and 85.5 GHz;
step 22, extracting 6 classification characteristic indexes corresponding to all the labels in the step 1 according to the brightness temperature data of the area collected in the step 21; the 6 classification characteristic indexes are respectively as follows: 37GHz vertical polarization brightness temperature and 22GHz vertical polarization brightness temperature, 19GHz polarization difference PD, scattering index SI, spectral gradient SG and difference D between 22GHz and 37 GHz; the calculation of the polarization difference PD, the scattering index SI, the spectral gradient SG, and the difference D between 22GHz and 37GHz is as follows:
PD=T B19V -T B19H
F=450.2-0.506×T B19V -1.874×T B22V +0.00637×T B22V 2
SI=F-T B85V
Figure FDA0004056204240000011
D=T B22V -T B37V
in the formula, T B19V Is a vertical polarization bright temperature of 19 GHz; t is a unit of B19H Is a horizontal polarization bright temperature of 19 GHz; t is B22V Is a vertical polarization bright temperature of 22 GHz; t is B37V Is a vertical polarization bright temperature of 37 GHz; t is B85V Is a vertical polarization bright temperature of 85 GHz; PD is the polarization difference at 19GHz brightness temperature; f is the estimated vertical polarization brightness temperature of 85GHz under the condition of no scattering; SI denotes T due to scattering B85V Degree of deviation of the actual value; SG is the spectral gradient between 19GHz and 37GHz luminance temperatures; d is the difference value between the vertical polarization bright temperature of 22GHz and the vertical polarization bright temperature of 37 GHz.
2. The surface freeze-thaw state classification method according to claim 1, wherein step 1 comprises:
step 11, utilizing the lowest temperature T of 0cm earth surface day of the observation station g Determining a label of a training sample as a basis for judging the freeze-thaw state of the earth surface; when the lowest surface temperature T g 0 ℃ C. Or less, the near surface soil being considered as frozen, and conversely, the lowest surface temperature T g >Near surface soil is considered as a molten state at 0 ℃;
and step 12, randomly extracting desert lattice points as training samples by using the land utilization data of the current situation remote sensing monitoring database, and setting all labels of observation sites corresponding to the desert of the land utilization data as the desert so as to eliminate the influence of the desert on the freeze-thaw state judgment.
3. The surface freeze-thaw state classification method according to claim 1, wherein the step 3 comprises:
step 31, randomly extracting part of training data Y by using a hierarchical sampling method to generate a base classifier set: selecting three models of a random forest RF, an extreme random tree ET and an extreme gradient lifting XGboost which have the best performance and the highest classification precision and have differences in the test from a plurality of classifiers as a base classifier pool C;
step 32, establishing a Bayesian optimization algorithm with the training accuracy as a target function, and optimizing hyper-parameters of the model of the base classifier;
step 33, determining the number N of the hyper-parameters needing to be optimized of the random forest RF and the extreme random tree ET as sub-models estimators Characteristic number M features Maximum depth of tree M depth Extreme gradient boost XGboost requires the optimized hyper-parameters to be the weight L of the model generated for each iteration rate Number of features M feature Determining the optimization range of each hyper-parameter;
step 34, initializing the iteration frequency F =1, and setting the maximum iteration frequency to be F max From the optimization range of each hyper-parameter
Randomly selecting one value to determine the hyper-parameter combination of the F-th iteration;
step 35, calculating the accuracy of each base classifier for sample Y cross validation under the hyper-parameter combination of the F-th iteration, constructing an objective function with the accuracy under the hyper-parameter combination as output, fitting an objective function F (p) by using the hyper-parameter combination of the {1,2,. Said., F } iteration and cross validation accuracy data and using gaussian process regression, and determining posterior distribution of the objective function of the F-th iteration, wherein a learning model specifically comprises:
p * =argmax(f(p))
wherein p is an optimal hyper-parameter; p is a hyper-parameter, belongs to P, P is a hyper-parameter search space, and f (P) is a target function;
step 36, selecting a confidence interval upper bound algorithm as an acquisition function according to the posterior distribution of the objective function of the F-th iteration to search the hyper-parameter combination of the F + 1-th iteration from the optimization range;
step 37, if F is less than F max If yes, let F = F +1, return to step 35; if F is greater than or equal to F max Then go to step 38;
step 38, select F max And combining the model parameters of each base classifier by the hyper-parameter with the highest accuracy in the hyper-parameter combinations to obtain the trained optimal base classifier pool.
4. The surface freeze-thaw state classification method according to claim 3, wherein the step 4 comprises:
step 41, inputting the residual sample x generated by hierarchical sampling query Estimating sample x on training samples using KNNE techniques query K nearest neighbors x j J is more than or equal to 1 and less than or equal to K, a set formed by K nearest neighbors is called a capability region ROC, and the initial value of K is set to be 3;
step 42, judge the x of the ability area j Whether 3 categories including melt, freeze and desert are included, go to step 43 if there are 3 different samples, otherwise go to step 44;
step 43, there are 3 categories of x for the competence area j Dynamic pruning is performed for each x j Is pre-selected at x j Correctly classify at least two different classes of classifiers within the capability range; when a classifier is selected in advance, dynamically cleaning a classifier pool, temporarily deleting unqualified classifiers, and if at least two classifiers which are classified correctly and have different classes do not exist, reserving all base classifiers;
step 44, all x in ROC based on the capability region j Estimating the capability of a base classifier, and if a certain classifier in the C can correctly classify i samples in the K neighbor samples, the number of votes cast by the classifier during integration is i votes; the votes obtained by each selected base classifier are equal to the number of labels correctly predicted in the ROC, the models are trained according to the set formed by combining the vote classifiers, the average value M of the probabilities of all model prediction samples in a certain class is used as a standard, and the corresponding class with the highest probability is a final prediction result;
in step 45, K = K +1, K ≦ 20, repeating steps 41-45, outputting the accuracy A (K) obtained by each training, wherein the corresponding K value is the finally selected K value of the model when A (K) is the maximum value.
5. The surface freeze-thaw state classification method according to claim 4, wherein the step 4 further comprises:
and step 46, evaluating the prediction performance of the prediction model trained in the step 44 on the data by using the test sets of different years, and if the result of the evaluation index for evaluating the prediction performance is lower than the target value or the model has an overfitting phenomenon, adjusting the number of the training data or replacing the base classifier, and re-training the model.
6. The surface freeze-thaw state classification method according to claim 5, wherein the evaluation indexes for evaluating the predictive performance are accuracy, recall rate and consistency of classification:
Figure FDA0004056204240000041
Figure FDA0004056204240000042
wherein Accuracy is Accuracy, recall is Recall, F F The number of freezes observed for the model and classified as freezes; f T Is the number of surfaces that are observed as frozen and misclassified as melted by the model; f D The number of freezes observed for the model and classified as deserts; t is a unit of F The number of meltings observed by the model that are misclassified by the model as freezes; t is a unit of T Is the number of surfaces observed to be melted and classified by the model as melted; t is D The number of deserts misclassified by the model for the model to observe melting; d F The number of deserts observed by the model that are misclassified by the model as frozen; d T Is the number of surfaces that are misclassified as melted by the model for the observed desert; d D A number of deserts observed for the model and classified by the model as deserts; TP is correctly classified as positiveThe number of the cells; FN is the number of instances that are wrongly divided into negative cases.
CN202210552737.9A 2022-05-19 2022-05-19 Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection Active CN115048986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210552737.9A CN115048986B (en) 2022-05-19 2022-05-19 Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210552737.9A CN115048986B (en) 2022-05-19 2022-05-19 Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection

Publications (2)

Publication Number Publication Date
CN115048986A CN115048986A (en) 2022-09-13
CN115048986B true CN115048986B (en) 2023-04-07

Family

ID=83159367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210552737.9A Active CN115048986B (en) 2022-05-19 2022-05-19 Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection

Country Status (1)

Country Link
CN (1) CN115048986B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876467B (en) * 2024-03-13 2024-06-04 广东海纬地恒空间信息技术有限公司 Surface area measurement method and device based on three-dimensional space positioning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111337531A (en) * 2020-02-26 2020-06-26 中国科学院遥感与数字地球研究所 Soil freezing and thawing state determination method and device and electronic equipment
CN113484338A (en) * 2021-06-24 2021-10-08 中国科学院空天信息创新研究院 Permafrost monitoring and classifying method based on passive microwave remote sensing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444657B (en) * 2020-03-10 2023-05-02 五邑大学 Method and device for constructing fatigue driving prediction model and storage medium
CN114186644A (en) * 2021-12-29 2022-03-15 南通大学 Defect report severity prediction method based on optimized random forest

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111337531A (en) * 2020-02-26 2020-06-26 中国科学院遥感与数字地球研究所 Soil freezing and thawing state determination method and device and electronic equipment
CN113484338A (en) * 2021-06-24 2021-10-08 中国科学院空天信息创新研究院 Permafrost monitoring and classifying method based on passive microwave remote sensing

Also Published As

Publication number Publication date
CN115048986A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
WO2022135265A1 (en) Failure warning and analysis method for reservoir dispatching rules under effects of climate change
CN107728234B (en) Thunder and lightning strength value prediction method based on atmospheric electric field data
CN104834977B (en) Traffic alert grade prediction technique based on learning distance metric
CN109165693B (en) Automatic identification method suitable for dew, frost and icing weather phenomena
CN109765559A (en) A kind of mountain area disastrous rainstorm recognition methods based on Doppler radar and deep learning
CN109917394B (en) Short-term intelligent extrapolation method based on weather radar
Hendrikx et al. Avalanche activity in an extreme maritime climate: The application of classification trees for forecasting
CN112949953B (en) Rainstorm forecasting method based on PP theory and AF model
CN110888186A (en) Method for forecasting hail and short-time heavy rainfall based on GBDT + LR model
Ikeda et al. Examination of mixed-phase precipitation forecasts from the High-Resolution Rapid Refresh model using surface observations and sounding data
CN115437036A (en) Sunflower satellite-based convective birth forecasting method
CN115048986B (en) Ground surface freezing and thawing state classification method based on multi-classifier dynamic pruning selection
Mayer et al. A random forest model to assess snow instability from simulated snow stratigraphy
CN115688053B (en) Mine environment dynamic monitoring management method and system based on data fusion
CN112651463A (en) Construction method of double-forecast model of hail weather in plateau area
Handler et al. Development of a probabilistic subfreezing road temperature nowcast and forecast using machine learning
CN111144637A (en) Regional power grid geological disaster forecasting model construction method based on machine learning
CN116805439A (en) Drought prediction method and system based on artificial intelligence and atmospheric circulation mechanism
CN117725448A (en) Cluster analysis method for meteorological navigation signal characteristics
Hendrick et al. Automated prediction of wet-snow avalanche activity in the Swiss Alps
Sedlar et al. Development of a random-forest cloud-regime classification model based on surface radiation and cloud products
CN116091801B (en) Rainfall image similarity searching method based on deep learning
Giordani et al. Characterizing hail-prone environments using convection-permitting reanalysis and overshooting top detections over south-central Europe
CN117849907B (en) Meteorological disaster targeted early warning method and system based on multi-source data
Shi et al. Radar-based hail-producing storm detection using positive unlabeled classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant