CN110502526B - Data sequence interpolation method suitable for icing phenomenon - Google Patents

Data sequence interpolation method suitable for icing phenomenon Download PDF

Info

Publication number
CN110502526B
CN110502526B CN201910789977.9A CN201910789977A CN110502526B CN 110502526 B CN110502526 B CN 110502526B CN 201910789977 A CN201910789977 A CN 201910789977A CN 110502526 B CN110502526 B CN 110502526B
Authority
CN
China
Prior art keywords
icing
sequence
data
station
daily
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910789977.9A
Other languages
Chinese (zh)
Other versions
CN110502526A (en
Inventor
温华洋
朱华亮
盛绍学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Meteorological Information Center
Original Assignee
Anhui Meteorological Information Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Meteorological Information Center filed Critical Anhui Meteorological Information Center
Priority to CN201910789977.9A priority Critical patent/CN110502526B/en
Publication of CN110502526A publication Critical patent/CN110502526A/en
Application granted granted Critical
Publication of CN110502526B publication Critical patent/CN110502526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data sequence interpolation method suitable for icing phenomena, which fully considers the formation mechanism of icing phenomena, applies the air temperature and ground surface temperature data observed by a ground weather station to control the quality of icing days observed by the rest weather stations in China 2400, utilizes a plurality of classification methods such as Bayesian judgment and the like, builds a set of icing phenomenon judgment model with higher judgment accuracy based on the lowest daily air temperature and the lowest daily ground surface temperature, corrects abnormal year data and forms an icing phenomenon reconstruction data set. The invention can detect most missing test and error test data, solves the problem of incomplete control lag of the existing data quality, and ensures the reliability of the data quality.

Description

Data sequence interpolation method suitable for icing phenomenon
Technical Field
The present invention relates to a method for interpolating a data sequence, and more particularly, to a method for interpolating a data sequence suitable for icing.
Background
Freezing (also called freezing) refers to freezing of open water surface into ice, including freezing of water in a vessel into ice, is a common natural phenomenon in daily life, but is usually accompanied by low-temperature cold injury, and has a great influence on human production and life and growth of animals and plants. For example, in winter, in rainy and snowy days, expressways and airport runway icing are closely related to social security production, public security travel and the like, and traffic accidents, traffic delays and the like caused by the ice are hot spot problems which are widely focused in various social circles. Therefore, the monitoring and forecasting of the icing phenomenon by the meteorological department are very important, and long-sequence and continuous icing data are beneficial to the analysis of climate change, the forecasting and service of agricultural meteorological and road traffic, and are scientific bases for effective disaster prevention, emergency rescue and other activities.
Each meteorological station in China starts to record icing phenomenon from the start of building, and longer-sequence icing observation data are formed. These observations are important for weather services in the fields of agriculture, traffic, etc., but have many problems as well. Firstly, the icing phenomenon is always observed in a manual visual observation mode, and the defects of strong subjectivity, low observation frequency and the like exist in the manual observation, so that the icing observation has the phenomena of missed detection, wrong detection and the like of different degrees, and meanwhile, most of missed detection and partial wrong detection data cannot be detected due to the fact that a data quality control end lacks a corresponding means. And secondly, the meteorological stations in China undergo multiple changes, and part of meteorological station observation tasks are adjusted for multiple times, for example, the night duty of a common meteorological station is cancelled in 2013, so that the icing observation record of part of meteorological stations is obviously reduced or not recorded, and for example, the ground observation automatic test point service developed in 2018 is cancelled, and the icing observation of part of stations in the province is also cancelled. In addition, the historical data has a lot of errors and leaks in the digitizing process, and blank records or missing record records need to be further confirmed. At present, the quality control method of ice-making data in China is simpler, and deeper data quality control is not performed yet; the method for correcting the data of the weather stations with a plurality of missing or abnormal data years is single, and the adopted linear regression method can only complete the interpolation of the annual (or monthly) icing days, so that whether icing occurs in a certain day or not is difficult to judge.
Disclosure of Invention
In view of the above problems, the present invention provides a method for interpolating a data sequence suitable for icing, which can perform deeper quality control on icing data, and complete interpolation of adult (or month) icing days, so as to form a uniform icing sequence reconstruction data set.
The invention adopts the following technical scheme: a method for interpolating data sequence suitable for icing phenomenon is carried out according to the following steps:
s1, aiming at an artificially observed icing phenomenon historical data sequence, carrying out data quality control on the sequence by combining element observation values, and marking the years without recording icing phenomena and abnormal icing days for the quality-controlled sequence.
S2, aiming at the data sequence processed in the step S1, carrying out continuity test, and classifying meteorological sites according to test results;
the data sequence continuity is good, and the weather stations without missing measurement and abnormal year are marked as class A; the data sequence has better continuity, and the stations with the missing or abnormal years accounting for less than or equal to 50 percent of the sequence length are marked as class B; the missing or abnormal year is greater than 50% of the sequence length, or the station with poor data sequence continuity is marked as C type
S3, establishing an icing identification model of each station, and completing the evaluation of the identification model to form an optimal sequence correction model, wherein the method comprises the following steps: aiming at class B sites, eliminating abnormal years of data, selecting class A site and class B site data in a random selection mode, taking 80% as a training data set and 20% as a checking data set, respectively adopting a Bayesian discrimination and classification training method, a classification logistic regression training method and a decision tree classification method to carry out icing discrimination model training, adopting the checking data set to carry out effect evaluation on a model obtained by training, ensuring that a model discrimination result is consistent with manual observation and is correct, and reserving a discrimination model with a discrimination accuracy reaching 85%;
and fusing the judgment model obtained through training by adopting a voting method or a weighting method based on judgment accuracy to obtain an optimal sequence correction model.
S4, correcting abnormal year data of the class B site and the class C site by using an optimal sequence correction model to obtain corrected data sequences, checking the continuity of the corrected data sequences, and performing uniformity correction on sequences which do not pass the continuity check to finally obtain a reconstructed sequence data set of icing phenomena.
The method of quality control in step S1 is further defined as a missing test, a threshold value test, an internal consistency test, an element consistency test, and a spatial consistency test.
Further, the element consistency check is performed by using the daily minimum air temperature and the daily minimum surface temperature, and if the daily minimum air temperature or the daily minimum surface temperature is not measured, the daily minimum air temperature or the daily minimum surface temperature is replaced by the minimum value.
Further, the specific method for checking the internal consistency is as follows: and calculating the average mu and the standard deviation sigma of the annual icing days by using the sequence, and if the annual icing days are smaller than mu-3 sigma or larger than mu+3 sigma, judging that the annual icing days are abnormal, and marking.
Further, the method for calculating the annual icing date number comprises the following steps:
firstly, checking the daily icing phenomenon, and if the icing phenomenon occurs on a certain day, and the lowest air temperature and the lowest surface temperature on the certain day are both higher than 10 ℃, judging that the icing phenomenon on the certain day is wrong;
secondly, calculating a correlation coefficient r between an icing daily number sequence and a daily minimum air temperature of each station and a daily number of the station which is smaller than 0 ℃, performing t test on the correlation coefficient r, if the correlation coefficient r passes the correlation coefficient test with the significance level of 0.05, obtaining a linear fitting formula of the icing daily number sequence and the daily minimum air temperature, and calculating average absolute deviation between an estimated value and a true value;
estimating the icing number of years of the inspection by using a linear fitting formula, and if the true value is more than or equal to 10 days and the true value is 50% lower than the estimated value or the true value is less than 10 days and the absolute value of the difference between the true value and the estimated value is more than or equal to 5 days, considering that the inspection is abnormal, and marking, wherein the correlation coefficient is the same as the estimated value
Figure BDA0002179259940000041
X 1 ,X 2 ,…,X n For annual icing number of days sequence, Y 1 ,Y 2 ,…,Y n The sequence of years and days is that the lowest daily air temperature is less than 0 ℃. t test statistic is
Figure BDA0002179259940000042
n is the sequence length.
Further, the method for checking the space consistency comprises the following steps: judging whether the icing daily number of a detection station is higher or lower than the average icing daily number of an adjacent station by 50%, if the deviation exceeds 20%, considering that the icing daily number of the detection station is abnormal, and marking;
wherein, the selection steps of the adjacent station are as follows: (1) an altitude difference from the inspection station of no more than 200 meters; (2) The correlation coefficient of the annual average air temperature sequence is greater than 0.7 and passes the correlation coefficient test with a significance level of 0.05; (3) If the number of the weather stations is less than 5 and calculated as the actual number, the space consistency check is not performed if the number is 0.
Further, the continuity check in step S2 employs a PMFT algorithm.
Further, in step S4, the method for correcting the abnormal year data of the class C site is as follows: correcting by adopting an optimal model of a reference station, wherein the selection standard of the reference station is developed as follows: (1) an altitude difference from the correction station of no more than 500 meters; (2) The correlation coefficient of the average air temperature sequence of the two years is larger than 0.7 and passes the correlation coefficient test with the significance level of 0.05; (3) the weather station closest to the correction station.
The invention has the advantages that: the invention provides a data sequence interpolation method suitable for icing, which can detect most missing detection and erroneous detection data, solves the problem of incomplete control lag of the existing data quality, and ensures the reliability of the data quality; the invention fully utilizes the existing observation data, combines a plurality of judging and identifying methods, establishes an icing judging and identifying model based on the lowest daily gas temperature and the lowest daily surface temperature, realizes the judgment of whether icing phenomenon occurs on a certain day, and can complete the interpolation correction of icing data on the day, month and year; the invention carries out homogeneity test and correction on the interpolated icing data sequence to form a long-sequence and uniform icing climate data sequence reconstruction data set, and provides a basic data source for climate change research, road icing prediction, agricultural meteorological service and other aspects.
Drawings
FIG. 1 is a flowchart of a method for data sequence interpolation for icing;
FIG. 2 is a flow chart of quality control of a data sequence in a method for data sequence interpolation according to the present invention;
FIG. 3 is a flow chart of model building in the method for data sequence interpolation provided by the invention;
FIG. 4 is a flowchart of the sequence correction and evaluation in the method for data sequence interpolation provided by the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and the specific embodiments.
Example 1: the embodiment provides a data sequence interpolation method suitable for icing phenomena, fully considers the formation mechanism of icing phenomena, applies the air temperature and ground surface temperature data observed by a ground weather station to control the quality of icing days observed by the rest weather stations in China 2400, utilizes a plurality of classification methods such as Bayesian discrimination and the like, builds a set of icing phenomenon discrimination model with higher discrimination accuracy based on the lowest daily air temperature and the lowest daily ground surface temperature, corrects abnormal year data and forms an icing phenomenon reconstruction data set. As shown in fig. 1, the specific implementation steps are as follows:
s1, aiming at an artificially observed icing phenomenon historical data sequence, carrying out data quality control on the sequence by combining element observation values, and marking the years without recording icing phenomena and abnormal icing days for the quality-controlled sequence.
As shown in fig. 2, the data quality control is performed on the manually observed icing data sequence of the rest of the weather sites in the country 2400 by adopting the methods of missing test, limit value test, internal consistency test, element consistency test, space consistency test and the like, and the years in which icing phenomenon and icing date abnormality are not recorded are marked. The specific implementation steps are as follows:
(1) And (5) missing test and inspection: for 2400 weather stations in the whole country, checking whether the normal observation of each weather station has the phenomenon of unobserved icing or not, and if the phenomenon of unobserved icing in a certain year, judging that the data in the year is abnormal, and marking.
(2) Limit value checking: checking whether the annual icing date of each weather station is less than or equal to 366 days and greater than or equal to 0 days, and if the annual icing date is not within the range of [0,366], judging that the annual data is abnormal, and marking.
(3) Element consistency check: the inspection was performed using the lowest daily gas temperature and the lowest daily surface temperature. If the day minimum air temperature (or day minimum surface temperature) is not measured, the day time air temperature (or surface temperature) is replaced by the minimum value.
Firstly, checking the daily icing phenomenon, and if the icing phenomenon occurs on a certain day, and the lowest air temperature and the lowest surface temperature on the day are both higher than 10 ℃, judging that the icing phenomenon recorded on the day is wrong, and correcting that the icing phenomenon does not occur.
Secondly, calculating a correlation coefficient between the icing daily number sequence and the daily number of years with the lowest daily air temperature less than 0 ℃ at each station:
Figure BDA0002179259940000061
wherein X is 1 ,X 2 ,…,X n For the annual daily number sequence with the lowest daily temperature less than 0 ℃, Y 1 ,Y 2 ,…,Y n Is a sequence of yearly icing days. If the correlation coefficient r is more than or equal to 0.6 and the significance level is 0.05 through t test, (test statistic)
Figure BDA0002179259940000062
n is the sequence length), then a linear fit formula for both is found:
Figure BDA0002179259940000063
wherein X represents the number of years of day with the lowest temperature of less than 0 ℃,
Figure BDA0002179259940000064
an estimated number of days of year ice is represented,
Figure BDA0002179259940000065
represents the average value of the number of days of the day with the lowest temperature less than 0℃,>
Figure BDA0002179259940000067
and represents the average value of the annual icing days. Estimated value calculated by linear formula +.>
Figure BDA0002179259940000066
And true value Y i Mean absolute deviation between>
Figure BDA0002179259940000071
And estimating the icing date of a certain year by using a linear fitting formula of B which is less than or equal to 6 days, and if the actual value is 50% lower than the estimated value, considering that the checked year is abnormal, and marking.
(4) Internal consistency check: calculating the average mu and standard deviation sigma of the annual icing days by using the sequence, and if the annual icing days are less than mu-3 sigma or more than mu+3 sigma, judging that the annual icing days are abnormal, and marking;
(5) Spatial consistency checking: the reference station is selected using the following criteria and steps:
a. the altitude difference from the detection station is not more than 200 meters;
b. the correlation coefficient of the two-station annual average air temperature sequence is larger than 0.7 and passes the correlation coefficient test with the significance level of 0.05;
c. the 5 weather stations closest to the detection station (if less than 5 are counted as actual number, if 0, no spatial consistency check is performed).
Calculating average annual icing date of reference station
Figure BDA0002179259940000072
Wherein Y is 1 ,…Y 5 For the reference station icing day, if the relative deviation between the icing day Y of a certain year of the detection station and the average icing day of the reference station is +.>
Figure BDA0002179259940000073
If the number of icing days exceeds 20%, the recording of the number of icing days is considered abnormal, and the number is marked.
S2, aiming at the data sequence processed in the step S1, carrying out continuity test, and classifying meteorological sites according to test results.
For the data sequence after quality control, adopting methods such as PMFT (penalized maximal F-test, maximum penalty F test) and the like to carry out homogeneity test on the sequence, wherein the test statistics are as follows:
Figure BDA0002179259940000074
if the test statistic is greater than a certain threshold, the station data sequence is considered to have a variable point, has poor uniformity and is marked. For example, in the annual icing daily number sequence of the combined fertilizer weather stations 1960-2010, the PMFT test statistics reach the maximum value in 2000, and the maximum value exceeds 11.06, namely, the annual icing daily number sequence of the combined fertilizer weather stations 1960-2010 has a variable point in 2000, and the sequence uniformity is poor.
And classifying 2400 weather sites according to the data sequence after the uniformity test. The data sequence uniformity is good, and the stations without missing measurement and abnormal year are marked as class A; the data sequence uniformity is better, and the stations with the missing or abnormal years accounting for less than (including) 50% of the sequence length are marked as class B; stations with missing or abnormal years accounting for over 50% of the sequence length, or poor data sequence uniformity, are designated as class C. If the annual icing daily number sequence of the combined fertilizer weather stations 1960-2010 is non-uniform, the combined fertilizer weather station is a C-type weather station.
S3, establishing an icing identification model of each station, and completing the evaluation of the identification model to form an optimal sequence correction model, wherein the method comprises the following steps: aiming at class B sites, eliminating abnormal years of data, selecting class A site and class B site data in a random selection mode, taking 80% as a training data set, taking 20% as a checking data set, respectively adopting a Bayesian discrimination and classification training method, a classification logistic regression training method and a decision tree classification method to carry out icing discrimination model training, adopting the checking data set to carry out effect evaluation on a model obtained by training, ensuring that a model discrimination result is consistent with manual observation and is correct, and reserving a discrimination model with a discrimination accuracy reaching 85%.
And fusing the judgment model obtained through training by adopting a voting method or a weighting method based on judgment accuracy to obtain an optimal sequence correction model.
(1) Model building
As shown in fig. 3, for the class B site, abnormal years of the data are removed, and then 80% of the class a site and the class B site data are selected as training data sets in a random selection mode, 20% are used as test data sets, and the freezing judgment model training is performed by adopting a bayesian judgment training method, a dichotomous logistic regression training method and a decision tree classification method respectively.
The Bayesian discrimination classification training method comprises the following steps:
a. calculating the frequency of icing occurrence in the training data set as a corresponding prior probability p (y i ) (i=0, 1), i=0 indicating no icing, i=1 indicating icing;
b. calculating the conditional probability p (z) of the lowest daily gas temperature and the lowest daily surface temperature under each category 1 /y j )(j=0,1)、p(z 2 /y j )(j=0,1);
c. According to the Bayesian theorem, p (Z/y) is calculated for each class i )p(y i ) The calculation formula is as follows:
p(Z/y i )p(y i )=p(z 1 /y i )p(z 2 /y i )。
d. according to p (Z/y) i )p(y i ) The largest term of (2) is used as a classification of whether or not icing is present. If p (Z/y) 1 )p(y 1 )>p(Z/y 0 )p(y 0 ) And if not, the icing phenomenon is considered to exist, otherwise, the icing phenomenon is considered to be absent.
The two-class logistic regression training method maps the result to a 0-1 space through a Sigmoid function (S-shaped growth curve), and sets the threshold value to be 0.5, and the training process is as follows:
a. the training sample is used for obtaining the optimal regression coefficient of the following formula by adopting a random gradient ascent method:
x=w 0 +w 1 z 1 +w 2 z 2
b. the calculated value of the above formula is taken as the input value of the Sigmoid function, and the formula is as follows:
Figure BDA0002179259940000091
/>
c. data greater than 0.5 is classified into class 1 (i.e., icing phenomenon occurs) and data less than 0.5 is classified into class 0 (i.e., icing phenomenon does not occur) according to a size with a threshold of 0.5.
The decision tree training method adopts CART (classification and regression tree ) algorithm to generate decision tree, and the training steps are as follows:
a. for a given training sample set D, its base index is:
Gini(D)=1-(|C 0 |/|D|) 2 -(|C 1 |/|D|) 2
wherein, |C 0 I represents the number of icing-free sets, |C 1 The number of frozen sets is represented by i, and the number of training samples is represented by i D. If the sample set D is divided into D according to whether the feature A (such as the lowest daily air temperature) takes a certain possible value a (such as 0℃) 1 ,D 2 Two parts, namely:
D 1 ={(x,y)∈D|A(x)=a},D 2 =D-D 1
then under the condition of feature a, the base index of set D is:
Figure BDA0002179259940000092
b. and selecting the feature with the minimum base index and the corresponding segmentation point from all possible features A and all possible segmentation points a as the optimal feature and the optimal segmentation point. Generating two sub-nodes from the current node according to the optimal characteristics and the optimal dividing points, and distributing the training data set into the two sub-nodes according to the characteristics;
c. recursively invoking (1) and (2) on both child nodes until there are no more features;
d. generating CART decision tree by using loss function C α (T t )=C(T t )+α|T t And carrying out CART tree pruning to obtain an optimal decision tree. Where α is a regularization parameter, C (T t ) For the prediction error of training data, |T t I is the number of leaf nodes of the subtree T.
(2) Model evaluation
Aiming at the icing judging model established in the step (1), an effect evaluation is carried out on the icing judging model by adopting a test data set, the judging result of the model is consistent with the manual observation and is considered to be correct, and the judging model with the judging accuracy reaching 85% is reserved. If the accuracy rate of a plurality of judgment models established by a certain station is less than 85%, the station judgment model is not successfully established.
(3) Model fusion
For the plurality of judgment models { D } obtained in (1) 1 ,D 2 ,D 3 … } where D 1 ,D 2 ,D 3 The method comprises the steps of respectively obtaining models by training a Bayesian discrimination method, logistic regression, decision tree and other methods, inputting daily minimum air temperature and daily minimum surface temperature values observed daily into the models to obtain discrimination results of the icing phenomenon of each model, and fusing the discrimination results of each model to obtain a final discrimination result. The fusion of the judgment results is carried out by adopting a weighting method based on the judgment accuracy, and the calculation steps are as follows:
a. initializing the weights of two classes with icing phenomenon and without icing phenomenon to 0;
b. calculation model D 1 ,D 2 ,D 3 The formula is as follows:
Figure BDA0002179259940000101
wherein error (D) i ) Representation model D i I.e., the higher the accuracy, the greater the weight.
c. Calculating an ith model D according to the input observation data x i Determination result y=d of icing phenomenon i (x)(i=1,2,3);
d. Model D i Weight value w of (2) i And finally, taking the class with the largest weight value in the icing class and the non-icing class as a final judgment result. If the sum of the weight values of the icing is larger than the sum of the weight values of the icing water, finally judging that dew phenomenon exists.
S4, correcting abnormal year data of the class B site and the class C site by using an optimal sequence correction model to obtain corrected data sequences, checking the continuity of the corrected data sequences, and performing uniformity correction on sequences which do not pass the continuity check to finally obtain a reconstructed sequence data set of icing phenomena.
(1) Reference station selection
Aiming at the unsuccessful site and the class C site established by the class B model, selecting a reference site by adopting the following standards and steps:
a. the altitude difference between the reference station and the correction station is not more than 500 meters;
b. the correlation coefficient of the annual average air temperature sequence of the reference station and the correction station is greater than 0.7 and passes the correlation coefficient test with the significance level of 0.05;
c. the station closest to the correction station and the model build was successful.
(2) Sequence correction
Correcting the abnormal year data of the class B site and the class C site data. The B-class site adopts the optimal correction model of the site to correct, and if the establishment of the site model is unsuccessful, the optimal model of the reference site is selected to correct; and C, correcting the class-C site by adopting the optimal model of the reference station.
(3) Uniformity inspection
The continuity of the corrected sequence was checked by SNHT (standard normal homogeneity test ), PMT (penalized maximal T-test, maximum penalty T test), PMFT (penalized maximal F-test, maximum penalty F test) or the like. And correcting the sequences which do not pass the continuity test by adopting a difference correction method and a comprehensive correction method to obtain an icing sequence reconstruction data set. The correction formula of the difference value correction method is as follows:
Figure BDA0002179259940000111
the correction formula of the comprehensive correction method is as follows:
Figure BDA0002179259940000121
y in the above two formulae α ,x α
Figure BDA0002179259940000122
σ y ,σ x The original value, average value and mean square error of the correction sequence and the basic sequence with the sample capacity of N are respectively.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be appreciated by persons skilled in the art that the above embodiments are not intended to limit the invention in any way, and that all technical solutions obtained by means of equivalent substitutions or equivalent transformations fall within the scope of the invention.

Claims (5)

1. A method for interpolating data sequence suitable for icing phenomenon is characterized by comprising the following steps:
s1, aiming at an icing phenomenon historical data sequence observed manually, carrying out data quality control on the icing phenomenon historical data sequence by combining element observation values, wherein the quality control method comprises missing detection, limit value detection, internal consistency detection, element consistency detection and space consistency detection;
the element consistency check is carried out by using the daily minimum air temperature and the daily minimum surface temperature, and if the daily minimum air temperature or the daily minimum surface temperature is not measured, the minimum value is taken as a substitute in the daily timing air temperature or the surface temperature;
the specific method for checking the internal consistency is as follows: calculating the average mu and standard deviation sigma of the annual icing days by using the sequence, and if the annual icing days are smaller than mu-3 sigma or larger than mu+3 sigma, judging that the annual icing days are abnormal, and marking;
the method for checking the space consistency comprises the following steps: judging whether the icing daily number of a detection station is higher or lower than the average icing daily number of an adjacent station by 50%, if the deviation exceeds 20%, considering that the icing daily number of the detection station is abnormal, and marking;
marking the years of which the icing phenomenon and the icing date abnormality are not recorded for the quality control sequence;
s2, aiming at the data sequence processed in the step S1, carrying out continuity test, and classifying meteorological sites according to test results;
the data sequence continuity is good, and the weather stations without missing measurement and abnormal year are marked as class A; the data sequence has better continuity, and the stations with the missing or abnormal years accounting for less than or equal to 50 percent of the sequence length are marked as class B; the missing or abnormal year is more than 50% of the sequence length, or the station with poor data sequence continuity is marked as class C;
s3, establishing an icing identification model of each station, and completing the evaluation of the identification model to form an optimal sequence correction model, wherein the method comprises the following steps: aiming at class B sites, eliminating abnormal years of data, selecting class A site and class B site data in a random selection mode, taking 80% as a training data set and 20% as a checking data set, respectively adopting a Bayesian discrimination and classification training method, a classification logistic regression training method and a decision tree classification method to carry out icing discrimination model training, adopting the checking data set to carry out effect evaluation on a model obtained by training, ensuring that a model discrimination result is consistent with manual observation and is correct, and reserving a discrimination model with a discrimination accuracy reaching 85%;
fusing the judgment model obtained through training by adopting a voting method or a weighting method based on judgment accuracy to obtain an optimal sequence correction model;
s4, correcting abnormal year data of the class B site and the class C site by using an optimal sequence correction model to obtain corrected data sequences, checking the continuity of the corrected data sequences, and performing uniformity correction on sequences which do not pass the continuity check to finally obtain a reconstructed sequence data set of icing phenomena.
2. The method for interpolating a data sequence for icing according to claim 1, wherein said yearly icing number is calculated by:
firstly, checking the daily icing phenomenon, and if the icing phenomenon occurs on a certain day, and the lowest air temperature and the lowest surface temperature on the certain day are both higher than 10 ℃, judging that the icing phenomenon on the certain day is wrong;
secondly, calculating a correlation coefficient r between an icing daily number sequence and a daily minimum air temperature of each station and a daily number of the station which is smaller than 0 ℃, performing t test on the correlation coefficient r, if the correlation coefficient r passes the correlation coefficient test with the significance level of 0.05, obtaining a linear fitting formula of the icing daily number sequence and the daily minimum air temperature, and calculating average absolute deviation between an estimated value and a true value;
estimating the icing number of years of the inspection by using a linear fitting formula, and if the true value is more than or equal to 10 days and the true value is 50% lower than the estimated value or the true value is less than 10 days and the absolute value of the difference between the true value and the estimated value is more than or equal to 5 days, considering that the inspection is abnormal, and marking, wherein the correlation coefficient is the same as the estimated value
Figure FDA0003826068210000021
X 1 ,X 2 ,…,X n For annual icing number of days sequence, Y 1 ,Y 2 ,…,Y n The time of day sequence is that the lowest temperature is less than 0 ℃ and the t test statistic is
Figure FDA0003826068210000022
n is the sequence length.
3. A method for data sequence interpolation for icing according to claim 1, wherein the adjacent stations are selected by: (1) an altitude difference from the inspection station of no more than 200 meters; (2) The correlation coefficient of the annual average air temperature sequence is greater than 0.7 and passes the correlation coefficient test with a significance level of 0.05; (3) If the number of the weather stations is less than 5 and calculated as the actual number, the space consistency check is not performed if the number is 0.
4. A method as claimed in claim 1, wherein the continuity check in step S2 is performed using a PMFT algorithm.
5. The method of claim 1, wherein the method for correcting abnormal year data of class C sites in step S4 is as follows: correcting by adopting an optimal model of a reference station, wherein the selection standard of the reference station is developed as follows: (1) an altitude difference from the correction station of no more than 500 meters; (2) The correlation coefficient of the average air temperature sequence of the two years is larger than 0.7 and passes the correlation coefficient test with the significance level of 0.05; (3) the weather station closest to the correction station.
CN201910789977.9A 2019-08-26 2019-08-26 Data sequence interpolation method suitable for icing phenomenon Active CN110502526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910789977.9A CN110502526B (en) 2019-08-26 2019-08-26 Data sequence interpolation method suitable for icing phenomenon

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910789977.9A CN110502526B (en) 2019-08-26 2019-08-26 Data sequence interpolation method suitable for icing phenomenon

Publications (2)

Publication Number Publication Date
CN110502526A CN110502526A (en) 2019-11-26
CN110502526B true CN110502526B (en) 2023-05-09

Family

ID=68589558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910789977.9A Active CN110502526B (en) 2019-08-26 2019-08-26 Data sequence interpolation method suitable for icing phenomenon

Country Status (1)

Country Link
CN (1) CN110502526B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286924A (en) * 2020-11-20 2021-01-29 中国水利水电科学研究院 Data cleaning technology for dynamic identification of data abnormality and multi-mode self-matching
CN113192007B (en) * 2021-04-07 2022-01-21 青岛地质工程勘察院(青岛地质勘查开发局) Multi-scale information fusion geothermal abnormal region extraction method
CN118069895B (en) * 2024-04-19 2024-07-23 临沂大学 Teenager physique big data optimal storage method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034252A (en) * 2018-08-01 2018-12-18 中国科学院大气物理研究所 The automatic identification method of air quality website monitoring data exception

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7228234B2 (en) * 2005-01-26 2007-06-05 Siemens Building Technologies, Inc. Weather data quality control and ranking method
CN104635281B (en) * 2015-02-17 2016-08-24 南京信息工程大学 Data of Automatic Weather method of quality control based on severe weather process correction
CN106503458B (en) * 2016-10-26 2019-04-16 南京信息工程大学 A kind of surface air temperature data quality control method
CN106909722B (en) * 2017-02-10 2019-07-26 广西壮族自治区气象减灾研究所 A kind of accurate inversion method of large area of temperature near the ground
CN109958588B (en) * 2017-12-14 2020-08-07 北京金风科创风电设备有限公司 Icing prediction method, icing prediction device, storage medium, model generation method and model generation device
CN109165693B (en) * 2018-09-11 2022-12-06 安徽省气象信息中心 Automatic identification method suitable for dew, frost and icing weather phenomena

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034252A (en) * 2018-08-01 2018-12-18 中国科学院大气物理研究所 The automatic identification method of air quality website monitoring data exception

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Wind power prediction with missing data using Gaussian process regression and multiple imputation;T. Liu et al.;《Applied Soft Computing》;第905-916页 *
中国结冰现象序列的建立及气候变化分析;余予等;《高原气象》(第02期);第252-258页 *
基于邻域特征的温度缺失值的填补方法;唐云辉等;《中国农业气象》(第04期);第76-79页 *

Also Published As

Publication number Publication date
CN110502526A (en) 2019-11-26

Similar Documents

Publication Publication Date Title
CN113919448B (en) Method for analyzing influence factors of carbon dioxide concentration prediction at any time-space position
CN110502526B (en) Data sequence interpolation method suitable for icing phenomenon
Tsakiris et al. Regional drought assessment based on the Reconnaissance Drought Index (RDI)
CN109165693B (en) Automatic identification method suitable for dew, frost and icing weather phenomena
CN113919231B (en) PM2.5 concentration space-time change prediction method and system based on space-time diagram neural network
CN111260111B (en) Runoff forecasting improvement method based on weather big data
Sakamoto et al. Detecting spatiotemporal changes of corn developmental stages in the US corn belt using MODIS WDRVI data
CN113033957B (en) Multi-mode rainfall forecast and real-time dynamic inspection and evaluation system
CN114298162A (en) Rainfall quality control and evaluation method fusing multi-source data of satellite radar and application
CN114936201A (en) Satellite precipitation data correction method based on adaptive block neural network model
CN114648705A (en) Carbon sink monitoring system and method based on satellite remote sensing
CN114880933A (en) Atmospheric temperature and humidity profile inversion method and system for non-exploration-site foundation microwave radiometer based on reanalysis data
CN112069673A (en) Method for estimating surface PM2.5 concentration based on gradient lifting decision tree
CN114926743A (en) Crop classification method and system based on dynamic time window
CN109543911B (en) Sunlight radiation prediction method and system
CN115691049A (en) Convection birth early warning method based on deep learning
CN108830444B (en) Method and device for evaluating and correcting sounding observation data
CN113742929A (en) Data quality evaluation method for grid weather live
CN113821895B (en) Method and device for constructing power transmission line icing thickness prediction model and storage medium
CN115420688A (en) Agricultural disaster information remote sensing extraction loss evaluation method based on Internet of things
Ndakize et al. A statistical analysis of the historical rainfall data over eastern province in Rwanda
Ou et al. Sensitivity of calibrated week-2 probabilistic forecast skill to reforecast sampling of the NCEP Global Ensemble Forecast System
Imfeld et al. 250 years of daily weather: Temperature and precipitation fields for Switzerland since 1763
CN112380778A (en) Weather drought forecasting method based on sea temperature
CN117933476B (en) Vegetation character spatial distribution estimation method for multi-year frozen soil region of Qinghai-Tibet plateau

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant