CN111368887B - Training method of thunderstorm weather prediction model and thunderstorm weather prediction method - Google Patents

Training method of thunderstorm weather prediction model and thunderstorm weather prediction method Download PDF

Info

Publication number
CN111368887B
CN111368887B CN202010116671.XA CN202010116671A CN111368887B CN 111368887 B CN111368887 B CN 111368887B CN 202010116671 A CN202010116671 A CN 202010116671A CN 111368887 B CN111368887 B CN 111368887B
Authority
CN
China
Prior art keywords
data
features
feature
thunderstorm weather
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010116671.XA
Other languages
Chinese (zh)
Other versions
CN111368887A (en
Inventor
段洪云
彭琛
汪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010116671.XA priority Critical patent/CN111368887B/en
Publication of CN111368887A publication Critical patent/CN111368887A/en
Priority to PCT/CN2020/117578 priority patent/WO2021169271A1/en
Application granted granted Critical
Publication of CN111368887B publication Critical patent/CN111368887B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01WMETEOROLOGY
    • G01W1/00Meteorology
    • G01W1/10Devices for predicting weather conditions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Atmospheric Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ecology (AREA)
  • Environmental Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a training method of a thunderstorm weather prediction model, which comprises the following steps: acquiring multiple groups of data, wherein each group of data comprises thunderstorm weather, multiple characteristics of the thunderstorm weather and association relations of the thunderstorm weather and the multiple characteristics of the thunderstorm weather; screening target features from a plurality of features of the plurality of groups of data, wherein the target features are features with first feature importance meeting a first preset condition; in each set of data of the plurality of sets of data, eliminating the characteristics irrelevant to the target characteristics to form a plurality of sets of training data; and training a preset algorithm by utilizing the plurality of sets of training data to obtain a thunderstorm weather prediction model. The invention also provides a thunderstorm weather prediction method, a training device of the thunderstorm weather prediction model, a thunderstorm weather prediction device, computer equipment and a computer readable storage medium.

Description

Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
Technical Field
The invention relates to the technical field of computers, in particular to a training method of a thunderstorm weather prediction model, a thunderstorm weather prediction method, a device, computer equipment and a computer readable storage medium.
Background
With the development of meteorological technology, a mode of predicting weather conditions is developed. In general, when the weather condition is pre-tested, the weather condition can be predicted according to the weather data collected by large-scale equipment such as satellites, radars and the like, for example, the collected weather data is input into a pre-trained weather prediction model. In order to ensure accuracy of weather prediction, it is generally required to ensure prediction accuracy of a weather prediction model, which has high requirements for a training process of the weather prediction model.
However, the inventors found, in the course of studying the present invention, that there are at least the following drawbacks in the prior art: in the prior art, when a weather model is trained, only weather factors in a weather factor pool are simply screened, and the reserved redundant factors still become excessive, so that an effective weather prediction model cannot be trained due to the fact that core factors cannot be obtained for model training.
Disclosure of Invention
The invention aims to provide a training method, a thunderstorm weather prediction method, a device, computer equipment and a computer readable storage medium for a thunderstorm weather prediction model, which can solve the defects in the prior art.
One aspect of the invention provides a training method of a thunderstorm weather prediction model, comprising the following steps: acquiring multiple groups of data, wherein each group of data comprises thunderstorm weather, multiple characteristics of the thunderstorm weather and association relations of the thunderstorm weather and the multiple characteristics of the thunderstorm weather; screening target features from a plurality of features of the plurality of groups of data, wherein the target features are features with first feature importance meeting a first preset condition; removing the characteristics irrelevant to the target characteristics from each group of data of the plurality of groups of data to form a plurality of groups of training data; and training a preset algorithm by utilizing the plurality of sets of training data to obtain the thunderstorm weather prediction model.
Optionally, the target feature includes a linear target feature belonging to a linear type, and selecting the target feature from a plurality of features of the plurality of sets of data includes: sampling the plurality of sets of data for N times to obtain N data sets, wherein each data set comprises one or more of the plurality of sets of data; inputting the data set into a linear feature screening model for each of the N data sets, wherein the linear feature screening model is configured to calculate a second feature importance of each feature for the plurality of features of the data set, and output features of the second feature importance satisfying a second predetermined condition and belonging to the linear type, referred to as a set of preliminary linear features; acquiring N groups of preliminary linear features output by the linear feature screening model; and screening out the linear target features by using the N groups of preliminary linear features.
Optionally, selecting the linear target feature by using the N sets of preliminary linear features includes: counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic; screening out the characteristics with the third characteristic importance meeting a third preset condition from the N groups of preliminary linear characteristics, wherein the characteristics are called secondary linear characteristics; and screening out the linear target features by using the secondary linear features.
Optionally, selecting the linear target feature by using the sub-step linear feature includes: step A1: calculating the feature quantity M of all the features in the sub-step linear features and the correlation coefficient of each feature and the thunderstorm weather; step A2: taking the characteristic with the correlation coefficient 1 as one characteristic of the linear target characteristic; step A3: inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a1 st preset regression model to obtain the 1 st saliency; step A4: judging whether i is greater than M, executing the step A5 when i is not greater than M, and executing the step A8 when i is greater than M, wherein the initial value of i is 1; step A5: inputting the i+1th characteristic with the large correlation coefficient into the i+1th preset regression model to obtain the i+1th significance, wherein the i+1th preset regression model is obtained by inputting the former i characteristic and thunderstorm weather into the i preset regression model; step A6: judging whether the relation between the ith significance and the i+1 significance meets a sixth preset condition, if so, executing the step A7, and if not, executing the step A4; step A7: determining the characteristic with the i+1 th large correlation coefficient as one characteristic of the linear target characteristic; step A8: all features from the sub-step linear features are determined as the linear target features.
Optionally, step A3 includes: inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a1 st preset regression model to obtain a1 st saliency and a1 st first fitting goodness; step A5 includes: inputting the characteristic with the i+1th large correlation coefficient into the i+1th preset regression model to obtain the i+1th significance and the i+1th first fitting goodness; after step A7, and before step A8, the method further includes: and (3) judging whether the relation between the ith first fitting goodness and the (i+1) th first fitting goodness meets a seventh preset condition, if not, executing the step A4, and if so, executing the step A8.
Optionally, the target feature includes a nonlinear target feature belonging to a nonlinear type, and selecting the target feature from a plurality of features of the plurality of sets of data includes: inputting the plurality of sets of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance degree of each of the plurality of features by using the plurality of sets of data, and outputting a feature of which the fourth feature importance degree meets a fourth predetermined condition and belongs to the nonlinear type; removing the characteristics of which the fourth characteristic importance meets a fifth preset condition from the characteristics output by the nonlinear characteristic screening model to obtain preliminary nonlinear characteristics; removing characteristics irrelevant to the preliminary nonlinear characteristics aiming at each group of data of the plurality of groups of data to obtain a plurality of groups of preliminary screening data; and continuously inputting the plurality of groups of preliminary screening data into the nonlinear characteristic screening model until the nonlinear target characteristics are screened out.
Optionally, after inputting the plurality of sets of data into the nonlinear feature screening model, the method further comprises: calculating a second goodness-of-fit of the nonlinear feature screening model; and continuously inputting the plurality of groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out, wherein the method comprises the following steps of: continuously inputting the plurality of groups of preliminary screening data into the nonlinear feature screening model to obtain secondary nonlinear features; removing characteristics irrelevant to the nonlinear characteristics of the secondary steps aiming at each group of preliminary screening data of the plurality of groups of preliminary screening data to obtain a plurality of groups of secondary step screening data; calculating a third fitting goodness of the nonlinear feature screening model; judging whether the relation between the second fitting goodness and the third fitting goodness meets an eighth preset condition; if yes, the secondary nonlinear characteristic is determined to be the nonlinear target characteristic. If not, continuously inputting the multiple groups of multi-step screening data into the nonlinear characteristic screening model until the nonlinear target characteristics are screened out.
Optionally, inputting the plurality of sets of data into a nonlinear feature screening model, including: pre-screening the plurality of characteristics by utilizing a preset rule aiming at each group of data in the plurality of groups of data to obtain a plurality of groups of preprocessed data; inputting the plurality of sets of preprocessing data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating the fourth feature importance of each feature in the features subjected to the pre-screening by using the plurality of sets of preprocessing data, and outputting the features of which the fourth feature importance meets the fourth preset condition and belongs to the nonlinear type.
Another aspect of the present invention provides a thunderstorm weather prediction method, comprising: acquiring target characteristics of the current weather; inputting the target characteristics into a pre-trained thunderstorm weather prediction model so that the thunderstorm weather prediction model outputs a weather prediction result, wherein the thunderstorm weather prediction model is obtained by the method in any one embodiment; judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
In still another aspect, the present invention provides a training apparatus for a thunderstorm weather prediction model, comprising: the first acquisition module is used for acquiring a plurality of groups of data, wherein each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an association relation of the thunderstorm weather and the characteristics of the thunderstorm weather; the screening module is used for screening target features from the multiple features of the multiple groups of data, wherein the target features are features with first feature importance meeting a first preset condition; the rejecting module is used for rejecting the characteristics irrelevant to the target characteristics in each group of data of the plurality of groups of data to form a plurality of groups of training data; and the training module is used for training a preset algorithm by utilizing the plurality of sets of training data to obtain the thunderstorm weather prediction model.
Still another aspect of the present invention provides a thunderstorm weather prediction apparatus, comprising: the second acquisition module is used for acquiring target characteristics of the current weather; the input module is used for inputting the target characteristics into a pre-trained thunderstorm weather prediction model so as to enable the thunderstorm weather prediction model to output weather prediction results, wherein the thunderstorm weather prediction model is obtained by the method of any one embodiment; and the judging module is used for judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
Yet another aspect of the present invention provides a computer apparatus comprising: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the training method of the thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method when executing the computer program.
Yet another aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the training method of the thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method described above.
According to the training method of the thunderstorm weather prediction model, provided by the invention, target features with the first feature importance meeting the first preset condition are screened out, features irrelevant to the target features are removed, multiple sets of training data are obtained, and then the thunderstorm weather prediction model is trained by utilizing the multiple sets of training data. Because the training data do not comprise redundant features, and the magnitude of the features included in the training data is also obviously reduced, the method is sufficient to overcome the defects in the prior art, and the purpose of improving the accuracy of the trained thunderstorm weather prediction model is achieved.
Furthermore, on the basis of the existing characteristic engineering characteristic screening, the invention considers two parts of characteristics: the method comprises the steps of taking the independent effects of the linear type features and the nonlinear type features into consideration, taking the synergistic effect among multiple features into consideration, and adding nonlinearity to influence the expression capacity of a lifting model.
For the linear type of characteristics, N groups of preliminary linear characteristics are screened out through N times of extraction and sequentially outputting a nonlinear characteristic screening model, then sub-step linear characteristics are counted from the N groups of preliminary linear characteristics, then after x with the largest y response to the output is selected through an improved preset regression model, new factors are gradually added, the new factors are ensured not to cause significant change of the original factors until the fitting goodness of the model is not improved, and through two layers of screening, different screening processes are different in pertinence, so that the interpretability of the characteristic screening process and the validity of the final linear target characteristics can be well improved.
For the characteristics of the nonlinear type, the controllability of the characteristic magnitude can be ensured through pre-screening, the characteristics are convenient to input into a nonlinear characteristic screening model, then the characteristics of which the fourth characteristic importance meets a fifth preset condition after each round of training are substituted into the training of the next round according to the fourth characteristic importance of the characteristics, and the characteristics with lower importance are gradually deleted, so that the characteristic quantity input into the nonlinear characteristic screening model is ensured to enter in a decreasing mode, and the purpose of nonlinear target characteristic screening is achieved while the accuracy of the model is improved. The expression capacity of the model not only depends on the existing single characteristics, but also can fit the effect of the model to a certain extent through the cooperative expression among the characteristics, and the accuracy of the result is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 schematically illustrates a flow chart of a training method of a thunderstorm weather prediction model, according to an embodiment of the invention;
FIG. 2 schematically illustrates a flow chart of a thunderstorm weather prediction method, according to an embodiment of the invention;
FIG. 3 schematically illustrates a block diagram of a training apparatus of a thunderstorm weather prediction model, according to an embodiment of the invention;
FIG. 4 schematically illustrates a block diagram of a thunderstorm weather prediction device, according to an embodiment of the invention;
Fig. 5 schematically shows a block diagram of a computer device adapted to implement a training method of a thunderstorm weather prediction model and/or a thunderstorm weather prediction method according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In order to better understand the beneficial technical effects achieved by the present invention, prior art related to the present invention is described before describing the specific embodiments of the present invention. In the prior art, feature screening is also performed before model training, and due to the improvement of the existing storage technology and calculation capability, the construction of the model is more perfect due to the fact that a large number of feature indexes exist, the accuracy of results is guaranteed, but the training of the model is extremely time-consuming due to a large number of redundant features, and the phenomenon of fitting is easy to occur. The existing feature screening method mainly uses a feature screening mode based on statistics, such as modes based on null rate, variance, correlation, collinearity and the like, and the method can play a role in distinguishing features to a certain extent, but under the condition of huge feature pool magnitude, the magnitude of the features is difficult to effectively reduce only by means of the mode, on one hand, the dependence of the objective screening mode on a statistical theory is too large, and therefore the interpretability of the features in the screening process is reduced, and on the other hand, the model cannot have good expandability due to feature screening from a single angle, and the influence of multi-feature antagonism on dependent variables is lacked. Therefore, core features cannot be obtained by utilizing the feature selection of the statistical method, so that an effective attribution model cannot be fitted.
According to the training method of the thunderstorm weather prediction model, provided by the invention, target features with the first feature importance meeting the first preset condition are screened out, features irrelevant to the target features are removed, multiple sets of training data are obtained, and then the thunderstorm weather prediction model is trained by utilizing the multiple sets of training data. Because the training data do not comprise redundant features, and the magnitude of the features included in the training data is also obviously reduced, the method is sufficient to overcome the defects in the prior art, and the purpose of improving the accuracy of the trained thunderstorm weather prediction model is achieved.
Furthermore, on the basis of the existing characteristic engineering characteristic screening, the invention considers two parts of characteristics: the method comprises the steps of taking the independent effects of the linear type features and the nonlinear type features into consideration, taking the synergistic effect among multiple features into consideration, and adding nonlinearity to influence the expression capacity of a lifting model.
For the linear type of characteristics, N groups of preliminary linear characteristics are screened out through N times of extraction and sequentially outputting a nonlinear characteristic screening model, then sub-step linear characteristics are counted from the N groups of preliminary linear characteristics, then after x with the largest y response to the output is selected through an improved preset regression model, new factors are gradually added, the new factors are ensured not to cause significant change of the original factors until the fitting goodness of the model is not improved, and through two layers of screening, different screening processes are different in pertinence, so that the interpretability of the characteristic screening process and the validity of the final linear target characteristics can be well improved.
For the characteristics of the nonlinear type, the controllability of the characteristic magnitude can be ensured through pre-screening, the characteristics are convenient to input into a nonlinear characteristic screening model, then the characteristics of which the fourth characteristic importance meets a fifth preset condition after each round of training are substituted into the training of the next round according to the fourth characteristic importance of the characteristics, and the characteristics with lower importance are gradually deleted, so that the characteristic quantity input into the nonlinear characteristic screening model is ensured to enter in a decreasing mode, and the purpose of nonlinear target characteristic screening is achieved while the accuracy of the model is improved. The expression capacity of the model not only depends on the existing single characteristics, but also can fit the effect of the model to a certain extent through the cooperative expression among the characteristics, and the accuracy of the result is improved.
FIG. 1 schematically illustrates a flow chart of a training method of a thunderstorm weather prediction model, according to an embodiment of the invention.
As shown in fig. 1, the training method of the thunderstorm weather prediction model may include steps S1 to S4, where:
Step S1, acquiring multiple groups of data, wherein each group of data comprises thunderstorm weather, multiple characteristics of the thunderstorm weather and association relations of the multiple characteristics of the thunderstorm weather and the thunderstorm weather.
In this embodiment, each set of data is data corresponding to a certain thunderstorm day in history, and each set of data includes an output y and an input x, that is, thunderstorm weather is referred to as output y, a plurality of features of the thunderstorm weather are referred to as input x, and an association relationship exists between y and x, that is, an association relationship exists between the thunderstorm weather and the plurality of features. Among other things, the characteristics of thunderstorm weather may be: temperature, air pressure, rainfall, humidity, air density, air volume, etc.
For example, there are 4 sets of data, the first set of data corresponding to data for 3 months and 15 days, including: the method comprises the following steps of (1) determining thunderstorm weather, a plurality of characteristics of the thunderstorm weather within 3 months and 15 days and association relation of the characteristics of the thunderstorm weather and the characteristics of the thunderstorm weather; the second set of data corresponds to data for day 3 months 18, including: the method comprises the following steps of (1) determining thunderstorm weather, a plurality of characteristics of the thunderstorm weather within 3 months and 18 days, and association relation of the characteristics and the characteristics; the third set of data corresponds to data for 5 months and 7 days, including: the method comprises the following steps of (1) determining thunderstorm weather, a plurality of characteristics of 5-month 7-day thunderstorm weather and association relation of the characteristics; the third set of data corresponds to data for 6 months and 24 days, including: the thunderstorm weather, a plurality of characteristics of the thunderstorm weather of 24 days of 6 months and the association relation of the characteristics.
And S2, screening target features from a plurality of features of a plurality of groups of data, wherein the target features are features with first feature importance meeting a first preset condition.
The aim of the embodiment is to train out a thunderstorm weather model by utilizing target characteristics, thereby overcoming the defects of the prior art. Therefore, it is necessary to screen out, as target features, features whose first feature importance satisfies a first predetermined condition from among a plurality of features. Each feature corresponds to a first feature importance level, and the first feature importance level is used for measuring the association compactness of the feature and thunderstorm weather. Alternatively, the first feature importance may be a correlation coefficient of each feature with thunderstorm weather, and the first predetermined condition may be a feature in which the first feature importance is arranged before a predetermined position.
Optionally, step S2 may include step S21 and/or step S22, wherein:
S21, screening out linear target features belonging to a linear type from a plurality of features by utilizing a plurality of groups of data; and/or
And S22, screening nonlinear target features belonging to nonlinear types from the plurality of features by utilizing the plurality of groups of data.
The plurality of features may include a linear type feature or a nonlinear type feature, and the linear type feature may also belong to the nonlinear type at the same time. In the present embodiment, when only a feature belonging to a linear type exists among a plurality of features, a linear target feature is determined as a target feature; determining a nonlinear target feature as a target feature when only a feature belonging to a nonlinear type exists in the plurality of features; when there is a feature belonging to both the linear type and the nonlinear type among the plurality of features, the linear target feature and the nonlinear target feature are determined as target features.
It should be noted that it is not known in advance which features are of the linear type and which features are of the nonlinear type, so, in order to ensure that the linear target features can be accurately screened when the features of the linear type exist, step S2 may include steps S21 to S24, where the target features may include linear target features of the linear type, specifically:
Step S21, performing N samples on the plurality of sets of data, to obtain N data sets, where each data set includes one or more of the plurality of sets of data.
The sampling manner is not limited, for example, the sampling may be performed by using the idea of Bootstrapping algorithm. For example, n=3, the first data set comprises: a first set of data, a third set of data, and a fourth set of data; the second data set includes: a second set of data, a third set of data, and a fourth set of data; the third data set includes: the first set of data, the second set of data, and the fourth set of data.
Step S22, inputting the data sets into a linear feature screening model for each of the N data sets, wherein the linear feature screening model is used for calculating a second feature importance of each feature for a plurality of features of the data sets, and outputting features of which the second feature importance meets a second predetermined condition and belongs to a linear type, which are called a group of preliminary linear features.
The linear feature screening model only outputs the features of the linear type, and for each feature of the linear type, the second feature importance degree of the feature is calculated, wherein, when the model outputs the features, the coefficients of each feature can carry the coefficients of the feature in front, and the coefficients are used for representing the importance degree of the feature, and the larger the coefficients are, the higher the importance degree is, so the second feature importance degree is the coefficient in front of each feature in the embodiment. Then, a feature that belongs to the linear type and that satisfies a second predetermined condition is output, for example, a feature that does not have a second feature importance of 0 and that belongs to the linear type is output.
Optionally, the linear feature screening model is a Lasso model introduced with an L1 regularization term, and is used for outputting the features of the linear type, automatically calculating the second feature importance of the features, and then outputting the second feature importance of the features belonging to the linear type as a coefficient form of the features, for example, 0.8 humidity, and then 0.8 is the second feature importance of the humidity. For another example, the second predetermined condition is: the second feature importance is not 0, then for each dataset the Lasso model will output features with coefficients other than 0 and of the linear type.
Step S23, N groups of preliminary linear features output by the linear feature screening model are obtained.
Since N sets of data sets are sequentially input into the linear feature screening model, and each set of data sets corresponds to a sub-domain preliminary linear feature, the linear screening model sequentially outputs N sets of preliminary linear features, and the types of features included in each set of preliminary linear features may be different.
For example, in combination with the above example, the first set of preliminary linear features includes: temperature, air pressure and humidity; the second set of preliminary linear features includes: temperature, air pressure, rainfall and air quantity; the third set of preliminary linear features includes: temperature and humidity.
And S24, screening linear target features by using the N groups of preliminary linear features.
Alternatively, step S24 may include steps S241 to S243, wherein:
step S241, counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic;
step S242, screening out the characteristics with the third characteristic importance degree meeting a third preset condition from the N groups of preliminary linear characteristics, namely, secondary linear characteristics;
Step S243, screening linear target features by utilizing the secondary linear features.
In this embodiment, the third feature importance may be the number of occurrences of each feature in the N sets of preliminary linear features, and the third predetermined condition may be the number exceeding a predetermined number threshold.
For example, in combination with the above example, the number of times of occurrence of temperature is 3, the number of times of occurrence of air pressure is 2, the number of times of occurrence of humidity is 2, the number of times of occurrence of rainfall is 1, and the number of times of occurrence of air volume is 1. If the third predetermined condition is that the number of times exceeds 1, the secondary linear characteristic is temperature, air pressure and humidity.
Further, linear target features may be screened based on the sub-step linear features. For example, the sub-step linear feature is directly taken as the linear target feature.
However, since the loss function of the L1 regularization term is not conductive, a certain instability exists in the linear target feature directly determined by the Lasso model with the L1 regularization term introduced. In order to solve the above-mentioned drawbacks, the present embodiment may input the secondary linear feature into a predetermined regression model, and determine the final linear target feature through the predetermined regression model, thereby improving the accuracy of determining the linear target feature. Specifically, step S243 may include steps A1 to A8, wherein:
Step A1: calculating the feature quantity M of all features in the sub-step linear features and the correlation coefficient of each feature and thunderstorm weather;
Step A2: taking the characteristic with the correlation coefficient 1 as one characteristic of the linear target characteristic;
step A3: inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a1 st preset regression model to obtain the 1 st saliency;
Step A4: judging whether i is greater than M, executing the step A5 when i is not greater than M, and executing the step A8 when i is greater than M, wherein the initial value of i is 1;
step A5: inputting the characteristic with the large correlation coefficient i+1 into an i+1 preset regression model to obtain the i+1 saliency, wherein the i+1 preset regression model is obtained by inputting the previous i characteristic and thunderstorm weather into the i preset regression model;
step A6: judging whether the relation between the ith significance and the i+1 significance meets a sixth preset condition, if so, executing the step A7, and if not, executing the step A4;
Step A7: determining the characteristic with the i+1-th large correlation coefficient as one characteristic of the linear target characteristic;
Step A8: all features are determined from the sub-step linear features as linear target features.
The present embodiment is a cyclic operation, specifically, a feature having the largest correlation coefficient with y (referred to as a feature having a large correlation coefficient 1) is selected from the linear features of the next step as one feature of the linear target feature, and the feature having the large correlation coefficient 1 and the output y are input into a predetermined regression model (referred to as a1 st predetermined regression model at this time) to obtain a significance, referred to as a1 st significance, and a model obtained by inputting the feature having the large correlation coefficient 1 into the 1 st predetermined regression model is referred to as a2 nd predetermined regression model. Further, a feature with a correlation coefficient 2 (referred to as a feature with a correlation coefficient 2) with y is selected from the secondary linear features, and the feature with the correlation coefficient 2 is input into a2 nd predetermined regression model to obtain a significance, referred to as a2 nd significance. Then judging whether the relation between the 1 st salience and the 2 nd salience meets a sixth preset condition (for example, judging whether the difference between the two salience is larger than 0.0001), if so, indicating that the characteristic with the large correlation coefficient 2 has obvious sound to the characteristic with the large correlation coefficient 1, and continuously judging the relation between the salience of the characteristic with the large correlation coefficient 3 and the 1 st salience at the moment; if not, the feature with the largest correlation number 2 is also used as one feature in the linear target features, the relation between the significance of the feature with the largest correlation coefficient 3 and the significance of the feature with the largest correlation coefficient 2 is continuously judged, and the process is repeated until all the features in the linear features of the next step are judged.
Note that the significance can be characterized by T statistics.
Alternatively, when the number of features included in the secondary linear feature is large, the processor workload may be severely increased if the judging logic of the saliency of all features is circularly executed, and at this time, the judging logic of when to stop the saliency may be determined by judging the goodness of fit of the predetermined regression model. The method comprises the following steps:
Step A3 may include: inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a1 st preset regression model to obtain a1 st saliency and a1 st first fitting goodness;
Step A5 may include: inputting the characteristic with the large correlation coefficient i+1 into the i+1 preset regression model to obtain the i+1 saliency and the i+1 first fitting goodness;
After step A7, and before step A8, the training method of the thunderstorm weather prediction model may further include: and judging whether the relation between the ith first fitting goodness and the (i+1) th first fitting goodness meets a seventh preset condition, if not, executing the step A4, and if so, executing the step A8.
In this embodiment, when the judgment logic of the salience of all the features has not been executed, if the relation between the i-th first goodness-of-fit and the i+1th first goodness-of-fit satisfies the seventh predetermined condition, the judgment logic of the salience of the remaining features is not continued, and all the features determined from the sub-step linear features up to this point are taken as the linear target features. For example, the relation between the i-th first goodness-of-fit and the i+1th first goodness-of-fit satisfying the seventh predetermined condition may be: the difference between the i-th first goodness of fit and the i+1-th first goodness of fit is less than 0.0001.
The goodness of fit may be determined by R 2, which is also known as the determinable coefficient.
Optionally, in order to ensure that the nonlinear target feature can be accurately screened out when the nonlinear type feature exists, step S2 may further include steps S21 'to S24', where the target feature may include a nonlinear target feature belonging to the nonlinear type, specifically:
And S21', inputting a plurality of groups of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance degree of each feature in the plurality of features by utilizing the plurality of groups of data, and outputting the features of which the fourth feature importance degree meets a fourth preset condition and belongs to a nonlinear type.
The nonlinear feature screening model outputs only the features of the nonlinear type, and for each feature of the nonlinear type, calculates a fourth feature importance of the features, and then outputs features belonging to the nonlinear type and having the fourth feature importance satisfying a fourth predetermined condition, for example, outputs features having a fourth feature importance other than 0 and belonging to the nonlinear type.
Alternatively, the nonlinear feature screening model is, for example, a machine learning model, such as Random Forest algorithm (RF) or gradient lifting tree (Gradient Boosting Decison Tree, GBDT) in the machine learning model. Taking a random forest algorithm as an example, the tree constructed in the random forest algorithm can count the reduction degree of the coefficient of the foundation after the node feature is split at each node, and the classification or regression purity of a certain feature can be improved under the condition of big data by randomly generating a plurality of trees and randomly selecting the feature, wherein the value is the contribution degree, namely the fourth feature importance. The fourth feature importance of the feature belonging to the nonlinear type may also be output as a coefficient of the feature, for example, 0.6 air density, and then 0.6 is the fourth feature importance of the air density. For another example, the fourth predetermined condition is: the fourth feature importance is not 0, and for each dataset, the non-linear screening model will output features with coefficients other than 0 and belonging to the non-linear type.
And S22', removing the characteristics with the fourth characteristic importance degree meeting a fifth preset condition from the characteristics output by the nonlinear characteristic screening model to obtain the preliminary nonlinear characteristics.
For example, the fifth predetermined condition is: fourth least important feature. In this embodiment, the features output by the nonlinear feature screening model may be ordered by using the order of the importance of the four features from large to small, and then the features arranged at the end may be extracted and removed to obtain the preliminary nonlinear feature.
Step S23', for each group of data of the plurality of groups of data, eliminating the characteristics irrelevant to the preliminary nonlinear characteristics to obtain a plurality of groups of preliminary screening data.
And eliminating the characteristics irrelevant to the preliminary nonlinear characteristics, namely eliminating the characteristics except the preliminary nonlinear characteristics.
And step S24', continuously inputting a plurality of groups of preliminary screening data into the nonlinear characteristic screening model until nonlinear target characteristics are screened out.
Optionally, after step S21', the training method of the thunderstorm weather prediction model may further include: and calculating a second goodness-of-fit of the nonlinear feature screening model.
Step S24' may include step S241' to step S246', wherein:
step S241', continuously inputting a plurality of groups of preliminary screening data into a nonlinear feature screening model to obtain a secondary nonlinear feature;
Step S242', removing characteristics irrelevant to the nonlinear characteristics of the secondary steps aiming at each group of preliminary screening data of the plurality of groups of preliminary screening data to obtain a plurality of groups of secondary step screening data;
Step S243', calculating a third fitting goodness of the nonlinear feature screening model;
Step S244', judging whether the relation between the second fitting goodness and the third fitting goodness meets an eighth preset condition; if yes, execute step S245'; if not, step S246' is performed.
Step S245' determines the secondary nonlinear feature as a nonlinear target feature.
Step S246', the step of inputting the multiple sets of the multi-step screening data into the nonlinear feature screening model is continued until nonlinear target features are screened out.
The embodiment also belongs to a cyclic operation, specifically, a plurality of groups of preliminary screening data are obtained first, and a second fitting goodness is calculated; and then obtaining a plurality of groups of sub-step screening data, and calculating a third fitting goodness, if the relation between the second fitting goodness and the third fitting goodness meets an eighth preset condition, determining the sub-step nonlinear characteristics as nonlinear target characteristics, otherwise, continuously inputting the plurality of groups of sub-step screening data into a nonlinear characteristic screening model until the relation between the fitting goodness meets the eighth preset condition. Wherein the eighth predetermined condition is, for example, that a difference between the loss function corresponding to the second goodness of fit and the loss function corresponding to the third goodness of fit is less than 0.0001.
Optionally, in order to avoid that the processing task is too heavy at the same time due to directly inputting multiple sets of data into the nonlinear feature screening model, so as to cause other problems, such as machine paralysis, the embodiment may also pre-process multiple sets of data first, and then input the pre-processed data into the nonlinear feature screening model. The method comprises the following steps:
step S21' may include step S211' and step S212', wherein:
step S211', pre-screening the plurality of characteristics by utilizing a preset rule for each group of data in the plurality of groups of data to obtain a plurality of groups of preprocessed data;
Step S212', inputting a plurality of groups of preprocessing data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating fourth feature importance of each feature in the features subjected to the pre-screening by utilizing the plurality of groups of preprocessing data, and outputting the features of which the fourth feature importance meets a fourth preset condition and belongs to a nonlinear type.
In this embodiment, the preprocessing may be to calculate, for each set of data, a distance between every two features, such as a euclidean distance, and if the distance between two features is greater than a predetermined threshold, it is considered that the correlation between the two features is strong, only one feature needs to be reserved, and at this time, the distance between each of the two features and the output y thunderstorm weather may be continuously calculated, and the feature having a smaller distance from the thunderstorm weather may be removed. Through the preprocessing, a plurality of groups of preprocessed data can be obtained. The plurality of sets of preprocessed data are further input into the nonlinear feature screening model, wherein the processing logic is consistent with the processing logic for directly inputting the plurality of sets of data into the nonlinear feature screening model, and details are omitted.
And S3, eliminating characteristics irrelevant to target characteristics in each group of data of the plurality of groups of data to form a plurality of groups of training data.
Wherein, when only the features belonging to the linear type exist in the plurality of features, the target features only comprise linear target features; when only the characteristics belonging to the nonlinear type exist in the plurality of characteristics, the target characteristics only comprise nonlinear target characteristics; when there is a feature belonging to both a linear type and a non-linear type among the plurality of features, the target feature includes both a linear target feature and a non-linear target feature.
In this embodiment, for each set of data, features other than the target feature are eliminated from a plurality of features of the set of data. At this time, the data after the execution of step S3 includes a feature that contributes significantly to the rainy weather.
And S4, training a preset algorithm by utilizing a plurality of sets of training data to obtain the thunderstorm weather prediction model.
And taking a plurality of groups of training data as a training set, training a preset algorithm, and further obtaining a thunderstorm weather prediction model, wherein the thunderstorm weather prediction model is used for predicting whether future weather is thunderstorm weather or not according to the characteristics of the current weather. The predetermined algorithm is, for example, a support vector machine (Support Vector Machine, abbreviated as SVM) algorithm, an adaptive reinforcement learning (Adaptive Boosting, abbreviated as AdaBoost) algorithm, a logistic regression (Logistic Regression, abbreviated as LR) algorithm, or a Decision Tree (Decision Tree) algorithm.
Fig. 2 schematically shows a flow chart of a thunderstorm weather prediction method according to an embodiment of the invention.
As shown in fig. 2, the thunderstorm weather prediction method may include steps M1 to M3, wherein:
step M1, obtaining target characteristics of the current weather;
and M2, inputting the target characteristics into a pre-trained thunderstorm weather prediction model so that the thunderstorm weather prediction model outputs a weather prediction result.
Wherein, thunderstorm weather prediction model is obtained by the method of the first embodiment.
And step M3, judging whether the future weather is thunderstorm weather or not according to weather prediction results.
In the embodiment, the target characteristics of the current weather belong to the pre-trained thunderstorm weather prediction model, and the training process of the thunderstorm weather prediction model is strict, so that the training result is accurate, and the obtained weather prediction result is also more reliable. The weather prediction result may be thunderstorm weather or not, when the weather prediction result is thunderstorm weather, the predicted future weather is the thunderstorm weather, and when the weather prediction result is not the thunderstorm weather, the predicted future weather is not the thunderstorm weather.
The embodiment of the invention also provides a training device of the thunderstorm weather prediction model, which corresponds to the training method of the thunderstorm weather prediction model provided by the embodiment, corresponding technical characteristics and technical effects are not described in detail in the embodiment, and the related parts can be referred to the embodiment. Specifically, fig. 3 schematically shows a block diagram of a training device of a thunderstorm weather prediction model according to an embodiment of the invention. As shown in fig. 3, the training apparatus 300 of the thunderstorm weather prediction model may include a first acquisition module 301, a screening module 302, a culling module 303, and a training module 304, where:
a first obtaining module 301, configured to obtain multiple sets of data, where each set of data includes a thunderstorm weather, multiple features of the thunderstorm weather, and an association relationship between the thunderstorm weather and the multiple features of the thunderstorm weather;
a screening module 302, configured to screen a target feature from a plurality of features of the plurality of sets of data, where the target feature is a feature whose first feature importance meets a first predetermined condition;
a rejecting module 303, configured to reject, in each set of data of the multiple sets of data, features unrelated to the target feature, to form multiple sets of training data;
the training module 304 is configured to train a predetermined algorithm using the plurality of sets of training data to obtain a thunderstorm weather prediction model.
Optionally, the screening module is further configured to: screening linear target features belonging to a linear type from the plurality of features by utilizing the plurality of groups of data; and/or screening nonlinear target features belonging to nonlinear types from the plurality of features by utilizing the plurality of sets of data.
Optionally, the target feature includes a linear target feature belonging to a linear type, and the screening module is further configured to, when screening the target feature from the plurality of features of the plurality of sets of data: performing N samples on the plurality of sets of data to obtain N data sets, wherein each data set comprises one or more of the plurality of sets of data; inputting the data set into a linear feature screening model for each of the N data sets, wherein the linear feature screening model is configured to calculate a second feature importance of each feature for the plurality of features of the data set, and output features of the second feature importance satisfying a second predetermined condition and belonging to the linear type, referred to as a set of preliminary linear features; acquiring N groups of preliminary linear features output by the linear feature screening model; and screening out the linear target features by using the N groups of preliminary linear features.
Optionally, when the linear target feature is selected by using the N sets of preliminary linear features, the screening module is further configured to: counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic; screening out features with third feature importance meeting a third preset condition from the N groups of preliminary linear features, wherein the features are called secondary linear features; and screening out the linear target features by using the secondary linear features.
Optionally, when the linear target feature is selected by using the secondary linear feature, the screening module is further configured to: step A1: calculating the feature quantity M of all the features in the sub-step linear features and the correlation coefficient of each feature and the thunderstorm weather; step A2: taking the characteristic with the correlation coefficient 1 as one characteristic of the linear target characteristic; step A3: inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a1 st preset regression model to obtain the 1 st saliency; step A4: judging whether i is greater than M, executing the step A5 when i is not greater than M, and executing the step A8 when i is greater than M, wherein the initial value of i is 1; step A5: inputting the i+1th characteristic with the large correlation coefficient into the i+1th preset regression model to obtain the i+1th significance, wherein the i+1th preset regression model is obtained by inputting the former i characteristic and thunderstorm weather into the i preset regression model; step A6: judging whether the relation between the ith significance and the i+1 significance meets a sixth preset condition, if so, executing the step A7, and if not, executing the step A4; step A7: determining the i+1th large characteristic of the correlation coefficient as one characteristic of the linear target characteristic; step A8: all features from the sub-step linear features are determined as the linear target features.
Optionally, the screening module is further configured to, when performing step A3: inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a 1 st preset regression model to obtain a 1 st saliency and a 1 st first fitting goodness; the screening module is further configured to, at step A5: inputting the characteristic with the i+1th large correlation coefficient into the i+1th preset regression model to obtain the i+1th significance and the i+1th first fitting goodness; after step A7, and before step A8, the apparatus further comprises: and the judging module is used for judging whether the relation between the ith first fitting goodness and the (i+1) th first fitting goodness meets a seventh preset condition, if not, the screening module executes the step A4, and if so, the screening module executes the step A8.
Optionally, the target feature includes a nonlinear target feature belonging to a nonlinear type, and the screening module is further configured to, when screening the target feature from the plurality of features of the plurality of sets of data: inputting the plurality of sets of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance of each feature in the plurality of features by utilizing the plurality of sets of data, and outputting the features of which the fourth feature importance meets a fourth preset condition and belongs to the nonlinear type; removing the characteristics of which the fourth characteristic importance meets a fifth preset condition from the characteristics output by the nonlinear characteristic screening model to obtain preliminary nonlinear characteristics; removing characteristics irrelevant to the preliminary nonlinear characteristics aiming at each group of data of the plurality of groups of data to obtain a plurality of groups of preliminary screening data; and continuously inputting the plurality of groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out.
Optionally, after inputting the plurality of sets of data into the nonlinear feature screening model, the apparatus further comprises: the calculation module is used for calculating the second fitting goodness of the nonlinear feature screening model;
The screening module is further configured to, when the plurality of sets of preliminary screening data are continuously input into the nonlinear feature screening model until the nonlinear target feature is screened out: continuously inputting the plurality of groups of preliminary screening data into the nonlinear feature screening model to obtain secondary nonlinear features; removing characteristics irrelevant to the secondary nonlinear characteristics aiming at each group of preliminary screening data of the plurality of groups of preliminary screening data to obtain a plurality of groups of secondary screening data; calculating a third fitting goodness of the nonlinear feature screening model; judging whether the relation between the second fitting goodness and the third fitting goodness meets an eighth preset condition or not; if yes, determining the secondary nonlinear characteristic as the nonlinear target characteristic. If not, continuously inputting the multiple groups of multi-step screening data into the nonlinear characteristic screening model until the nonlinear target characteristics are screened out.
Optionally, the filtering module is further configured to, when inputting the plurality of sets of data into the nonlinear feature filtering model: pre-screening the plurality of characteristics by utilizing a preset rule aiming at each group of data in the plurality of groups of data to obtain a plurality of groups of preprocessed data; inputting the plurality of sets of preprocessing data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating the fourth feature importance of each feature in the features subjected to the pre-screening by utilizing the plurality of sets of preprocessing data, and outputting the features of which the fourth feature importance meets the fourth preset condition and belongs to the nonlinear type.
The embodiment of the invention also provides a thunderstorm weather prediction method device, which corresponds to the thunderstorm weather prediction method provided by the embodiment, corresponding technical features and technical effects are not described in detail in the embodiment, and the related parts can be referred to the embodiment. In particular, the method comprises the steps of,
Fig. 4 schematically shows a block diagram of a thunderstorm weather prediction device according to an embodiment of the invention. As shown in fig. 4, the thunderstorm weather prediction apparatus 400 may include a second acquisition module 401, an input module 402, and a determination module 403, wherein:
a second obtaining module 401, configured to obtain a target feature of the current weather;
The input module 402 is configured to input the target feature into a pre-trained thunderstorm weather prediction model, so that the thunderstorm weather prediction model outputs a weather prediction result, where the thunderstorm weather prediction model is obtained by the training method of the thunderstorm weather prediction model;
And the judging module 403 is configured to judge whether the future weather is thunderstorm weather according to the weather prediction result.
Fig. 5 schematically shows a block diagram of a computer device adapted to implement a training method of a thunderstorm weather prediction model and/or a thunderstorm weather prediction method according to an embodiment of the invention. In this embodiment, the computer device 500 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including a stand-alone server or a server cluster formed by a plurality of servers) for executing a program, and so on. As shown in fig. 5, the computer device 500 of the present embodiment includes at least, but is not limited to: a memory 501, a processor 502, and a network interface 503 that may be communicatively coupled to each other via a system bus. It is noted that FIG. 5 only shows computer device 500 having components 501-503, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In this embodiment, the memory 501 includes at least one type of computer readable storage medium, including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 501 may be an internal storage unit of the computer device 500, such as a hard disk or memory of the computer device 500. In other embodiments, the memory 501 may also be an external storage device of the computer device 500, such as a plug-in hard disk provided on the computer device 500, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Of course, memory 501 may also include both internal storage units of computer device 500 and external storage devices. In this embodiment, the memory 501 is typically used to store an operating system and various types of application software installed on the computer device 500, such as program code of a training method of a thunderstorm weather prediction model and/or program code of a thunderstorm weather prediction method. Further, the memory 501 may be used to temporarily store various types of data that have been output or are to be output.
The processor 502 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 502 is generally used to control the overall operation of the computer device 500. Such as performing control and processing related to data interaction or communication with the computer device 500. In this embodiment, the processor 502 is configured to execute a program code of a training method of a thunderstorm weather prediction model and/or a program code of a thunderstorm weather prediction method stored in the memory 501.
In this embodiment, the training method of the thunderstorm weather prediction model and/or the thunderstorm weather prediction method stored in the memory 501 may also be divided into one or more program modules and executed by one or more processors (the processor 502 in this embodiment) to complete the present invention.
The network interface 503 may include a wireless network interface or a wired network interface, the network interface 503 typically being used to establish a communication link between the computer device 500 and other computer devices. For example, the network interface 503 is used to connect the computer device 500 to an external terminal through a network, establish a data transmission channel and a communication link between the computer device 500 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, abbreviated as GSM), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviated as WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, etc.
The present embodiment also provides a computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., having stored thereon a computer program that, when executed by a processor, implements the steps of the training method of the thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A method of training a thunderstorm weather prediction model, comprising:
Acquiring multiple groups of data, wherein each group of data comprises thunderstorm weather, multiple characteristics of the thunderstorm weather and association relations of the thunderstorm weather and the multiple characteristics of the thunderstorm weather;
screening target features from a plurality of features of the plurality of groups of data, wherein the target features are features with first feature importance meeting a first preset condition;
in each set of data of the plurality of sets of data, eliminating the characteristics irrelevant to the target characteristics to form a plurality of sets of training data;
training a preset algorithm by utilizing the plurality of sets of training data to obtain a thunderstorm weather prediction model;
The target features include linear target features belonging to a linear type, and the target features are screened from a plurality of features of the plurality of sets of data, including:
Sampling the plurality of groups of data for N times to obtain N data sets; each of the data sets includes one or more of the plurality of sets of data;
Inputting the data set into a linear feature screening model for each of the N data sets; the linear feature screening model is used for calculating a second feature importance degree of each feature of the data set, and outputting features, which meet a second preset condition and belong to a linear type, of the second feature importance degree, wherein the features are called a group of preliminary linear features;
acquiring N groups of preliminary linear features output by the linear feature screening model;
Counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic;
Screening out features with third feature importance meeting a third preset condition from the N groups of preliminary linear features, wherein the features are called secondary linear features;
screening out the linear target features by utilizing the secondary linear features; specifically:
step A1, calculating the feature quantity M of all features in the linear features of the secondary steps and the correlation coefficient of each feature and thunderstorm weather;
step A2, taking the characteristic with the correlation coefficient 1 as one characteristic of linear target characteristics;
Step A3, inputting the 1 st characteristic of the correlation coefficient and the thunderstorm weather into a1 st preset regression model to obtain the 1 st saliency;
step A4, judging whether i is larger than M, executing step A5 when i is not larger than M, and executing step A8 when i is larger than M, wherein the initial value of i is 1;
step A5, inputting the characteristic with the larger correlation number i+1 into an i+1 preset regression model to obtain the i+1 saliency, wherein the i+1 preset regression model is obtained by inputting the previous i characteristic and thunderstorm weather into the i preset regression model;
step A6, judging whether the relation between the ith significance and the i+1 significance meets a sixth preset condition, if so, executing the step A7, and if not, executing the step A4;
step A7, determining the characteristic with the i+1 th large correlation number as one characteristic of the linear target characteristic;
And step A8, determining all the characteristics from the secondary linear characteristics as linear target characteristics.
2. The method of claim 1, wherein the target features comprise nonlinear target features that are of a nonlinear type, and wherein screening target features from the plurality of features of the plurality of sets of data comprises:
Inputting the plurality of sets of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance of each feature in the plurality of features by utilizing the plurality of sets of data, and outputting the features of which the fourth feature importance meets a fourth preset condition and belongs to the nonlinear type;
Removing the characteristics of which the fourth characteristic importance meets a fifth preset condition from the characteristics output by the nonlinear characteristic screening model to obtain preliminary nonlinear characteristics;
removing characteristics irrelevant to the preliminary nonlinear characteristics aiming at each group of data of the plurality of groups of data to obtain a plurality of groups of preliminary screening data;
and continuously inputting the plurality of groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out.
3. The method of claim 2, wherein inputting the plurality of sets of data into a nonlinear feature screening model comprises:
pre-screening the plurality of characteristics by utilizing a preset rule aiming at each group of data in the plurality of groups of data to obtain a plurality of groups of preprocessed data;
Inputting the plurality of sets of preprocessing data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating the fourth feature importance of each feature in the features subjected to the pre-screening by utilizing the plurality of sets of preprocessing data, and outputting the features of which the fourth feature importance meets the fourth preset condition and belongs to the nonlinear type.
4. A method of predicting thunderstorm weather, comprising:
acquiring target characteristics of the current weather;
Inputting the target features into a pre-trained thunderstorm weather prediction model so that the thunderstorm weather prediction model outputs a weather prediction result, wherein the thunderstorm weather prediction model is obtained by the method of any one of claims 1 to 3;
judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
5. A training device for implementing a thunderstorm weather prediction model of the method of any one of claims 1 to 3, comprising:
The first acquisition module is used for acquiring a plurality of groups of data, wherein each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an association relation of the thunderstorm weather and the characteristics of the thunderstorm weather;
The screening module is used for screening target features from the multiple features of the multiple groups of data, wherein the target features are features with first feature importance meeting a first preset condition;
The rejecting module is used for rejecting the characteristics irrelevant to the target characteristics in each group of data of the plurality of groups of data to form a plurality of groups of training data;
and the training module is used for training a preset algorithm by utilizing the plurality of sets of training data to obtain a thunderstorm weather prediction model.
6. A thunderstorm weather prediction device for implementing the method of claim 4, comprising:
The second acquisition module is used for acquiring target characteristics of the current weather;
The input module is used for inputting the target characteristics into a pre-trained thunderstorm weather prediction model so that the thunderstorm weather prediction model outputs weather prediction results, wherein the thunderstorm weather prediction model is obtained by the method of any one of claims 1 to 3;
and the judging module is used for judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
7. A computer device, the computer device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 3 and/or the steps of the method according to claim 4 when the computer program is executed by the processor.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, is adapted to carry out the method steps of any one of claims 1 to 3 and/or the method steps of claim 4.
CN202010116671.XA 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method Active CN111368887B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010116671.XA CN111368887B (en) 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
PCT/CN2020/117578 WO2021169271A1 (en) 2020-02-25 2020-09-25 Training method for thunderstorm weather prediction model, and thunderstorm weather prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116671.XA CN111368887B (en) 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Publications (2)

Publication Number Publication Date
CN111368887A CN111368887A (en) 2020-07-03
CN111368887B true CN111368887B (en) 2024-05-03

Family

ID=71208274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116671.XA Active CN111368887B (en) 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Country Status (2)

Country Link
CN (1) CN111368887B (en)
WO (1) WO2021169271A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368887B (en) * 2020-02-25 2024-05-03 平安科技(深圳)有限公司 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN111832828B (en) * 2020-07-17 2023-12-19 国家卫星气象中心(国家空间天气监测预警中心) Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites
CN111915068A (en) * 2020-07-17 2020-11-10 同济大学 Road visibility temporary prediction method based on ensemble learning
CN112561199B (en) * 2020-12-23 2024-06-21 北京百度网讯科技有限公司 Weather parameter prediction model training method, weather parameter prediction method and device
CN113985145A (en) * 2021-09-13 2022-01-28 广东电网有限责任公司广州供电局 Thunder and lightning early warning method, early warning device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031472A (en) * 2017-06-09 2018-12-18 阿里巴巴集团控股有限公司 A kind of data processing method and device for weather prognosis
CN109472283A (en) * 2018-09-13 2019-03-15 中国科学院计算机网络信息中心 A kind of hazardous weather event prediction method and apparatus based on Multiple Incremental regression tree model
JP2019095323A (en) * 2017-11-24 2019-06-20 株式会社日立製作所 Weather prediction device
CN110428015A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of training method and relevant device of model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298389A (en) * 2019-06-11 2019-10-01 上海冰鉴信息科技有限公司 More wheels circulation feature selection approach and device when training pattern
CN111368887B (en) * 2020-02-25 2024-05-03 平安科技(深圳)有限公司 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031472A (en) * 2017-06-09 2018-12-18 阿里巴巴集团控股有限公司 A kind of data processing method and device for weather prognosis
JP2019095323A (en) * 2017-11-24 2019-06-20 株式会社日立製作所 Weather prediction device
CN109472283A (en) * 2018-09-13 2019-03-15 中国科学院计算机网络信息中心 A kind of hazardous weather event prediction method and apparatus based on Multiple Incremental regression tree model
CN110428015A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of training method and relevant device of model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"A statistical scheme to forecast the daily lightning threat over southern Africa using the Unified Model";Morné Gijben等;《Atmospheric Research》;第194卷;第2.3节 *
"GPS水汽资料在雷雨预报中的应用";陈雷等;《大气科学研究与应用》(第2期);第2-4节及表1-2 *
"基于逐步回归分析的西北地区东部雷暴概率预报方法研究";孔德兵等;《干旱气象》;第34卷(第1期);第3.2节 *
陈雷等."GPS水汽资料在雷雨预报中的应用".《大气科学研究与应用》.2007,(第2期),第2-4节及表1-2. *

Also Published As

Publication number Publication date
WO2021169271A1 (en) 2021-09-02
CN111368887A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN112800116B (en) Method and device for detecting abnormity of service data
CN113657668A (en) Power load prediction method and system based on LSTM network
CN111784044A (en) Landslide prediction method, device, equipment and storage medium
CN108681751B (en) Method for determining event influence factors and terminal equipment
CN115801463B (en) Industrial Internet platform intrusion detection method and device and electronic equipment
CN110796485A (en) Method and device for improving prediction precision of prediction model
CN112508299A (en) Power load prediction method and device, terminal equipment and storage medium
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
CN116739172B (en) Method and device for ultra-short-term prediction of offshore wind power based on climbing identification
CN114596702B (en) Traffic state prediction model construction method and traffic state prediction method
CN114495137B (en) Bill abnormity detection model generation method and bill abnormity detection method
CN116070958A (en) Attribution analysis method, attribution analysis device, electronic equipment and storage medium
CN112183622B (en) Mobile application bots installation cheating detection method, device, equipment and medium
CN115170304A (en) Method and device for extracting risk feature description
CN114881162A (en) Method, apparatus, device and medium for predicting failure of metering automation master station
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
CN114549884A (en) Abnormal image detection method, device, equipment and medium
CN114444721A (en) Model training method and device, electronic equipment and computer storage medium
CN113423113A (en) Wireless parameter optimization processing method and device and server
CN112365333A (en) Real-time dynamic flow distribution method, system, electronic equipment and storage medium
CN112149833A (en) Prediction method, device, equipment and storage medium based on machine learning
CN113705786B (en) Model-based data processing method, device and storage medium
Shang et al. Chance Constrained Model Predictive Control via Active Uncertainty Set Learning and Calibration
CN118093325B (en) Log template acquisition method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant