CN111368887A - Training method of thunderstorm weather prediction model and thunderstorm weather prediction method - Google Patents

Training method of thunderstorm weather prediction model and thunderstorm weather prediction method Download PDF

Info

Publication number
CN111368887A
CN111368887A CN202010116671.XA CN202010116671A CN111368887A CN 111368887 A CN111368887 A CN 111368887A CN 202010116671 A CN202010116671 A CN 202010116671A CN 111368887 A CN111368887 A CN 111368887A
Authority
CN
China
Prior art keywords
feature
data
features
screening
thunderstorm weather
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010116671.XA
Other languages
Chinese (zh)
Other versions
CN111368887B (en
Inventor
段洪云
彭琛
汪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010116671.XA priority Critical patent/CN111368887B/en
Publication of CN111368887A publication Critical patent/CN111368887A/en
Priority to PCT/CN2020/117578 priority patent/WO2021169271A1/en
Application granted granted Critical
Publication of CN111368887B publication Critical patent/CN111368887B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01WMETEOROLOGY
    • G01W1/00Meteorology
    • G01W1/10Devices for predicting weather conditions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Atmospheric Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ecology (AREA)
  • Environmental Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a training method of a thunderstorm weather prediction model, which comprises the following steps: acquiring a plurality of groups of data, wherein each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an incidence relation between the thunderstorm weather and the plurality of characteristics of the thunderstorm weather; screening target characteristics from a plurality of characteristics of the plurality of groups of data, wherein the target characteristics are characteristics of which first characteristic importance degrees meet first preset conditions; in each group of data of the multiple groups of data, eliminating features irrelevant to the target features to form multiple groups of training data; and training a predetermined algorithm by using the plurality of groups of training data to obtain a thunderstorm weather prediction model. The invention also provides a thunderstorm weather prediction method, a training device of the thunderstorm weather prediction model, a thunderstorm weather prediction device, computer equipment and a computer readable storage medium.

Description

Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
Technical Field
The invention relates to the technical field of computers, in particular to a training method of a thunderstorm weather prediction model, a thunderstorm weather prediction method, a thunderstorm weather prediction device, computer equipment and a computer readable storage medium.
Background
With the development of meteorological techniques, the manner of predicting weather conditions has been developed. Generally, when a weather condition is pretested, the prediction can be performed according to weather data collected by a large device such as a satellite, a radar and the like, for example, the collected weather data is input into a weather prediction model which is trained in advance. In order to ensure the accuracy of the weather prediction, it is usually required to ensure the prediction accuracy of the weather prediction model, which has high requirements on the training process of the weather prediction model.
However, the inventor finds that the prior art has at least the following defects in the process of researching the invention: in the prior art, when a weather model is trained, only the weather factors in the weather factor pool are simply screened, the retained redundant factors are still too many, and for model training, an effective weather prediction model cannot be trained due to the fact that core factors cannot be obtained.
Disclosure of Invention
The invention aims to provide a training method of a thunderstorm weather prediction model, a thunderstorm weather prediction method, a thunderstorm weather prediction device, computer equipment and a computer readable storage medium, which can solve the defects in the prior art.
One aspect of the present invention provides a training method for a thunderstorm weather prediction model, including: acquiring a plurality of groups of data, wherein each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an incidence relation between the thunderstorm weather and the characteristics of the thunderstorm weather; screening out target characteristics from a plurality of characteristics of the plurality of groups of data, wherein the target characteristics are characteristics of which the first characteristic importance degree meets a first preset condition; in each group of data of the multiple groups of data, eliminating the features irrelevant to the target features to form multiple groups of training data; and training a predetermined algorithm by using the plurality of groups of training data to obtain a thunderstorm weather prediction model.
Optionally, the target feature includes a linear target feature belonging to a linear type, and the screening of the target feature from the plurality of features of the plurality of sets of data includes: performing sampling on the multiple groups of data for N times to obtain N data sets, wherein each data set comprises one or more groups of the multiple groups of data; inputting the data set into a linear feature screening model for each of the N data sets, wherein the linear feature screening model is configured to calculate a second feature importance of each feature for the plurality of features of the data set, and output features whose second feature importance satisfies a second predetermined condition and belongs to the linear type, which are called a set of preliminary linear features; acquiring N groups of preliminary linear characteristics output by the linear characteristic screening model; and screening the linear target characteristics by using the N groups of preliminary linear characteristics.
Optionally, screening out the linear target feature by using the N sets of preliminary linear features includes: counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic; screening out the characteristics with the third characteristic importance degree meeting a third preset condition from the N groups of preliminary linear characteristics, and calling the characteristics as secondary linear characteristics; and screening the linear target characteristics by using the linear characteristics of the next step.
Optionally, screening out the linear target feature by using the secondary linear feature includes: step A1: calculating the feature quantity M of all the features in the linear features of the next step and the correlation coefficient of each feature and the thunderstorm weather; step A2: setting a feature having a1 st highest correlation coefficient as a feature of the linear target feature; step A3: inputting the 1 st large correlation coefficient characteristic and thunderstorm weather into a1 st preset regression model to obtain a1 st significance; step A4: judging whether i is larger than M, executing the step A5 when i is not larger than M, and executing the step A8 when i is larger than M, wherein the initial value of i is 1; step A5: inputting the feature with the large correlation coefficient i +1 to the i +1 th predetermined regression model to obtain the i +1 th significance, wherein the i +1 th predetermined regression model is obtained by inputting the previous i features and thunderstorm weather to the i th predetermined regression model; step A6: judging whether the relation between the ith saliency and the i +1 saliency satisfies a sixth preset condition, if so, executing step A7, and if not, executing step A4; step A7: determining a feature having a large i +1 th correlation coefficient as a feature of the linear target feature; step A8: all features are determined from the sub-step linear features as the linear target features.
Optionally, step a3 includes: inputting the 1 st large correlation coefficient characteristic and thunderstorm weather into a1 st preset regression model to obtain a1 st significance and a1 st first goodness of fit; step a5 includes: inputting the feature with the correlation coefficient i +1 th to the i +1 th predetermined regression model to obtain the i +1 th significance and the i +1 th first goodness-of-fit; after step a7, and before step A8, the method further comprises: and judging whether the relation between the ith first goodness-of-fit and the (i + 1) th first goodness-of-fit meets a seventh preset condition, if not, executing step A4, and if so, executing step A8.
Optionally, the target feature includes a non-linear target feature belonging to a non-linear type, and the screening the target feature from the plurality of features of the plurality of sets of data includes: inputting the plurality of sets of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is configured to calculate a fourth feature importance of each of the plurality of features using the plurality of sets of data, and output a feature having the fourth feature importance satisfying a fourth predetermined condition and belonging to the nonlinear type; removing the characteristics of which the fourth characteristic importance degree meets a fifth preset condition from the characteristics output by the nonlinear characteristic screening model to obtain a preliminary nonlinear characteristic; removing features irrelevant to the preliminary nonlinear features from each group of data of the multiple groups of data to obtain multiple groups of preliminary screening data; and continuously inputting the multiple groups of preliminary screening data into the nonlinear characteristic screening model until the nonlinear target characteristics are screened out.
Optionally, after the plurality of sets of data are input into the nonlinear feature screening model, the method further includes: calculating a second goodness of fit of the nonlinear feature screening model at the time; continuously inputting the multiple groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out, wherein the method comprises the following steps: continuously inputting the multiple groups of preliminary screening data into the nonlinear feature screening model to obtain a secondary nonlinear feature; removing features irrelevant to the secondary nonlinear features from each group of the plurality of groups of primary screening data to obtain a plurality of groups of secondary screening data; calculating a third goodness of fit of the nonlinear feature screening model at this time; judging whether the relation between the second goodness-of-fit and the third goodness-of-fit meets an eighth preset condition or not; if yes, determining the next nonlinear characteristic as the nonlinear target characteristic. If not, the multiple groups of the secondary screening data are continuously input into the nonlinear feature screening model until the nonlinear target features are screened out.
Optionally, inputting the multiple sets of data into a nonlinear feature screening model, including: pre-screening the plurality of characteristics by using a preset rule aiming at each group of data in the plurality of groups of data to obtain a plurality of groups of pre-processing data; and inputting the plurality of sets of preprocessed data into a nonlinear feature screening model, wherein the nonlinear feature screening model is configured to calculate the fourth feature importance of each feature in the features subjected to the pre-screening by using the plurality of sets of preprocessed data, and output the feature of which the fourth feature importance satisfies the fourth predetermined condition and belongs to the nonlinear type.
Another aspect of the present invention provides a thunderstorm weather prediction method, including: acquiring target characteristics of current weather; inputting the target characteristics into a pre-trained thunderstorm weather prediction model so that the thunderstorm weather prediction model outputs a weather prediction result, wherein the thunderstorm weather prediction model is obtained by the method of any one of the embodiments; and judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
In another aspect, the present invention provides a training apparatus for a thunderstorm weather prediction model, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of groups of data, and each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an incidence relation of the thunderstorm weather and the plurality of characteristics of the thunderstorm weather; the screening module is used for screening target characteristics from a plurality of characteristics of the plurality of groups of data, wherein the target characteristics are characteristics of which the first characteristic importance degree meets a first preset condition; the rejecting module is used for rejecting the features irrelevant to the target features in each group of data of the multiple groups of data to form multiple groups of training data; and the training module is used for training a predetermined algorithm by utilizing the plurality of groups of training data to obtain a thunderstorm weather prediction model.
Yet another aspect of the present invention provides a thunderstorm weather prediction apparatus, including: the second acquisition module is used for acquiring the target characteristics of the current weather; an input module, configured to input the target feature into a pre-trained thunderstorm weather prediction model, so that the thunderstorm weather prediction model outputs a weather prediction result, where the thunderstorm weather prediction model is obtained by the method according to any one of the embodiments; and the judging module is used for judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
Yet another aspect of the present invention provides a computer apparatus, comprising: the method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the steps of the training method of the thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method when executing the computer program.
Yet another aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the above-introduced steps of the training method of a thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method.
According to the training method of the thunderstorm weather prediction model, the target features with the first feature importance degree meeting the first preset condition are screened out, the features irrelevant to the target features are eliminated, multiple groups of training data are obtained, and the thunderstorm weather prediction model is trained by utilizing the multiple groups of training data. Because the training data do not comprise redundant features and the magnitude of the features included in the training data is also obviously reduced, the defects in the prior art are overcome, and the aim of improving the accuracy of the trained thunderstorm weather prediction model is fulfilled.
Further, on the basis of the existing characteristic engineering characteristic screening, the invention considers two parts of characteristics: the method comprises the steps of linear type feature and nonlinear type feature, and considering independent action of the linear type feature and the nonlinear type feature, and on the basis, considering synergistic action among multiple features, and adding nonlinear influence to improve the expression capability of a model.
For linear type features, N groups of preliminary linear features are screened out through N times of extraction and sequentially output nonlinear feature screening models, next-step linear features are counted from the N groups of preliminary linear features, then x with the largest response to output y is selected through an improved preset regression model, new factors are gradually added, the fact that the new factors cannot cause the significant change of the original factors is guaranteed, until the fitting goodness of the models is not improved any more, different screening processes are different in pertinence through two layers of screening, and therefore the interpretability of the feature screening process and the effectiveness of the final linear target feature can be well improved.
For the nonlinear type features, controllability of the feature magnitude can be ensured through pre-screening, the features can be conveniently input into a nonlinear feature screening model, then the features with the fourth feature importance meeting a fifth preset condition after each round of training are substituted into the next round of training according to the fourth feature importance of the features, and the features with lower importance are gradually deleted, so that the number of the features input into the nonlinear feature screening model is ensured to enter in a decreasing mode, and the purpose of nonlinear target feature screening is achieved while the model accuracy is improved. The expression ability of the model depends on the existing single characteristics, and the synergistic expression of the characteristics can also fit the effect of the model to a certain extent, so that the accuracy of the result is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 schematically shows a flow chart of a training method of a thunderstorm weather prediction model according to an embodiment of the invention;
FIG. 2 schematically illustrates a flow chart of a method of thunderstorm weather prediction according to an embodiment of the invention;
FIG. 3 schematically shows a block diagram of a training apparatus for a thunderstorm weather prediction model according to an embodiment of the invention;
fig. 4 schematically shows a block diagram of a thunderstorm weather prediction apparatus according to an embodiment of the present invention;
fig. 5 schematically shows a block diagram of a computer device adapted to implement a training method for a thunderstorm weather prediction model and/or a thunderstorm weather prediction method according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
To better understand the technical advantages achieved by the present invention, prior to describing the embodiments of the present invention, the prior art related to the present invention will be described. In the prior art, feature screening is also performed before model training, due to the improvement of the existing storage technology and the calculation capability, the model is more perfect due to the large amount of feature indexes, the accuracy of the result is ensured, but the training of the model is very time-consuming due to the large amount of redundant features, and an overfitting phenomenon is easily generated. The existing feature screening method mainly utilizes a feature screening mode based on statistics, such as a null value rate, a variance, a correlation, a collinearity and other forms, and the method can play a role in distinguishing features to a certain extent, but under the condition that the magnitude of a feature pool is huge, the magnitude of the features is difficult to effectively reduce only by means of the method, on one hand, the dependence of an objective screening mode on a statistical theory is too large, so that the interpretability of the features in the screening process is reduced, on the other hand, the feature screening is only carried out from a single angle, so that a model has no good expandability, and the influence of multi-feature antagonism on a dependent variable is lacked. Therefore, the feature selection using the statistical method still cannot obtain the core features, resulting in failure to fit an effective attribution model.
According to the training method of the thunderstorm weather prediction model, the target features with the first feature importance degree meeting the first preset condition are screened out, the features irrelevant to the target features are eliminated, multiple groups of training data are obtained, and the thunderstorm weather prediction model is trained by utilizing the multiple groups of training data. Because the training data do not comprise redundant features and the magnitude of the features included in the training data is also obviously reduced, the defects in the prior art are overcome, and the aim of improving the accuracy of the trained thunderstorm weather prediction model is fulfilled.
Further, on the basis of the existing characteristic engineering characteristic screening, the invention considers two parts of characteristics: the method comprises the steps of linear type feature and nonlinear type feature, and considering independent action of the linear type feature and the nonlinear type feature, and on the basis, considering synergistic action among multiple features, and adding nonlinear influence to improve the expression capability of a model.
For linear type features, N groups of preliminary linear features are screened out through N times of extraction and sequentially output nonlinear feature screening models, next-step linear features are counted from the N groups of preliminary linear features, then x with the largest response to output y is selected through an improved preset regression model, new factors are gradually added, the fact that the new factors cannot cause the significant change of the original factors is guaranteed, until the fitting goodness of the models is not improved any more, different screening processes are different in pertinence through two layers of screening, and therefore the interpretability of the feature screening process and the effectiveness of the final linear target feature can be well improved.
For the nonlinear type features, controllability of the feature magnitude can be ensured through pre-screening, the features can be conveniently input into a nonlinear feature screening model, then the features with the fourth feature importance meeting a fifth preset condition after each round of training are substituted into the next round of training according to the fourth feature importance of the features, and the features with lower importance are gradually deleted, so that the number of the features input into the nonlinear feature screening model is ensured to enter in a decreasing mode, and the purpose of nonlinear target feature screening is achieved while the model accuracy is improved. The expression ability of the model depends on the existing single characteristics, and the synergistic expression of the characteristics can also fit the effect of the model to a certain extent, so that the accuracy of the result is improved.
Fig. 1 schematically shows a flowchart of a training method of a thunderstorm weather prediction model according to an embodiment of the present invention.
As shown in fig. 1, the training method of the thunderstorm weather prediction model may include steps S1 to S4, where:
step S1, acquiring multiple groups of data, wherein each group of data comprises thunderstorm weather, multiple characteristics of the thunderstorm weather, and incidence relation of the thunderstorm weather and the multiple characteristics of the thunderstorm weather.
In this embodiment, each set of data is data corresponding to a certain historical thunderstorm weather day, and each set of data includes output y and input x, that is, thunderstorm weather is referred to as output y, a plurality of features of thunderstorm weather are referred to as input x, and there is an association between y and x, that is, there is an association between thunderstorm weather and the plurality of features. Wherein, a plurality of characteristics of thunderstorm weather can be: temperature, air pressure, rainfall, humidity, air density, air volume, and the like.
For example, there are 4 sets of data, the first set of data corresponding to 3 months and 15 days of data, including: a plurality of characteristics of thunderstorm weather and 3-month 15-day thunderstorm weather, and the incidence relation of the characteristics and the thunderstorm weather; the second set of data corresponds to 3 months and 18 days of data, including: a plurality of characteristics of thunderstorm weather and 3-month and 18-day thunderstorm weather, and the incidence relation of the characteristics and the thunderstorm weather; the third set of data corresponds to 5 months and 7 days of data, including: a plurality of characteristics of thunderstorm weather and 5-month 7-day thunderstorm weather, and the incidence relation of the characteristics and the thunderstorm weather; the third set of data corresponds to 6 months and 24 days of data, including: the weather of thunderstorm, a plurality of characteristics of the weather of thunderstorm of 6 months and 24 days, and the correlation of the two.
Step S2, a target feature is screened from a plurality of features of the plurality of sets of data, wherein the target feature is a feature whose first feature importance satisfies a first predetermined condition.
The purpose of this embodiment is to train out a thunderstorm weather model using target features, thereby overcoming the drawbacks of the prior art. Therefore, it is necessary to screen out, as the target feature, a feature whose first feature importance satisfies a first predetermined condition from the plurality of features. Each feature corresponds to a first feature importance, and the first feature importance is used for measuring the closeness of association between the feature and thunderstorm weather. Alternatively, the first feature importance may be a correlation coefficient of each feature with thunderstorm weather, and the first predetermined condition may be a feature in which the first feature importance is arranged before a predetermined position.
Alternatively, step S2 may include step S21 and/or step S22, wherein:
step S21, using multiple groups of data to screen out linear target features belonging to linear type from multiple features; and/or
And step S22, using the multiple groups of data to screen out nonlinear target characteristics belonging to the nonlinear type from the multiple characteristics.
The plurality of features may include linear type features and may also include nonlinear type features, and the linear type features may also belong to the nonlinear type simultaneously. In this embodiment, when only a feature belonging to a linear type exists in a plurality of features, a linear target feature is determined as a target feature; determining a nonlinear target feature as a target feature when only a feature belonging to a nonlinear type exists in the plurality of features; when there are both features belonging to the linear type and features belonging to the non-linear type among the plurality of features, the linear target feature and the non-linear target feature are determined as target features.
It should be noted that it is not known in advance which features are of a linear type and which features are of a non-linear type, and therefore, in order to ensure that when there are features of a linear type, a linear target feature can be accurately screened out, step S2 may include steps S21 to S24, where the target feature may include a linear target feature of a linear type, specifically:
in step S21, N times of sampling are performed on the multiple sets of data to obtain N data sets, where each data set includes one or more of the multiple sets of data.
The sampling method is not limited, and for example, the sampling may be performed by using the idea of Bootstrapping algorithm. For example, N — 3, the first data set includes: a first set of data, a third set of data, and a fourth set of data; the second data set includes: a second set of data, a third set of data, and a fourth set of data; the third data set includes: a first set of data, a second set of data, and a fourth set of data.
And step S22, inputting the data set into a linear feature screening model aiming at each data set in the N data sets, wherein the linear feature screening model is used for calculating a second feature importance degree of each feature aiming at a plurality of features of the data set, and outputting the features of which the second feature importance degree meets a second preset condition and belongs to a linear type, and the features are called a group of preliminary linear features.
The linear feature screening model only outputs features of a linear type, and for each feature of the linear type, a second feature importance degree of the feature is calculated, wherein when the model outputs the feature, a coefficient of the feature is carried in front of each feature, the coefficient is used for representing the importance degree of the feature, the larger the coefficient is, the higher the importance degree is, and therefore, the second feature importance degree in this embodiment is a coefficient in front of each feature. Then, a feature that belongs to the linear type and the second feature importance satisfies a second predetermined condition is output, e.g., a feature that the second feature importance is not 0 and belongs to the linear type is output.
Optionally, the linear feature screening model introduces an L1 regularization term as a Lasso model, and the model is configured to output a linear type feature, automatically calculate a second feature importance of the feature, and output the second feature importance of the feature belonging to the linear type as a coefficient form of the feature, where 0.8 is the second feature importance of humidity if 0.8 humidity is present. For another example, the second predetermined condition is: if the second feature importance is not 0, the Lasso model will output features with coefficients other than 0 and belonging to linear type for each data set.
And step S23, acquiring N groups of preliminary linear features output by the linear feature screening model.
Because N groups of data sets are input into the linear feature screening model in sequence, and each group of data set corresponds to the primary linear feature of the subdomain, the linear screening model can output N groups of primary linear features in sequence, and the feature types contained in each group of primary linear features are possibly different.
For example, in connection with the above example, the first set of preliminary linear features includes: temperature, air pressure and humidity; the second set of preliminary linear features includes: temperature, air pressure, rainfall and air volume; the third set of preliminary linear features includes: temperature and humidity.
And step S24, screening out linear target characteristics by using the N groups of preliminary linear characteristics.
Alternatively, step S24 may include steps S241 to S243, in which:
step S241, counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic;
step S242, screening out the characteristics with the third characteristic importance degree meeting a third preset condition from the N groups of preliminary linear characteristics, and calling the characteristics as secondary linear characteristics;
and step S243, screening out linear target characteristics by utilizing the linear characteristics in the next step.
In this embodiment, the third feature importance may be the number of times each feature of the N groups of preliminary linear features appears, and the third predetermined condition may be that the number of times exceeds a predetermined number threshold.
For example, in connection with the above example, the number of times of temperature occurrence is 3, the number of times of air pressure occurrence is 2, the number of times of humidity occurrence is 2, the number of times of rainfall occurrence is 1, and the number of times of air volume occurrence is 1. If the third predetermined condition is that the number of times exceeds 1, the linear characteristics of the sub-step are temperature, air pressure and humidity.
Further, linear target features may be screened out based on the sub-step linear features. For example, the linear target feature is directly taken as the sub-step linear feature.
However, since the loss function of the L1 regular term is not derivable, there is a certain instability in the linear target feature directly determined by the Lasso model with the L1 regular term introduced. In order to solve the above-mentioned defects, the present embodiment may input the linear feature of the next step into a predetermined regression model, and determine the final linear target feature through the predetermined regression model, thereby improving the accuracy of determining the linear target feature. Specifically, step S243 may include step a1 to step A8, in which:
step A1: calculating the feature quantity M of all the features in the linear features of the next step and the correlation coefficient of each feature and the thunderstorm weather;
step A2: taking the feature with the 1 st large correlation coefficient as one feature of the linear target feature;
step A3: inputting the 1 st characteristic with the large correlation coefficient and thunderstorm weather into a1 st predetermined regression model to obtain a1 st significance;
step A4: judging whether i is larger than M, executing the step A5 when i is not larger than M, and executing the step A8 when i is larger than M, wherein the initial value of i is 1;
step A5: inputting the characteristics with the large correlation coefficient i +1 to an i +1 th preset regression model to obtain the i +1 th significance, wherein the i +1 th preset regression model is obtained by inputting the previous i characteristics and thunderstorm weather to the i th preset regression model;
step A6: judging whether the relation between the ith significance and the i +1 significance meets a sixth preset condition, if so, executing the step A7, and if not, executing the step A4;
step A7: determining the characteristic with the correlation coefficient of i +1 th as one characteristic of the linear target characteristic;
step A8: all features will be determined from the sub-step linear features as linear target features.
The present embodiment is a loop operation, specifically, first, a feature with the largest correlation coefficient with y (referred to as a feature with the largest correlation coefficient 1) is selected from the linear features in the next step, and is used as a feature of the linear target feature, and the feature with the largest correlation coefficient 1 and the output y are input into a predetermined regression model (referred to as a1 st predetermined regression model in this case) to obtain a significance, which is referred to as a1 st significance, and a model obtained after the feature with the largest correlation coefficient 1 is input into the 1 st predetermined regression model is referred to as a2 nd predetermined regression model. Further, a feature with a2 nd correlation coefficient with y (referred to as a feature with a2 nd correlation coefficient) is selected from the sub-step linear features, and the feature with the 2 nd correlation coefficient is input into a2 nd predetermined regression model to obtain a significance, which is referred to as a2 nd significance. Then, whether the relation between the 1 st significance and the 2 nd significance meets a sixth preset condition is judged (for example, whether the difference between the two significances is larger than 0.0001 is judged), if yes, the feature with the 2 nd largest correlation coefficient has significant sound for the feature with the 1 st largest correlation coefficient, and at the moment, the relation between the significance of the feature with the 3 rd largest correlation coefficient and the 1 st significance is continuously judged; if not, taking the feature with the largest correlation number 2 as one feature in the linear target features, continuously judging the relationship between the significance of the feature with the largest correlation coefficient 3 and the 2 nd significance, and repeating the steps until all the features in the linear features in the next step are judged.
It should be noted that significance can be characterized by T statistic.
Alternatively, when the number of features included in the linear features in the next step is large, if the workload of the processor is increased seriously after the judgment logic for the significance of all the features is executed in a loop, the judgment logic for when to stop the significance can be determined by judging the goodness of fit of the predetermined regression model. The method comprises the following specific steps:
step a3 may include: inputting the 1 st characteristic with the large correlation coefficient and thunderstorm weather into a1 st predetermined regression model to obtain the 1 st significance and the 1 st first goodness of fit;
step a5 may include: inputting the characteristics with the correlation coefficient of i +1 th into the i +1 th predetermined regression model to obtain the i +1 th significance and the i +1 th first goodness of fit;
after step a7, and before step A8, the training method of the thunderstorm weather prediction model may further include: and judging whether the relation between the ith first goodness-of-fit and the (i + 1) th first goodness-of-fit meets a seventh preset condition, if not, executing the step A4, and if so, executing the step A8.
In this embodiment, when the determination logic of the significance of all the features is not executed, if the relationship between the ith first goodness of fit and the (i + 1) th first goodness of fit satisfies the seventh predetermined condition, the determination logic of the significance of the remaining features is not continued, and all the features determined from the next linear feature up to this point are taken as the linear target features. For example, the relationship between the ith first goodness-of-fit and the (i + 1) th first goodness-of-fit satisfying the seventh predetermined condition may be: the difference between the ith first goodness-of-fit and the (i + 1) th first goodness-of-fit is less than 0.0001.
Wherein the goodness of fit can be determined by R2It is determined that R is also known as a coefficient of solution.
Optionally, in order to ensure that the nonlinear target feature can be accurately screened out when the nonlinear type feature exists, step S2 may further include steps S21 'to S24', wherein the target feature may include a nonlinear target feature belonging to a nonlinear type, specifically:
and step S21', inputting the plurality of groups of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance degree of each feature in the plurality of features by using the plurality of groups of data, and outputting the feature of which the fourth feature importance degree meets a fourth preset condition and belongs to a nonlinear type.
The nonlinear feature screening model outputs only the features of the nonlinear type, and for each feature of the nonlinear type, calculates a fourth feature importance of the feature, and then outputs the feature that belongs to the nonlinear type and the fourth feature importance satisfies a fourth predetermined condition, e.g., outputs the feature that the fourth feature importance is not 0 and belongs to the nonlinear type.
Optionally, the nonlinear feature screening model is, for example, a machine learning model, such as a Random Forest algorithm (RF) or a Gradient Boosting Tree (GBDT) in the machine learning model. Taking a random forest algorithm as an example, the trees constructed in the random forest algorithm can count the reduction degree of the kini coefficient after each node is split by the node feature, a plurality of trees are randomly generated, and features are randomly selected, so that the improvement of classification or regression purity of a certain feature can be obtained under the condition of big data, and the value is the contribution degree, namely the importance degree of the fourth feature. Wherein, the fourth feature importance of the feature belonging to the non-linear type can also be output as a coefficient form of the feature, for example, 0.6 air density, and 0.6 is the fourth feature importance of the air density. For another example, the fourth predetermined condition is: if the importance of the fourth feature is not 0, the nonlinear screening model outputs a feature with a coefficient not equal to 0 and belonging to a nonlinear type for each data set.
And step S22', removing the feature of which the fourth feature importance degree meets the fifth preset condition from the features output by the nonlinear feature screening model to obtain the preliminary nonlinear feature.
If, the fifth predetermined condition is: the fourth least important feature. In this embodiment, the features output by the nonlinear feature screening model may be sorted in the order of the importance of the four features from large to small, and then the features arranged at the end are extracted and removed to obtain the preliminary nonlinear features.
And step S23', removing the characteristics irrelevant to the preliminary nonlinear characteristics aiming at each group of data of the multiple groups of data to obtain multiple groups of preliminary screening data.
And (4) rejecting the features irrelevant to the preliminary nonlinear features, namely rejecting the features except the preliminary nonlinear features.
And step S24', continuously inputting multiple groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out.
Optionally, after step S21', the training method of the thunderstorm weather prediction model may further include: and calculating a second goodness of fit of the nonlinear feature screening model.
Step S24 ' may include steps S241 ' to S246 ', in which:
step S241', continuously inputting a plurality of groups of preliminary screening data into a nonlinear feature screening model to obtain a secondary nonlinear feature;
step 242', removing features irrelevant to the next step nonlinear features from each group of the plurality of groups of preliminary screening data to obtain a plurality of groups of next step screening data;
step S243', calculating a third goodness of fit of the nonlinear feature screening model at this time;
step S244', determining whether the relationship between the second goodness-of-fit and the third goodness-of-fit satisfies an eighth predetermined condition; if yes, go to step S245'; if not, go to step S246'.
And step S245', determining the nonlinear characteristic of the next step as a nonlinear target characteristic.
And step S246', the plurality of groups of the secondary screening data are continuously input into the nonlinear characteristic screening model until the nonlinear target characteristics are screened out.
The embodiment also belongs to a cycle operation, and specifically, a plurality of groups of preliminary screening data are obtained first, and a second goodness of fit is calculated; and then obtaining multiple groups of sub-step screening data, calculating a third goodness of fit, if the relationship between the second goodness of fit and the third goodness of fit meets an eighth preset condition, determining the sub-step nonlinear characteristic as a nonlinear target characteristic, and otherwise, continuously inputting the multiple groups of sub-step screening data into the nonlinear characteristic screening model until the relationship between the goodness of fit meets the eighth preset condition. Wherein the eighth predetermined condition is, for example, that a difference between the loss function corresponding to the second goodness-of-fit and the loss function corresponding to the third goodness-of-fit is less than 0.0001.
Optionally, in order to avoid that directly inputting multiple sets of data into the nonlinear feature screening model would cause too heavy processing tasks at the same time, thereby causing other problems, such as machine paralysis, the embodiment may also preprocess the multiple sets, and then input the preprocessed data into the nonlinear feature screening model. The method comprises the following specific steps:
step S21 ' may include step S211 ' and step S212 ', where:
step S211', pre-screening a plurality of characteristics by using a preset rule aiming at each group of data in a plurality of groups of data to obtain a plurality of groups of pre-processing data;
step S212', inputting multiple sets of preprocessing data into a nonlinear feature screening model, where the nonlinear feature screening model is configured to calculate a fourth feature importance of each feature in the features subjected to the pre-screening by using the multiple sets of preprocessing data, and output a feature whose fourth feature importance satisfies a fourth predetermined condition and belongs to a nonlinear type.
In this embodiment, the preprocessing may be to calculate a distance between every two features for each group of data, for example, an euclidean distance, and if the distance between two features is greater than a predetermined threshold, it is considered that the correlation between the two features is strong, and only one feature needs to be reserved, and at this time, the distance between each of the two features and the output y thunderstorm weather may be continuously calculated, and the feature having a smaller distance from the thunderstorm weather may be eliminated. Through the preprocessing, a plurality of groups of preprocessing data can be obtained. And further inputting a plurality of groups of preprocessed data into the nonlinear feature screening model, wherein the processing logic at the position is consistent with the processing logic of directly inputting the plurality of groups of data into the nonlinear feature screening model, and the details are not repeated.
And step S3, removing features irrelevant to the target features from each group of data of the multiple groups of data to form multiple groups of training data.
Wherein, when only the features belonging to the linear type exist in the plurality of features, the target feature only includes a linear target feature; when only the features belonging to the non-linear type exist in the plurality of features, the target features only include the non-linear target features; when there are both features belonging to the linear type and features belonging to the non-linear type among the plurality of features, the target feature includes both the linear target feature and the non-linear target feature.
In this embodiment, for each set of data, features other than the target feature are removed from the plurality of features of the set of data. At this time, the data after the execution of step S3 includes a feature that contributes significantly to thunderstorm weather.
And step S4, training a predetermined algorithm by using multiple groups of training data to obtain a thunderstorm weather prediction model.
And taking a plurality of groups of training data as a training set, training a preset algorithm, and further obtaining a thunderstorm weather prediction model, wherein the thunderstorm weather prediction model is used for predicting whether the future weather is the thunderstorm weather according to the characteristics of the current weather. The predetermined algorithm is, for example, a Support Vector Machine (SVM) algorithm, an Adaptive Boosting (AdaBoost) algorithm, a Logistic Regression (LR) algorithm, or a Decision Tree (Decision Tree) algorithm.
Fig. 2 schematically shows a flow chart of a thunderstorm weather prediction method according to an embodiment of the invention.
As shown in fig. 2, the thunderstorm weather prediction method may include steps M1 to M3, wherein:
step M1, acquiring target characteristics of the current weather;
and step M2, inputting the target characteristics into a pre-trained thunderstorm weather prediction model so that the thunderstorm weather prediction model outputs a weather prediction result.
The thunderstorm weather prediction model is obtained by the method of the first embodiment.
And step M3, judging whether the future weather is thunderstorm weather according to the weather prediction result.
In this embodiment, the target characteristics of the current weather belong to a pre-trained thunderstorm weather prediction model, and the obtained weather prediction result is more reliable because the training process of the thunderstorm weather prediction model is rigorous and the training result is accurate. The weather prediction result may be thunderstorm weather or not, when the weather prediction result is thunderstorm weather, it indicates that the predicted future weather is thunderstorm weather, and when the weather prediction result is not thunderstorm weather, it indicates that the predicted future weather is not thunderstorm weather.
The embodiment of the invention also provides a training device of the thunderstorm weather prediction model, which corresponds to the training method of the thunderstorm weather prediction model provided by the embodiment, corresponding technical features and technical effects are not detailed in the embodiment, and related points can refer to the embodiment. Specifically, fig. 3 schematically shows a block diagram of a training apparatus of a thunderstorm weather prediction model according to an embodiment of the present invention. As shown in fig. 3, the training apparatus 300 for the thunderstorm weather prediction model may include a first obtaining module 301, a screening module 302, a culling module 303, and a training module 304, wherein:
a first obtaining module 301, configured to obtain multiple sets of data, where each set of data includes thunderstorm weather, multiple features of the thunderstorm weather, and an association relationship between the thunderstorm weather and the multiple features of the thunderstorm weather;
a screening module 302, configured to screen out a target feature from a plurality of features of the plurality of sets of data, where the target feature is a feature whose first feature importance satisfies a first predetermined condition;
a removing module 303, configured to remove, from each of the multiple sets of data, a feature that is unrelated to the target feature to form multiple sets of training data;
and the training module 304 is configured to train a predetermined algorithm by using the plurality of sets of training data to obtain a thunderstorm weather prediction model.
Optionally, the screening module is further configured to: screening out linear target characteristics belonging to a linear type from the characteristics by using the multiple groups of data; and/or screening out a nonlinear target characteristic belonging to a nonlinear type from the plurality of characteristics by using the plurality of groups of data.
Optionally, the target feature comprises a linear target feature belonging to a linear type, and the filtering module, when filtering out the target feature from the plurality of features of the plurality of sets of data, is further configured to: performing N times of sampling on the multiple groups of data to obtain N data sets, wherein each data set comprises one or more groups of the multiple groups of data; inputting the data sets into a linear feature screening model for each of the N data sets, wherein the linear feature screening model is configured to calculate a second feature importance of each feature for the plurality of features of the data sets, and output features, called a set of preliminary linear features, of which the second feature importance satisfies a second predetermined condition and belongs to the linear type; acquiring N groups of preliminary linear characteristics output by the linear characteristic screening model; and screening out the linear target characteristics by using the N groups of preliminary linear characteristics.
Optionally, the screening module, when screening out the linear target feature by using the N sets of preliminary linear features, is further configured to: counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic; screening out the characteristics with the third characteristic importance degree meeting a third preset condition from the N groups of preliminary linear characteristics, and calling the characteristics as secondary linear characteristics; and screening the linear target characteristics by using the linear characteristics of the next step.
Optionally, the screening module, when screening out the linear target feature by using the next-step linear feature, is further configured to: step A1: calculating the feature quantity M of all the features in the linear features of the next step and the correlation coefficient of each feature and the thunderstorm weather; step A2: taking the feature with the 1 st large correlation coefficient as a feature of the linear target feature; step A3: inputting the feature with the 1 st large correlation coefficient and thunderstorm weather into a1 st preset regression model to obtain a1 st significance; step A4: judging whether i is larger than M, executing the step A5 when i is not larger than M, and executing the step A8 when i is larger than M, wherein the initial value of i is 1; step A5: inputting the feature with the large correlation coefficient i +1 to the i +1 th preset regression model to obtain the i +1 th significance, wherein the i +1 th preset regression model is obtained by inputting the previous i features and thunderstorm weather to the i th preset regression model; step A6: judging whether the relation between the ith saliency and the i +1 saliency meets a sixth preset condition, if so, executing step A7, and if not, executing step A4; step A7: determining the characteristic with the correlation coefficient of i +1 th as one characteristic of the linear target characteristic; step A8: and determining all features from the sub-step linear features as the linear target features.
Optionally, the screening module, when executing step a3, is further configured to: inputting the 1 st large correlation coefficient feature and thunderstorm weather into a1 st preset regression model to obtain a1 st significance and a1 st first goodness of fit; the screening module, at step a5, is further configured to: inputting the characteristic with the correlation coefficient of the (i + 1) th into the (i + 1) th preset regression model to obtain the (i + 1) th significance and the (i + 1) th first goodness of fit; after step a7, and before step A8, the apparatus further comprises: and the judging module is used for judging whether the relation between the ith first goodness-of-fit and the (i + 1) th first goodness-of-fit meets a seventh preset condition or not, if not, the screening module executes the step A4, and if so, the screening module executes the step A8.
Optionally, the target feature includes a non-linear target feature belonging to a non-linear type, and the filtering module, when filtering out the target feature from the plurality of features of the plurality of sets of data, is further configured to: inputting the multiple groups of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance degree of each feature in the multiple features by using the multiple groups of data, and outputting the feature of which the fourth feature importance degree meets a fourth preset condition and belongs to the nonlinear type; removing the features of which the fourth feature importance degree meets a fifth preset condition from the features output by the nonlinear feature screening model to obtain a preliminary nonlinear feature; removing features irrelevant to the preliminary nonlinear features aiming at each group of data of the multiple groups of data to obtain multiple groups of preliminary screening data; and continuously inputting the multiple groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out.
Optionally, after inputting the plurality of sets of data into the nonlinear feature screening model, the apparatus further comprises: the calculation module is used for calculating a second goodness of fit of the nonlinear feature screening model at this time;
the screening module is used for continuously inputting the multiple groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out, and is also used for: continuously inputting the multiple groups of preliminary screening data into the nonlinear feature screening model to obtain a secondary nonlinear feature; removing features irrelevant to the secondary nonlinear features from each group of the plurality of groups of primary screening data to obtain a plurality of groups of secondary screening data; calculating a third goodness of fit of the nonlinear feature screening model; judging whether the relation between the second goodness-of-fit and the third goodness-of-fit meets an eighth preset condition or not; and if so, determining the next nonlinear feature as the nonlinear target feature. If not, the multiple sets of the secondary screening data are continuously input into the nonlinear feature screening model until the nonlinear target features are screened out.
Optionally, the screening module, when inputting the plurality of sets of data into the nonlinear feature screening model, is further configured to: pre-screening the plurality of characteristics by using a preset rule aiming at each group of data in the plurality of groups of data to obtain a plurality of groups of pre-processing data; and inputting the multiple groups of preprocessing data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating the fourth feature importance of each feature in the features subjected to pre-screening by using the multiple groups of preprocessing data, and outputting the feature of which the fourth feature importance meets the fourth preset condition and belongs to the nonlinear type.
The embodiment of the invention also provides a thunderstorm weather forecasting method device, which corresponds to the thunderstorm weather forecasting method provided by the embodiment, corresponding technical features and technical effects are not detailed in the embodiment, and related points can refer to the embodiment. In particular, the amount of the solvent to be used,
fig. 4 schematically shows a block diagram of a thunderstorm weather prediction apparatus according to an embodiment of the present invention. As shown in fig. 4, the thunderstorm weather prediction apparatus 400 may include a second obtaining module 401, an input module 402, and a determination module 403, wherein:
a second obtaining module 401, configured to obtain a target feature of current weather;
an input module 402, configured to input the target feature into a pre-trained thunderstorm weather prediction model, so that the thunderstorm weather prediction model outputs a weather prediction result, where the thunderstorm weather prediction model is obtained by a training method of the thunderstorm weather prediction model;
and a determining module 403, configured to determine whether the future weather is thunderstorm weather according to the weather prediction result.
Fig. 5 schematically shows a block diagram of a computer device adapted to implement a training method for a thunderstorm weather prediction model and/or a thunderstorm weather prediction method according to an embodiment of the invention. In this embodiment, the computer device 500 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster composed of a plurality of servers) for executing programs, and the like. As shown in fig. 5, the computer device 500 of the present embodiment includes at least but is not limited to: a memory 501, a processor 502, and a network interface 503 communicatively coupled to each other via a system bus. It is noted that FIG. 5 only illustrates the computer device 500 having components 501 and 503, but it is to be understood that not all illustrated components are required to be implemented, and that more or fewer components can alternatively be implemented.
In this embodiment, the memory 501 includes at least one type of computer-readable storage medium, which includes flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 501 may be an internal storage unit of the computer device 500, such as a hard disk or a memory of the computer device 500. In other embodiments, the memory 501 may also be an external storage device of the computer device 500, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 500. Of course, the memory 501 may also include both internal and external memory units of the computer device 500. In the present embodiment, the memory 501 is generally used for storing an operating system and various types of application software installed on the computer device 500, such as program codes of a training method of a thunderstorm weather prediction model and/or program codes of a thunderstorm weather prediction method. Further, the memory 501 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 502 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 502 generally operates to control the overall operation of the computer device 500. Such as performing control and processing related to data interaction or communication with computer device 500. In this embodiment, the processor 502 is configured to execute the program code of the training method of the thunderstorm weather prediction model and/or the program code of the thunderstorm weather prediction method stored in the memory 501.
In this embodiment, the training method of the thunderstorm weather prediction model and/or the thunderstorm weather prediction method stored in the memory 501 may be further divided into one or more program modules and executed by one or more processors (in this embodiment, the processor 502) to complete the present invention.
The network interface 503 may include a wireless network interface or a wired network interface, and the network interface 503 is typically used to establish communication links between the computer device 500 and other computer devices. For example, the network interface 503 is used to connect the computer device 500 to an external terminal via a network, establish a data transmission channel and a communication link between the computer device 500 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G network, Bluetooth (Bluetooth), Wi-Fi, etc.
The present embodiment also provides a computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor, implements the steps of the training method of the thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A training method of a thunderstorm weather prediction model is characterized by comprising the following steps:
acquiring a plurality of groups of data, wherein each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an incidence relation between the thunderstorm weather and the plurality of characteristics of the thunderstorm weather;
screening target characteristics from a plurality of characteristics of the plurality of groups of data, wherein the target characteristics are characteristics of which first characteristic importance degrees meet first preset conditions;
in each group of data of the multiple groups of data, eliminating features irrelevant to the target features to form multiple groups of training data;
and training a predetermined algorithm by using the plurality of groups of training data to obtain a thunderstorm weather prediction model.
2. The method of claim 1, wherein the target feature comprises a linear target feature belonging to a linear type, and wherein the screening of the target feature from the plurality of features of the plurality of sets of data comprises:
performing N times of sampling on the multiple groups of data to obtain N data sets, wherein each data set comprises one or more groups of the multiple groups of data;
inputting the data sets into a linear feature screening model for each of the N data sets, wherein the linear feature screening model is configured to calculate a second feature importance of each feature for the plurality of features of the data sets, and output features, called a set of preliminary linear features, of which the second feature importance satisfies a second predetermined condition and belongs to the linear type;
acquiring N groups of preliminary linear characteristics output by the linear characteristic screening model;
and screening out the linear target characteristics by using the N groups of preliminary linear characteristics.
3. The method of claim 2, wherein using the N sets of preliminary linear features to screen out the linear target feature comprises:
counting all the characteristics in the N groups of preliminary linear characteristics to obtain a third characteristic importance degree of each characteristic;
screening out the characteristics with the third characteristic importance degree meeting a third preset condition from the N groups of preliminary linear characteristics, and calling the characteristics as secondary linear characteristics;
and screening the linear target characteristics by using the linear characteristics of the next step.
4. The method of claim 1, wherein the target feature comprises a non-linear target feature belonging to a non-linear type, and wherein the screening of the target feature from the plurality of features of the plurality of sets of data comprises:
inputting the multiple groups of data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating a fourth feature importance degree of each feature in the multiple features by using the multiple groups of data, and outputting the feature of which the fourth feature importance degree meets a fourth preset condition and belongs to the nonlinear type;
removing the features of which the fourth feature importance degree meets a fifth preset condition from the features output by the nonlinear feature screening model to obtain a preliminary nonlinear feature;
removing features irrelevant to the preliminary nonlinear features aiming at each group of data of the multiple groups of data to obtain multiple groups of preliminary screening data;
and continuously inputting the multiple groups of preliminary screening data into the nonlinear feature screening model until the nonlinear target features are screened out.
5. The method of claim 4, wherein inputting the plurality of sets of data into a nonlinear feature screening model comprises:
pre-screening the plurality of characteristics by using a preset rule aiming at each group of data in the plurality of groups of data to obtain a plurality of groups of pre-processing data;
and inputting the multiple groups of preprocessing data into a nonlinear feature screening model, wherein the nonlinear feature screening model is used for calculating the fourth feature importance of each feature in the features subjected to pre-screening by using the multiple groups of preprocessing data, and outputting the feature of which the fourth feature importance meets the fourth preset condition and belongs to the nonlinear type.
6. A thunderstorm weather prediction method is characterized by comprising the following steps:
acquiring target characteristics of current weather;
inputting the target characteristics into a pre-trained thunderstorm weather prediction model so as to enable the thunderstorm weather prediction model to output a weather prediction result, wherein the thunderstorm weather prediction model is obtained by the method of any one of claims 1 to 5;
and judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
7. A training device for a thunderstorm weather prediction model is characterized by comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of groups of data, and each group of data comprises thunderstorm weather, a plurality of characteristics of the thunderstorm weather and an incidence relation of the thunderstorm weather and the plurality of characteristics of the thunderstorm weather;
the screening module is used for screening target characteristics from a plurality of characteristics of the plurality of groups of data, wherein the target characteristics are characteristics of which the first characteristic importance degree meets a first preset condition;
the rejecting module is used for rejecting the features irrelevant to the target features in each group of data of the multiple groups of data to form multiple groups of training data;
and the training module is used for training a predetermined algorithm by utilizing the plurality of groups of training data to obtain a thunderstorm weather prediction model.
8. A thunderstorm weather prediction apparatus, comprising:
the second acquisition module is used for acquiring the target characteristics of the current weather;
an input module, configured to input the target feature into a pre-trained thunderstorm weather prediction model, so that the thunderstorm weather prediction model outputs a weather prediction result, where the thunderstorm weather prediction model is obtained by the method according to any one of claims 1 to 5;
and the judging module is used for judging whether the future weather is thunderstorm weather or not according to the weather prediction result.
9. A computer device, the computer device comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 5 and/or the steps of the method of claim 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 5 and/or the steps of the method of claim 6.
CN202010116671.XA 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method Active CN111368887B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010116671.XA CN111368887B (en) 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
PCT/CN2020/117578 WO2021169271A1 (en) 2020-02-25 2020-09-25 Training method for thunderstorm weather prediction model, and thunderstorm weather prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116671.XA CN111368887B (en) 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Publications (2)

Publication Number Publication Date
CN111368887A true CN111368887A (en) 2020-07-03
CN111368887B CN111368887B (en) 2024-05-03

Family

ID=71208274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116671.XA Active CN111368887B (en) 2020-02-25 2020-02-25 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Country Status (2)

Country Link
CN (1) CN111368887B (en)
WO (1) WO2021169271A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832828A (en) * 2020-07-17 2020-10-27 国家卫星气象中心(国家空间天气监测预警中心) Intelligent precipitation prediction method based on wind-cloud four-weather satellite
CN111915068A (en) * 2020-07-17 2020-11-10 同济大学 Road visibility temporary prediction method based on ensemble learning
CN112561199A (en) * 2020-12-23 2021-03-26 北京百度网讯科技有限公司 Weather parameter prediction model training method, weather parameter prediction method and device
WO2021169271A1 (en) * 2020-02-25 2021-09-02 平安科技(深圳)有限公司 Training method for thunderstorm weather prediction model, and thunderstorm weather prediction method
CN113985145A (en) * 2021-09-13 2022-01-28 广东电网有限责任公司广州供电局 Thunder and lightning early warning method, early warning device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031472A (en) * 2017-06-09 2018-12-18 阿里巴巴集团控股有限公司 A kind of data processing method and device for weather prognosis
CN109472283A (en) * 2018-09-13 2019-03-15 中国科学院计算机网络信息中心 A kind of hazardous weather event prediction method and apparatus based on Multiple Incremental regression tree model
JP2019095323A (en) * 2017-11-24 2019-06-20 株式会社日立製作所 Weather prediction device
CN110428015A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of training method and relevant device of model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298389A (en) * 2019-06-11 2019-10-01 上海冰鉴信息科技有限公司 More wheels circulation feature selection approach and device when training pattern
CN111368887B (en) * 2020-02-25 2024-05-03 平安科技(深圳)有限公司 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031472A (en) * 2017-06-09 2018-12-18 阿里巴巴集团控股有限公司 A kind of data processing method and device for weather prognosis
JP2019095323A (en) * 2017-11-24 2019-06-20 株式会社日立製作所 Weather prediction device
CN109472283A (en) * 2018-09-13 2019-03-15 中国科学院计算机网络信息中心 A kind of hazardous weather event prediction method and apparatus based on Multiple Incremental regression tree model
CN110428015A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of training method and relevant device of model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MORNÉ GIJBEN等: ""A statistical scheme to forecast the daily lightning threat over southern Africa using the Unified Model"", 《ATMOSPHERIC RESEARCH》, vol. 194, pages 2 *
孔德兵等: ""基于逐步回归分析的西北地区东部雷暴概率预报方法研究"", 《干旱气象》, vol. 34, no. 1, pages 3 *
陈雷等: ""GPS水汽资料在雷雨预报中的应用"", 《大气科学研究与应用》, no. 2, pages 2 - 4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169271A1 (en) * 2020-02-25 2021-09-02 平安科技(深圳)有限公司 Training method for thunderstorm weather prediction model, and thunderstorm weather prediction method
CN111832828A (en) * 2020-07-17 2020-10-27 国家卫星气象中心(国家空间天气监测预警中心) Intelligent precipitation prediction method based on wind-cloud four-weather satellite
CN111915068A (en) * 2020-07-17 2020-11-10 同济大学 Road visibility temporary prediction method based on ensemble learning
CN111832828B (en) * 2020-07-17 2023-12-19 国家卫星气象中心(国家空间天气监测预警中心) Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites
CN112561199A (en) * 2020-12-23 2021-03-26 北京百度网讯科技有限公司 Weather parameter prediction model training method, weather parameter prediction method and device
CN113985145A (en) * 2021-09-13 2022-01-28 广东电网有限责任公司广州供电局 Thunder and lightning early warning method, early warning device and computer readable storage medium

Also Published As

Publication number Publication date
CN111368887B (en) 2024-05-03
WO2021169271A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN113515770B (en) Method and device for determining target service model based on privacy protection
CN112800116B (en) Method and device for detecting abnormity of service data
CN111784044A (en) Landslide prediction method, device, equipment and storage medium
CN113657668A (en) Power load prediction method and system based on LSTM network
CN113807568B (en) Power load prediction method and device and terminal equipment
CN114428937A (en) Time sequence data prediction method based on space-time diagram neural network
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN115545331A (en) Control strategy prediction method and device, equipment and storage medium
CN115801463B (en) Industrial Internet platform intrusion detection method and device and electronic equipment
CN111783883A (en) Abnormal data detection method and device
CN109600627B (en) Video identification method and device
CN113839799A (en) Alarm association rule mining method and device
CN114495137B (en) Bill abnormity detection model generation method and bill abnormity detection method
CN114116853B (en) Data security analysis method and device based on time sequence association analysis
CN115689067A (en) Solar irradiance prediction method, device and storage medium
CN115438054A (en) Incremental calculation updating method based on expert statistical characteristics, electronic equipment and medium
CN114913008A (en) Decision tree-based bond value analysis method, device, equipment and storage medium
CN114881162A (en) Method, apparatus, device and medium for predicting failure of metering automation master station
CN112784165A (en) Training method of incidence relation estimation model and method for estimating file popularity
CN114549884A (en) Abnormal image detection method, device, equipment and medium
CN114444721A (en) Model training method and device, electronic equipment and computer storage medium
CN112183622A (en) Method, device, equipment and medium for detecting cheating in mobile application bots installation
CN113289346A (en) Task model training method and device, electronic equipment and storage medium
CN112396100A (en) Fine-grained classification model optimization method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant