CN115759446A - Machine learning feature selection method for new energy high-precision prediction - Google Patents

Machine learning feature selection method for new energy high-precision prediction Download PDF

Info

Publication number
CN115759446A
CN115759446A CN202211488356.5A CN202211488356A CN115759446A CN 115759446 A CN115759446 A CN 115759446A CN 202211488356 A CN202211488356 A CN 202211488356A CN 115759446 A CN115759446 A CN 115759446A
Authority
CN
China
Prior art keywords
meteorological
feature set
feature
features
meteorological feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211488356.5A
Other languages
Chinese (zh)
Inventor
周悦
马溪原
陈元峰
陈炎森
程凯
张子昊
周长城
李卓环
包涛
姚森敬
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Power Grid Digital Grid Research Institute Co Ltd
Original Assignee
Southern Power Grid Digital Grid Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Power Grid Digital Grid Research Institute Co Ltd filed Critical Southern Power Grid Digital Grid Research Institute Co Ltd
Priority to CN202211488356.5A priority Critical patent/CN115759446A/en
Publication of CN115759446A publication Critical patent/CN115759446A/en
Priority to CN202310972728.XA priority patent/CN117113230A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a new energy high-precision prediction oriented machine learning feature selection method, a new energy high-precision prediction oriented machine learning feature selection device, computer equipment, a storage medium and a computer program product. The method comprises the steps of obtaining an initial meteorological feature set; screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set; based on the correlation analysis, performing meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set; and screening the meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set. The whole scheme is used for preliminarily extracting meteorological features according to a random forest algorithm, correlation analysis is carried out on the meteorological features on the basis of primary feature extraction, extraction is carried out again according to correlation analysis results, then recursive elimination is carried out on the remaining features, and through multiple screening, the feature screening processing with the largest influence on the new energy power generation power is carried out, so that more accurate meteorological features are obtained.

Description

Machine learning feature selection method for new energy high-precision prediction
Technical Field
The present application relates to the field of new energy technologies, and in particular, to a method and an apparatus for selecting machine learning features for high-precision prediction of new energy, a computer device, a storage medium, and a computer program product.
Background
The new energy generally refers to renewable energy developed and utilized on the basis of new technology. With the limited nature of conventional energy and the increasing prominence of environmental issues, new energy characterized by environmental protection and regeneration is more and more emphasized. The output of new energy is influenced by meteorological factors, and has strong randomness and volatility. The large-scale new energy grid connection brings challenges to safe and stable operation of an electric power system, and power prediction is one of key measures for solving the new energy grid connection.
In the traditional power prediction process, power prediction is mainly carried out by selecting meteorological features. However, due to various meteorological characteristic factors influencing the new energy output, the meteorological characteristics influencing the output of the new energy power generation system under different weather conditions are different. And therefore, accurate characteristics affecting the generated power cannot be extracted.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for selecting a machine learning feature oriented to high-precision prediction of new energy, which can obtain accurate power generation of new energy.
In a first aspect, the application provides a new energy high-precision prediction-oriented machine learning feature selection method. The method comprises the following steps:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
In one embodiment, screening meteorological features in an initial meteorological feature set by using a random forest algorithm to obtain a first meteorological feature set includes: acquiring power generation sample data of a new energy power generation system; according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores; and according to the feature scores, removing features with the feature scores lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
In one embodiment, according to power generation sample data, scoring meteorological features in the initial meteorological feature set by using a random forest algorithm, and obtaining feature scores includes: randomly sampling power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
In one embodiment, calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain the feature score includes: calculating the prediction error rate of the decision tree according to the test sample data; randomly adding noise into a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
In one embodiment, the meteorological feature screening the first meteorological feature set based on the correlation analysis between the meteorological features and the generated power, and the obtaining the second meteorological feature set comprises: calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
In one embodiment, the step of screening the meteorological features in the second meteorological feature set by using a recursive feature elimination method to obtain the target meteorological feature set includes: performing meteorological feature extraction on the second meteorological feature by adopting a recursive feature elimination method; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
In a second aspect, the application further provides a new energy high-precision prediction-oriented machine learning feature selection device. The device includes:
the acquisition module is used for acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
the first extraction module is used for screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
the second extraction module is used for screening meteorological features of the first meteorological feature set based on correlation analysis between the meteorological features and the generated power to obtain a second meteorological feature set;
the third extraction module is used for screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
In one embodiment, the first extraction module is further configured to obtain power generation sample data of the new energy power generation system; according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores; and according to the feature scores, removing features with the feature scores lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
In one embodiment, the first extraction module is further configured to randomly sample power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
In one embodiment, the first extraction module is further configured to calculate a prediction error rate of the decision tree according to the test sample data; randomly adding noise into a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
In one embodiment, the second extraction module is further used for calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
In one embodiment, the third extraction module is further configured to perform meteorological feature extraction on the second meteorological feature by using a recursive feature elimination method; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is gradually increased.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
performing meteorological feature screening on the first meteorological feature set based on correlation analysis between meteorological features and generated power to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
According to the machine learning feature selection method, device, computer equipment, storage medium and computer program product for high-precision prediction of new energy, initial meteorological features influencing power generation power in a new energy power generation system are obtained, and an initial meteorological feature set is obtained; screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set; based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set; screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set; and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step. The whole scheme acquires a comprehensive meteorological feature set, and then carries out preliminary extraction on meteorological features according to a random forest algorithm, and then on the basis of the features extracted for the first time, carries out correlation analysis on the meteorological features, extracts again according to correlation analysis results, and then carries out recursion elimination on the remaining features, and through screening many times, the feature screening processing with the largest influence on the new energy power generation power is carried out, and then more accurate meteorological features are obtained.
Drawings
Fig. 1 is an application scenario diagram of a new energy high-precision prediction oriented machine learning feature selection method in an embodiment;
FIG. 2 is a schematic flowchart of a method for selecting a new-energy-oriented machine learning feature with high accuracy prediction in an embodiment;
FIG. 3 is a schematic flow chart of a first weather feature extraction process in one embodiment;
FIG. 4 is a schematic representation of the correlation between the pre-second meteorological feature extraction features and the power generated in one embodiment;
FIG. 5 is a schematic representation of the correlation between the extracted second meteorological feature and the generated power in one embodiment;
FIG. 6 is a diagram illustrating a relationship between the number of features and the prediction accuracy in one embodiment;
FIG. 7 is a schematic diagram illustrating the accuracy of the meteorological features extracted in the present application before and after power generation power prediction in one embodiment;
FIG. 8 is a complete diagram of a new energy high-accuracy prediction oriented machine learning feature selection method in one embodiment;
FIG. 9 is a block diagram illustrating an exemplary embodiment of a new energy high-precision prediction oriented machine learning feature selection apparatus;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
At present, most of new energy power generation power prediction methods consider the influence of selection of an algorithm model on prediction accuracy, and correlation rules among meteorological features cannot be fully mined. Due to the fact that meteorological characteristic factors of new energy output are various, for new energy power generation systems (namely new energy stations) under different climatic conditions, meteorological characteristics influencing output are different, and a set of characteristic selection method which is flow-based, transplantable and wide in application range is lacked. Therefore, the potential law of the meteorological features is fully utilized, the optimal training features are selected, and the method plays an important role in improving the power prediction accuracy of the new energy.
According to the method, firstly, a meteorological feature factor combination which has the largest influence on the power change of the new energy is screened out based on the influence power of a random forest algorithm, on the basis, the correlation between meteorological features and power generation power is researched by applying statistical correlation analysis, the meteorological features are further screened out, and finally, a recursive feature elimination method is used for determining a feature set which is finally screened out for prediction. The method is used for selecting meteorological features during power prediction, and the new energy power prediction precision can be improved.
The machine learning feature selection method for high-precision prediction of new energy can be applied to the application environment shown in fig. 1. Wherein, the user 102 communicates with the terminal 104 through the network, and the terminal 104 communicates with the new energy power generation system 106 through the network. The data storage system may store power generation sample data in the new energy power generation system 106. The data storage system may be integrated on the new energy power generation system 106, or may be placed on the cloud or other network server. A user 102 initiates a meteorological feature analysis request to a terminal 104, the terminal 104 receives the meteorological feature analysis request, an initial meteorological feature set is obtained from a new energy power generation system 106 according to the meteorological feature analysis request, and a random forest algorithm is adopted to screen meteorological features in the initial meteorological feature set to obtain a first meteorological feature set; based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set; screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set; and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step. The terminal 104 may be, but not limited to, various physical servers, virtual hosts, personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The new energy generation system 106 may be implemented as a stand-alone server or a server cluster including a plurality of servers.
In an embodiment, as shown in fig. 2, a method for selecting a machine learning feature oriented to new energy high-precision prediction is provided, and this embodiment is illustrated by applying the method to a terminal, and includes the following steps:
step 202, obtaining initial meteorological features influencing the generated power in the new energy power generation system to obtain an initial meteorological feature set.
The new energy power generation system comprises new energy power generation systems of different power generation types, such as a photovoltaic power generation station, a wind power station and the like. The initial meteorological features refer to all meteorological features influencing the new energy power generation power, and a set formed by all meteorological features in the initial meteorological features is an initial meteorological feature set. The initial meteorological feature set includes Wind Speed (WS), wind Direction (WD), temperature (TEM), air Density (Density), air Pressure (PRS), total radiation (SR), direct radiation (SWDDIF), scattered radiation (SWDDIR), high cloud number (TCC), total cloud number (HCC), and the like.
Specifically, the terminal obtains a meteorological feature analysis request of a user, analyzes the meteorological feature analysis request, determines meteorological feature types such as a wind power station, a photovoltaic station and the like, obtains initial meteorological features which affect the power generation power in the new energy power generation system and correspond to the meteorological feature types, and takes a set formed by the initial meteorological features as an initial meteorological feature set. The initial meteorological features can be obtained by performing correlation analysis on all meteorological features and the new energy power generation power by the terminal according to all meteorological features and extracting the initial meteorological features. The initial meteorological features can also be meteorological features which are summarized according to scientific research by new energy experts and influence the generated power of the new energy.
And 204, screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set.
The first meteorological feature set refers to a meteorological feature set which is obtained by screening meteorological features through a random forest algorithm and has a large influence on the power generation power.
Specifically, the terminal adopts a random forest algorithm, a data classifier comprising a plurality of decision trees is constructed through the random forest algorithm, meteorological features which have large influence on generating power are extracted through the data classifier and the relation between historical meteorological data and historical generating power, and a first meteorological feature set is obtained.
And step 206, based on the correlation analysis between the meteorological features and the generated power, performing meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set.
Specifically, the terminal extracts the historical data of the meteorological features contained in the first meteorological feature set from the historical meteorological data, performs correlation analysis between the historical data of the first meteorological features and the historical generated power, eliminates the meteorological features lower than a preset correlation threshold value in the correlation analysis result, and combines the meteorological features remaining in the first meteorological feature set after the elimination to obtain a second meteorological feature set.
And 208, screening the meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set.
In the screening process of the terminal for several times, the meteorological features which have large influence on the generated power are screened out every time, so that the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is gradually increased.
Specifically, after the terminal obtains a second meteorological feature set, a decision tree model is recursively built, features with the strongest correlation with the power generation power are selected according to correlation coefficients of meteorological features in the second meteorological feature set, the decision tree model is built again according to the remaining meteorological features in the second meteorological feature set, features with the strongest correlation with the power generation power are selected according to correlation coefficients of the remaining meteorological features in the second meteorological feature set until meteorological feature positions of preset feature data are selected, the screening is stopped, and the meteorological features obtained through screening are used as a target meteorological feature set.
In the machine learning feature selection method for high-precision prediction of new energy, initial meteorological features influencing power generation power in a new energy power generation system are obtained to obtain an initial meteorological feature set; screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set; based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set; screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set; and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step. The whole scheme acquires a comprehensive meteorological feature set, and then carries out preliminary extraction on meteorological features according to a random forest algorithm, and then on the basis of the features extracted for the first time, carries out correlation analysis on the meteorological features, extracts again according to correlation analysis results, and then carries out recursion elimination on the remaining features, and through screening many times, the feature screening processing with the largest influence on the new energy power generation power is carried out, and then more accurate meteorological features are obtained.
In an alternative embodiment, as shown in fig. 3, the screening meteorological features in the initial meteorological feature set by using a random forest algorithm to obtain the first meteorological feature set includes:
and 302, acquiring power generation sample data of the new energy power generation system.
The power generation sample data comprises meteorological sample data and power generation sample data. Generating sample data is obtained by collecting historical meteorological data and historical generating power data of new energy to be researched in a period of time.
Specifically, the terminal obtains a meteorological feature analysis request of a user, analyzes the meteorological feature analysis request to obtain a meteorological feature type and a sample storage path, and obtains power generation sample data of preset days and initial meteorological features influencing power generation power in the new energy power generation system according to the sample storage path and the meteorological feature type.
And step 304, according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores.
Specifically, the terminal constructs a plurality of decision trees by adopting a random forest algorithm according to part of sample data in the power generation sample data, calculates the prediction accuracy of the constructed decision trees according to other residual sample data, and calculates according to the prediction accuracy of the decision trees to obtain the feature scores of a plurality of meteorological features.
And step 306, removing the features with the feature scores lower than a preset score threshold value from the initial meteorological feature set according to the feature scores to obtain a first meteorological feature set.
Specifically, the terminal compares the characteristic score of each meteorological feature in the initial meteorological feature set with a preset score threshold value in sequence, the meteorological features with the characteristic scores lower than the preset score threshold value are removed from the initial meteorological feature set, the meteorological features left in the initial meteorological feature set after removal are features important for predicting the power generation power in a decision tree, and a first meteorological feature set is formed.
In an optional embodiment, scoring the meteorological features in the initial meteorological feature set by using a random forest algorithm according to power generation sample data, and obtaining the feature score includes: randomly sampling power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
Specifically, the terminal conducts repeated random sampling on the power generation sample data, and when the total amount of data obtained through sampling reaches preset training sample data, the sampling is stopped, and the training sample data are obtained; and taking the data which is not sampled in the power generation sample data as the test sample data. And constructing a plurality of decision trees according to the meteorological sample data and the generating power sample data in the training sample data. Verifying the prediction accuracy of the decision trees according to the meteorological sample data and the generated power sample data in the test sample data, and calculating the importance degree of the meteorological features in the decision trees according to the prediction accuracy of the decision trees to obtain the feature score of each meteorological feature.
In an optional embodiment, calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data, and obtaining the feature score includes: calculating the prediction error rate of the decision tree according to the test sample data; randomly adding noise into a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
Specifically, the terminal verifies the prediction error rates of the decision trees according to meteorological sample data and power generation power sample data in the test sample data; the terminal randomly adds noise interference to the characteristic X of all samples from the test sample data, namely randomly changes the value of the sample at the characteristic X, and calculates the noise prediction error rate (namely the error of the data outside the bag) of the decision tree; and subtracting the prediction error rate according to the noise prediction error rate, and dividing the accumulated subtraction result of the plurality of decision trees by the number of the decision trees to obtain the importance degree of the characteristic X. Randomly changing the value of each feature in the initial meteorological feature set, determining the importance degree of the feature according to the prediction error rate and the noise prediction error rate of the decision tree of each feature, and obtaining the feature score corresponding to the meteorological feature. And removing the meteorological features with the characteristic scores lower than a preset score threshold value from the initial meteorological feature set, wherein the meteorological features left in the initial meteorological feature set after removal are features which are important in predicting the power generation power in a decision tree, and forming a first meteorological feature set.
Further, the feature score randomly changes the value of a certain meteorological feature based on the error of the data outside the bag, and if the error of the data outside the bag greatly rises, the meteorological feature has a great influence on the classification result of the sample, and the importance degree is high. The feature score is calculated as follows:
the method comprises the steps that firstly, training sample data are obtained through repeated sampling of power generation sample data, a decision tree is built according to the training sample data, and data which are not sampled are used for evaluating the performance of a random forest and calculating the prediction error rate of a model, namely the error of data outside a bag;
secondly, calculating the prediction error rate of the model for each decision tree in the random forest, and recording the prediction error rate as errOOB1;
and thirdly, randomly adding noise interference to the characteristic X of all the samples of the data outside the bag (randomly changing the value of the sample at the characteristic X), and calculating the error of the data outside the bag, wherein the error is recorded as errOOB2.
Fourth, assuming there are N trees in the random forest, then the importance for feature X = ∑ (err 00B2-errOOB 1)/N.
Features below a preset scoring threshold are discarded. The preset score threshold is used as a variable parameter of the feature selection method, and can be manually adjusted, and the default value of the random forest score threshold set in this embodiment is 0.05.
In an optional embodiment, the performing meteorological feature screening on the first meteorological feature set based on correlation analysis between meteorological features and generated power to obtain the second meteorological feature set comprises: calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
Specifically, after the terminal obtains a first meteorological feature set, a Pearson correlation coefficient between each meteorological feature in the first meteorological feature set and generated power is calculated to obtain a characteristic correlation coefficient; and removing meteorological features of which the correlation coefficients are lower than a preset correlation threshold value from the first meteorological feature set, and combining the remaining features to obtain a second meteorological feature set.
Further, based on correlation analysis, the degree of linear relation between different meteorological features and power is researched through a statistical correlation coefficient r. Comparing all the characteristics with the absolute value of the correlation coefficient of the power, and discarding the characteristics of which the absolute value of the correlation coefficient is lower than a critical value (threshold value). The threshold value is used as a variable parameter of the feature selection method and can be manually adjusted, generally, the absolute value of the correlation coefficient is 0-0.09, no correlation is considered, 0.3-weak, 0.1-0.3 is weak correlation, 0.3-0.5 is medium correlation, and 0.5-1.0 is strong correlation. According to the method, the default value of the critical value of the photovoltaic correlation coefficient is set to be 0.2, and the default value of the critical value of the wind power correlation coefficient is set to be 0.45. The correlation coefficient r is calculated as follows:
Figure BDA0003963699700000121
as shown in fig. 4, before the meteorological features are screened by the correlation analysis, the correlation coefficient between each meteorological feature and power is calculated. Before screening, the meteorological features have serious linear correlation, and the performance of a prediction model is easily influenced.
As shown in fig. 5, after the meteorological features are screened by the correlation analysis, the correlation coefficient between each meteorological feature and power is high. After screening, the correlation among the meteorological features is reduced, and linear correlation basically does not exist.
In an optional embodiment, screening the meteorological features in the second meteorological feature set by using a recursive feature elimination method to obtain the target meteorological feature set includes: performing meteorological feature extraction on the second meteorological feature by adopting a recursive feature elimination method; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
Specifically, after the terminal obtains the second meteorological feature set, the decision tree model is recursively constructed, the best feature is selected according to the correlation coefficient through cross validation, and then the process is repeated on the remaining features until all the features are traversed. The features are sorted in the recursion process, k features which are ranked at most are reserved, k values are saved as feature sets and used as parameters of the feature selection method, manual adjustment can be achieved, the default number of the set features is 4, and adjustment can be achieved according to different new energy power generation types. As shown in fig. 6, when the number of training features is determined to be 4 by using recursive elimination, the model prediction accuracy is the highest. And finally, the terminal takes the target meteorological feature set as the output of the feature engineering in a list form, and can also output and store the target meteorological feature set as a configuration file in a csv format, so that the future training or research is facilitated.
As shown in fig. 7, after the meteorological characteristic method provided by the present application is adopted for screening, the new energy generated power prediction accuracy is significantly improved.
According to the method, firstly, a meteorological feature factor combination which has the largest influence on the power change of the new energy is screened out based on the random forest algorithm, on the basis, the correlation analysis of statistics is applied, the correlation among meteorological features is researched, the meteorological features are further screened out, finally, a cross verification method and a recursive feature elimination method are used, the meteorological features are selected when the power is predicted, and the prediction accuracy of the power of the new energy can be improved. According to the method, 7 new energy power generation systems in Yunnan, guizhou and the like are selected, meteorological characteristics are selected, and through characteristic engineering, compared with new energy power prediction accuracy rates without characteristic engineering, the accuracy rate of short-term prediction (predicting 72 hours in the future) is improved by 0.22% at the minimum, 7.09% at the maximum and 2.08% on average; the accuracy of ultra-short-term prediction (4 hours in the future) is improved by 0.14 percent of the minimum value, 11.16 percent of the maximum value and 2.72 percent of the average value.
In order to easily understand the technical solution provided by the embodiment of the present application, as shown in fig. 8, a complete machine learning feature selection process facing high-precision prediction of new energy is used to briefly describe the machine learning feature selection method facing high-precision prediction of new energy provided by the embodiment of the present application:
(1) The path of the incoming data set, the number of data days loaded, the meteorological feature type (wind/photovoltaic), random forests and correlation coefficient thresholds. And screening out a meteorological characteristic factor combination which has the largest influence on the power change of the new energy based on a random forest algorithm.
(2) And (3) researching the correlation between the meteorological features and the generated power by applying statistical correlation analysis, and further screening the meteorological features according to the correlation between the meteorological features and the generated power.
(3) And recursively constructing a decision tree model by using a cross-validation method and a recursive feature elimination method, selecting the best feature according to the correlation coefficient, and repeating the process on the remaining features until all the features are traversed to obtain a target meteorological feature set.
(4) And exporting the feature set obtained by screening the feature engineering, and outputting the feature set into a configuration file in a csv format.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the application also provides a new energy high-precision prediction-oriented machine learning feature selection device for realizing the new energy high-precision prediction-oriented machine learning feature selection method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so that specific limitations in one or more embodiments of the device for selecting machine learning features for high-precision prediction of new energy provided below can be referred to the limitations in the above method for selecting machine learning features for high-precision prediction of new energy, and are not described herein again.
In one embodiment, as shown in fig. 9, there is provided a new energy high-precision prediction oriented machine learning feature selection apparatus, including: an acquisition module 902, a first extraction module 904, a second extraction module 906, and a third extraction module 908, wherein:
an obtaining module 902 is configured to obtain an initial meteorological feature that affects power generation power in the new energy power generation system, to obtain an initial meteorological feature set.
And the first extraction module 904 is configured to filter meteorological features in the initial meteorological feature set by using a random forest algorithm to obtain a first meteorological feature set.
And a second extraction module 906, configured to perform meteorological feature screening on the first meteorological feature set based on correlation analysis between meteorological features and generated power, to obtain a second meteorological feature set.
And a third extraction module 908, configured to filter meteorological features in the second meteorological feature set by using a recursive feature elimination method to obtain a target meteorological feature set.
And the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
In one embodiment, the first extraction module 904 is further configured to obtain power generation sample data of the new energy power generation system; grading the meteorological features in the initial meteorological feature set by adopting a random forest algorithm according to the power generation sample data to obtain feature grades; and according to the feature scores, removing features with the feature scores lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
In one embodiment, the first extraction module 904 is further configured to randomly sample power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
In one embodiment, the first extraction module 904 is further configured to calculate a prediction error rate of the decision tree according to the test sample data; randomly adding noise into a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
In one embodiment, the second extraction module 906 is further configured to calculate a correlation between each meteorological feature in the first set of meteorological features and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
In one embodiment, the third extraction module 908 is further configured to perform meteorological feature extraction on the second meteorological feature by using a recursive feature elimination method; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
All modules in the machine learning feature selection device for high-precision prediction of new energy can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to realize a meteorological feature extraction method for influencing the generated power in the new energy power generation system. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
In one embodiment, the processor, when executing the computer program, further performs the steps of: adopting a random forest algorithm, screening meteorological features in the initial meteorological feature set, and obtaining a first meteorological feature set, wherein the first meteorological feature set comprises the following steps: acquiring power generation sample data of a new energy power generation system; according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores; and according to the feature scores, removing features with the feature scores lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
In one embodiment, the processor, when executing the computer program, further performs the steps of: according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm, and obtaining feature scores comprises the following steps: randomly sampling power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
In one embodiment, the processor, when executing the computer program, further performs the steps of: calculating the importance degree of meteorological features in the initial meteorological feature set according to the decision tree and the test sample data, and obtaining the feature score comprises the following steps: calculating the prediction error rate of the decision tree according to the test sample data; randomly adding noise to a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
In one embodiment, the processor, when executing the computer program, further performs the steps of: based on correlation analysis between meteorological features and generated power, the meteorological feature screening is carried out on the first meteorological feature set, and the obtaining of a second meteorological feature set comprises the following steps: calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set, wherein the method comprises the following steps: performing meteorological feature extraction on the second meteorological feature by adopting a recursive feature elimination method; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
In one embodiment, the computer program when executed by the processor further performs the steps of: adopting a random forest algorithm, screening meteorological features in the initial meteorological feature set, and obtaining a first meteorological feature set, wherein the first meteorological feature set comprises the following steps: acquiring power generation sample data of the new energy power generation system; according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores; and according to the feature scores, removing features with the feature scores lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
In one embodiment, the computer program when executed by the processor further performs the steps of: according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm, and obtaining feature scores comprises the following steps: randomly sampling power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
In one embodiment, the computer program when executed by the processor further performs the steps of: calculating the importance degree of meteorological features in the initial meteorological feature set according to the decision tree and the test sample data, and obtaining the feature score comprises the following steps: calculating the prediction error rate of the decision tree according to the test sample data; randomly adding noise into a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
In one embodiment, the computer program when executed by the processor further performs the steps of: based on correlation analysis between meteorological features and generated power, the meteorological feature screening is carried out on the first meteorological feature set, and the obtaining of a second meteorological feature set comprises the following steps: calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
In one embodiment, the computer program when executed by the processor further performs the steps of: and screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set, wherein the method comprises the following steps: adopting a recursive feature elimination method to extract meteorological features of the second meteorological feature; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on the correlation analysis between the meteorological features and the generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
and the influence degree of each average meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is gradually increased.
In one embodiment, the computer program when executed by the processor further performs the steps of: adopting a random forest algorithm, screening meteorological features in the initial meteorological feature set, and obtaining a first meteorological feature set comprises the following steps: acquiring power generation sample data of the new energy power generation system; according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores; and according to the feature scores, removing features with the feature scores lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
In one embodiment, the computer program when executed by the processor further performs the steps of: according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm, and obtaining feature scores comprises the following steps: randomly sampling power generation sample data to obtain training sample data; obtaining test sample data according to the power generation sample data which is not sampled; constructing a decision tree according to training sample data; and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
In one embodiment, the computer program when executed by the processor further performs the steps of: calculating the importance degree of meteorological features in the initial meteorological feature set according to the decision tree and the test sample data, and obtaining the feature score comprises the following steps: calculating the prediction error rate of the decision tree according to the test sample data; randomly adding noise into a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree; and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
In one embodiment, the computer program when executed by the processor further performs the steps of: based on the correlation analysis between the meteorological features and the generated power, the meteorological feature screening is carried out on the first meteorological feature set, and the obtaining of the second meteorological feature set comprises the following steps: calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power; and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
In one embodiment, the computer program when executed by the processor further performs the steps of: and screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set, wherein the method comprises the following steps: adopting a recursive feature elimination method to extract meteorological features of the second meteorological feature; and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A new energy high-precision prediction-oriented machine learning feature selection method is characterized by comprising the following steps:
acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
based on correlation analysis between meteorological features and generated power, carrying out meteorological feature screening on the first meteorological feature set to obtain a second meteorological feature set;
screening meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
the influence degree of each meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
2. The method of claim 1, wherein the screening meteorological features in the initial meteorological feature set by using a random forest algorithm to obtain a first meteorological feature set comprises:
acquiring power generation sample data of the new energy power generation system;
according to the power generation sample data, scoring the meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain feature scores;
and according to the feature scores, removing the features of which the feature scores are lower than a preset score threshold value from the initial meteorological feature set to obtain a first meteorological feature set.
3. The method according to claim 2, wherein the scoring meteorological features in the initial meteorological feature set by using a random forest algorithm according to the power generation sample data to obtain a feature score comprises:
randomly sampling the power generation sample data to obtain training sample data;
obtaining test sample data according to the power generation sample data which is not sampled;
constructing a decision tree according to the training sample data;
and calculating the importance degree of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain a feature score.
4. The method of claim 3, wherein calculating the importance of the meteorological features in the initial meteorological feature set according to the decision tree and the test sample data to obtain the feature score comprises:
calculating the prediction error rate of the decision tree according to the test sample data;
randomly adding noise to a single meteorological feature of the test sample data, and calculating the noise prediction error rate of the decision tree;
and determining the importance degree of the meteorological features in the meteorological feature set according to the prediction error rate and the noise prediction error rate to obtain a feature score.
5. The method of claim 1, wherein the meteorological feature screening the first meteorological feature set based on correlation analysis between meteorological features and generated power to obtain a second meteorological feature set comprises:
calculating the correlation between each meteorological feature in the first meteorological feature set and the generated power;
and performing meteorological feature screening on the first meteorological feature set according to the correlation between each meteorological feature and the generated power to obtain a second meteorological feature set.
6. The method of claim 1, wherein the filtering the meteorological features in the second meteorological feature set by using a recursive feature elimination method to obtain a target meteorological feature set comprises:
performing meteorological feature extraction on the second meteorological feature by adopting a recursive feature elimination method;
and performing cross validation on the extracted meteorological features to obtain a target meteorological feature set.
7. A new energy high-precision prediction-oriented machine learning feature selection device is characterized by comprising:
the acquisition module is used for acquiring initial meteorological features influencing the power generation power in the new energy power generation system to obtain an initial meteorological feature set;
the first extraction module is used for screening meteorological features in the initial meteorological feature set by adopting a random forest algorithm to obtain a first meteorological feature set;
the second extraction module is used for screening meteorological features of the first meteorological feature set based on correlation analysis between the meteorological features and the generated power to obtain a second meteorological feature set;
the third extraction module is used for screening the meteorological features in the second meteorological feature set by adopting a recursive feature elimination method to obtain a target meteorological feature set;
the influence degree of each meteorological feature in the initial meteorological feature set, the first meteorological feature set, the second meteorological feature set and the target meteorological feature set on the generated power is increased step by step.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.
CN202211488356.5A 2022-08-19 2022-11-25 Machine learning feature selection method for new energy high-precision prediction Pending CN115759446A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211488356.5A CN115759446A (en) 2022-11-25 2022-11-25 Machine learning feature selection method for new energy high-precision prediction
CN202310972728.XA CN117113230A (en) 2022-08-19 2023-08-03 New energy high-precision prediction-oriented machine learning feature selection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211488356.5A CN115759446A (en) 2022-11-25 2022-11-25 Machine learning feature selection method for new energy high-precision prediction

Publications (1)

Publication Number Publication Date
CN115759446A true CN115759446A (en) 2023-03-07

Family

ID=85337812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211488356.5A Pending CN115759446A (en) 2022-08-19 2022-11-25 Machine learning feature selection method for new energy high-precision prediction

Country Status (1)

Country Link
CN (1) CN115759446A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117956A (en) * 2018-07-05 2019-01-01 浙江大学 A kind of determination method of optimal feature subset
CN114266421A (en) * 2022-03-01 2022-04-01 南方电网数字电网研究院有限公司 New energy power prediction method based on composite meteorological feature construction and selection
CN115329880A (en) * 2022-08-19 2022-11-11 南方电网数字电网研究院有限公司 Meteorological feature extraction method and device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117956A (en) * 2018-07-05 2019-01-01 浙江大学 A kind of determination method of optimal feature subset
CN114266421A (en) * 2022-03-01 2022-04-01 南方电网数字电网研究院有限公司 New energy power prediction method based on composite meteorological feature construction and selection
CN115329880A (en) * 2022-08-19 2022-11-11 南方电网数字电网研究院有限公司 Meteorological feature extraction method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Malvoni et al. Forecasting of PV Power Generation using weather input data‐preprocessing techniques
CN115329880A (en) Meteorological feature extraction method and device, computer equipment and storage medium
CN109904878B (en) Multi-wind-field power generation time sequence simulation scene construction method
CN114241779B (en) Short-time prediction method, computer and storage medium for urban expressway traffic flow
CN108491226B (en) Spark configuration parameter automatic tuning method based on cluster scaling
EP2389624A1 (en) Sampling analysis of search queries
CN112035549B (en) Data mining method, device, computer equipment and storage medium
CN111861781A (en) Feature optimization method and system in residential electricity consumption behavior clustering
CN114266421B (en) New energy power prediction method based on composite meteorological feature construction and selection
CN116739172A (en) Method and device for ultra-short-term prediction of offshore wind power based on climbing identification
CN113468796A (en) Voltage missing data identification method based on improved random forest algorithm
CN115795329A (en) Power utilization abnormal behavior analysis method and device based on big data grid
CN116304713A (en) Wind power plant fault scene prediction model generation method and device and computer equipment
CN115759446A (en) Machine learning feature selection method for new energy high-precision prediction
CN111461324A (en) Hierarchical pruning method based on layer recovery sensitivity
CN110610203A (en) Electric energy quality disturbance classification method based on DWT and extreme learning machine
CN114495137B (en) Bill abnormity detection model generation method and bill abnormity detection method
CN111612289B (en) New energy multi-scene risk feature oriented power system risk assessment method
CN117113230A (en) New energy high-precision prediction-oriented machine learning feature selection method
CN114876731A (en) Method, system, equipment and medium for checking wind turbine generator in inefficient operation of wind farm
CN114118411A (en) Training method of image recognition network, image recognition method and device
CN114021699A (en) Gradient-based convolutional neural network pruning method and device
CN114139482A (en) EDA circuit failure analysis method based on depth measurement learning
CN113408816A (en) Power grid disaster situation evaluation method based on deep neural network
JP2021124949A (en) Machine learning model compression system, pruning method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20230307

WD01 Invention patent application deemed withdrawn after publication