CN112364477B - Outdoor empirical prediction model library generation method and system - Google Patents

Outdoor empirical prediction model library generation method and system Download PDF

Info

Publication number
CN112364477B
CN112364477B CN202011048216.7A CN202011048216A CN112364477B CN 112364477 B CN112364477 B CN 112364477B CN 202011048216 A CN202011048216 A CN 202011048216A CN 112364477 B CN112364477 B CN 112364477B
Authority
CN
China
Prior art keywords
data
outdoor
column
prediction model
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011048216.7A
Other languages
Chinese (zh)
Other versions
CN112364477A (en
Inventor
赵上懿
曾湘安
洪志浩
揭敢新
王俊
许楚斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Electric Apparatus Research Institute Co Ltd
Original Assignee
China National Electric Apparatus Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Electric Apparatus Research Institute Co Ltd filed Critical China National Electric Apparatus Research Institute Co Ltd
Priority to CN202011048216.7A priority Critical patent/CN112364477B/en
Publication of CN112364477A publication Critical patent/CN112364477A/en
Application granted granted Critical
Publication of CN112364477B publication Critical patent/CN112364477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an outdoor empirical prediction model base generation method and system, which are characterized in that prediction models of performance indexes related to environmental factors reflected by tested performances of all photovoltaic string strings are obtained through an outdoor empirical prediction model base generation system based on data collected by an outdoor empirical test according to the outdoor empirical prediction model base generation method, namely, each prediction model discloses internal relation between specific meteorological monitoring data and tested photovoltaic string outdoor empirical test result data; and substituting the same meteorological monitoring data into the prediction model corresponding to each group of photovoltaic strings, so that the tested performance of each group of photovoltaic strings is fitted to consistent environmental factors, and finally, the photovoltaic strings in the outdoor empirical data set can be compared with each other on the tested items.

Description

Outdoor empirical prediction model library generation method and system
Technical Field
The invention relates to the technical field of photovoltaic power generation, in particular to a method and a system for generating an outdoor empirical prediction model library.
Background
Photovoltaic power generation is a technology for directly converting light energy into electric energy by utilizing the photovoltaic effect of a semiconductor interface, and the development of photovoltaic power generation is particularly rapid in recent years as one of main technologies of solar power generation. Outdoor demonstration is used as a test means for evaluating the performance of a photovoltaic product in a real environment, and is widely applied to the photovoltaic industry in recent years.
At present, the comparison of outdoor demonstration test results of photovoltaic products can only be carried out among the photovoltaic products participating in the same outdoor demonstration test. For outdoor empirical tests under different environmental conditions, due to the complexity of environmental factors, test results of the outdoor empirical photovoltaic products participating in different environments cannot be directly compared.
For example, the outdoor demonstration test with the environmental condition B1 is performed on one batch of the photovoltaic string A1 to obtain the test result C1, and the outdoor demonstration test with the environmental condition B2 is performed on the other batch of the photovoltaic string A2 to obtain the test result C2, so that the photovoltaic string A1 and the photovoltaic string A2 cannot be compared based on the performance indexes reflected by the respective outdoor demonstration test results due to different environmental conditions.
In order to solve the above technical problem, it is easy to think that if a part of the photovoltaic string A1 is extracted and participates in the outdoor empirical test under the environmental condition B2 together with the photovoltaic string B1, the comparison of the performance indexes reflected by the outdoor empirical test result of the photovoltaic strings A1 and B1 can be realized.
However, according to the technical scheme, if the photovoltaic products participating in the outdoor demonstration test comparison are huge in batch number, the detection efficiency of outdoor demonstration is severely limited due to the field limitation of an outdoor demonstration test field, and then the experiment cost for comparing outdoor demonstration test results is greatly increased.
Disclosure of Invention
In order to realize the comparison of the performance indexes reflected by the obtained test results after outdoor empirical tests are carried out on photovoltaic strings of different batches under different environmental conditions, the invention provides an outdoor empirical prediction model library generation method and system, and aims to mine the internal relation between meteorological monitoring data and test result data of each photovoltaic string during the outdoor empirical test. The technical scheme is as follows:
in one aspect, an outdoor empirical prediction model library generation method is provided, and the method includes:
step one, collecting and recording the empirical test result data and meteorological monitoring data of all tested photovoltaic string collected and recorded by an outdoor empirical test field, and establishing an outdoor empirical database;
selecting one group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to the evidence tested project based on the corresponding evidence tested project for realizing comparison to form an outdoor evidence data set of the photovoltaic string,
the weather monitoring data is used as a characteristic column of the outdoor demonstration data set, and the demonstration testing result data is used as a label column of the outdoor demonstration data set;
step three, carrying out data cleaning and pretreatment on the outdoor demonstration data set;
step four, the outdoor demonstration data set processed in the step three is divided into a training set, a verification set and a test set after being disturbed, the discrete type features and the continuous type features of the training set are respectively coded and normalized, the internal relation between the feature columns and the label columns of the training set is mined, a prediction model of the label columns based on the feature columns is generated,
wherein the data volume of the training set is greater than the sum of the data volumes of the validation set and the test set;
evaluating the accuracy of the prediction model generated in the step four on the test set data, and storing the prediction model if the accuracy is higher than a preset critical value; if the accuracy is lower than a preset critical value, optimizing the prediction model;
and step six, iterating the step two to the step five until the prediction models of all the photovoltaic string groups are obtained, and summarizing all the prediction models to generate a prediction model library of outdoor empirical test results.
In another aspect, a system for generating an outdoor empirical prediction model library is provided, which includes:
the data acquisition module is used for acquiring meteorological monitoring data of an outdoor empirical test field and empirical test result data of all tested photovoltaic strings;
the data storage module is used for summarizing meteorological monitoring data of the outdoor demonstration test field and demonstration test result data of all tested photovoltaic group strings and establishing an outdoor demonstration database;
the data extraction module is used for selecting a group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to an evidence tested project based on the corresponding evidence tested project for realizing comparison, generating an outdoor evidence data set, taking the meteorological monitoring data as a characteristic column of the outdoor evidence data set, and taking the evidence test result data as a label column of the outdoor evidence data set;
and the data value preprocessing module is used for cleaning and preprocessing the data of the outdoor demonstration data set.
The data segmentation and coding module is used for disorganizing the outdoor demonstration data set processed by the data value preprocessing module, segmenting the outdoor demonstration data set into a training set, a verification set and a test set, and coding and normalizing discrete features and continuous features of the training set respectively;
the data mining modeling module is used for mining the internal relation between the feature columns and the label columns of the training set based on a data mining technology and generating a prediction model of the label columns based on the feature columns;
the model base storage module is used for storing all the prediction models generated by the data mining modeling module;
and applying the prediction model to a verification set for verification, sending the verified prediction model to a model base storage module by the data mining modeling module, and activating the data extraction module to extract the next group of photovoltaic strings.
The technical scheme provided by the invention has the beneficial effects that:
the method comprises the steps that prediction models of performance indexes, related to environmental factors, reflected by tested performances of all photovoltaic strings are obtained through data collected based on outdoor empirical tests, an outdoor empirical prediction model library generation system and an outdoor empirical prediction model library generation method, namely each prediction model reveals internal relation between specific meteorological monitoring data and tested photovoltaic string outdoor empirical test result data;
substituting the same meteorological monitoring data into the prediction model corresponding to each group of photovoltaic strings to enable the tested performance of each group of photovoltaic strings to be fitted to the consistent environmental factors, and finally realizing the mutual comparison of the photovoltaic strings in the outdoor demonstration data set on the tested items;
obviously, by increasing more batches of photovoltaic string outdoor empirical tests, continuously accumulating outdoor empirical data sets to fill an outdoor empirical database, and filling a new prediction model into the outdoor empirical prediction model library through an outdoor empirical prediction model library generation system according to an outdoor empirical prediction model library generation method, the prediction model is more and more accurate, meanwhile, mutual comparison on tested items is provided for more batches of photovoltaic strings, and the experiment cost for comparing outdoor empirical test results is greatly reduced.
Drawings
In order to more clearly illustrate the technical solution in the present embodiment, the drawings needed to be used in the description of the embodiment will be briefly introduced below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for generating an outdoor empirical prediction model library according to the present invention;
FIG. 2 is a schematic flow chart of step 103 according to the present invention;
FIG. 3 is a schematic flow chart illustrating steps 104 to 105 of the present invention;
FIG. 4 is a block diagram of an outdoor empirical prediction model library generation system according to the present invention;
FIG. 5 is a schematic diagram illustrating comparison between the real value and the predicted value of the DC power generation power at 100 points randomly selected by one photovoltaic string on the test set;
FIG. 6 is a specific process diagram for fitting the tested projects of all photovoltaic strings to a uniform meteorological condition to achieve mutual comparability between the test results of the tested projects.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the outdoor empirical prediction model library generating method may include the following steps:
step 101, collecting and recording the empirical test result data and meteorological monitoring data of all tested photovoltaic string collected and recorded by an outdoor empirical test field, and establishing an outdoor empirical database.
Specifically, the outdoor empirical test field is located in the Hainan, 10 different batches with the test time range of 2016-2020 are photovoltaic string tests, the test time of each batch is 1 year, and the data recording interval of the empirical test result is 5 minutes. The meteorological monitoring data comprises 10 items of date and time, air humidity, air temperature, total solar radiation degree, ultraviolet radiation degree, wind speed, wind direction, total sunshine hours per day, total rainfall per day and air pressure.
102, selecting one group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to the evidence tested project based on the corresponding evidence tested project for realizing comparison, and forming an outdoor evidence data set of the photovoltaic string.
Specifically, the evidence measured item for realizing the comparison is the direct current power generation power, and corresponding meteorological monitoring data and direct current power generation power data of the direct current power generation power generated by the photovoltaic set string in a corresponding outdoor evidence database are extracted to form an outdoor evidence data set.
The weather monitoring data is used as a characteristic column of the outdoor demonstration data set, and demonstration test result data, namely direct current generating power data, is used as a label column of the outdoor demonstration data set.
Step 103, carrying out data cleaning and preprocessing on the outdoor demonstration data set
Specifically, referring to fig. 2, the data cleaning and preprocessing processes the missing value of the feature column by sequentially performing feature conversion on the feature column, cleaning the tag column, and cleaning the abnormal value of the feature column.
Alternatively, the feature column for which the feature conversion is performed includes a solar terms conversion of the date and a period conversion of the time. For example, the date is converted into different solar terms according to a 24 solar terms per year schedule to mine the implicit meteorological features, and the time is converted into different time periods according to the daily day bright, day black and day intermediate schedules of an outdoor demonstration test field to mine the implicit daily meteorological feature change.
Specifically, the tag column is cleaned, and the data set record row where the missing value in the tag column is located is deleted, so that no missing value exists in the tag column.
Specifically, for cleaning the abnormal values of the characteristic columns and performing noise reduction processing on the outdoor demonstration data set of the photovoltaic string, the specific method is as follows:
and screening extreme values, namely setting a reasonable numerical range for the data of each characteristic column, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are in the set reasonable numerical range.
For example, after a reasonable numerical range is set for each feature column of the outdoor demonstration data set of the Hainan outdoor demonstration test field, it is checked whether the maximum value and the minimum value of each feature column are within the set reasonable numerical range.
And logic screening, namely performing characteristic analysis on each characteristic column, screening out the characteristic columns with dependency or dependency relationship, and checking whether the data of the characteristic columns have illogical values.
For example, after the outdoor empirical data set of the photovoltaic string is subjected to feature analysis, the interdependent features include total solar irradiance and ultraviolet irradiance, total solar irradiance and direct-current power generation, and logic screening is performed to ensure that the total solar irradiance is 0 and both the ultraviolet irradiance and the direct-current power generation are 0.
Abnormal value processing, namely setting an abnormal proportion critical value M%, wherein M is greater than 0, and the characteristic column lower than the critical value is an abnormal degree low characteristic column, otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and on the premise of ensuring the accuracy of the original data, the abnormal values of the high-abnormality characteristic column are modified to be in accordance with the judgment of extremum screening and logic screening.
For example, setting M =5, and when the degree of abnormality of the feature column is lower than 5%, deleting the outdoor demonstration dataset record row where the abnormal value of the feature column with low degree of abnormality is located; when the degree of abnormality of the characteristic column is higher than 5%, the abnormal value of the characteristic column with high degree of abnormality is changed to be in line with results of extremum screening and logic screening on the premise that the operation, data acquisition, data recording and data summarization processes of the outdoor empirical test before inspection are accurate, and therefore data noise reduction of the outdoor empirical data set is achieved.
Specifically, the method for processing the missing value of the characteristic column is as follows, in order to complete the outdoor demonstration data set of the photovoltaic string, so as to solve the problem of the missing value in the outdoor demonstration data set:
classifying the characteristic missing conditions, setting a critical value N% of the missing degree, wherein N is greater than 0, and the characteristic column lower than the critical value is a low-missing-degree characteristic column, otherwise, the characteristic column is a high-missing-degree characteristic column;
deleting the outdoor demonstration data set record row where the missing value of the missing-degree low-feature column is located;
and filling missing values of the high-missing-degree characteristic columns, wherein the filled values need to be inspected and screened by cleaning abnormal values of the characteristic columns.
For example, setting N =5, and when the deletion degree of the feature column is lower than 5%, performing deletion processing on the outdoor demonstration data set record row where the deletion value of the feature column with the low deletion degree is located; when the missing degree of the feature column is higher than 5%, filling methods such as a fixed value filling method, a mean value filling method, a mode filling method, an algorithm filling method and the like are selected for filling the missing value of the missing-degree high feature column, the specific filling method is not limited, for the embodiment, a random forest algorithm can be used for predicting the missing value through the existing data, so that the missing value filling of the missing-degree high feature column is completed, the filled data needs to be cleaned by an abnormal value again, and the data filled by the algorithm is guaranteed to be a reliable value.
And step 104, breaking up the outdoor demonstration data set processed in the step 103, dividing the outdoor demonstration data set into a training set, a verification set and a test set, respectively coding and normalizing discrete features and continuous features of the training set, mining the intrinsic relation between feature columns and label columns of the training set, and generating a prediction model of the label columns based on the feature columns.
Specifically, the data amount of the training set is greater than the sum of the data amounts of the verification set and the test set, and the segmentation ratio selected in this embodiment is 6:2:2.
optionally, referring to fig. 3, after the discrete feature and the continuous feature of the training set are encoded and normalized, the discrete feature and the continuous feature of the verification set and the test set are encoded and normalized respectively by using an encoding and normalizing model generated based on the training set data.
Specifically, the XGBoost algorithm is adopted in the present embodiment as an internal relation between the feature column and the label column of the mining training set, and a prediction model based on the feature column of the label column is generated.
Optionally, the application of the prediction model on the verification set verifies the robustness and stability of the generated prediction model. Specifically, when the prediction model is trained by using the training set data, the trained prediction model is continuously applied to the verification set data to verify the robustness and stability of the prediction model so as to prevent overfitting of the model.
Step 105, evaluating the accuracy of the prediction model generated in the step 104 on the test set data, and if the accuracy is higher than a preset critical value, storing the prediction model; and if the accuracy is lower than a preset critical value, optimizing the prediction model.
Specifically, the model prediction accuracy evaluation index selected in this embodiment is a decision coefficient, and the decision coefficient reflects the capability of the model to capture data information, and the closer to 1, the better.
For example, the accuracy threshold is set to 0.9; if the accuracy critical value does not pass, optimizing the prediction model; if the prediction model passes, the prediction model is stored in a prediction model library.
And 106, iterating the steps 102 to 105 until prediction models of all photovoltaic string groups are obtained, and summarizing all the prediction models to generate a prediction model base of outdoor demonstration test results.
Referring to fig. 4, the system for generating the outdoor empirical prediction model library includes a data acquisition module, a data storage module, a data extraction module, a data value preprocessing module, a data segmentation and coding module, a data mining and modeling module, and a model library storage module.
The data acquisition module acquires the empirical test result data of all tested photovoltaic string in Hainan outdoor empirical test field and 10 meteorological monitoring data including date and time, air humidity, air temperature, total solar radiance, ultraviolet radiance, wind speed, wind direction, total sunshine hours per day, total rainfall per day and air pressure, the acquisition time of each batch of tested photovoltaic string is 1 year, and the record interval of the empirical test result data is 5 minutes.
The data storage module collects meteorological monitoring data of a Hainan outdoor empirical test field and empirical test result data of all tested photovoltaic unit strings, and an outdoor empirical database is established;
the data extraction module selects a group of photovoltaic string in the outdoor demonstration database, uses the direct current power generation power as the proved project for realizing comparison, extracts the corresponding meteorological monitoring data and the direct current power generation power data of the direct current power generation power generated by the photovoltaic string in the corresponding outdoor demonstration database to form an outdoor demonstration data set, uses the meteorological monitoring data as the characteristic list of the outdoor demonstration data set, and uses the demonstration test result data, namely the direct current power generation power data, as the label list of the outdoor demonstration data set.
The data value preprocessing module cleans and preprocesses data of the outdoor demonstration data set.
Specifically, the data value preprocessing module comprises a feature conversion module for performing feature conversion on the feature column, a tag column cleaning module for cleaning the tag column, a feature column abnormal value cleaning module for cleaning the abnormal value of the feature column, and a feature column missing value processing module for performing missing value processing on the feature column.
Specifically, the characteristic conversion module can convert the date in the characteristic column into different solar terms according to a 24 solar terms per year schedule so as to mine the implicit meteorological characteristics in the characteristic column, and convert the time in the characteristic column into different time periods according to the daily day bright, dark and daytime schedules in the outdoor demonstration test field so as to mine the implicit daily meteorological characteristic change in the characteristic column.
Specifically, the tag column cleaning module deletes the data set record row where the missing value in the tag column is located, so that no missing value exists in the tag column.
Specifically, an extreme value screening submodule, a logic screening submodule and an abnormal value processing submodule are arranged in the characteristic column abnormal value cleaning module; wherein the content of the first and second substances,
and the extreme value screening sub-module is used for setting a reasonable numerical range for each characteristic column data, such as the characteristic column data of an outdoor demonstration data set based on the Hainan outdoor demonstration test field, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are within the set reasonable numerical range.
And the logic screening sub-module is used for performing characteristic analysis on each characteristic column, screening out the characteristic columns with subordinate or dependent relationships, such as the total solar radiance and ultraviolet radiance, the total solar radiance and direct-current power generation power, and checking whether an illogical numerical value exists among data of the characteristic columns.
The abnormal value processing sub-module is used for setting an abnormal proportion critical value M%, M is greater than 0, the characteristic column lower than the critical value is an abnormal degree low characteristic column, and otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and correspondingly changing the abnormal values of the high-abnormality characteristic column on the premise of ensuring the accuracy of the original data to ensure that the abnormal values conform to the judgment of extremum screening and logic screening.
Specifically, the characteristic column missing value processing module sets a missing degree critical value N%, wherein N is greater than 0, the characteristic column lower than the critical value is a missing degree low characteristic column, otherwise, the characteristic column is a missing degree high characteristic column, and deletion processing is performed on the outdoor demonstration data set record row where the missing value of the missing degree low characteristic column is located; and filling missing values of the high-missing-degree characteristic column, and returning the filled values to the characteristic column abnormal value cleaning module for inspection and screening.
And the data segmentation and coding module is used for disordering the outdoor demonstration data set processed by the data value preprocessing module, segmenting the outdoor demonstration data set into a training set, a verification set and a test set, and coding and normalizing discrete features and continuous features of the training set respectively.
And the data mining modeling module is used for mining the internal relation between the feature columns and the label columns of the training set based on the data mining technology and generating a prediction model of the label columns based on the feature columns. And applying the prediction model to a verification set for verification so as to verify the robustness and stability of the generated prediction model, sending the verified prediction model to a model base storage module, and activating a data extraction module to extract the next group of photovoltaic string.
Specifically, an XGboost algorithm is adopted as an internal relation between a feature column and a label column of a mining training set to generate a prediction model of the label column based on the feature column, and the prediction model is applied to a verification set to verify the robustness and stability of the generated prediction model.
And the model base storage module is used for storing all the prediction models generated by the data mining modeling module.
Referring to fig. 5, a schematic diagram of a comparison between a real value and a predicted value of generated power of 100 randomly selected points on a test set of one photovoltaic string is shown, and a comparison result in the diagram shows that a prediction model of a tested photovoltaic string in an outdoor demonstration database based on XGBoost performs well on data in the test set, that is, the predicted value of generated power is very close to the real value.
Referring to fig. 6, the same meteorological monitoring data is substituted into the corresponding prediction models of all the photovoltaic strings in the prediction model library, so that the dc power generation data results of all the photovoltaic strings are fitted to the unified meteorological conditions, and the dc power generation test results of the photovoltaic strings in the outdoor demonstration database in hainan are mutually comparable.
It will be understood by those skilled in the art that all or part of the steps of the above embodiments may be implemented by hardware, or may be implemented by hardware related to instructions of a program, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (11)

1. An outdoor empirical prediction model library generation method, comprising:
step one, collecting and recording the empirical test result data and meteorological monitoring data of all tested photovoltaic string collected and recorded by an outdoor empirical test field, and establishing an outdoor empirical database;
selecting one group of photovoltaic string tested under the same meteorological monitoring data in the outdoor evidence database, extracting the meteorological monitoring data and the evidence test result data corresponding to the evidence tested project based on the corresponding evidence tested project for realizing comparison to form an outdoor evidence data set of the photovoltaic string,
the meteorological monitoring data is used as a characteristic column of the outdoor demonstration data set, and the demonstration test result data is used as a label column of the outdoor demonstration data set;
step three, carrying out data cleaning and pretreatment on the outdoor demonstration data set;
step four, the outdoor demonstration data set processed in the step three is divided into a training set, a verification set and a test set after being disturbed, the discrete type features and the continuous type features of the training set are respectively coded and normalized, the internal relation between the feature columns and the label columns of the training set is mined, a prediction model of the label columns based on the feature columns is generated,
wherein the data volume of the training set is greater than the sum of the data volumes of the validation set and the test set;
step five, evaluating the accuracy of the prediction model generated in the step four on the test set data, and if the accuracy is higher than a preset critical value, storing the prediction model; if the accuracy is lower than a preset critical value, optimizing the prediction model;
and step six, iterating the step two to the step five until the prediction models of all the photovoltaic strings are obtained, and summarizing all the prediction models to generate a prediction model library of outdoor empirical experiment results.
2. The outdoor empirical prediction model library generating method of claim 1, wherein the third step comprises performing feature transformation on the feature column, cleaning the tag column, cleaning abnormal values of the feature column, and processing missing values of the feature column.
3. The outdoor empirical prediction model library generating method of claim 2, wherein the feature columns subjected to the feature conversion include solar terms conversion on a date and time periods conversion.
4. The outdoor empirical prediction model library generating method of claim 2, wherein the tag column is cleaned by deleting a data set record row in which a missing value in the tag column is located, so that no missing value exists in the tag column.
5. The outdoor empirical prediction model library generation method of claim 2, wherein the cleaning of the outliers of the feature string is performed to reduce noise of the outdoor empirical data set of the photovoltaic string by the following steps:
screening extreme values, setting a reasonable numerical range for the data of each characteristic column, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are within the set reasonable numerical range;
logic screening, namely performing characteristic analysis on each characteristic column, screening out the characteristic columns with subordination or dependency relationship, and checking whether data of the characteristic columns have an illogical value or not;
abnormal value processing, namely setting an abnormal proportion critical value M%, wherein M is greater than 0, and the characteristic column lower than the critical value is an abnormal degree low characteristic column, otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and on the premise of ensuring the accuracy of the original data, the abnormal values of the high-abnormality characteristic column are modified to be in accordance with the judgment of extremum screening and logic screening.
6. The outdoor empirical prediction model library generation method of claim 2, wherein the processing of the missing values of the feature string to complete the outdoor empirical data set of the pv string comprises:
classifying the characteristic missing conditions, setting a critical value N% of the missing degree, wherein N is greater than 0, and the characteristic columns lower than the critical value are the characteristic columns with low missing degree, otherwise, the characteristic columns with high missing degree;
deleting the outdoor demonstration data set record row where the missing value of the missing-degree low-feature column is located;
and filling missing values of the high-missing-degree characteristic columns, wherein the filled values need to be inspected and screened by cleaning abnormal values of the characteristic columns.
7. The outdoor empirical prediction model library creating method of any of claims 1-6, wherein in the fourth step, after the discrete type feature and the continuous type feature of the training set are encoded and normalized, the discrete type feature and the continuous type feature column of the verification set and the test set are encoded and normalized respectively by using an encoding and normalizing model created based on the training set data.
8. The outdoor empirical prediction model library generating method of claim 7, wherein the application of the prediction model to the validation set verifies the robustness and stability of the generated prediction model, and in particular, when the prediction model is trained using the training set data, the application of the trained prediction model to the validation set data continuously verifies the robustness and stability of the prediction model.
9. An outdoor empirical prediction model library generation system, comprising:
the data acquisition module is used for acquiring meteorological monitoring data of an outdoor empirical test field and empirical test result data of all tested photovoltaic strings;
the data storage module is used for summarizing meteorological monitoring data of the outdoor demonstration test field and demonstration test result data of all tested photovoltaic group strings and establishing an outdoor demonstration database;
the data extraction module is used for selecting a group of photovoltaic group strings which are tested under the same meteorological monitoring data in an outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to an evidence tested project based on the corresponding evidence tested project for realizing comparison, generating an outdoor evidence data set, taking the meteorological monitoring data as a characteristic column of the outdoor evidence data set, and taking the evidence test result data as a label column of the outdoor evidence data set;
the data value preprocessing module is used for cleaning and preprocessing the data of the outdoor demonstration data set;
the data segmentation and coding module is used for disordering the outdoor empirical data set processed by the data value preprocessing module, segmenting the outdoor empirical data set into a training set, a verification set and a test set, and coding and normalizing discrete features and continuous features of the training set respectively;
the data mining modeling module is used for mining the internal relation between the feature columns and the label columns of the training set based on the data mining technology and generating a prediction model of the label columns based on the feature columns;
the model base storage module is used for storing all the prediction models generated by the data mining modeling module;
and applying the prediction model to a verification set for verification, sending the verified prediction model to a model base storage module by the data mining modeling module, and activating the data extraction module to extract the next group of photovoltaic strings.
10. The system according to claim 9, wherein the data value preprocessing module comprises a feature conversion module for performing feature conversion on the feature string, a tag string cleaning module for cleaning the tag string, a feature string outlier cleaning module for cleaning the outlier of the feature string, and a feature string missing value processing module for processing the missing value of the feature string.
11. The outdoor empirical prediction model library generation system of claim 10, wherein the feature column outliers cleaning module is embedded with an extremum screening submodule, a logic screening submodule, and an outliers processing submodule, wherein,
the extreme value screening submodule is used for setting a reasonable numerical range for the data of each characteristic column, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are in the set reasonable numerical range;
the logic screening submodule is used for carrying out characteristic analysis on each characteristic column, screening out the characteristic columns with subordination or dependency relationship, and checking whether the data of the characteristic columns have an illogical value or not;
the abnormal value processing sub-module is used for setting an abnormal proportion critical value M%, M is greater than 0, the characteristic column lower than the critical value is an abnormal degree low characteristic column, and otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and correspondingly modifying the abnormal values of the high-abnormality characteristic column on the premise of ensuring the accuracy of the original data to make the abnormal values accord with the judgment of extremum screening and logic screening.
CN202011048216.7A 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system Active CN112364477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011048216.7A CN112364477B (en) 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011048216.7A CN112364477B (en) 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system

Publications (2)

Publication Number Publication Date
CN112364477A CN112364477A (en) 2021-02-12
CN112364477B true CN112364477B (en) 2022-12-06

Family

ID=74508317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011048216.7A Active CN112364477B (en) 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system

Country Status (1)

Country Link
CN (1) CN112364477B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298084A (en) * 2021-11-17 2022-04-08 华能大理风力发电有限公司洱源分公司 XGboost-based photovoltaic group string communication abnormity identification method and system

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330566A (en) * 2017-07-19 2017-11-07 桑夏太阳能股份有限公司 The predictor method and system of photovoltaic array power output
CN107944604A (en) * 2017-11-10 2018-04-20 中国电力科学研究院有限公司 A kind of weather pattern recognition methods and device for photovoltaic power prediction
CN108197744A (en) * 2018-01-02 2018-06-22 华北电力大学(保定) A kind of determining method and system of photovoltaic generation power
CN108694484A (en) * 2018-08-30 2018-10-23 广东工业大学 A kind of photovoltaic power generation power prediction method
CN109657881A (en) * 2019-01-14 2019-04-19 南京国电南自电网自动化有限公司 A kind of neural network photovoltaic power generation prediction technique and system suitable for small sample
CN109711609A (en) * 2018-12-15 2019-05-03 福州大学 Photovoltaic plant output power predicting method based on wavelet transformation and extreme learning machine
CN109978258A (en) * 2019-03-26 2019-07-05 北京博望华科科技有限公司 Multi-data source method for forecasting photovoltaic power generation quantity and system based on machine learning
CN110084412A (en) * 2019-04-12 2019-08-02 重庆邮电大学 A kind of photovoltaic power generation big data prediction technique based on the study of Feature Conversion multi-tag
CN110414748A (en) * 2019-08-12 2019-11-05 合肥阳光新能源科技有限公司 Photovoltaic power prediction technique
CN110516844A (en) * 2019-07-25 2019-11-29 太原理工大学 Multivariable based on EMD-PCA-LSTM inputs photovoltaic power forecasting method
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm
CN110689161A (en) * 2019-08-09 2020-01-14 南京因泰莱电器股份有限公司 Method for realizing photovoltaic power generation power prediction model with reusability
CN110796292A (en) * 2019-10-14 2020-02-14 国网辽宁省电力有限公司盘锦供电公司 Photovoltaic power short-term prediction method considering haze influence
CN110909919A (en) * 2019-11-07 2020-03-24 哈尔滨工程大学 Photovoltaic power prediction method of depth neural network model with attention mechanism fused
CN111612244A (en) * 2020-05-18 2020-09-01 南瑞集团有限公司 QRA-LSTM-based method for predicting nonparametric probability of photovoltaic power before day

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330566A (en) * 2017-07-19 2017-11-07 桑夏太阳能股份有限公司 The predictor method and system of photovoltaic array power output
CN107944604A (en) * 2017-11-10 2018-04-20 中国电力科学研究院有限公司 A kind of weather pattern recognition methods and device for photovoltaic power prediction
CN108197744A (en) * 2018-01-02 2018-06-22 华北电力大学(保定) A kind of determining method and system of photovoltaic generation power
CN108694484A (en) * 2018-08-30 2018-10-23 广东工业大学 A kind of photovoltaic power generation power prediction method
CN109711609A (en) * 2018-12-15 2019-05-03 福州大学 Photovoltaic plant output power predicting method based on wavelet transformation and extreme learning machine
CN109657881A (en) * 2019-01-14 2019-04-19 南京国电南自电网自动化有限公司 A kind of neural network photovoltaic power generation prediction technique and system suitable for small sample
CN109978258A (en) * 2019-03-26 2019-07-05 北京博望华科科技有限公司 Multi-data source method for forecasting photovoltaic power generation quantity and system based on machine learning
CN110084412A (en) * 2019-04-12 2019-08-02 重庆邮电大学 A kind of photovoltaic power generation big data prediction technique based on the study of Feature Conversion multi-tag
CN110516844A (en) * 2019-07-25 2019-11-29 太原理工大学 Multivariable based on EMD-PCA-LSTM inputs photovoltaic power forecasting method
CN110689161A (en) * 2019-08-09 2020-01-14 南京因泰莱电器股份有限公司 Method for realizing photovoltaic power generation power prediction model with reusability
CN110414748A (en) * 2019-08-12 2019-11-05 合肥阳光新能源科技有限公司 Photovoltaic power prediction technique
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm
CN110796292A (en) * 2019-10-14 2020-02-14 国网辽宁省电力有限公司盘锦供电公司 Photovoltaic power short-term prediction method considering haze influence
CN110909919A (en) * 2019-11-07 2020-03-24 哈尔滨工程大学 Photovoltaic power prediction method of depth neural network model with attention mechanism fused
CN111612244A (en) * 2020-05-18 2020-09-01 南瑞集团有限公司 QRA-LSTM-based method for predicting nonparametric probability of photovoltaic power before day

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
不同类型晶硅光伏组件在湿热环境下的性能研究;曾湘安、揭敢新等;《环境试验》;20190825;第19-22页 *

Also Published As

Publication number Publication date
CN112364477A (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN108647716B (en) Photovoltaic array fault diagnosis method based on composite information
Jebli et al. Prediction of solar energy guided by pearson correlation using machine learning
CN111382542B (en) Highway electromechanical device life prediction system facing full life cycle
Kang et al. Big data analytics in China's electric power industry: modern information, communication technologies, and millions of smart meters
CN109376960A (en) Load Forecasting based on LSTM neural network
CN115994325B (en) Fan icing power generation data enhancement method based on TimeGAN deep learning method
CN112257784A (en) Electricity stealing detection method based on gradient boosting decision tree
CN109617526A (en) A method of photovoltaic power generation array fault diagnosis and classification based on wavelet multiresolution analysis and SVM
CN111967675A (en) Photovoltaic power generation amount prediction method and prediction device
CN112364477B (en) Outdoor empirical prediction model library generation method and system
CN109670549A (en) The data screening method, apparatus and computer equipment of fired power generating unit
CN115718861A (en) Method and system for classifying power users and monitoring abnormal behaviors in high-energy-consumption industry
Yun et al. Research on fault diagnosis of photovoltaic array based on random forest algorithm
CN115859099A (en) Sample generation method and device, electronic equipment and storage medium
CN115758151A (en) Combined diagnosis model establishing method and photovoltaic module fault diagnosis method
CN116756505B (en) Photovoltaic equipment intelligent management system and method based on big data
Kardi et al. Anomaly detection in electricity consumption data using deep learning
CN117495126A (en) High-proportion new energy distribution network line loss prediction method and device
CN117439045A (en) Multi-element load prediction method for comprehensive energy system
CN115766504A (en) Method for detecting cycle time sequence abnormity
Maalej et al. Sensor data augmentation strategy for load forecasting in smart grid context
CN113435494A (en) Low-voltage resident user abnormal electricity utilization identification method and simulation system
Xia et al. Research on Solar Radiation Estimation based on Singular Spectrum Analysis-Deep Belief Network
Jian et al. Abnormal detection of power consumption based on a stacking ensemble model
Zhang et al. Abnormal Electricity Consumption Detection from Incomplete Records in Power System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant