CN112364477A - Outdoor empirical prediction model library generation method and system - Google Patents

Outdoor empirical prediction model library generation method and system Download PDF

Info

Publication number
CN112364477A
CN112364477A CN202011048216.7A CN202011048216A CN112364477A CN 112364477 A CN112364477 A CN 112364477A CN 202011048216 A CN202011048216 A CN 202011048216A CN 112364477 A CN112364477 A CN 112364477A
Authority
CN
China
Prior art keywords
data
outdoor
column
prediction model
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011048216.7A
Other languages
Chinese (zh)
Other versions
CN112364477B (en
Inventor
赵上懿
曾湘安
洪志浩
揭敢新
王俊
许楚斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Electric Apparatus Research Institute Co Ltd
Original Assignee
China National Electric Apparatus Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Electric Apparatus Research Institute Co Ltd filed Critical China National Electric Apparatus Research Institute Co Ltd
Priority to CN202011048216.7A priority Critical patent/CN112364477B/en
Publication of CN112364477A publication Critical patent/CN112364477A/en
Application granted granted Critical
Publication of CN112364477B publication Critical patent/CN112364477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an outdoor empirical prediction model base generation method and system, which are characterized in that prediction models of performance indexes related to environmental factors reflected by tested performances of all photovoltaic string strings are obtained through an outdoor empirical prediction model base generation system based on data collected by an outdoor empirical test according to the outdoor empirical prediction model base generation method, namely, each prediction model discloses internal relation between specific meteorological monitoring data and tested photovoltaic string outdoor empirical test result data; and substituting the same meteorological monitoring data into the prediction model corresponding to each group of photovoltaic strings, so that the tested performance of each group of photovoltaic strings is fitted to consistent environmental factors, and finally, the photovoltaic strings in the outdoor empirical data set can be compared with each other on the tested items.

Description

Outdoor empirical prediction model library generation method and system
Technical Field
The invention relates to the technical field of photovoltaic power generation, in particular to a method and a system for generating an outdoor empirical prediction model library.
Background
Photovoltaic power generation is a technology for directly converting light energy into electric energy by utilizing the photovoltaic effect of a semiconductor interface, and the development of photovoltaic power generation is particularly rapid in recent years as one of main technologies of solar power generation. Outdoor demonstration is used as a test means for evaluating the performance of a photovoltaic product in a real environment, and is widely applied to the photovoltaic industry in recent years.
At present, the comparison of outdoor demonstration test results of photovoltaic products can only be carried out among the photovoltaic products participating in the same outdoor demonstration test. For outdoor demonstration tests under different environmental conditions, due to the complexity of environmental factors, test results of the outdoor demonstration tests cannot be directly compared among photovoltaic products participating in the outdoor demonstration tests under different environments.
For example, when the outdoor proof test with the environmental condition of B1 is performed on one batch of the photovoltaic string a1 to obtain the test result C1, and the outdoor proof test with the environmental condition of B2 is performed on the other batch of the photovoltaic string a2 to obtain the test result C2, the photovoltaic string a1 and the photovoltaic string a2 cannot be compared with each other based on the performance indexes reflected by the respective outdoor proof test results due to the difference of the environmental conditions.
In order to solve the above technical problem, it is easy to think that if a part of the photovoltaic string a1 is extracted and participates in the outdoor empirical test with the environmental condition B2 together with the photovoltaic string B1, the comparison of the performance indexes reflected by the outdoor empirical test results of the photovoltaic strings a1 and B1 can be realized.
However, according to the technical scheme, if the photovoltaic products participating in the outdoor demonstration test comparison have large batch number, the detection efficiency of the outdoor demonstration can be seriously limited due to the field limitation of the outdoor demonstration test field, and the test cost for comparing the outdoor demonstration test results is greatly increased.
Disclosure of Invention
In order to realize the comparison of the performance indexes reflected by the obtained test results after outdoor empirical tests are carried out on photovoltaic strings of different batches under different environmental conditions, the invention provides an outdoor empirical prediction model library generation method and system, and aims to mine the internal relation between meteorological monitoring data and test result data of each photovoltaic string during the outdoor empirical test. The technical scheme is as follows:
in one aspect, an outdoor empirical prediction model library generation method is provided, and the method includes:
step one, collecting and recording the empirical test result data and meteorological monitoring data of all tested photovoltaic string collected and recorded by an outdoor empirical test field, and establishing an outdoor empirical database;
selecting one group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to the evidence tested project based on the corresponding evidence tested project for realizing comparison to form an outdoor evidence data set of the photovoltaic string,
the meteorological monitoring data is used as a characteristic column of the outdoor demonstration data set, and the demonstration test result data is used as a label column of the outdoor demonstration data set;
step three, carrying out data cleaning and pretreatment on the outdoor demonstration data set;
step four, the outdoor demonstration data set processed in the step three is divided into a training set, a verification set and a test set after being disturbed, the discrete type features and the continuous type features of the training set are respectively coded and normalized, the internal relation between the feature columns and the label columns of the training set is mined, a prediction model of the label columns based on the feature columns is generated,
wherein the data volume of the training set is greater than the sum of the data volumes of the validation set and the test set;
evaluating the accuracy of the prediction model generated in the step four on the test set data, and storing the prediction model if the accuracy is higher than a preset critical value; if the accuracy is lower than a preset critical value, optimizing the prediction model;
and step six, iterating the step two to the step five until the prediction models of all the photovoltaic strings are obtained, and summarizing all the prediction models to generate a prediction model library of outdoor demonstration test results.
In another aspect, an outdoor empirical prediction model library generation system is provided, and includes:
the data acquisition module is used for acquiring meteorological monitoring data of an outdoor empirical test field and empirical test result data of all tested photovoltaic group strings;
the data storage module is used for summarizing meteorological monitoring data of the outdoor demonstration test field and demonstration test result data of all tested photovoltaic group strings and establishing an outdoor demonstration database;
the data extraction module is used for selecting a group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to an evidence tested project based on the corresponding evidence tested project for realizing comparison, generating an outdoor evidence data set, taking the meteorological monitoring data as a characteristic column of the outdoor evidence data set, and taking the evidence test result data as a label column of the outdoor evidence data set;
and the data value preprocessing module is used for cleaning and preprocessing the data of the outdoor demonstration data set.
The data segmentation and coding module is used for disordering the outdoor empirical data set processed by the data value preprocessing module, segmenting the outdoor empirical data set into a training set, a verification set and a test set, and coding and normalizing discrete features and continuous features of the training set respectively;
the data mining modeling module is used for mining the internal relation between the feature columns and the label columns of the training set based on the data mining technology and generating a prediction model of the label columns based on the feature columns;
the model base storage module is used for storing all the prediction models generated by the data mining modeling module;
and applying the prediction model to a verification set for verification, sending the verified prediction model to a model base storage module by the data mining modeling module, and activating the data extraction module to extract the next group of photovoltaic strings.
The technical scheme provided by the invention has the beneficial effects that:
based on data collected by an outdoor empirical test, obtaining prediction models of performance indexes reflected by tested performances of all photovoltaic strings about environmental factors through an outdoor empirical prediction model library generation system and according to an outdoor empirical prediction model library generation method, namely each prediction model reveals the internal relation between specific meteorological monitoring data and tested photovoltaic string outdoor empirical test result data;
substituting the same meteorological monitoring data into the prediction model corresponding to each group of photovoltaic strings to enable the tested performance of each group of photovoltaic strings to be fitted to consistent environmental factors, and finally realizing the mutual comparison of the photovoltaic strings in the outdoor empirical data set on the tested items;
obviously, by increasing more batches of photovoltaic string outdoor empirical tests, continuously accumulating outdoor empirical data sets to fill an outdoor empirical database, and filling a new prediction model into the outdoor empirical prediction model library through an outdoor empirical prediction model library generation system according to an outdoor empirical prediction model library generation method, the prediction model is more and more accurate, meanwhile, mutual comparison on tested items is provided for more batches of photovoltaic strings, and the experiment cost for comparing outdoor empirical test results is greatly reduced.
Drawings
In order to more clearly illustrate the technical solution in the present embodiment, the drawings needed to be used in the description of the embodiment will be briefly introduced below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for generating an outdoor empirical prediction model library according to the present invention;
FIG. 2 is a schematic flow chart of step 103 according to the present invention;
FIG. 3 is a schematic flow chart illustrating steps 104 to 105 of the present invention;
FIG. 4 is a block diagram of an outdoor empirical prediction model library generation system according to the present invention;
FIG. 5 is a schematic diagram illustrating comparison between the real value and the predicted value of the DC power generation power at 100 points randomly selected by one photovoltaic string on the test set;
FIG. 6 is a specific process diagram for fitting the tested items of all photovoltaic strings to a uniform meteorological condition to achieve mutual comparability between the test results of the tested items.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Referring to fig. 1, the outdoor empirical prediction model library generating method may include the following steps:
step 101, collecting and recording the empirical test result data and meteorological monitoring data of all tested photovoltaic string collected and recorded by an outdoor empirical test field, and establishing an outdoor empirical database.
Specifically, the outdoor empirical test field is located in the Hainan, 10 different batches with the test time range of 2016-2020 are photovoltaic string tests, the test time of each batch is 1 year, and the data recording interval of the empirical test result is 5 minutes. The meteorological monitoring data comprises 10 items of date and time, air humidity, air temperature, total solar radiation degree, ultraviolet radiation degree, wind speed, wind direction, total sunshine hours per day, total rainfall per day and air pressure.
And 102, selecting one group of photovoltaic string in the outdoor demonstration database, extracting meteorological monitoring data and demonstration test result data corresponding to the demonstration tested project based on the corresponding demonstration tested project for realizing comparison, and forming an outdoor demonstration data set of the photovoltaic string.
Specifically, the evidence measured item for realizing the comparison is the direct current power generation power, and corresponding meteorological monitoring data and direct current power generation power data of the direct current power generation power generated by the photovoltaic set string in a corresponding outdoor evidence database are extracted to form an outdoor evidence data set.
The meteorological monitoring data are used as a characteristic column of the outdoor demonstration data set, and demonstration test result data, namely direct current power generation data, are used as a label column of the outdoor demonstration data set.
Step 103, carrying out data cleaning and preprocessing on the outdoor demonstration data set
Specifically, referring to fig. 2, the data cleaning and preprocessing processes the missing value of the feature column by sequentially performing feature conversion on the feature column, cleaning the tag column, and cleaning the abnormal value of the feature column.
Alternatively, the feature column for which the feature conversion is performed includes a solar term conversion on the date and a time period conversion on the time. For example, the date is converted into different solar terms according to a 24 solar terms per year schedule to mine the implicit meteorological features, and the time is converted into different time periods according to the daily day bright, day black and day intermediate schedules of an outdoor demonstration test field to mine the implicit daily meteorological feature change.
Specifically, the tag column is cleaned, and the data set record row where the missing value in the tag column is located is deleted, so that no missing value exists in the tag column.
Specifically, for cleaning the abnormal values of the characteristic columns and performing noise reduction processing on the outdoor demonstration data set of the photovoltaic string, the specific method is as follows:
and (4) extreme value screening, namely setting a reasonable numerical range for the data of each characteristic column, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are in the set reasonable numerical range.
For example, after a reasonable numerical range is set according to each feature column of the outdoor proof data set of the Hainan outdoor proof test field, whether the maximum value and the minimum value of each feature column are within the set reasonable numerical range is checked.
And logic screening, namely performing characteristic analysis on each characteristic column, screening out the characteristic columns with dependency or dependency relationship, and checking whether the data of the characteristic columns have illogical values.
For example, after the outdoor empirical data set of the photovoltaic string is subjected to feature analysis, the interdependent features include total solar irradiance and ultraviolet irradiance, total solar irradiance and direct-current power generation, and logic screening is performed to ensure that the total solar irradiance is 0 and both the ultraviolet irradiance and the direct-current power generation are 0.
Abnormal value processing, namely setting an abnormal proportion critical value M%, wherein M is greater than 0, and the characteristic column lower than the critical value is an abnormal degree low characteristic column, otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and on the premise of ensuring the accuracy of the original data, the abnormal values of the high-abnormality characteristic column are modified to be in accordance with the judgment of extremum screening and logic screening.
For example, setting M =5, and when the degree of abnormality of the feature column is lower than 5%, performing deletion processing on the outdoor demonstration data set record row where the abnormal value of the feature column with low degree of abnormality is located; when the degree of abnormality of the characteristic column is higher than 5%, the abnormal value of the characteristic column with high degree of abnormality is changed to be in line with results of extremum screening and logic screening on the premise that the operation, data acquisition, data recording and data summarization processes of the outdoor empirical test before inspection are accurate, and therefore data noise reduction of the outdoor empirical data set is achieved.
Specifically, the method for processing the missing value of the characteristic column to complete the outdoor demonstration data set of the photovoltaic string to solve the missing value problem in the outdoor demonstration data set includes:
classifying the characteristic missing conditions, setting a critical value N% of the missing degree, wherein N is greater than 0, and the characteristic column lower than the critical value is a low-missing-degree characteristic column, otherwise, the characteristic column is a high-missing-degree characteristic column;
deleting the outdoor demonstration data set record row where the missing value of the missing-degree low-feature column is located;
and filling missing values of the high-missing-degree characteristic columns, wherein the filled values need to be inspected and screened by cleaning abnormal values of the characteristic columns.
For example, setting N =5, and when the deletion degree of the feature column is lower than 5%, performing deletion processing on the outdoor demonstration data set record row where the deletion value of the feature column with the low deletion degree is located; when the missing degree of the feature column is higher than 5%, filling methods such as a fixed value filling method, a mean value filling method, a mode filling method, an algorithm filling method and the like are selected for filling the missing value of the missing-degree high feature column, the specific filling method is not limited, for the embodiment, a random forest algorithm can be used for predicting the missing value through the existing data, so that the missing value filling of the missing-degree high feature column is completed, the filled data needs to be cleaned by an abnormal value again, and the data filled by the algorithm is guaranteed to be a reliable value.
And step 104, the outdoor demonstration data set processed in the step 103 is divided into a training set, a verification set and a test set after being disturbed, the discrete type features and the continuous type features of the training set are respectively coded and normalized, the internal relation between the feature columns and the label columns of the training set is mined, and the prediction model of the label columns based on the feature columns is generated.
Specifically, the data amount of the training set is greater than the sum of the data amounts of the verification set and the test set, and the segmentation ratio selected in this embodiment is 6: 2: 2.
optionally, referring to fig. 3, after the discrete feature and the continuous feature of the training set are encoded and normalized, the discrete feature and the continuous feature of the verification set and the test set are encoded and normalized respectively by using an encoding and normalizing model generated based on the training set data.
Specifically, the XGBoost algorithm is adopted in the present embodiment as an internal relation between the feature column and the label column of the mining training set, and a prediction model based on the feature column of the label column is generated.
Optionally, the application of the prediction model on the verification set verifies the robustness and stability of the generated prediction model. Specifically, when the prediction model is trained by using the training set data, the trained prediction model is continuously applied to the verification set data to verify the robustness and stability of the prediction model so as to prevent the model from being over-fitted.
Step 105, evaluating the accuracy of the prediction model generated in the step 104 on the test set data, and if the accuracy is higher than a preset critical value, storing the prediction model; and if the accuracy is lower than a preset critical value, optimizing the prediction model.
Specifically, the model prediction accuracy evaluation index selected in this embodiment is a decision coefficient, and the decision coefficient reflects the capability of the model to capture data information, and the closer to 1, the better.
For example, the accuracy threshold is set to 0.9; if the accuracy critical value does not pass, optimizing the prediction model; if the prediction model passes, the prediction model is stored in a prediction model library.
And 106, iterating the steps 102 to 105 until prediction models of all photovoltaic string groups are obtained, and summarizing all the prediction models to generate a prediction model base of outdoor demonstration test results.
Referring to fig. 4, the system for generating the outdoor empirical prediction model library includes a data acquisition module, a data storage module, a data extraction module, a data value preprocessing module, a data segmentation and coding module, a data mining and modeling module, and a model library storage module.
The data acquisition module acquires the empirical test result data of all tested photovoltaic string in Hainan outdoor empirical test field and 10 meteorological monitoring data including date and time, air humidity, air temperature, total solar radiance, ultraviolet radiance, wind speed, wind direction, total sunshine hours per day, total rainfall per day and air pressure, the acquisition time of each batch of tested photovoltaic string is 1 year, and the record interval of the empirical test result data is 5 minutes.
The data storage module collects meteorological monitoring data of a Hainan outdoor demonstration test field and demonstration test result data of all tested photovoltaic group strings, and an outdoor demonstration database is established;
the data extraction module selects a group of photovoltaic string in the outdoor demonstration database, uses the direct current power generation power as the proved project for realizing comparison, extracts the corresponding meteorological monitoring data and the direct current power generation power data of the direct current power generation power generated by the photovoltaic string in the corresponding outdoor demonstration database to form an outdoor demonstration data set, uses the meteorological monitoring data as the characteristic list of the outdoor demonstration data set, and uses the demonstration test result data, namely the direct current power generation power data, as the label list of the outdoor demonstration data set.
The data value preprocessing module cleans and preprocesses data of the outdoor demonstration data set.
Specifically, the data value preprocessing module comprises a feature conversion module for performing feature conversion on the feature column, a tag column cleaning module for cleaning the tag column, a feature column abnormal value cleaning module for cleaning the abnormal value of the feature column, and a feature column missing value processing module for performing missing value processing on the feature column.
Specifically, the characteristic conversion module can convert the date in the characteristic column into different solar terms according to a 24 solar terms per year schedule so as to mine the implicit meteorological characteristics in the characteristic column, and convert the time in the characteristic column into different time periods according to the daily day bright, dark and daytime schedules in the outdoor demonstration test field so as to mine the implicit daily meteorological characteristic change in the characteristic column.
Specifically, the tag column cleaning module deletes the data set record row where the missing value in the tag column is located, so that no missing value exists in the tag column.
Specifically, an extreme value screening submodule, a logic screening submodule and an abnormal value processing submodule are arranged in the characteristic column abnormal value cleaning module; wherein the content of the first and second substances,
and the extreme value screening submodule is used for setting a reasonable numerical range for characteristic column data, such as the characteristic column data of an outdoor demonstration data set based on the Hainan outdoor demonstration test field, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are in the set reasonable numerical range.
And the logic screening sub-module is used for performing characteristic analysis on each characteristic column, screening out the characteristic columns with subordinate or dependent relationships, such as the total solar radiance and ultraviolet radiance, the total solar radiance and direct-current power generation power, and checking whether an illogical numerical value exists among data of the characteristic columns.
The abnormal value processing sub-module is used for setting an abnormal proportion critical value M%, M is greater than 0, the characteristic column lower than the critical value is an abnormal degree low characteristic column, and otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and correspondingly changing the abnormal values of the high-abnormality characteristic column on the premise of ensuring the accuracy of the original data to ensure that the abnormal values conform to the judgment of extremum screening and logic screening.
Specifically, the characteristic column missing value processing module sets a missing critical value N%, wherein N is greater than 0, the characteristic column lower than the critical value is a missing low characteristic column, otherwise, the characteristic column is a missing high characteristic column, and deletion processing is performed on the outdoor demonstration data set record row where the missing value of the missing low characteristic column is located; and filling missing values of the high-missing-degree characteristic column, and returning the filled values to the characteristic column abnormal value cleaning module for inspection and screening.
And the data segmentation and coding module is used for disordering the outdoor demonstration data set processed by the data value preprocessing module, segmenting the outdoor demonstration data set into a training set, a verification set and a test set, and coding and normalizing discrete features and continuous features of the training set respectively.
And the data mining modeling module is used for mining the internal relation between the feature columns and the label columns of the training set based on the data mining technology and generating a prediction model of the label columns based on the feature columns. And applying the prediction model to a verification set for verification so as to verify the robustness and stability of the generated prediction model, sending the verified prediction model to a model base storage module, and activating a data extraction module to extract the next group of photovoltaic string.
Specifically, an XGboost algorithm is adopted as an internal relation between a feature column and a label column of a mining training set to generate a prediction model of the label column based on the feature column, and the prediction model is applied to a verification set to verify the robustness and stability of the generated prediction model.
And the model base storage module is used for storing all the prediction models generated by the data mining modeling module.
Referring to fig. 5, a schematic diagram of a comparison between a real value and a predicted value of generated power at 100 points randomly selected by one photovoltaic string on a test set is shown, and a comparison result in the diagram shows that a prediction model of a tested photovoltaic string in an outdoor demonstration database based on XGBoost performs well on data in the test set, that is, the predicted value of generated power is very close to the real value.
Referring to fig. 6, the same meteorological monitoring data is substituted into the corresponding prediction models of all the photovoltaic strings in the prediction model library, so that the dc generated power data results of all the photovoltaic strings are fitted to the uniform meteorological conditions, and thus, mutual comparison between the dc generated power test results of the photovoltaic strings in the Hainan outdoor demonstration database is realized.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware related to instructions of a program, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be considered as within the scope of the present invention.

Claims (11)

1. An outdoor empirical prediction model library generation method, the method comprising:
step one, collecting and recording the empirical test result data and meteorological monitoring data of all tested photovoltaic string collected and recorded by an outdoor empirical test field, and establishing an outdoor empirical database;
selecting one group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to the evidence tested project based on the corresponding evidence tested project for realizing comparison to form an outdoor evidence data set of the photovoltaic string,
the meteorological monitoring data is used as a characteristic column of the outdoor demonstration data set, and the demonstration test result data is used as a label column of the outdoor demonstration data set;
step three, carrying out data cleaning and pretreatment on the outdoor demonstration data set;
step four, the outdoor demonstration data set processed in the step three is divided into a training set, a verification set and a test set after being disturbed, the discrete type features and the continuous type features of the training set are respectively coded and normalized, the internal relation between the feature columns and the label columns of the training set is mined, a prediction model of the label columns based on the feature columns is generated,
wherein the data volume of the training set is greater than the sum of the data volumes of the validation set and the test set;
evaluating the accuracy of the prediction model generated in the step four on the test set data, and storing the prediction model if the accuracy is higher than a preset critical value; if the accuracy is lower than a preset critical value, optimizing the prediction model;
and step six, iterating the step two to the step five until the prediction models of all the photovoltaic strings are obtained, and summarizing all the prediction models to generate a prediction model library of outdoor demonstration test results.
2. The outdoor empirical prediction model library generating method of claim 1, wherein the third step comprises performing feature transformation on the feature column, cleaning the tag column, cleaning abnormal values of the feature column, and processing missing values of the feature column.
3. The outdoor empirical prediction model library generating method of claim 2, wherein the feature columns subjected to the feature conversion include solar terms conversion on a date and time periods conversion.
4. The outdoor empirical prediction model library generating method of claim 2, wherein the tag column is cleaned by deleting a data set record row in which a missing value in the tag column is located, so that no missing value exists in the tag column.
5. The outdoor empirical prediction model library generation method of claim 2, wherein the cleaning of the outliers of the feature string is performed to reduce noise of the outdoor empirical data set of the photovoltaic string by the following steps:
screening extreme values, setting a reasonable numerical range for the data of each characteristic column, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are within the set reasonable numerical range;
logic screening, namely performing characteristic analysis on each characteristic column, screening out the characteristic columns with subordination or dependency relationship, and checking whether data of the characteristic columns have an illogical value or not;
abnormal value processing, namely setting an abnormal proportion critical value M%, wherein M is greater than 0, and the characteristic column lower than the critical value is an abnormal degree low characteristic column, otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and on the premise of ensuring the accuracy of the original data, the abnormal values of the high-abnormality characteristic column are modified to be in accordance with the judgment of extremum screening and logic screening.
6. The outdoor empirical prediction model library generation method of claim 2, wherein the processing of the missing values of the feature string to complete the outdoor empirical data set of the pv string comprises:
classifying the characteristic missing conditions, setting a critical value N% of the missing degree, wherein N is greater than 0, and the characteristic column lower than the critical value is a low-missing-degree characteristic column, otherwise, the characteristic column is a high-missing-degree characteristic column;
deleting the outdoor demonstration data set record row where the missing value of the missing-degree low-feature column is located;
and filling missing values of the high-missing-degree characteristic columns, wherein the filled values need to be inspected and screened by cleaning abnormal values of the characteristic columns.
7. The outdoor empirical prediction model library generating method of any one of claims 1 to 6, wherein in the fourth step, after the discrete features and the continuous features of the training set are encoded and normalized, the discrete feature columns and the continuous feature columns of the verification set and the test set are encoded and normalized respectively by using an encoding and normalizing model generated based on the training set data.
8. The outdoor empirical prediction model library generation method of claim 7, wherein applying the prediction model to a validation set verifies robustness and stability of the generated prediction model; specifically, when the prediction model is trained by using the training set data, the trained prediction model is continuously applied to the verification set data to verify the robustness and stability of the prediction model.
9. An outdoor empirical prediction model library generation system, comprising:
the data acquisition module is used for acquiring meteorological monitoring data of an outdoor empirical test field and empirical test result data of all tested photovoltaic group strings;
the data storage module is used for summarizing meteorological monitoring data of the outdoor demonstration test field and demonstration test result data of all tested photovoltaic group strings and establishing an outdoor demonstration database;
the data extraction module is used for selecting a group of photovoltaic string in the outdoor evidence database, extracting meteorological monitoring data and evidence test result data corresponding to an evidence tested project based on the corresponding evidence tested project for realizing comparison, generating an outdoor evidence data set, taking the meteorological monitoring data as a characteristic column of the outdoor evidence data set, and taking the evidence test result data as a label column of the outdoor evidence data set;
the data value preprocessing module is used for cleaning and preprocessing the data of the outdoor demonstration data set;
the data segmentation and coding module is used for disordering the outdoor empirical data set processed by the data value preprocessing module, segmenting the outdoor empirical data set into a training set, a verification set and a test set, and coding and normalizing discrete features and continuous features of the training set respectively;
the data mining modeling module is used for mining the internal relation between the feature columns and the label columns of the training set based on the data mining technology and generating a prediction model of the label columns based on the feature columns;
the model base storage module is used for storing all the prediction models generated by the data mining modeling module;
and applying the prediction model to a verification set for verification, sending the verified prediction model to a model base storage module by the data mining modeling module, and activating the data extraction module to extract the next group of photovoltaic strings.
10. The system according to claim 9, wherein the data value preprocessing module comprises a feature conversion module for performing feature conversion on the feature string, a tag string cleaning module for cleaning the tag string, a feature string outlier cleaning module for cleaning the outlier of the feature string, and a feature string missing value processing module for processing the missing value of the feature string.
11. The outdoor empirical prediction model library generation system of claim 10, wherein the feature column outliers cleaning module is embedded with an extremum screening submodule, a logic screening submodule, and an outliers processing submodule, wherein,
the extreme value screening submodule is used for setting a reasonable numerical range for the data of each characteristic column, performing data statistics on each characteristic column, and checking whether the maximum value and the minimum value of each characteristic column are in the set reasonable numerical range;
the logic screening submodule is used for carrying out characteristic analysis on each characteristic column, screening out the characteristic columns with subordination or dependency relationship, and checking whether the data of the characteristic columns have an illogical value or not;
the abnormal value processing sub-module is used for setting an abnormal proportion critical value M%, M is greater than 0, the characteristic column lower than the critical value is an abnormal degree low characteristic column, and otherwise, the characteristic column is an abnormal degree high characteristic column; deleting the outdoor demonstration data set record row where the abnormal value of the abnormal degree low characteristic column is located; and correspondingly changing the abnormal values of the high-abnormality characteristic column on the premise of ensuring the accuracy of the original data to ensure that the abnormal values conform to the judgment of extremum screening and logic screening.
CN202011048216.7A 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system Active CN112364477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011048216.7A CN112364477B (en) 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011048216.7A CN112364477B (en) 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system

Publications (2)

Publication Number Publication Date
CN112364477A true CN112364477A (en) 2021-02-12
CN112364477B CN112364477B (en) 2022-12-06

Family

ID=74508317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011048216.7A Active CN112364477B (en) 2020-09-29 2020-09-29 Outdoor empirical prediction model library generation method and system

Country Status (1)

Country Link
CN (1) CN112364477B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023087569A1 (en) * 2021-11-17 2023-05-25 中国华能集团清洁能源技术研究院有限公司 Photovoltaic string communication abnormality identification method and system based on xgboost

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330566A (en) * 2017-07-19 2017-11-07 桑夏太阳能股份有限公司 The predictor method and system of photovoltaic array power output
CN107944604A (en) * 2017-11-10 2018-04-20 中国电力科学研究院有限公司 A kind of weather pattern recognition methods and device for photovoltaic power prediction
CN108197744A (en) * 2018-01-02 2018-06-22 华北电力大学(保定) A kind of determining method and system of photovoltaic generation power
CN108694484A (en) * 2018-08-30 2018-10-23 广东工业大学 A kind of photovoltaic power generation power prediction method
CN109657881A (en) * 2019-01-14 2019-04-19 南京国电南自电网自动化有限公司 A kind of neural network photovoltaic power generation prediction technique and system suitable for small sample
CN109711609A (en) * 2018-12-15 2019-05-03 福州大学 Photovoltaic plant output power predicting method based on wavelet transformation and extreme learning machine
CN109978258A (en) * 2019-03-26 2019-07-05 北京博望华科科技有限公司 Multi-data source method for forecasting photovoltaic power generation quantity and system based on machine learning
CN110084412A (en) * 2019-04-12 2019-08-02 重庆邮电大学 A kind of photovoltaic power generation big data prediction technique based on the study of Feature Conversion multi-tag
CN110414748A (en) * 2019-08-12 2019-11-05 合肥阳光新能源科技有限公司 Photovoltaic power prediction technique
CN110516844A (en) * 2019-07-25 2019-11-29 太原理工大学 Multivariable based on EMD-PCA-LSTM inputs photovoltaic power forecasting method
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm
CN110689161A (en) * 2019-08-09 2020-01-14 南京因泰莱电器股份有限公司 Method for realizing photovoltaic power generation power prediction model with reusability
CN110796292A (en) * 2019-10-14 2020-02-14 国网辽宁省电力有限公司盘锦供电公司 Photovoltaic power short-term prediction method considering haze influence
CN110909919A (en) * 2019-11-07 2020-03-24 哈尔滨工程大学 Photovoltaic power prediction method of depth neural network model with attention mechanism fused
CN111612244A (en) * 2020-05-18 2020-09-01 南瑞集团有限公司 QRA-LSTM-based method for predicting nonparametric probability of photovoltaic power before day

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330566A (en) * 2017-07-19 2017-11-07 桑夏太阳能股份有限公司 The predictor method and system of photovoltaic array power output
CN107944604A (en) * 2017-11-10 2018-04-20 中国电力科学研究院有限公司 A kind of weather pattern recognition methods and device for photovoltaic power prediction
CN108197744A (en) * 2018-01-02 2018-06-22 华北电力大学(保定) A kind of determining method and system of photovoltaic generation power
CN108694484A (en) * 2018-08-30 2018-10-23 广东工业大学 A kind of photovoltaic power generation power prediction method
CN109711609A (en) * 2018-12-15 2019-05-03 福州大学 Photovoltaic plant output power predicting method based on wavelet transformation and extreme learning machine
CN109657881A (en) * 2019-01-14 2019-04-19 南京国电南自电网自动化有限公司 A kind of neural network photovoltaic power generation prediction technique and system suitable for small sample
CN109978258A (en) * 2019-03-26 2019-07-05 北京博望华科科技有限公司 Multi-data source method for forecasting photovoltaic power generation quantity and system based on machine learning
CN110084412A (en) * 2019-04-12 2019-08-02 重庆邮电大学 A kind of photovoltaic power generation big data prediction technique based on the study of Feature Conversion multi-tag
CN110516844A (en) * 2019-07-25 2019-11-29 太原理工大学 Multivariable based on EMD-PCA-LSTM inputs photovoltaic power forecasting method
CN110689161A (en) * 2019-08-09 2020-01-14 南京因泰莱电器股份有限公司 Method for realizing photovoltaic power generation power prediction model with reusability
CN110414748A (en) * 2019-08-12 2019-11-05 合肥阳光新能源科技有限公司 Photovoltaic power prediction technique
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm
CN110796292A (en) * 2019-10-14 2020-02-14 国网辽宁省电力有限公司盘锦供电公司 Photovoltaic power short-term prediction method considering haze influence
CN110909919A (en) * 2019-11-07 2020-03-24 哈尔滨工程大学 Photovoltaic power prediction method of depth neural network model with attention mechanism fused
CN111612244A (en) * 2020-05-18 2020-09-01 南瑞集团有限公司 QRA-LSTM-based method for predicting nonparametric probability of photovoltaic power before day

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曾湘安、揭敢新等: "不同类型晶硅光伏组件在湿热环境下的性能研究", 《环境试验》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023087569A1 (en) * 2021-11-17 2023-05-25 中国华能集团清洁能源技术研究院有限公司 Photovoltaic string communication abnormality identification method and system based on xgboost

Also Published As

Publication number Publication date
CN112364477B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN108647716B (en) Photovoltaic array fault diagnosis method based on composite information
Jebli et al. Prediction of solar energy guided by pearson correlation using machine learning
CN111382542B (en) Highway electromechanical device life prediction system facing full life cycle
Kang et al. Big data analytics in China's electric power industry: modern information, communication technologies, and millions of smart meters
CN115994325B (en) Fan icing power generation data enhancement method based on TimeGAN deep learning method
CN111967675A (en) Photovoltaic power generation amount prediction method and prediction device
CN109617526A (en) A method of photovoltaic power generation array fault diagnosis and classification based on wavelet multiresolution analysis and SVM
CN112257784A (en) Electricity stealing detection method based on gradient boosting decision tree
CN112364477B (en) Outdoor empirical prediction model library generation method and system
CN109670549A (en) The data screening method, apparatus and computer equipment of fired power generating unit
CN115859099A (en) Sample generation method and device, electronic equipment and storage medium
CN115718861A (en) Method and system for classifying power users and monitoring abnormal behaviors in high-energy-consumption industry
Yun et al. Research on fault diagnosis of photovoltaic array based on random forest algorithm
CN112183877A (en) Photovoltaic power station fault intelligent diagnosis method based on transfer learning
CN115758151A (en) Combined diagnosis model establishing method and photovoltaic module fault diagnosis method
CN113496210B (en) Photovoltaic string tracking and fault tracking method based on attention mechanism
CN113642255A (en) Photovoltaic power generation power prediction method based on multi-scale convolution cyclic neural network
CN116756505B (en) Photovoltaic equipment intelligent management system and method based on big data
CN117495126A (en) High-proportion new energy distribution network line loss prediction method and device
Kardi et al. Anomaly detection in electricity consumption data using deep learning
CN117578438A (en) Generating countermeasure network method and system for predicting new energy power generation
CN117439045A (en) Multi-element load prediction method for comprehensive energy system
Peng et al. Short‐term wind power prediction based on stacked denoised auto‐encoder deep learning and multi‐level transfer learning
CN117764547A (en) Photovoltaic string fault diagnosis method and system
Maalej et al. Sensor data augmentation strategy for load forecasting in smart grid context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant