Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for detecting abnormal data of a photovoltaic power station, and an electronic device, so as to solve the problem that abnormal data of a distributed photovoltaic power station needs to be detected urgently.
In order to solve the technical problems, the invention adopts the following technical scheme:
an abnormal data detection method of a photovoltaic power station comprises the following steps:
acquiring at least one photovoltaic residual data;
calculating a density value and a distance value corresponding to the photovoltaic residual data;
screening out photovoltaic residual data of which the corresponding density values and distance values meet preset conditions, and taking the photovoltaic residual data as a clustering center point of at least one photovoltaic residual data;
clustering the photovoltaic residual data based on the clustering central point to obtain a clustering result;
and determining photovoltaic residual data which do not belong to any cluster in the clustering result as abnormal data.
Optionally, calculating a density value corresponding to the photovoltaic residual data includes:
acquiring a photovoltaic residual data threshold;
by using
Formula, calculating to obtain density value rho corresponding to photovoltaic residual data
i(ii) a Wherein i, j is the identifier of the photovoltaic residual data; d
i,jThe Euclidean distance of the two photovoltaic residual error data; d
cIs a photovoltaic residual data threshold.
Optionally, the calculating a distance value corresponding to the photovoltaic residual data includes:
for each of the photovoltaic residual data, determining a set of photovoltaic residual data having a corresponding density value greater than a corresponding density value of the photovoltaic residual data;
according to
Calculating to obtain a distance value corresponding to the photovoltaic residual error data; wherein, I
sA set of photovoltaic residual data having a corresponding density value greater than a corresponding density value of the photovoltaic residual data.
Optionally, the preset condition includes that the density value is greater than an average value of the density values corresponding to all the photovoltaic residual data, and the distance value is greater than an average value of the distance values corresponding to all the photovoltaic residual data.
Optionally, acquiring at least one photovoltaic residual data comprises:
acquiring actual operation data and predicted operation data of at least one power station;
and subtracting the actual operation data and the predicted operation data corresponding to the same power station to obtain photovoltaic residual data corresponding to the power station.
Optionally, obtaining predicted operating data of at least one power station comprises:
acquiring predicted operation data, a weight value and acquired actual operation data at the previous data acquisition moment;
according to St=ayt-1+(1-a)St-1Calculating to obtain the predicted operation data of the power station according to a formula; wherein a is a weight value; y ist-1Actual operation data collected at the previous data collection moment; st-1And predicting the operation data at the previous data acquisition time.
An abnormal data detection apparatus of a photovoltaic power plant, comprising:
the data acquisition module is used for acquiring at least one photovoltaic residual error data;
the numerical value calculation module is used for calculating a density value and a distance value corresponding to the photovoltaic residual data;
the data screening module is used for screening out photovoltaic residual data of which the corresponding density values and distance values meet preset conditions and using the photovoltaic residual data as a clustering center point of at least one photovoltaic residual data;
the clustering module is used for clustering the photovoltaic residual error data based on the clustering central point to obtain a clustering result;
and the abnormal data determining module is used for determining the photovoltaic residual data which do not belong to any cluster in the clustering result as abnormal data.
Optionally, when the numerical calculation module is configured to calculate the density value corresponding to the photovoltaic residual data, the numerical calculation module is specifically configured to:
obtaining photovoltaic residual data threshold value, utilizing
Formula, calculating to obtain density value rho corresponding to photovoltaic residual data
i(ii) a Wherein i, j is the identifier of the photovoltaic residual data; d
i,jThe Euclidean distance of the two photovoltaic residual error data; d
cIs a photovoltaic residual data threshold.
Optionally, when the numerical calculation module is configured to calculate a distance value corresponding to the photovoltaic residual data, the numerical calculation module is specifically configured to:
for each photovoltaic residual data, determining a set of photovoltaic residual data having a corresponding density value greater than a corresponding density value of the photovoltaic residual data, according to
Calculating to obtain a distance value corresponding to the photovoltaic residual error data; wherein, I
sA set of photovoltaic residual data having a corresponding density value greater than a corresponding density value of the photovoltaic residual data.
An electronic device, comprising: a memory and a processor;
wherein the memory is used for storing programs;
the processor calls a program and is used to:
acquiring at least one photovoltaic residual data;
calculating a density value and a distance value corresponding to the photovoltaic residual data;
screening out photovoltaic residual data of which the corresponding density values and distance values meet preset conditions, and taking the photovoltaic residual data as a clustering center point of at least one photovoltaic residual data;
clustering the photovoltaic residual data based on the clustering central point to obtain a clustering result;
and determining photovoltaic residual data which do not belong to any cluster in the clustering result as abnormal data.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a method and a device for detecting abnormal data of a photovoltaic power station and electronic equipment. Furthermore, the clustering center point is determined through the two dimension data of the density value and the distance value, so that the determined clustering center point is more accurate, the clustering result obtained by using the clustering center point is more accurate, and the determined abnormal data is more accurate.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to monitor abnormal data in photovoltaic power stations at different positions, the inventor finds that abnormal data can be detected through a data abnormality detection model based on cubic exponential smoothing and DBSCAN. The data anomaly detection model mainly comprises a cubic exponential smoothing model and a DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise) Clustering algorithm. And the cubic exponential smoothing model carries out time sequence modeling on the input power consumption data sequence, predicts time by time and obtains a power consumption predicted value corresponding to each time. And then, carrying out cluster analysis on residual error items of the real value and the predicted value of the power consumption data by adopting a DBSCAN clustering algorithm, thereby realizing the detection of abnormal data points. Specifically, a clustering center point is determined by manually referring to residual items of the true value and the predicted value according to experience, then clustering is performed by using the clustering center point to obtain a clustering result, and photovoltaic residual data which do not belong to any clustering cluster in the clustering result is determined as abnormal data.
However, the inventor finds that, when the abnormal detection technology based on the DBSCAN data clustering determines the clustering center point, a manual determination mode is adopted, and if the manually determined clustering center point is inaccurate, the minimum inclusion number MinPts and the scanning radius Eps are difficult to select by using the inaccurate clustering center point under the scenes that the density of the data clustering is not uniform and the clustering intervals are greatly different. And finally, the clustering result is inaccurate, namely the clustering quality is poor, and further, the screened abnormal data is inaccurate. If the method is applied to the power data anomaly detection, the anomaly data cannot be detected and distinguished quickly and accurately, and the data processing accuracy and the real-time improvement effect are not obvious.
In order to solve the problems that a Clustering center point is selected wrongly due to a mode of manually determining the Clustering center point, so that Clustering failure and abnormal data detection are caused, the inventor of the invention provides a method for determining the Clustering center point based on a CFSFDP (Clustering by fast Clustering algorithm and fine of similarity peaks, density peak value based) algorithm and screening abnormal data, when determining the Clustering center point, two influencing factors of density and distance are considered comprehensively, similar data can be clustered together by adopting the density factor, the distance between the two Clustering center points can be far enough by adopting the distance factor, the difference between the two Clustering clusters is larger, and the accuracy of which Clustering cluster the data should fall into is improved. Therefore, the clustering center line point can be determined more quickly and accurately, and then clustering calculation is carried out based on the clustering center point to obtain abnormal data, so that the determined abnormal data is more accurate.
Specifically, referring to fig. 1, the abnormal data detection method of the photovoltaic power station may include:
and S11, acquiring at least one photovoltaic residual error data.
The distributed photovoltaic power stations are widely distributed, for example, one photovoltaic power station can be respectively arranged at A, B, C, D, E, F and other places, and in order to monitor which data monitored by the photovoltaic power station at the same time is abnormal data, the abnormal data detection method in the embodiment of the invention is provided. In the present embodiment, the case of simultaneously collecting irradiance and ambient temperature is described.
If 23 photovoltaic power generation stations are provided, each photovoltaic power generation station acquires the ambient temperature and irradiance at the current moment, wherein the ambient temperature and irradiance at the current moment acquired by each photovoltaic power generation station can form a vector, 23 photovoltaic power generation stations correspond to 23 vectors, and each vector is actual operation data.
After having gathered actual operation data, still need carry out data cleaning to actual operation data, wash dirty data, data cleaning can include:
the first step is as follows: the method comprises the following steps of firstly cleaning missing values, namely confirming the range of the missing values, removing unnecessary fields, filling missing contents and re-fetching.
The second step is that: format content cleaning, including two stages of adjusting display format inconsistency and content inconsistency between the content and the field.
The third step: and the logical error cleaning comprises three stages of removing the duplicate, removing the unreasonable value and correcting the contradictory content.
The fourth step: and (4) relevance verification, namely performing relevance verification on a plurality of data sources, and striving for non-contradictory data before the plurality of data sources.
After the actual operation data is acquired, the predicted operation data at the current moment also needs to be determined, at this time, the calculation of the predicted operation data can be carried out by adopting an exponential smoothing algorithm, and the predicted operation data StThe calculation formula of (2) is as follows:
St=ayt-1+(1-a)St-1(ii) a Wherein a is a weight value; y ist-1Actual operation data at the previous data acquisition moment; st-1And predicting the operation data at the previous data acquisition time.
The predicted operation data of each power station at the current moment can be determined through the formula, and the predicted operation data of 23 power stations at the current moment can be obtained.
It should be noted that, if data is detected every fixed time, such as 5 seconds, 10 seconds, 1 minute, and the like, the previous data acquisition time is the previous data acquisition time of the current time, and if the fixed time is 5 seconds, the previous data acquisition time is 5 seconds before. a is a weight value, which is set by a technician according to a specific data detection scenario, and is not limited to a specific numerical value.
If the actual operation data is two, such as the ambient temperature and the irradiance, the ambient temperature and the irradiance in the predicted operation data are calculated separately according to the formula, the predicted data corresponding to the ambient temperature and the irradiance are obtained through calculation respectively, and then a vector comprising the ambient temperature and the irradiance is formed and used as the predicted operation data.
And after actual operation data and the predicted operation data corresponding to each power station are obtained, subtracting the actual operation data and the predicted operation data corresponding to the same power station to obtain photovoltaic residual data corresponding to the power station. And after a difference making result is obtained, taking the absolute value of the difference value as final photovoltaic residual data.
If the number of the power stations is 23, 23 pieces of photovoltaic residual data, that is, 23 photovoltaic residual vectors, may be obtained, and a scene diagram of the photovoltaic residual data may refer to fig. 2. In fig. 2, the abscissa and the ordinate may represent the temperature difference value and the illumination difference value. There are 23 circles, each representing a photovoltaic residual data.
As can be seen from fig. 2, the point 23 photovoltaic residual data is far from other photovoltaic residual data, and there is a high possibility that the data is abnormal data.
And S12, determining the clustering center point of the photovoltaic residual error data according to the photovoltaic residual error data.
And S13, clustering the photovoltaic residual error data based on the clustering central point to obtain a clustering result.
The clustering center point is a center point used in data clustering, and after a plurality of photovoltaic residual data are obtained, clustering is carried out by using the clustering center point, so that a plurality of clustering clusters can be obtained, and the clustering result can be used.
When clustering is carried out, algorithms such as a DBSCAN algorithm, a K-means algorithm, an improved KNN algorithm and the like can be adopted for clustering. For example, taking the improved KNN algorithm as an example, a corresponding density threshold (also referred to as a scanning radius) Eps is set, and the photovoltaic residual data is clustered, so that a clustering result can be obtained.
And S14, determining the photovoltaic residual data which do not belong to any cluster in the clustering result as abnormal data.
In practical application, the photovoltaic residual data which do not belong to any cluster is abnormal data. And after the abnormal data are obtained, the abnormal data are pushed to a monitoring system for photovoltaic operation and maintenance personnel to check and make decisions and timely process faults.
In the embodiment, after the photovoltaic residual data corresponding to the power station is obtained, the clustering center point of the photovoltaic residual data is determined according to the photovoltaic residual data, the photovoltaic residual data are clustered based on the clustering center point to obtain clustering results, and the photovoltaic residual data which do not belong to any clustering cluster in the clustering results are determined as abnormal data. Furthermore, the clustering center point is determined through the two dimension data of the density value and the distance value, so that the determined clustering center point is more accurate, the clustering result obtained by using the clustering center point is more accurate, and the determined abnormal data is more accurate.
The above proposes "determining the clustering center point of the photovoltaic residual data according to the photovoltaic residual data", and now details a specific implementation process thereof. Specifically, referring to fig. 3, step S12 may include:
and S21, calculating the density value and the distance value corresponding to the photovoltaic residual data.
In this embodiment, a CFSFDP algorithm is used to determine the clustering center point, and when determining the clustering center point, a density value and a distance value corresponding to each photovoltaic residual data need to be determined.
In practical applications, the density value is related to the euclidean distance between the photovoltaic residual data and all of the photovoltaic residual data except the photovoltaic residual data. In particular, the density value ρiThe calculation formula of (2) is as follows:
wherein, i, j is the mark of the photovoltaic residual data, namelyWhich photovoltaic residual data is the body; d
cIs a photovoltaic residual data threshold; d
i,jEuclidean distance for two photovoltaic residual data.
After determining the density value rhoiThen, the distance value sigma of the photovoltaic residual error data is continuously determinediA distance value is related to a density value of the photovoltaic residual data and Euclidean distances between the photovoltaic residual data and all of the photovoltaic residual data except the photovoltaic residual data; in particular, the method comprises the following steps of,
wherein, for a photovoltaic residual data, IsIs a set of photovoltaic residual data with a corresponding density value greater than the corresponding density value of the photovoltaic residual data, if IsIf not, the photovoltaic residual data is compared with IsThe minimum Euclidean distance of the photovoltaic residual data in (1) is taken as sigmaiIf I issIf the set is an empty set, the maximum Euclidean distance between the photovoltaic residual data and all the photovoltaic residual data is taken as sigmai。
And S22, screening out photovoltaic residual data of which the corresponding density values and distance values meet preset conditions, and taking the photovoltaic residual data as the clustering center point.
After the density value and the distance value of each photovoltaic residual data are determined, whether the density value and the distance value corresponding to each photovoltaic residual data meet preset conditions is judged, and the preset conditions can be that the density value is larger than the average value of the density values corresponding to all the photovoltaic residual data and the distance value is larger than the average value of the distance values corresponding to all the photovoltaic residual data. That is to say, the photovoltaic residual data with the larger density value and the larger distance value are screened out, and the data are the clustering center points.
Referring to fig. 4, after obtaining the density value and the distance value of each photovoltaic residual data, a two-dimensional graph of ρ and σ may be constructed, and points with larger density values and distance values, such as 14 and 19, are screened from the two-dimensional graph, i.e., the clustering center points are obtained. Note that point 23 in fig. 4 is closer to the coordinate σ axis and farther from the ρ axis, and this point is determined as an abnormal point. No. 23 points can be screened out through a clustering algorithm, and compared with the existing anomaly detection algorithm, the algorithm is higher in precision and higher in operation speed.
In this embodiment, a CFSFDP clustering algorithm based on density and distance is used, and by analyzing two characteristics of low density around the outlier and a distance between the outlier and the central point, similar data can be clustered together by using a density factor, and the distance between the two clustering central points can be sufficiently far by using a distance factor, so that the difference between the two clustering clusters is larger, and the accuracy of which clustering cluster a certain data should fall into is improved. The method and the device can quickly find the outlier, namely the abnormal data. Compared to a single density-based or single distance-based data anomaly detection algorithm. On one hand, relevance between the original electric power data is not damaged as far as possible, on the other hand, dimensionality and complexity of the data are reduced, and accurate detection of abnormal data is achieved, so that the safety situation of the electric power big data network is ensured, the abnormal detection result is more accurate, and the operation speed is more efficient.
Optionally, on the basis of the above embodiment of the abnormal data detection method, another embodiment of the present invention provides an abnormal data detection apparatus for a photovoltaic power plant, and with reference to fig. 5, the abnormal data detection apparatus may include:
a data obtaining module 11, configured to obtain at least one photovoltaic residual data;
a numerical calculation module 12, configured to calculate a density value and a distance value corresponding to the photovoltaic residual data;
the data screening module 13 is configured to screen out photovoltaic residual data of which the corresponding density values and distance values meet preset conditions, and use the photovoltaic residual data as a clustering center point of at least one photovoltaic residual data;
the clustering module 14 is configured to cluster the photovoltaic residual data based on the clustering center point to obtain a clustering result;
and the abnormal data determining module 15 is configured to determine photovoltaic residual data, which do not belong to any cluster, in the clustering result as abnormal data.
Further, when the numerical calculation module is configured to calculate the density value corresponding to the photovoltaic residual data, the numerical calculation module is specifically configured to:
obtaining photovoltaic residual data threshold value, utilizing
Formula, calculating to obtain density value rho corresponding to photovoltaic residual data
i(ii) a Wherein i, j is the identifier of the photovoltaic residual data; d
i,jThe Euclidean distance of the two photovoltaic residual error data; d
cIs a photovoltaic residual data threshold.
Further, when the numerical calculation module is configured to calculate a distance value corresponding to the photovoltaic residual data, the numerical calculation module is specifically configured to:
for each photovoltaic residual data, determining a set of photovoltaic residual data having a corresponding density value greater than a corresponding density value of the photovoltaic residual data, according to
Calculating to obtain a distance value corresponding to the photovoltaic residual error data; wherein, I
sA set of photovoltaic residual data having a corresponding density value greater than a corresponding density value of the photovoltaic residual data.
Further, the preset conditions include that the density value is greater than an average value of the density values corresponding to all the photovoltaic residual data, and the distance value is greater than an average value of the distance values corresponding to all the photovoltaic residual data.
Further, when the data obtaining module is configured to obtain at least one photovoltaic residual data, the data obtaining module is specifically configured to: acquiring actual operation data and predicted operation data of at least one power station, and subtracting the actual operation data and the predicted operation data corresponding to the same power station to obtain photovoltaic residual data corresponding to the power station.
Further, obtaining predicted operational data for at least one power station, comprising:
obtaining the predicted operation data, the weight value and the collected actual operation data at the previous data collection moment according to the St=ayt-1+(1-a)St-1Calculating to obtain the predicted operation data of the power station according to a formula; wherein a is a weight value; y ist-1Actual operation data collected at the previous data collection moment; st-1And predicting the operation data at the previous data acquisition time.
In the embodiment, after the photovoltaic residual data corresponding to the power station is obtained, the clustering center point of the photovoltaic residual data is determined according to the photovoltaic residual data, the photovoltaic residual data are clustered based on the clustering center point to obtain clustering results, and the photovoltaic residual data which do not belong to any clustering cluster in the clustering results are determined as abnormal data. Furthermore, the clustering center point is determined through the two dimension data of the density value and the distance value, so that the determined clustering center point is more accurate, the clustering result obtained by using the clustering center point is more accurate, and the determined abnormal data is more accurate.
In addition, by utilizing a CFSFDP clustering algorithm based on density and distance, similar data can be clustered together by adopting a density factor through analyzing two characteristics of low density around a cluster point and far distance between the cluster point and a central point, the distance between the two cluster central points can be far enough by adopting a distance factor, the difference of the two cluster clusters is larger, and the accuracy of which cluster the certain data should fall into is improved. The method and the device can quickly find the outlier, namely the abnormal data. Compared to a single density-based or single distance-based data anomaly detection algorithm. On one hand, relevance between the original electric power data is not damaged as far as possible, on the other hand, dimensionality and complexity of the data are reduced, and accurate detection of abnormal data is achieved, so that the safety situation of the electric power big data network is ensured, the abnormal detection result is more accurate, and the operation speed is more efficient.
It should be noted that, for the working process of each module in this embodiment, please refer to the corresponding description in the above embodiments, which is not described herein again.
Optionally, on the basis of the above embodiments of the abnormal data detection method and apparatus, another embodiment of the present invention provides an electronic device, including: a memory and a processor;
wherein the memory is used for storing programs;
the processor calls a program and is used to:
acquiring at least one photovoltaic residual data;
calculating a density value and a distance value corresponding to the photovoltaic residual data;
screening out photovoltaic residual data of which the corresponding density values and distance values meet preset conditions, and taking the photovoltaic residual data as a clustering center point of at least one photovoltaic residual data;
clustering the photovoltaic residual data based on the clustering central point to obtain a clustering result;
and determining photovoltaic residual data which do not belong to any cluster in the clustering result as abnormal data.
In the embodiment, after the photovoltaic residual data corresponding to the power station is obtained, the clustering center point of the photovoltaic residual data is determined according to the photovoltaic residual data, the photovoltaic residual data are clustered based on the clustering center point to obtain clustering results, and the photovoltaic residual data which do not belong to any clustering cluster in the clustering results are determined as abnormal data. Furthermore, the clustering center point is determined through the two dimension data of the density value and the distance value, so that the determined clustering center point is more accurate, the clustering result obtained by using the clustering center point is more accurate, and the determined abnormal data is more accurate.
In addition, by utilizing a CFSFDP clustering algorithm based on density and distance, similar data can be clustered together by adopting a density factor through analyzing two characteristics of low density around a cluster point and far distance between the cluster point and a central point, the distance between the two cluster central points can be far enough by adopting a distance factor, the difference of the two cluster clusters is larger, and the accuracy of which cluster the certain data should fall into is improved. The method and the device can quickly find the outlier, namely the abnormal data. Compared to a single density-based or single distance-based data anomaly detection algorithm. On one hand, relevance between the original electric power data is not damaged as far as possible, on the other hand, dimensionality and complexity of the data are reduced, and accurate detection of abnormal data is achieved, so that the safety situation of the electric power big data network is ensured, the abnormal detection result is more accurate, and the operation speed is more efficient.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.