CN112085621B

CN112085621B - Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model

Info

Publication number: CN112085621B
Application number: CN202010952635.7A
Authority: CN
Inventors: 梁华锋; 陈昊; 林翀
Original assignee: Hangzhou Huadian Xiasha Thermoelectricity Co ltd
Current assignee: Hangzhou Huadian Xiasha Thermoelectricity Co ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2022-08-02
Anticipated expiration: 2040-09-11
Also published as: CN112085621A

Abstract

The invention relates to the technical field of photovoltaic power generation, in particular to a K-Means-HMM model-based distributed photovoltaic power station fault early warning algorithm, which comprises the steps of collecting N production records, screening M pieces of data with the starting electric power of more than 1kw from the N pieces of data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage; training an HMM model by using the clustered working state sample data of the photovoltaic power station; collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result; and inputting the trained HMM model to obtain an early warning result. The method discretizes the generated power of the photovoltaic module according to a power interval, adopts K-Means to classify output current and output voltage into different categories, and utilizes learning and predicting capability of HMM to time series to learn normal state and various fault states so as to judge and early warn the working state of the photovoltaic module.

Description

Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model

Technical Field

The invention relates to the technical field of photovoltaic power generation, in particular to a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model.

Background

With the popularization of photovoltaic power generation in China, the characteristic of high failure rate of a photovoltaic panel gradually appears. Because the photovoltaic panel works in outdoor environment for a long time, the panel inevitably has the problems of damage, aging and the like, so that faults such as hot spots, cracks and the like are caused, the stable operation of the photovoltaic panel is influenced, and the service life of the panel is shortened. Therefore, the diagnosis of the failure of the photovoltaic module is a key topic of interest in the industry.

Common defects of the photovoltaic module include hot spots, glass cracking, lightning striations and the like. Hot spots are due to local power increases caused by dust, falling leaves or other obstructions in the outdoor environment or by internal defects in the battery, resulting in local temperature increases. Glass cracking is the breaking of glass caused by falling objects from the outside or other external forces. The specific cause of the lightning striations is not clear, and researches indicate that the lightning striations can cause the attenuation of output power.

For the fault detection of the photovoltaic module, the conventional method is that a tester uses an IV detector to detect the current and voltage data of the module, and a professional analyzes the data. The method has the defects that the operability is poor, the number of components of the power plant is very large, only specific equipment can be checked, all the equipment cannot be monitored on line in real time, expert knowledge is relied on, and the requirement on the professional level of an analyst is high.

At present, an informatization system of a photovoltaic power plant is quite complete, and real-time production data of photovoltaic modules are collected and stored by the system, so that a photovoltaic module fault diagnosis method based on data driving is popular. However, most of current photovoltaic module fault diagnosis algorithms are researched by academic research institutions in test environments, and even software is adopted to simulate the research environments to generate test data, so that a large number of fault samples can be obtained, and the method is greatly different from the real situation of a photovoltaic power station. And the algorithm only carries out diagnosis and analysis on a single time point during each operation, and gives an alarm when a fault occurs, so that the information on the time dimension is not effectively utilized.

The distributed photovoltaic power station fault early warning algorithm (CN 111124840A) based on the hidden Markov model in the prior art is retrieved, and only missing value processing and data encoding of the distributed photovoltaic power station fault early warning algorithm are discussed in a data preprocessing mode. Therefore, how to encode is achieved, whether the encoding principle is beneficial to knowing the accuracy of model prediction or not is unknown, if a large amount of manual dynamic encoding is carried out or unreasonable encoding rules are formulated, the requirements of sample expansion and model dynamic updating are not facilitated, therefore, the early warning model is poor in later maintenance and expandability, and the prediction accuracy cannot be guaranteed.

Disclosure of Invention

The invention aims to provide a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model comprises the following steps:

s1, collecting N production records, screening M data with the starting electric power being more than 1kw from the N data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage;

s2, training an HMM model by using the clustered working state sample data of the photovoltaic power station;

s3, collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result;

and S4, inputting the real-time production data of the clustered photovoltaic modules into the trained HMM model to obtain an early warning result.

Preferably, in the step S1, N production records of one year are collected, each record includes five parameters of a component number, time, output current, output voltage, and generated power, M pieces of data with the generated electric power of more than 1kw are screened from the N pieces of data, the minimum value of the screened generated power data is 1kw, and the maximum power is 14.7 kw; and setting the power generation power value of certain data as P, and if n is less than or equal to P < n +1 and n is greater than or equal to 1 and less than or equal to 13, taking n as the power generation power step number P as the value of the HMM model state sequence.

Preferably, the specific steps of performing K-Means clustering with two dimensions of output current and voltage in step S1 are as follows:

for the output current in M pieces of data, the output voltage is normalized, the output current and the output voltage are regarded as a matrix with the size of M multiplied by 2, and the normalized calculation formula is as follows:

wherein:

respectively the indices of the rows and columns in the matrix,

after expressing normalization

The matrix elements of the position are,

before expressing normalization

The matrix elements of the position are,

to normalize the smallest matrix element in the first j columns,

is the maximum matrix element in the normalized front j columns;

and performing K-means clustering on the normalized output current and output voltage data, setting the K value to be 20, and obtaining a clustering result C of M pieces of data, wherein the clustering result is used as a value of an HMM observation sequence.

Preferably, the training samples in step S2 are, according to the fault records of the photovoltaic power station of a year, to select 32 normal operation time points and corresponding component numbers in months 1 to 10 of the year in a time period, as training set normal state labels, and to select 8 time points of four faults, namely, blocking, short-circuit of a lead, open-circuit of a lead, and aging of a battery, and corresponding component numbers, as training set fault state labels;

in 11-12 months of the year in the time period, 8 normal operation time points and corresponding component numbers are selected as test set normal state labels, and 2 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as test set fault state labels.

Preferably, each normal state label obtained in step S2 includes a training set and a test set, a corresponding record is searched in M pieces of data according to a time point and a component number, a clustering result C value and a generated power level P value from 300 minutes before the corresponding time point to the corresponding time point are selected, 32 normal samples of the training set and 8 normal samples of the test set are obtained, and each sample includes 300C values and 300P values;

for each fault state label obtained in step S2, the fault state label includes a training set and a test set, corresponding records are searched in M pieces of data according to the time point and the component number, a clustering result C value and a generated power level P value from 360 minutes before to 60 minutes before the corresponding time point are selected, and 32 training set fault samples and 8 test set fault samples are obtained, each sample including 300C values and 300P values.

Preferably, the process of training the HMM model in step S2 is as follows:

the number of state sequence state sets N =13, the data volume of observation set M =20, 300P values as state sequences, 300C values as observation sequences, and the initial state probability vector for each training sample

There are 13 elements, each element having a value of 1/13; for samples in five states of normal, shielding, short circuit of a lead, open circuit of the lead and aging of a battery in a training set, respectively training five HMM models according to the following algorithms:

based on known information, a supervised learning algorithm is used for solving a transition probability matrix A and an observation probability matrix B of the HMM;

(1) solving a transition probability matrix A:

for all training samples, the number of samples with the statistical C value transferred from i to j is

Then, the estimated value of the probability of transitioning from state i to state j is:

the formed matrix is a transition probability matrix A;

(2) and (3) solving an observation probability matrix B:

for all training samples, the number of samples with the C value of i and the P value of j is counted as

If the state is i and the observed value is j, the probability estimation value is:

；

the formed matrix is the observation probability matrix B.

Preferably, the prediction accuracy of the trained HMM model needs to be verified, the observation sequences of the test sample are respectively substituted into the five trained HMM models, and a forward-backward algorithm is used for calculation to obtain the occurrence probabilities of the five observation sequences, wherein the formula of the forward-backward algorithm is as follows:

wherein

Is forward probability, i.e. partial observation of sequence at time t is O ₁ ，O ₂ ，O ₃ …，O _t In a state of

The probability of (d) is recorded as:

is a backward probability, i.e. the partial observation sequence O at the time t +1 _t+2 ，O _t+3 ，…，O _T In a state of

The probability of (d) is recorded as:

according to the 5 probability values obtained by the formula calculation, the model type corresponding to the maximum probability value is taken as the prediction state of the test data; and calculating the prediction accuracy of the HMM model according to the predicted state and the actual state, and if the accuracy is out of the range, reselecting data to train the HMM model according to the steps from S1 to S2.

Preferably, in step S3, data of the output current, the output voltage, and the length of any photovoltaic module in 300 minutes are obtained, normalized values of the output current and the output voltage are calculated according to the normalization formula in step S1, and the clustering result is calculated by using the K-means model trained in step S1.

Preferably, in step S4, the 300 clustering results obtained in step S3 are used as observation sequences, five trained HMMs are respectively input, 5 probability values are obtained according to a forward-backward algorithm, and the model type corresponding to the maximum probability value is taken, that is, the predicted state.

Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of discretizing the generated power of the photovoltaic module according to a power interval, dividing output current and output voltage into different categories by adopting K-Means, learning and predicting the time sequence by utilizing HMM (hidden Markov model), learning normal states and various fault states, and judging and early warning the working state of the photovoltaic module.

The following advantages can be obtained through the early warning of the invention:

1. there is a higher accuracy for small sample datasets. Common time series models are ARIMA model, HMM and RNN. The ARIMA model has poor prediction on unstable time sequence data and strong RNN learning capacity, but has a complex structure, needs a large amount of data and time for training, and the number of fault samples of the photovoltaic module cannot meet the training requirement.

2. Comprehensive and timeliness, for traditional manual work to carry out the spot check to a few subassemblies, the model can cover all subassemblies, real-time on-line monitoring carries out, sends out the police dispatch newspaper to the trouble in advance.

3. The intelligent degree is high. The fault can be automatically diagnosed, maintenance personnel do not need to go to a field to use an instrument to detect the component, and the diagnosis process does not need manual participation.

Drawings

FIG. 1 is a prediction flow diagram of the present invention;

FIG. 2 is a diagram illustrating the structure of an HMM model according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-2, the present invention provides a technical solution:

a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model comprises the following steps:

s1, collecting N production records, screening M data with the starting electric power being more than 1kw from the N data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage.

The specific process is as follows:

n production records in 2019 (or other time periods) all year around are collected, each record comprises five parameters of component number, time, output current, output voltage and generated power, M pieces of data with the starting electric power of more than 1kw are screened from the N pieces of data, the minimum value of the screened generated power data is 1kw, and the maximum power is 14.7 kw. And setting the power generation power value of certain data as P, and if n is less than or equal to P < n +1 and n is greater than or equal to 1 and less than or equal to 13, taking n as the power generation power step number P as the value of the HMM model state sequence.

The specific steps of performing K-Means clustering by using two dimensions of output current and voltage are as follows:

wherein:

respectively the indices of the rows and columns in the matrix,

after representing the normalization

The matrix elements of the position are,

before expressing normalization

The matrix elements of the position are,

to normalize the smallest matrix element in the first j columns,

is the maximum matrix element in the normalized front j columns;

And S2, training an HMM model by using the clustered working state sample data of the photovoltaic power station.

The specific process is as follows:

the training sample is that according to the fault records of the photovoltaic power station in 2019 (other time periods can be collected, but the training sample is required to correspond to the step S1), 32 normal operation time points and corresponding component numbers are selected in 1-10 months in 2019 to serve as training set normal state labels, and 8 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected to serve as training set fault state labels;

in 11-12 months in 2019, 8 normal operation time points and corresponding component numbers are selected as test set normal state labels, and 2 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as test set fault state labels.

Each normal state label obtained in the step S2 includes a training set and a test set, corresponding records are searched in M pieces of data according to time points and component numbers, a clustering result C value and a generated power level P value from 300 minutes before the corresponding time point to the corresponding time point are selected, 32 normal samples of the training set and 8 normal samples of the test set are obtained, and each sample includes 300C values and 300P values;

The HMM is a time series model, a state sequence is generated by a hidden Markov chain, and an observation sequence is formed by each state to generate an observation. The HMM structure is shown in fig. 1. The process of training the HMM model is as follows:

(1) solving a transition probability matrix A:

the formed matrix is a transition probability matrix A;

(2) and (3) solving an observation probability matrix B:

for all training samples, the number of samples with the C value of i and the P value of j is counted

A first, thenThe probability estimate for state i and observation j is:

；

the formed matrix is the observation probability matrix B.

Verifying the prediction accuracy of the trained HMM model, and the process is as follows:

and respectively substituting the observation sequences of the test sample into the five trained HMM models, and calculating by using a forward-backward algorithm to obtain the occurrence probability of the five observation sequences, wherein the formula of the forward-backward algorithm is as follows:

wherein

The probability of (d) is recorded as:

Probability of (2) is recorded as：

According to the 5 probability values obtained by the formula calculation, the model type corresponding to the maximum probability value is taken as the prediction state of the test data; and calculating the prediction accuracy of the HMM model according to the predicted state and the actual state, and if the accuracy is beyond the range, re-selecting data to train the HMM model according to the steps from S1 to S2. The accuracy range can be set by itself, for example, more than 90% or more than 95%.

And S3, collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result.

The specific process is as follows:

and (4) taking data of the output current, the output voltage and the length of any photovoltaic module, calculating the normalized values of the output current and the output voltage according to the normalization formula in the step S1, and calculating a clustering result by using the K-means model trained in the step S1.

The specific process is as follows:

and taking the 300 clustering results obtained in the step S3 as observation sequences, respectively inputting the five trained HMMs, obtaining 5 probability values according to a forward-backward algorithm, and taking the model type corresponding to the maximum probability value, namely the predicted state.

The creative key link of the invention is to discretize the power generation power of the photovoltaic module according to the power interval, divide the output current and the output voltage into different categories by adopting K-Means, learn the normal state and various fault states by utilizing the learning and predicting capability of HMM to the time sequence, and further judge and early warn the working state of the photovoltaic module.

The HMM is a main algorithm used for photovoltaic fault early warning, the HMM is divided into model training modes of supervised learning and unsupervised learning, the supervised model training mode is adopted in the scheme, a series of data preprocessing modes are carried out to meet the characteristics (discrete data) of the HMM, firstly, power data are subjected to stepping discretization, and secondly, voltage and current are subjected to discretization through K-Means, and the structural requirements of the HMM are met. In the prior art, a k-means clustering algorithm is also applied to fault early warning, but the discretization function of the k-means clustering algorithm on data is not utilized. The K-Means in the technical scheme adopts an unsupervised mode to solve the problem of data discretization in the process of supervised learning of the HMM. Has unexpected technical effect.

And it is only necessary to know that these data of the cluster have different categories, without concern about what category they are specifically. The K-Means model is only one of the clustering modes, and a GMM (Gaussian mixture model) clustering mode can also be adopted.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model is characterized in that: the method comprises the following steps:

collecting N production records of one year, wherein each record comprises five parameters of component number, time, output current, output voltage and generated power, M data with the starting electric power of more than 1kw are screened from the N data, the minimum value of the screened generated power data is 1kw, and the maximum power is 14.7 kw; setting the power generation power value of certain data as P, if n is more than or equal to P < n +1, and n is more than or equal to 1 and less than or equal to 13, then n is the power generation power step number P and is used as the value of the HMM model state sequence;

wherein:

respectively the indices of the rows and columns in the matrix,

after expressing normalization

The matrix elements of the position are,

before expressing normalization

The matrix elements of the position are,

to normalize the smallest matrix element in the first j columns,

is the maximum matrix element in the normalized front j columns;

performing K-means clustering on the normalized output current and output voltage data, setting the K value to be 20, and obtaining a clustering result C of M pieces of data, wherein the clustering result is used as a value of an HMM observation sequence;

the training sample is that 32 normal operation time points and corresponding component numbers are selected in 10 months of the year in a time period according to fault records of a photovoltaic power station of a year, the 32 normal operation time points and the corresponding component numbers are used as training set normal state labels, and 8 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as training set fault state labels;

in the other 2 months of the year in the time period, 8 normal operation time points and corresponding component numbers are selected as test set normal state labels, and 2 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as test set fault state labels;

obtaining each normal state label, wherein the normal state label comprises a training set and a testing set, searching for a corresponding record in M pieces of data according to the time point and the component number, selecting a clustering result C value and a power generation power level number P value from 300 minutes before the corresponding time point to the corresponding time point, and obtaining 32 normal samples of the training set and 8 normal samples of the testing set, wherein each sample comprises 300C values and 300P values;

for each obtained fault state label, the fault state label comprises a training set and a testing set, corresponding records are searched in M pieces of data according to time points and component numbers, clustering result C values and power generation level number P values from 360 minutes to 60 minutes before the corresponding time point are selected, 32 training set fault samples and 8 testing set fault samples are obtained, and each sample comprises 300C values and 300P values;

the process of training the HMM model is as follows:

There are 13 elements, each element having a value of 1/13; for positive in training setSamples of five states of normal, shielding, short circuit of a lead, open circuit of the lead and aging of a battery are trained respectively according to the following algorithms:

(1) solving a transition probability matrix A:

the formed matrix is a transition probability matrix A;

(2) and (3) solving an observation probability matrix B:

；

the formed matrix is an observation probability matrix B;

and verifying the prediction accuracy of the trained HMM model, respectively substituting the observation sequences of the test sample into the five trained HMM models, and calculating by using a forward-backward algorithm to obtain the occurrence probability of the five observation sequences, wherein the formula of the forward-backward algorithm is as follows:

wherein

The probability of (d) is recorded as:

The probability of (d) is recorded as:

according to the 5 probability values obtained by the formula calculation, the model type corresponding to the maximum probability value is taken as the prediction state of the test data; calculating the prediction accuracy of the HMM model according to the predicted state and the actual state, and if the accuracy is beyond the range, re-selecting data to train the HMM model according to the steps S1 to S2;

taking data of the output current, the output voltage and the length of any photovoltaic module, calculating the normalized values of the output current and the output voltage according to the normalization formula in the step S1, and calculating a clustering result by using the K-means model trained in the step S1;

s4, inputting the real-time production data of the clustered photovoltaic modules into the trained HMM model to obtain an early warning result;