CN112085621B - Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model - Google Patents

Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model Download PDF

Info

Publication number
CN112085621B
CN112085621B CN202010952635.7A CN202010952635A CN112085621B CN 112085621 B CN112085621 B CN 112085621B CN 202010952635 A CN202010952635 A CN 202010952635A CN 112085621 B CN112085621 B CN 112085621B
Authority
CN
China
Prior art keywords
data
state
probability
value
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010952635.7A
Other languages
Chinese (zh)
Other versions
CN112085621A (en
Inventor
梁华锋
陈昊
林翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huadian Xiasha Thermoelectricity Co ltd
Original Assignee
Hangzhou Huadian Xiasha Thermoelectricity Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huadian Xiasha Thermoelectricity Co ltd filed Critical Hangzhou Huadian Xiasha Thermoelectricity Co ltd
Priority to CN202010952635.7A priority Critical patent/CN112085621B/en
Publication of CN112085621A publication Critical patent/CN112085621A/en
Application granted granted Critical
Publication of CN112085621B publication Critical patent/CN112085621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Probability & Statistics with Applications (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing Of Individual Semiconductor Devices (AREA)
  • Photovoltaic Devices (AREA)

Abstract

The invention relates to the technical field of photovoltaic power generation, in particular to a K-Means-HMM model-based distributed photovoltaic power station fault early warning algorithm, which comprises the steps of collecting N production records, screening M pieces of data with the starting electric power of more than 1kw from the N pieces of data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage; training an HMM model by using the clustered working state sample data of the photovoltaic power station; collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result; and inputting the trained HMM model to obtain an early warning result. The method discretizes the generated power of the photovoltaic module according to a power interval, adopts K-Means to classify output current and output voltage into different categories, and utilizes learning and predicting capability of HMM to time series to learn normal state and various fault states so as to judge and early warn the working state of the photovoltaic module.

Description

Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model
Technical Field
The invention relates to the technical field of photovoltaic power generation, in particular to a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model.
Background
With the popularization of photovoltaic power generation in China, the characteristic of high failure rate of a photovoltaic panel gradually appears. Because the photovoltaic panel works in outdoor environment for a long time, the panel inevitably has the problems of damage, aging and the like, so that faults such as hot spots, cracks and the like are caused, the stable operation of the photovoltaic panel is influenced, and the service life of the panel is shortened. Therefore, the diagnosis of the failure of the photovoltaic module is a key topic of interest in the industry.
Common defects of the photovoltaic module include hot spots, glass cracking, lightning striations and the like. Hot spots are due to local power increases caused by dust, falling leaves or other obstructions in the outdoor environment or by internal defects in the battery, resulting in local temperature increases. Glass cracking is the breaking of glass caused by falling objects from the outside or other external forces. The specific cause of the lightning striations is not clear, and researches indicate that the lightning striations can cause the attenuation of output power.
For the fault detection of the photovoltaic module, the conventional method is that a tester uses an IV detector to detect the current and voltage data of the module, and a professional analyzes the data. The method has the defects that the operability is poor, the number of components of the power plant is very large, only specific equipment can be checked, all the equipment cannot be monitored on line in real time, expert knowledge is relied on, and the requirement on the professional level of an analyst is high.
At present, an informatization system of a photovoltaic power plant is quite complete, and real-time production data of photovoltaic modules are collected and stored by the system, so that a photovoltaic module fault diagnosis method based on data driving is popular. However, most of current photovoltaic module fault diagnosis algorithms are researched by academic research institutions in test environments, and even software is adopted to simulate the research environments to generate test data, so that a large number of fault samples can be obtained, and the method is greatly different from the real situation of a photovoltaic power station. And the algorithm only carries out diagnosis and analysis on a single time point during each operation, and gives an alarm when a fault occurs, so that the information on the time dimension is not effectively utilized.
The distributed photovoltaic power station fault early warning algorithm (CN 111124840A) based on the hidden Markov model in the prior art is retrieved, and only missing value processing and data encoding of the distributed photovoltaic power station fault early warning algorithm are discussed in a data preprocessing mode. Therefore, how to encode is achieved, whether the encoding principle is beneficial to knowing the accuracy of model prediction or not is unknown, if a large amount of manual dynamic encoding is carried out or unreasonable encoding rules are formulated, the requirements of sample expansion and model dynamic updating are not facilitated, therefore, the early warning model is poor in later maintenance and expandability, and the prediction accuracy cannot be guaranteed.
Disclosure of Invention
The invention aims to provide a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model comprises the following steps:
s1, collecting N production records, screening M data with the starting electric power being more than 1kw from the N data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage;
s2, training an HMM model by using the clustered working state sample data of the photovoltaic power station;
s3, collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result;
and S4, inputting the real-time production data of the clustered photovoltaic modules into the trained HMM model to obtain an early warning result.
Preferably, in the step S1, N production records of one year are collected, each record includes five parameters of a component number, time, output current, output voltage, and generated power, M pieces of data with the generated electric power of more than 1kw are screened from the N pieces of data, the minimum value of the screened generated power data is 1kw, and the maximum power is 14.7 kw; and setting the power generation power value of certain data as P, and if n is less than or equal to P < n +1 and n is greater than or equal to 1 and less than or equal to 13, taking n as the power generation power step number P as the value of the HMM model state sequence.
Preferably, the specific steps of performing K-Means clustering with two dimensions of output current and voltage in step S1 are as follows:
for the output current in M pieces of data, the output voltage is normalized, the output current and the output voltage are regarded as a matrix with the size of M multiplied by 2, and the normalized calculation formula is as follows:
Figure DEST_PATH_IMAGE002
wherein:
Figure DEST_PATH_IMAGE004
respectively the indices of the rows and columns in the matrix,
Figure DEST_PATH_IMAGE006
after expressing normalization
Figure 904141DEST_PATH_IMAGE004
The matrix elements of the position are,
Figure 100002_DEST_PATH_IMAGE008
before expressing normalization
Figure 962227DEST_PATH_IMAGE004
The matrix elements of the position are,
Figure DEST_PATH_IMAGE009
to normalize the smallest matrix element in the first j columns,
Figure 100002_DEST_PATH_IMAGE010
is the maximum matrix element in the normalized front j columns;
and performing K-means clustering on the normalized output current and output voltage data, setting the K value to be 20, and obtaining a clustering result C of M pieces of data, wherein the clustering result is used as a value of an HMM observation sequence.
Preferably, the training samples in step S2 are, according to the fault records of the photovoltaic power station of a year, to select 32 normal operation time points and corresponding component numbers in months 1 to 10 of the year in a time period, as training set normal state labels, and to select 8 time points of four faults, namely, blocking, short-circuit of a lead, open-circuit of a lead, and aging of a battery, and corresponding component numbers, as training set fault state labels;
in 11-12 months of the year in the time period, 8 normal operation time points and corresponding component numbers are selected as test set normal state labels, and 2 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as test set fault state labels.
Preferably, each normal state label obtained in step S2 includes a training set and a test set, a corresponding record is searched in M pieces of data according to a time point and a component number, a clustering result C value and a generated power level P value from 300 minutes before the corresponding time point to the corresponding time point are selected, 32 normal samples of the training set and 8 normal samples of the test set are obtained, and each sample includes 300C values and 300P values;
for each fault state label obtained in step S2, the fault state label includes a training set and a test set, corresponding records are searched in M pieces of data according to the time point and the component number, a clustering result C value and a generated power level P value from 360 minutes before to 60 minutes before the corresponding time point are selected, and 32 training set fault samples and 8 test set fault samples are obtained, each sample including 300C values and 300P values.
Preferably, the process of training the HMM model in step S2 is as follows:
the number of state sequence state sets N =13, the data volume of observation set M =20, 300P values as state sequences, 300C values as observation sequences, and the initial state probability vector for each training sample
Figure DEST_PATH_IMAGE011
There are 13 elements, each element having a value of 1/13; for samples in five states of normal, shielding, short circuit of a lead, open circuit of the lead and aging of a battery in a training set, respectively training five HMM models according to the following algorithms:
based on known information, a supervised learning algorithm is used for solving a transition probability matrix A and an observation probability matrix B of the HMM;
(1) solving a transition probability matrix A:
for all training samples, the number of samples with the statistical C value transferred from i to j is
Figure DEST_PATH_IMAGE012
Then, the estimated value of the probability of transitioning from state i to state j is:
Figure DEST_PATH_IMAGE013
Figure DEST_PATH_IMAGE014
the formed matrix is a transition probability matrix A;
(2) and (3) solving an observation probability matrix B:
for all training samples, the number of samples with the C value of i and the P value of j is counted as
Figure DEST_PATH_IMAGE015
If the state is i and the observed value is j, the probability estimation value is:
Figure DEST_PATH_IMAGE016
Figure DEST_PATH_IMAGE017
the formed matrix is the observation probability matrix B.
Preferably, the prediction accuracy of the trained HMM model needs to be verified, the observation sequences of the test sample are respectively substituted into the five trained HMM models, and a forward-backward algorithm is used for calculation to obtain the occurrence probabilities of the five observation sequences, wherein the formula of the forward-backward algorithm is as follows:
Figure DEST_PATH_IMAGE018
wherein
Figure DEST_PATH_IMAGE019
Is forward probability, i.e. partial observation of sequence at time t is O 1 ,O 2 ,O 3 …,O t In a state of
Figure DEST_PATH_IMAGE020
The probability of (d) is recorded as:
Figure DEST_PATH_IMAGE021
Figure DEST_PATH_IMAGE022
is a backward probability, i.e. the partial observation sequence O at the time t +1 t+2 ,O t+3 ,…,O T In a state of
Figure 86171DEST_PATH_IMAGE020
The probability of (d) is recorded as:
Figure DEST_PATH_IMAGE023
according to the 5 probability values obtained by the formula calculation, the model type corresponding to the maximum probability value is taken as the prediction state of the test data; and calculating the prediction accuracy of the HMM model according to the predicted state and the actual state, and if the accuracy is out of the range, reselecting data to train the HMM model according to the steps from S1 to S2.
Preferably, in step S3, data of the output current, the output voltage, and the length of any photovoltaic module in 300 minutes are obtained, normalized values of the output current and the output voltage are calculated according to the normalization formula in step S1, and the clustering result is calculated by using the K-means model trained in step S1.
Preferably, in step S4, the 300 clustering results obtained in step S3 are used as observation sequences, five trained HMMs are respectively input, 5 probability values are obtained according to a forward-backward algorithm, and the model type corresponding to the maximum probability value is taken, that is, the predicted state.
Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of discretizing the generated power of the photovoltaic module according to a power interval, dividing output current and output voltage into different categories by adopting K-Means, learning and predicting the time sequence by utilizing HMM (hidden Markov model), learning normal states and various fault states, and judging and early warning the working state of the photovoltaic module.
The following advantages can be obtained through the early warning of the invention:
1. there is a higher accuracy for small sample datasets. Common time series models are ARIMA model, HMM and RNN. The ARIMA model has poor prediction on unstable time sequence data and strong RNN learning capacity, but has a complex structure, needs a large amount of data and time for training, and the number of fault samples of the photovoltaic module cannot meet the training requirement.
2. Comprehensive and timeliness, for traditional manual work to carry out the spot check to a few subassemblies, the model can cover all subassemblies, real-time on-line monitoring carries out, sends out the police dispatch newspaper to the trouble in advance.
3. The intelligent degree is high. The fault can be automatically diagnosed, maintenance personnel do not need to go to a field to use an instrument to detect the component, and the diagnosis process does not need manual participation.
Drawings
FIG. 1 is a prediction flow diagram of the present invention;
FIG. 2 is a diagram illustrating the structure of an HMM model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution:
a distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model comprises the following steps:
s1, collecting N production records, screening M data with the starting electric power being more than 1kw from the N data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage.
The specific process is as follows:
n production records in 2019 (or other time periods) all year around are collected, each record comprises five parameters of component number, time, output current, output voltage and generated power, M pieces of data with the starting electric power of more than 1kw are screened from the N pieces of data, the minimum value of the screened generated power data is 1kw, and the maximum power is 14.7 kw. And setting the power generation power value of certain data as P, and if n is less than or equal to P < n +1 and n is greater than or equal to 1 and less than or equal to 13, taking n as the power generation power step number P as the value of the HMM model state sequence.
The specific steps of performing K-Means clustering by using two dimensions of output current and voltage are as follows:
for the output current in M pieces of data, the output voltage is normalized, the output current and the output voltage are regarded as a matrix with the size of M multiplied by 2, and the normalized calculation formula is as follows:
Figure DEST_PATH_IMAGE025
wherein:
Figure DEST_PATH_IMAGE026
respectively the indices of the rows and columns in the matrix,
Figure 419064DEST_PATH_IMAGE006
after representing the normalization
Figure 905540DEST_PATH_IMAGE026
The matrix elements of the position are,
Figure 462423DEST_PATH_IMAGE008
before expressing normalization
Figure 526194DEST_PATH_IMAGE026
The matrix elements of the position are,
Figure 459515DEST_PATH_IMAGE009
to normalize the smallest matrix element in the first j columns,
Figure 800498DEST_PATH_IMAGE010
is the maximum matrix element in the normalized front j columns;
and performing K-means clustering on the normalized output current and output voltage data, setting the K value to be 20, and obtaining a clustering result C of M pieces of data, wherein the clustering result is used as a value of an HMM observation sequence.
And S2, training an HMM model by using the clustered working state sample data of the photovoltaic power station.
The specific process is as follows:
the training sample is that according to the fault records of the photovoltaic power station in 2019 (other time periods can be collected, but the training sample is required to correspond to the step S1), 32 normal operation time points and corresponding component numbers are selected in 1-10 months in 2019 to serve as training set normal state labels, and 8 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected to serve as training set fault state labels;
in 11-12 months in 2019, 8 normal operation time points and corresponding component numbers are selected as test set normal state labels, and 2 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as test set fault state labels.
Each normal state label obtained in the step S2 includes a training set and a test set, corresponding records are searched in M pieces of data according to time points and component numbers, a clustering result C value and a generated power level P value from 300 minutes before the corresponding time point to the corresponding time point are selected, 32 normal samples of the training set and 8 normal samples of the test set are obtained, and each sample includes 300C values and 300P values;
for each fault state label obtained in step S2, the fault state label includes a training set and a test set, corresponding records are searched in M pieces of data according to the time point and the component number, a clustering result C value and a generated power level P value from 360 minutes before to 60 minutes before the corresponding time point are selected, and 32 training set fault samples and 8 test set fault samples are obtained, each sample including 300C values and 300P values.
The HMM is a time series model, a state sequence is generated by a hidden Markov chain, and an observation sequence is formed by each state to generate an observation. The HMM structure is shown in fig. 1. The process of training the HMM model is as follows:
the number of state sequence state sets N =13, the data volume of observation set M =20, 300P values as state sequences, 300C values as observation sequences, and the initial state probability vector for each training sample
Figure 528282DEST_PATH_IMAGE011
There are 13 elements, each element having a value of 1/13; for samples in five states of normal, shielding, short circuit of a lead, open circuit of the lead and aging of a battery in a training set, respectively training five HMM models according to the following algorithms:
based on known information, a supervised learning algorithm is used for solving a transition probability matrix A and an observation probability matrix B of the HMM;
(1) solving a transition probability matrix A:
for all training samples, the number of samples with the statistical C value transferred from i to j is
Figure 79349DEST_PATH_IMAGE012
Then, the estimated value of the probability of transitioning from state i to state j is:
Figure 550782DEST_PATH_IMAGE013
Figure 746271DEST_PATH_IMAGE014
the formed matrix is a transition probability matrix A;
(2) and (3) solving an observation probability matrix B:
for all training samples, the number of samples with the C value of i and the P value of j is counted
Figure 910536DEST_PATH_IMAGE015
A first, thenThe probability estimate for state i and observation j is:
Figure 948899DEST_PATH_IMAGE016
Figure 958444DEST_PATH_IMAGE017
the formed matrix is the observation probability matrix B.
Verifying the prediction accuracy of the trained HMM model, and the process is as follows:
and respectively substituting the observation sequences of the test sample into the five trained HMM models, and calculating by using a forward-backward algorithm to obtain the occurrence probability of the five observation sequences, wherein the formula of the forward-backward algorithm is as follows:
Figure 274018DEST_PATH_IMAGE018
wherein
Figure 343606DEST_PATH_IMAGE019
Is forward probability, i.e. partial observation of sequence at time t is O 1 ,O 2 ,O 3 …,O t In a state of
Figure 869265DEST_PATH_IMAGE020
The probability of (d) is recorded as:
Figure 682500DEST_PATH_IMAGE021
Figure 852581DEST_PATH_IMAGE022
is a backward probability, i.e. the partial observation sequence O at the time t +1 t+2 ,O t+3 ,…,O T In a state of
Figure 358649DEST_PATH_IMAGE020
Probability of (2) is recorded as:
Figure 106025DEST_PATH_IMAGE023
According to the 5 probability values obtained by the formula calculation, the model type corresponding to the maximum probability value is taken as the prediction state of the test data; and calculating the prediction accuracy of the HMM model according to the predicted state and the actual state, and if the accuracy is beyond the range, re-selecting data to train the HMM model according to the steps from S1 to S2. The accuracy range can be set by itself, for example, more than 90% or more than 95%.
And S3, collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result.
The specific process is as follows:
and (4) taking data of the output current, the output voltage and the length of any photovoltaic module, calculating the normalized values of the output current and the output voltage according to the normalization formula in the step S1, and calculating a clustering result by using the K-means model trained in the step S1.
And S4, inputting the real-time production data of the clustered photovoltaic modules into the trained HMM model to obtain an early warning result.
The specific process is as follows:
and taking the 300 clustering results obtained in the step S3 as observation sequences, respectively inputting the five trained HMMs, obtaining 5 probability values according to a forward-backward algorithm, and taking the model type corresponding to the maximum probability value, namely the predicted state.
The creative key link of the invention is to discretize the power generation power of the photovoltaic module according to the power interval, divide the output current and the output voltage into different categories by adopting K-Means, learn the normal state and various fault states by utilizing the learning and predicting capability of HMM to the time sequence, and further judge and early warn the working state of the photovoltaic module.
The HMM is a main algorithm used for photovoltaic fault early warning, the HMM is divided into model training modes of supervised learning and unsupervised learning, the supervised model training mode is adopted in the scheme, a series of data preprocessing modes are carried out to meet the characteristics (discrete data) of the HMM, firstly, power data are subjected to stepping discretization, and secondly, voltage and current are subjected to discretization through K-Means, and the structural requirements of the HMM are met. In the prior art, a k-means clustering algorithm is also applied to fault early warning, but the discretization function of the k-means clustering algorithm on data is not utilized. The K-Means in the technical scheme adopts an unsupervised mode to solve the problem of data discretization in the process of supervised learning of the HMM. Has unexpected technical effect.
And it is only necessary to know that these data of the cluster have different categories, without concern about what category they are specifically. The K-Means model is only one of the clustering modes, and a GMM (Gaussian mixture model) clustering mode can also be adopted.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (1)

1. A distributed photovoltaic power station fault early warning algorithm based on a K-Means-HMM model is characterized in that: the method comprises the following steps:
s1, collecting N production records, screening M data with the starting electric power being more than 1kw from the N data, carrying out grading discretization on the collected data, and carrying out K-Means clustering by two dimensions of output current and voltage;
collecting N production records of one year, wherein each record comprises five parameters of component number, time, output current, output voltage and generated power, M data with the starting electric power of more than 1kw are screened from the N data, the minimum value of the screened generated power data is 1kw, and the maximum power is 14.7 kw; setting the power generation power value of certain data as P, if n is more than or equal to P < n +1, and n is more than or equal to 1 and less than or equal to 13, then n is the power generation power step number P and is used as the value of the HMM model state sequence;
the specific steps of performing K-Means clustering by using two dimensions of output current and voltage are as follows:
for the output current in M pieces of data, the output voltage is normalized, the output current and the output voltage are regarded as a matrix with the size of M multiplied by 2, and the normalized calculation formula is as follows:
Figure 425106DEST_PATH_IMAGE002
wherein:
Figure 873405DEST_PATH_IMAGE004
respectively the indices of the rows and columns in the matrix,
Figure 184301DEST_PATH_IMAGE006
after expressing normalization
Figure 126980DEST_PATH_IMAGE004
The matrix elements of the position are,
Figure DEST_PATH_IMAGE008
before expressing normalization
Figure 269248DEST_PATH_IMAGE004
The matrix elements of the position are,
Figure 622869DEST_PATH_IMAGE009
to normalize the smallest matrix element in the first j columns,
Figure DEST_PATH_IMAGE010
is the maximum matrix element in the normalized front j columns;
performing K-means clustering on the normalized output current and output voltage data, setting the K value to be 20, and obtaining a clustering result C of M pieces of data, wherein the clustering result is used as a value of an HMM observation sequence;
s2, training an HMM model by using the clustered working state sample data of the photovoltaic power station;
the training sample is that 32 normal operation time points and corresponding component numbers are selected in 10 months of the year in a time period according to fault records of a photovoltaic power station of a year, the 32 normal operation time points and the corresponding component numbers are used as training set normal state labels, and 8 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as training set fault state labels;
in the other 2 months of the year in the time period, 8 normal operation time points and corresponding component numbers are selected as test set normal state labels, and 2 time points of four faults of shielding, lead short circuit, lead open circuit and battery aging and corresponding component numbers are selected as test set fault state labels;
obtaining each normal state label, wherein the normal state label comprises a training set and a testing set, searching for a corresponding record in M pieces of data according to the time point and the component number, selecting a clustering result C value and a power generation power level number P value from 300 minutes before the corresponding time point to the corresponding time point, and obtaining 32 normal samples of the training set and 8 normal samples of the testing set, wherein each sample comprises 300C values and 300P values;
for each obtained fault state label, the fault state label comprises a training set and a testing set, corresponding records are searched in M pieces of data according to time points and component numbers, clustering result C values and power generation level number P values from 360 minutes to 60 minutes before the corresponding time point are selected, 32 training set fault samples and 8 testing set fault samples are obtained, and each sample comprises 300C values and 300P values;
the process of training the HMM model is as follows:
the number of state sequence state sets N =13, the data volume of observation set M =20, 300P values as state sequences, 300C values as observation sequences, and the initial state probability vector for each training sample
Figure 968531DEST_PATH_IMAGE012
There are 13 elements, each element having a value of 1/13; for positive in training setSamples of five states of normal, shielding, short circuit of a lead, open circuit of the lead and aging of a battery are trained respectively according to the following algorithms:
based on known information, a supervised learning algorithm is used for solving a transition probability matrix A and an observation probability matrix B of the HMM;
(1) solving a transition probability matrix A:
for all training samples, the number of samples with the statistical C value transferred from i to j is
Figure 698590DEST_PATH_IMAGE014
Then, the estimated value of the probability of transitioning from state i to state j is:
Figure 649359DEST_PATH_IMAGE016
Figure 439461DEST_PATH_IMAGE018
the formed matrix is a transition probability matrix A;
(2) and (3) solving an observation probability matrix B:
for all training samples, the number of samples with the C value of i and the P value of j is counted as
Figure 724949DEST_PATH_IMAGE020
If the state is i and the observed value is j, the probability estimation value is:
Figure 993119DEST_PATH_IMAGE022
Figure 63974DEST_PATH_IMAGE024
the formed matrix is an observation probability matrix B;
and verifying the prediction accuracy of the trained HMM model, respectively substituting the observation sequences of the test sample into the five trained HMM models, and calculating by using a forward-backward algorithm to obtain the occurrence probability of the five observation sequences, wherein the formula of the forward-backward algorithm is as follows:
Figure 24977DEST_PATH_IMAGE026
wherein
Figure 63340DEST_PATH_IMAGE028
Is forward probability, i.e. partial observation of sequence at time t is O 1 ,O 2 ,O 3 …,O t In a state of
Figure 620355DEST_PATH_IMAGE030
The probability of (d) is recorded as:
Figure DEST_PATH_IMAGE032A
Figure 935929DEST_PATH_IMAGE034
is a backward probability, i.e. the partial observation sequence O at the time t +1 t+2 ,O t+3 ,…,O T In a state of
Figure DEST_PATH_IMAGE035
The probability of (d) is recorded as:
Figure DEST_PATH_IMAGE037
according to the 5 probability values obtained by the formula calculation, the model type corresponding to the maximum probability value is taken as the prediction state of the test data; calculating the prediction accuracy of the HMM model according to the predicted state and the actual state, and if the accuracy is beyond the range, re-selecting data to train the HMM model according to the steps S1 to S2;
s3, collecting real-time production data of the photovoltaic module, performing K-Means clustering by two dimensions of output current and voltage, and calculating a clustering result;
taking data of the output current, the output voltage and the length of any photovoltaic module, calculating the normalized values of the output current and the output voltage according to the normalization formula in the step S1, and calculating a clustering result by using the K-means model trained in the step S1;
s4, inputting the real-time production data of the clustered photovoltaic modules into the trained HMM model to obtain an early warning result;
and taking the 300 clustering results obtained in the step S3 as observation sequences, respectively inputting the five trained HMMs, obtaining 5 probability values according to a forward-backward algorithm, and taking the model type corresponding to the maximum probability value, namely the predicted state.
CN202010952635.7A 2020-09-11 2020-09-11 Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model Active CN112085621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010952635.7A CN112085621B (en) 2020-09-11 2020-09-11 Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010952635.7A CN112085621B (en) 2020-09-11 2020-09-11 Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model

Publications (2)

Publication Number Publication Date
CN112085621A CN112085621A (en) 2020-12-15
CN112085621B true CN112085621B (en) 2022-08-02

Family

ID=73738046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010952635.7A Active CN112085621B (en) 2020-09-11 2020-09-11 Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model

Country Status (1)

Country Link
CN (1) CN112085621B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710114B (en) * 2022-05-23 2022-09-13 北京华清未来能源技术研究院有限公司 Photovoltaic inverter fault prediction method
CN115601197B (en) * 2022-11-28 2023-03-10 深圳市峰和数智科技有限公司 Abnormal state detection method and device for photovoltaic power station
CN116826788B (en) * 2023-08-30 2024-01-05 东方电气集团科学技术研究院有限公司 Photovoltaic power generation active support cluster construction and control method
CN117390481B (en) * 2023-12-12 2024-02-27 国网辽宁省电力有限公司 Distributed photovoltaic power generation system cluster division system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2309487A1 (en) * 2009-09-11 2011-04-13 Honda Research Institute Europe GmbH Automatic speech recognition system integrating multiple sequence alignment for model bootstrapping
CN106485093A (en) * 2016-11-10 2017-03-08 哈尔滨工程大学 Based on the solar irradiance time series synthetic method for improving Markov chain
CN107102223A (en) * 2017-03-29 2017-08-29 江苏大学 NPC photovoltaic DC-to-AC converter method for diagnosing faults based on improved hidden Markov model GHMM
CN111124840A (en) * 2019-12-02 2020-05-08 北京天元创新科技有限公司 Method and device for predicting alarm in business operation and maintenance and electronic equipment
CN111428816A (en) * 2020-04-17 2020-07-17 贵州电网有限责任公司 Non-invasive load decomposition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2309487A1 (en) * 2009-09-11 2011-04-13 Honda Research Institute Europe GmbH Automatic speech recognition system integrating multiple sequence alignment for model bootstrapping
CN106485093A (en) * 2016-11-10 2017-03-08 哈尔滨工程大学 Based on the solar irradiance time series synthetic method for improving Markov chain
CN107102223A (en) * 2017-03-29 2017-08-29 江苏大学 NPC photovoltaic DC-to-AC converter method for diagnosing faults based on improved hidden Markov model GHMM
CN111124840A (en) * 2019-12-02 2020-05-08 北京天元创新科技有限公司 Method and device for predicting alarm in business operation and maintenance and electronic equipment
CN111428816A (en) * 2020-04-17 2020-07-17 贵州电网有限责任公司 Non-invasive load decomposition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于每时晴空指数的大规模光伏电站出力多维时间序列模拟;李国庆等;《电网技术》;20200316(第09期);全文 *
基于隐马尔科夫模型的道岔故障诊断方法;许庆阳等;《铁道学报》;20180815(第08期);全文 *

Also Published As

Publication number Publication date
CN112085621A (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN112085621B (en) Distributed photovoltaic power station fault early warning algorithm based on K-Means-HMM model
CN111552609B (en) Abnormal state detection method, system, storage medium, program and server
CN111459700B (en) Equipment fault diagnosis method, diagnosis device, diagnosis equipment and storage medium
CN111507376B (en) Single-index anomaly detection method based on fusion of multiple non-supervision methods
CN113887616B (en) Real-time abnormality detection method for EPG connection number
CN113343633A (en) Thermal runaway fault classification and risk prediction method and system for power lithium battery
CN109766583A (en) Based on no label, unbalanced, initial value uncertain data aero-engine service life prediction technique
CN116450399B (en) Fault diagnosis and root cause positioning method for micro service system
CN109858140B (en) Fault diagnosis method for water chilling unit based on information entropy discrete Bayesian network
CN110636066B (en) Network security threat situation assessment method based on unsupervised generative reasoning
Mustafa et al. Fault identification for photovoltaic systems using a multi-output deep learning approach
CN112462734B (en) Industrial production equipment fault prediction analysis method and model
CN112085108B (en) Photovoltaic power station fault diagnosis algorithm based on automatic encoder and K-means clustering
CN112906764A (en) Communication safety equipment intelligent diagnosis method and system based on improved BP neural network
CN116432123A (en) Electric energy meter fault early warning method based on CART decision tree algorithm
Li et al. Meteorological radar fault diagnosis based on deep learning
CN117434450A (en) Battery health state prediction method and system
CN117556347A (en) Power equipment fault prediction and health management method based on industrial big data
Dang et al. seq2graph: Discovering dynamic non-linear dependencies from multivariate time series
CN117150383A (en) New energy automobile power battery fault classification method of SheffleDarkNet 37-SE
CN116484219A (en) Water supply network water quality abnormal pollution source identification method based on gate control graph neural network
CN116151799A (en) BP neural network-based distribution line multi-working-condition fault rate rapid assessment method
CN112001530A (en) Predictive maintenance method and system for transformer oil chromatography online monitoring device
Bahrami et al. Machine Learning Application to Extreme Weather Power Outage Forecasting in Distribution Networks using a Majority Under-Sampling and Minority Over-Sampling Strategy
CN117688464B (en) Hidden danger analysis method and system based on multi-source sensor data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant