CN113469571A - Data quality evaluation method and device, computer equipment and readable storage medium - Google Patents

Data quality evaluation method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113469571A
CN113469571A CN202110829306.8A CN202110829306A CN113469571A CN 113469571 A CN113469571 A CN 113469571A CN 202110829306 A CN202110829306 A CN 202110829306A CN 113469571 A CN113469571 A CN 113469571A
Authority
CN
China
Prior art keywords
data
quality evaluation
data quality
weighting
evaluation index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110829306.8A
Other languages
Chinese (zh)
Inventor
张尧
李子森
任炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202110829306.8A priority Critical patent/CN113469571A/en
Publication of CN113469571A publication Critical patent/CN113469571A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The application relates to a data quality evaluation method, a data quality evaluation device, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a quantized value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weight coefficient corresponding to each data quality evaluation index, and calculating a data quality evaluation result of the data of the transformer substation according to the quantized value of each data quality evaluation index and the corresponding comprehensive weight coefficient. By the adoption of the method, the data quality can be evaluated from multi-dimensional influence factors, so that data quality evaluation indexes are more comprehensive, weighting is carried out on each data quality evaluation quality through a combined weighting method to obtain comprehensive weight coefficients, subjective randomness of weighting is reduced, and finally a data quality evaluation result of the transformer substation data is obtained through calculation through the data quality evaluation indexes and the corresponding comprehensive weight coefficients, so that accuracy of the data quality evaluation result is improved.

Description

Data quality evaluation method and device, computer equipment and readable storage medium
Technical Field
The application relates to the technical field of transformer substations, in particular to a data quality evaluation method and device, computer equipment and a readable storage medium.
Background
The intelligent substation is a key link for constructing an intelligent power grid, and with the continuous improvement of the construction of the intelligent substation, the generated and stored data continuously increase, and gradually show the trend of big data, and data quality problems such as data loss and data redundancy often occur when the data are uploaded and stored in each system. In an electric power system, the data quality not only influences the accuracy and effectiveness of application analysis of a transformer substation, but also directly influences the safe and reliable operation of an intelligent transformer substation. Therefore, the reasonable data quality evaluation technology can effectively reflect the operation condition of the transformer substation.
In the traditional technology, data quality evaluation indexes are determined, then weighting is carried out on each data quality evaluation index by adopting a subjective method, and then a data quality evaluation result is obtained through the data quality evaluation indexes and corresponding weighting results. However, the accuracy of the data quality evaluation result obtained by the conventional method is low.
Disclosure of Invention
In view of the above, it is desirable to provide a data quality evaluation method, an apparatus, a computer device, and a readable storage medium capable of improving accuracy of a data quality evaluation result.
A method of data quality evaluation, the method comprising:
obtaining a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and calculating to obtain a data quality evaluation result of the substation data according to the quantization value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
In one embodiment, the weighting each of the data quality evaluation indexes by a combined weighting method to obtain a comprehensive weight coefficient corresponding to each of the data quality evaluation indexes includes:
subjectively weighting each data quality evaluation index by adopting a sequence relation method to obtain a subjective weight coefficient of each data quality evaluation index;
performing objective weighting on each data quality evaluation index by using a variation coefficient method to obtain an objective weighting coefficient of each data quality evaluation index;
and acquiring a comprehensive weight coefficient corresponding to each data quality evaluation index according to the subjective weight coefficient and the objective weight coefficient.
In one embodiment, the subjectively weighting each of the data quality evaluation indexes by using a sequence relation method to obtain a subjective weight coefficient of each of the data quality evaluation indexes includes:
ordering the importance of each data quality evaluation index to obtain an importance sequence;
weighting each data quality evaluation index to obtain an initial weight coefficient of each data quality evaluation index;
calculating a weight evaluation scale according to the importance sequence and the initial weight coefficient;
and determining a subjective weight coefficient of each data quality evaluation index according to the weight evaluation scale.
In one embodiment, the objectively weighting each of the data quality evaluation indexes by using a variation coefficient method to obtain an objective weighting coefficient of each of the data quality evaluation indexes includes:
acquiring the average value and the standard deviation of each data quality evaluation index according to the quantization value of each data quality evaluation index;
and calculating objective weight coefficients of the data quality evaluation indexes according to the average value and the standard deviation.
In one embodiment, the obtaining a comprehensive weight coefficient corresponding to each of the data quality evaluation indexes by using the subjective weight coefficient and the objective weight coefficient includes:
acquiring a first relative importance degree corresponding to the subjective weight coefficient and a second relative importance degree corresponding to the objective weight coefficient;
and calculating the comprehensive weight coefficient by using a combined weighting method according to the first relative importance degree, the second relative importance degree, the subjective weight coefficient and the objective weight coefficient.
In one embodiment, the obtaining of the quantized value of each data quality evaluation index in the data quality evaluation system according to the substation data includes:
acquiring the transformer substation data;
clustering the transformer substation data through an improved K-means algorithm to obtain a plurality of data sets;
and calculating the quantitative value of the data quality evaluation index of each data set through a data quality evaluation algorithm.
In one embodiment, the clustering the substation data by using the improved K-means algorithm to obtain a plurality of data sets includes:
acquiring the density, the average distance and the weight of each data point in the substation data;
processing the substation data to obtain a clustering center and a clustering number according to the density, the average distance and the weight;
and based on the clustering centers and the clustering numbers, clustering the transformer substation data by adopting a K-means algorithm to obtain a plurality of data sets.
In one embodiment, the data quality evaluation indexes comprise accuracy, perfectness, consistency, timeliness, accuracy and redundancy of the substation data;
wherein the accuracy is determined according to the total number of data in the data set, the number of data with inaccurate precision, the number of data with a range not meeting a threshold, the number of data with invalid data bits, and the number of data with redundant records;
the perfectness is determined according to the total number of data in the data set, the number of invalid data of the data bits, the number of null data and the number of redundant data;
the consistency is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant data records, the number of data with abnormal reference consistency of the same data and the number of data with abnormal logic consistency of different data;
the timeliness is determined according to the total number of data in the data set, the number of invalid data of the data bits, the number of redundant data to be recorded and the number of data which is not updated timely;
the accuracy is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant data records, the number of data with incorrect format and the number of data with incorrect length;
the redundancy is determined according to the total number of data in the data set, the number of invalid data of the data bits, the number of redundant data of the record, the total number of the same data in each row and the total number of the same data in each column.
A data quality evaluation apparatus, the apparatus comprising:
the index quantitative value acquisition module is used for acquiring the quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
the weighting module is used for weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, and the combined weighting method comprises a subjective weighting method and an objective weighting method;
and the evaluation result calculation module is used for calculating the data quality evaluation result of the substation data according to the quantized value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
obtaining a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and calculating to obtain a data quality evaluation result of the substation data according to the quantization value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
A readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
obtaining a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and calculating to obtain a data quality evaluation result of the substation data according to the quantization value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
According to the data quality evaluation method, the data quality evaluation device, the computer equipment and the readable storage medium, the quantized value of each data quality evaluation index in a data quality evaluation system is obtained according to the data of the transformer substation, each data quality evaluation index is weighted by a combined weighting method to obtain a comprehensive weight coefficient corresponding to each data quality evaluation index, and the data quality evaluation result of the data of the transformer substation is obtained through calculation according to the quantized value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index; according to the method, the data quality can be evaluated from multi-dimensional influence factors, so that the data quality evaluation indexes are more comprehensive, weighting is performed on each data quality evaluation quality through a combined weighting method to obtain a comprehensive weight coefficient, the subjective randomness of weighting is reduced, the accuracy of the weight coefficient is improved, and finally, the data quality evaluation result of the substation data is obtained through calculation through the quantization value of the data quality evaluation indexes and the comprehensive weight coefficient corresponding to the data quality evaluation indexes, so that the accuracy of the data quality evaluation result is improved.
Drawings
FIG. 1 is a diagram of an exemplary embodiment of a data quality evaluation method;
FIG. 2 is a schematic flow chart diagram of a data quality evaluation method according to an embodiment;
FIG. 3 is a flowchart illustrating a method for calculating a quantitative value of a data quality indicator according to an embodiment;
FIG. 4 is a graph showing data quality evaluation indexes in another embodiment;
FIG. 5 is a schematic diagram of a data quality evaluation architecture in another embodiment;
FIG. 6 is a flowchart illustrating a method for assigning weights to data quality assessment indicators in another embodiment;
FIG. 7 is a schematic flow chart illustrating a method for obtaining integrated weight coefficients according to another embodiment;
FIG. 8 is a flowchart illustrating a detailed procedure of a data quality evaluation method according to another embodiment;
FIG. 9 is a data proportion diagram of classification of substation data into data sets in another embodiment;
FIG. 10 is a block diagram showing the construction of a data quality evaluating apparatus according to an embodiment;
FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data quality evaluation method provided by the application can be applied to the application environment shown in fig. 1. The application environment comprises transformer substation primary equipment, transformer substation secondary equipment and an operation and maintenance center. The transformer substation secondary equipment is communicated with the transformer substation primary equipment and the operation and maintenance center through a network respectively, transformer substation data in the operation process of the transformer substation are obtained in real time, and the transformer substation data are stored in the transformer substation secondary equipment. After the computer equipment acquires the transformer substation data in the secondary equipment of the transformer substation, the data quality evaluation can be carried out on the transformer substation data, so that the operation condition of the transformer substation can be more effectively analyzed through the data quality evaluation result. The primary equipment of the transformer substation can be a transformer and accessory equipment thereof, GIS equipment, switch cabinet equipment, a grounding transformer, a station transformer and a dynamic reactive power compensation device, and can also be other accessory equipment; the secondary equipment of the transformer substation can be integrated automation equipment, an integrated power supply system, communication equipment and the like; the operation and maintenance center may be implemented by an independent server or a server cluster composed of a plurality of servers, and may also be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. It should be noted that the embodiment does not limit the specific form of the operation and maintenance center.
In an embodiment, as shown in fig. 2, a data quality evaluation method is provided, which is described by taking an example that the method is applied to an operation and maintenance center, and includes the following steps:
s100, obtaining a quantized value of each data quality evaluation index in a data quality evaluation system according to the transformer substation data, wherein the number of the data quality evaluation indexes in the data quality evaluation system is larger than a preset number threshold.
Specifically, the substation data may represent relevant data generated by the substation primary device in the use process, such as monitoring data of the transformer (three-phase voltage, three-phase current, gas content in the transformer, and the like), temperature, voltage, current, humidity, power, and the like; the substation data can be obtained by measuring substation secondary equipment.
In order to evaluate the data quality, data quality evaluation indexes may be selected from multidimensional influence factors of the substation data, and the number of the data quality evaluation indexes may be greater than a preset number threshold, which may be equal to 5 in this embodiment. The data quality evaluation index can analyze the influence factors of the data quality evaluation from the analysis dimensions of the importance, the regionality, the difference, the real-time property and the like of the data, so the data quality evaluation index can be the accuracy, the perfection, the consistency, the timeliness, the accuracy, the redundancy, the standardization, the accessibility, the relevance and the like of the data of the transformer substation, and is not limited.
And S200, weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method.
Specifically, the operation and maintenance center can perform weighting on each data quality evaluation index through a combined weighting method to obtain a comprehensive weight coefficient corresponding to each data quality evaluation index. The combined weighting method can comprise a subjective weighting method and an objective weighting method, and each data quality evaluation index is weighted by the combined weighting method to obtain an optimal weight coefficient so as to improve the accuracy of each data quality evaluation index weight coefficient. The method for obtaining the comprehensive weight coefficient may be to perform a combined operation on the weight coefficient obtained by the subjective weighting method and the weight coefficient obtained by the objective weighting method to obtain the comprehensive weight coefficient.
And S300, calculating to obtain a data quality evaluation result of the substation data through the quantization value of the data quality evaluation index and the comprehensive weight coefficient corresponding to the data quality evaluation index.
Specifically, the operation and maintenance center may perform a combined operation on the quantized value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index to obtain a data quality evaluation result of the substation data. The combination operation may be a combination operation between arithmetic, logarithmic, exponential, power operation, and the like, and is not limited thereto.
According to the data quality evaluation method, the quantized values of all data quality evaluation indexes in a data quality evaluation system are obtained according to the substation data, the weighting is carried out on all the data quality evaluation indexes through a combined weighting method to obtain comprehensive weight coefficients corresponding to all the data quality evaluation indexes, and the data quality evaluation result of the substation data is obtained through calculation according to the quantized values of all the data quality evaluation indexes and the comprehensive weight coefficients corresponding to all the data quality evaluation indexes; the method can evaluate the data quality from multi-dimensional influence factors, so that the data quality evaluation index is more comprehensive, and the evaluation quality of each data quality is weighted by a combined weighting method to obtain a comprehensive weight coefficient, thereby avoiding the problem that the evaluation result is inconsistent with the reality because the meaning and the mutual relation of indexes cannot be fully considered by a single weighting method in the prior art, the combined weighting method not only reflects the preference of a decision maker on evaluation indexes, but also reduces the subjective randomness of weighting, to improve the accuracy of the weight coefficient, and finally, the data quality evaluation result of the substation data is calculated through the quantization value of the data quality evaluation index and the comprehensive weight coefficient corresponding to the data quality evaluation index, therefore, the accuracy of the data quality evaluation result is improved, the evaluation result is real and reliable, and the stable operation of the secondary equipment of the transformer substation is further ensured.
In some scenarios, in order to make the data quality evaluation index more comprehensive, the data quality evaluation index may be considered from multiple dimensions of the substation data, and therefore, as an embodiment, as shown in fig. 3, the step of obtaining the quantized value of each data quality evaluation index in the data quality evaluation system according to the substation data in S100 may be implemented by the following steps:
and S110, acquiring substation data.
It is understood that the operation and maintenance center may obtain substation data from the substation secondary equipment. Specifically, the operation and maintenance center may obtain the substation data stored in the secondary device of the substation within a preset time period, where the preset time period may be 1 hour, one week, one month, or two months, and the time period is not limited.
The transformer substation primary equipment can acquire electrical parameter data of the transformer substation primary equipment in the operation process and temperature and humidity data of the environment in the transformer substation primary equipment in the operation process in real time in the use process of the transformer substation primary equipment, and the data are stored in a database of the transformer substation secondary equipment. When the quality of the substation data in the preset time period needs to be evaluated, the substation data can be directly extracted from a database in the substation secondary equipment, namely, the data of the substation secondary equipment in the preset time period is obtained.
And S120, clustering the transformer substation data through an improved K-means algorithm to obtain a plurality of data sets.
In particular, the substation data may be understood as a large data set. The transformer substation data has the characteristics of large scale, low density and the like, so before the quality evaluation of the transformer substation data, a clustering algorithm is needed to classify and process high-dimensional transformer substation data to obtain a plurality of data sets, and valuable data is further extracted from mass data, so that the dimensionality reduction is realized to reduce the algorithm complexity, and therefore, an operation and maintenance center can cluster the transformer substation data through an improved K-means algorithm and divide the transformer substation data into a plurality of data sets. The improved K-means algorithm can be a K-means algorithm based on an optimized clustering center, can also be a K-means algorithm based on abnormal data elimination, and of course, can also be a K-means algorithm based on data preprocessing. In this embodiment, the improved K-means algorithm may be a distance and weight based weight improved K-means algorithm.
In S120, the step of clustering the substation data by using the improved K-means algorithm to obtain a plurality of data sets may specifically include: the method comprises the steps of obtaining the density, the average distance and the weight of each data point in the transformer substation data, processing the transformer substation data through the density, the average distance and the weight to obtain a clustering center and a clustering number, and clustering the transformer substation data through a K-means algorithm based on the clustering center and the clustering number to obtain a plurality of data sets.
Let the substation data be set D, where there are n data points (i.e., sample points) in D, i.e., D ═ x1,x2,…,xi,…,xnWhere each sample point can be represented as xi={xi1,xi2,…,xim1 ≦ i ≦ n, and the dimension of each sample point may be m. In this embodiment, the density, the average distance, and the weight of each sample point in the set D may be calculated first, the sample point with the highest density is selected as the initial clustering center, the sample points in the set D with the distance from the initial clustering center smaller than the average sample distance meandist (D) are deleted, and the parameter τ of each remaining sample point in the set D after deletion processing is calculatediChoosing the maximum parameter τiAs a second clustering center, deleting sample points in the deleted set D, which are less than the average sample distance Meandist (D) from the initial clustering center, repeating the steps until only one sample point remains in the set D, and then continuously bringing the obtained clustering centers and the obtained number of clustering clusters into the traditional K-means algorithm until the clustering centers do not change any more, wherein the parameter calculation process involved in the process is as follows:
(1) calculating the distance weight omega of each sample pointidThe formula can be expressed as:
Figure BDA0003174894940000081
(2) calculating the weighted Euclidean distance d between every two sample pointsω(xi,xj) The formula can be expressed as:
Figure BDA0003174894940000082
x in the formula (2)id、xjdAre the ith and jth data points, d, in the d-dimensional spaceω(xi,xj) Denotes xidAnd xjdIn the first placeWeighted euclidean distances of the ith and jth data points in d-dimensional space.
(3) Calculating an average sample distance Meandist (D) of the substation data D, wherein the formula can be expressed as:
Figure BDA0003174894940000083
(4) calculating sample point x in transformer substation data DiThe formula may be expressed as:
Figure BDA0003174894940000084
in the formula (4), function
Figure BDA0003174894940000085
ρ (i) denotes the sample point xiThe number of sample points that can be included with the radius of the average sample distance meandist (d).
(5) Is calculated for each sample point xiAs the center of the circle, the average sample distance Meandist (D) is the average distance a of the number of sample points contained in the circle with the radiusiThe formula can be expressed as:
Figure BDA0003174894940000086
(6) calculating the distance between the clusters, specifically as follows:
if the sample point xiNot the sample point with the highest density, the cluster-like distance is the smallest distance between the sample point and the cluster, i.e. si=min(dw(xi,xj) ); if the sample point xiIs the sample point with the maximum density, the distance defining the class cluster is the maximum distance in the class cluster, i.e. si=max(dw(xi,xj))。
(7) Calculating a sample point xiWeight ω of (d)iThe formula can be expressed as:
Figure BDA0003174894940000087
(8) calculating the parameter τiThe formula can be expressed as:
τi=ωi·dw(xi,ci-1) (7);
in the formula (7), dw(xi,ci-1) Represents a sample point xiWith the last cluster center ci-1The parameter τ betweeniRepresenting the distance and the weight from the next sample point to be selected to the last cluster center point, wherein the farther the distance is, the more the weight is, the parameter tauiThe larger the cluster center is, the higher the probability that the cluster center is generated near the sample point is, and the global characteristic of the power information can be reflected better. In the embodiment, the transformer substation data is clustered to realize the processing of the high-dimensional transformer substation data into the low-dimensional data, so that the calculation amount of the data quality evaluation algorithm can be reduced, and the efficiency of the data quality evaluation algorithm is further improved.
And S130, calculating the quantitative value of the data quality evaluation index of each data set through a data quality evaluation algorithm.
In this embodiment, as shown in fig. 4, the data quality evaluation index includes accuracy, completeness, consistency, timeliness, accuracy, and redundancy of the substation data;
the accuracy is determined according to the total number of data in the data set, the number of data with inaccurate precision, the number of data with a range not conforming to a threshold value, the number of data with invalid data bits and the number of data with redundant records;
the completeness is determined according to the total number of data in the data set, the number of invalid data bits, the number of null data and the number of redundant data;
the consistency is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant records, the number of data with abnormal reference consistency of the same data and the number of data with abnormal logic consistency of different data;
the timeliness is determined according to the total number of data in the data set, the number of invalid data of data bits, the number of data for recording redundancy and the number of data which is not updated timely;
the accuracy is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant data records, the number of data with incorrect format and the number of data with incorrect length;
the redundancy is determined according to the total number of data in the data set, the number of data with invalid data bits, the number of data with redundant records, the total number of identical data in each row and the total number of identical data in each column.
Specifically, the data quality evaluation system may be divided into five layers, as shown in fig. 5, which are a data layer, a method layer, a criterion layer, an index layer and an evaluation layer; the data layer comprises transformer substation data acquired by transformer substation secondary equipment, the method layer comprises subjective and objective weighting on the index layer, the criterion layer comprises an evaluation index layer, each index has one or more criterion evaluations, the index layer comprises calculation data quality evaluation indexes, and the evaluation layer comprises data quality evaluation on the transformer substation data. The embodiment can realize data quality evaluation through the index layer. The operation and maintenance center can calculate the quantitative values of the six data quality evaluation indexes corresponding to each data set, and the quantitative values of the data quality evaluation indexes corresponding to each data set can be the same or different. After the operation and maintenance center classifies the transformer substation data, a plurality of data sets can be obtained, and then the quantitative value of the data quality evaluation index of each data set is calculated through a data quality evaluation algorithm. The data quality evaluation algorithm may be understood as an evaluation index calculation method, that is, a method of calculating one or more influence parameter values affecting the data quality evaluation index to obtain the data quality evaluation index, where the influence parameter may be determined according to the specific data quality evaluation index, and is not limited thereto.
The specific process of calculating the data quality evaluation index quantitative value of each data set by the operation and maintenance center through the data quality evaluation algorithm is as follows:
(1) accuracy:
since both the precision of the data and the range of the data affect the accuracy of the data, the method of calculating the accuracy quantification of the data set can be formulated as:
Figure BDA0003174894940000101
in the formula (8), S is the total number of data in the data set, SB12For inaccurate data quantity in the data set, SB11For the number of data in the data set whose range does not meet the threshold, SB21Number of data bits invalid in data set, SB61The amount of redundant data is recorded for the data set.
According to the operation and maintenance requirements of the transformer substation, the maximum data threshold and the minimum data threshold corresponding to each data set can be set so as to obtain the data quantity S with the data range not between the maximum data threshold and the minimum data thresholdB11(ii) a The embodiment can specify the number of bits of the effective bit data after the decimal point of the data, and further count the data quantity S with invalid data bitsB21
(2) And (3) perfection:
in the processes of data acquisition, data transmission, data reception and the like, the transformer substation can cause data loss, data invalidation and the like, so that the data perfection can represent data record perfection and data perfection in a data set, namely the data record perfection and the data perfection are influenced by two parameters, namely the number S of null data in the data setB21And number of data bits invalid (null) SB22Therefore, the calculation method of the integrity quantification value of the data set can be expressed by the formula:
Figure BDA0003174894940000102
(3) consistency:
on one hand, at adjacent time, the same kind of data generally does not generate great data fluctuation, namely, the same data is referred to as consistent reference; on the other hand, different data have certain logics mathematically at the same time, and the logics of different data types are consistent; thus, the consistency of the data set may characterize whether there is a deviation in the homogeneous data and the data at a particular time.
Let N different types of data exist in a data set, i.e. the data set is denoted X ═ X1,X2,…,Xi,…,XNIn which X isiContaining N data points Xi={xi1,xi2,…,xiN}。
A. Consistency check of same type data reference:
firstly, the homogeneous data is drawn into a discrete point diagram on a coordinate axis, namely a discrete diagram, then the discrete points are fitted by a least square method, and specifically, a-y can be obtained by fitting a polynomial model0+a1x1+…+anxnWherein a is0、a1、...、anFor undetermined coefficients, other values of the same data type are predicted by the fitted model, and the deviation sequence is:
Figure BDA0003174894940000111
wherein, yi
Figure BDA0003174894940000112
The ith actual value and the fitting value of the index y are respectively; setting a minimum deviation tolerance KcIf | Bi|>KcAnd if the data is considered as abnormal data points, sequentially judging until the data reference consistency check of the same data of the substation is completed.
B. Different types of data logical consistency check:
establishing a regression equation for each row of data in the data set through multivariate regression analysis, and completing the test of the logical consistency of different data by using the deviation degree of the result of the regression equation and the true value; selecting a certain type of data as independent variable and other data as dependent variable, drawing a discrete graph of the dependent variable and each variable, determining the relation between the variable and the dependent variable according to the discrete graph, and determining a regression equation as follows:
Figure BDA0003174894940000113
predicting the dependent variable by means of regression equation, calculating its deviation degree, setting the minimum deviation tolerance as KrIf | Bi|>KrAnd in time, the data is considered as an abnormal point, and the data is sequentially judged until the detection of the logic consistency of the different types of data of the intelligent substation is completed. The detected problem data are counted by detecting the consistency of the reference of the same data and the logical consistency of different data, so that the consistency quantification value of the data set can be expressed by the following formula:
Figure BDA0003174894940000114
wherein S isB31Referencing the number of consistent outliers, S, for the same data in a data setB32Logically consistent quantities of anomalous data for different data in the data set.
For example, voltage values corresponding to one of three phases of voltages of the transformer at three moments t-1, t and t +1 cannot generate large data fluctuation, which may be referred to as identical data references; the voltage and the current output by the transformer have a certain mathematical relationship, such as R ═ U/I in the direct current, which can be referred to as the consistency of different types of data logic.
(4) And timeliness:
the data sources of all the devices in the transformer substation are unique, all the devices can update data at a preset time point, and if the actual updating time point is different from the preset updating time point, the data in the data set is not updated timely; according to the operation condition of the transformer substation, a maximum time threshold value can be set, and if the difference value between the actual updating time point of data and the preset updating time point of the data is greater than the maximum time threshold value, the data is considered to be not updated timely; therefore, the timeliness of the data mainly represents whether the time used for data transmission and data reception can meet the requirement of the secondary equipment of the transformer substation after data acquisition, that is, whether the update of the data set is timely or not, so that the timeliness quantification value of the data set can be expressed as follows:
Figure BDA0003174894940000121
wherein S isB41And updating the untimely data quantity for the data set.
(5) Accuracy:
the accuracy of a data set can be measured by both factors of incorrect data format and incorrect data length. If the format of the data is "%", "/", and the like, such as 45%, 45/100, and 0.45, in this embodiment, a format may be selected as a measurement standard according to actual requirements, the amount of data that does not meet the requirements of the standard format may be counted, and the amount of data that does not meet the requirements of the standard length may be counted according to actual requirements, and the accuracy quantization value of the data set may be represented by the following formula:
Figure BDA0003174894940000122
wherein S isB51For an incorrectly formatted amount of data in the data set, SB52Is the amount of data in the data set that is of an incorrect length.
(6) Redundancy:
the redundancy of a data set may characterize whether duplicate data is present in the data set as compared to a reference data set, i.e., the redundancy includes both the record redundancy as compared to the reference data set and the data redundancy of the data set.
A. Record redundancy compared to the reference data set:
because two sets of detection devices are arranged when the transformer substation collects data, two numbers can be generated under the actual conditionData set, using one data set as reference data set for obtaining the condition of whether there is repeated record, counting the quantity S of repeated dataB61
B. Data redundancy of data sets:
the data redundancy is mainly to detect whether the data in each row and each column are the same, record whether the same quantity of the data in each row and each column exceeds a threshold value, and count the quantity S of repeated dataB62. Thus, the redundancy quantization value of the data set can be represented by the following formula:
Figure BDA0003174894940000131
wherein S isB621For the total number of identical data per row in the data set, SB622The total amount of the same data for each column in the data set.
According to the data quality evaluation method, the data quality evaluation indexes of the substation data can be acquired, so that the data quality evaluation indexes are more comprehensive, the data quality is evaluated through the more comprehensive data quality evaluation indexes, the accuracy of the data quality evaluation result can be further improved, the evaluation result is real and reliable, and the stable operation of the substation secondary equipment is further ensured.
As an example, as shown in fig. 6, the step of weighting each data quality evaluation index by a combined weighting method in S200 to obtain an integrated weight coefficient corresponding to each data quality evaluation index may be implemented by the following steps:
and S210, subjectively weighting each data quality evaluation index by adopting a sequence relation method to obtain a subjective weight coefficient of each data quality evaluation index.
Specifically, the operation and maintenance center may subjectively weight each data quality evaluation index by using a subjective weighting method to obtain a subjective weighting coefficient of each data quality evaluation index. The subjective weighting method may be a binomial coefficient method, an analytic hierarchy method, an expert survey method, or the like, and is not limited thereto. In this embodiment, the operation and maintenance center may subjectively weight each data quality evaluation index by using a sequence relation method to obtain a subjective weight coefficient of each data quality evaluation index.
In S210, the step of subjectively weighting each data quality evaluation index by using the order relation method to obtain a subjective weight coefficient of each data quality evaluation index may specifically include: the method comprises the steps of sequencing the importance of each data quality evaluation index to obtain an importance sequence, weighting each data quality evaluation index to obtain an initial weight coefficient of each data quality evaluation index, calculating a weight evaluation scale according to the importance sequence and the initial weight coefficient, and determining a subjective weight coefficient of each data quality evaluation index according to the weight evaluation scale.
Assuming that n data quality evaluation indexes are provided, a data quality evaluation index set T (T ═ T) is set1,T2,…,Tn}) of the data quality evaluation indexes, n data quality evaluation indexes are sorted according to importance, for example T, according to the opinion and operation requirements of expertsi>Tj>…>Tk(ii) a Secondly, determining a data quality evaluation index TiInitial weight coefficient W ofiData quality evaluation index T adjacent to importance degreejInitial weight coefficient W ofjIs the weight evaluation scale ri=Wi/Wj,riFirstly, the data are judged by experts independently and then averaged, if the data quality evaluation indexes are equal in importance in numerical value, r is takeni1: and finally, determining the subjective weight coefficient of the data quality evaluation index according to the evaluation scale of the weight coefficient:
Figure BDA0003174894940000141
wherein r isiTo weight the evaluation scale, WjAnd the subjective weight coefficient is the j-th data quality evaluation index.
And S220, objectively weighting each data quality evaluation index by adopting a variation coefficient method to obtain an objective weight coefficient of each data quality evaluation index.
Specifically, the operation and maintenance center may perform objective weighting on each data quality evaluation index by using an objective weighting method to obtain an objective weighting coefficient of each data quality evaluation index. The objective weighting method may be a principal component analysis method, a dispersion and mean square error method, a multi-objective programming method, and the like, which is not limited. In this embodiment, the operation and maintenance center may perform objective weighting on each data quality evaluation index by using a variation coefficient method to obtain an objective weighting coefficient corresponding to each data quality evaluation index.
In S220, the step of performing objective weighting on each data quality evaluation index by using a variation coefficient method to obtain an objective weighting coefficient of each data quality evaluation index may specifically include: and obtaining the average value and the standard deviation of each data quality evaluation index according to the quantized value of each data quality evaluation index, and calculating the objective weight coefficient of each data quality evaluation index according to the average value and the standard deviation.
Let n data quality evaluation indexes, m data sets form m × n judgment matrix U ═ U (U)ij)m×nWherein u isijIs the quantized value of the j-th data quality evaluation index of the ith data set, and the generated objective weight coefficient omega is omega ═ omega12,…,ωk,…,ωn}. Firstly, calculating the average value P of the quantization values of the kth data quality evaluation indexkAnd standard deviation deltak
Figure BDA0003174894940000142
Figure BDA0003174894940000143
Wherein j is 1,2, …, n; i is 1,2, …, m.
Again according to the mean value PjAnd standard deviation deltajCalculating the variation coefficient z of the quantization value of the jth data quality evaluation indexj
Figure BDA0003174894940000144
Finally, an objective weight coefficient omega of the quantization value of the jth data quality evaluation index is obtained by using a variation coefficient methodk
Figure BDA0003174894940000145
And S230, acquiring a comprehensive weight coefficient corresponding to each data quality evaluation index through the subjective weight coefficient and the objective weight coefficient.
Specifically, the operation and maintenance center may perform a combined operation through the subjective weight coefficient and the objective weight coefficient to obtain a comprehensive weight coefficient corresponding to each data quality evaluation index. The combination operation may be addition operation, subtraction operation, multiplication operation or division operation, or may be any combination operation of these operations, of course, it may also be weighted summation operation, and the specific weight coefficient may be set by user-defined or calculated according to a specific algorithm.
According to the data quality evaluation method, each data quality evaluation index is comprehensively weighted through a subjective weighting method and an objective weighting method, and the subjective weighting coefficient and the objective weighting coefficient are combined to obtain the comprehensive weighting coefficient, so that the preference of a decision maker on the evaluation indexes is reflected, the subjective randomness of weighting is reduced, the accuracy of the weighting coefficient is improved, the data quality evaluation result of the transformer substation data is obtained through calculation through the quantitative value of the data quality evaluation index and the comprehensive weighting coefficient corresponding to the data quality evaluation index, the accuracy of the data quality evaluation result is improved, and the evaluation result is real and reliable.
As shown in fig. 7, the step of obtaining the comprehensive weight coefficient corresponding to each data quality evaluation index through the objective weight coefficient and the subjective weight coefficient in S230 may specifically include:
s231, acquiring a first relative importance degree corresponding to the subjective weight coefficient and a second relative importance degree corresponding to the objective weight coefficient.
Assuming that a first relative importance degree corresponding to the subjective weight coefficient is α, a second relative importance degree corresponding to the objective weight coefficient is β, and a determination matrix U of m × n formed by n data quality evaluation indexes and m data sets is (U)ij)m×nWherein u isijThe quantitative value of the jth data quality evaluation index of the ith data set is obtained, and then the first relative importance degree alpha corresponding to the subjective weight coefficient and the second relative importance degree beta corresponding to the objective weight coefficient can be obtained through the following formulas:
Figure BDA0003174894940000151
Figure BDA0003174894940000152
and S232, calculating a comprehensive weight coefficient by using a combined weighting method according to the first relative importance degree, the second relative importance degree, the subjective weight coefficient and the objective weight coefficient.
Further, the operation and maintenance center can perform weighted summation through the first relative importance degree, the second relative importance degree, the subjective weight coefficient and the objective weight coefficient to obtain the comprehensive weight coefficient. In this embodiment, the comprehensive weight coefficient is calculated by a combination weighting method through the first relative importance degree, the second relative importance degree, the subjective weight coefficient and the objective weight coefficient, and the specific formula for calculating the comprehensive weight coefficient is as follows:
Uj=αWj+βωj (23);
wherein, UjThe comprehensive weight coefficient is the j data quality evaluation index, and the objective weight determined by the variation coefficient method is omega12,…,ωk,…,ωnSubjective weight W determined by the order relation method1,W2,…,Wk,…,Wn
Further, in operation and maintenanceThe quantized value of each data quality evaluation index can be multiplied by the comprehensive weight coefficient corresponding to each data quality evaluation index and then summed to obtain a data quality evaluation result A of the transformer substation dataSBy the formula, can be expressed as:
Figure BDA0003174894940000161
the data quality evaluation result can be expressed in a percentage score form. And setting a hierarchical interval of a data quality evaluation result according to the operation and maintenance requirements of the transformer substation. If the score of the data quality evaluation result is a percentile system, the percentile system can be divided into six grades of 'excellent, good, general, passing, poor and extremely poor' according to the operation and maintenance requirements of the transformer substation, and if A is set to be equal to [0,30], the data quality evaluation result of the transformer substation data is extremely poor; a belongs to [30,60], and the data quality evaluation result of the transformer substation data is 'poor'; a belongs to [60,75], and the data quality evaluation result of the transformer substation data is 'pass'; a belongs to [75,85], and the data quality evaluation result of the transformer substation data is 'normal'; a belongs to [85,95], and the data quality evaluation result of the transformer substation data is 'good'; and A belongs to [95,100], and the data quality evaluation result of the substation data is 'excellent'. Therefore, the present embodiment can evaluate the level of "good, general, passing, bad, and extremely bad" of the data quality according to the obtained data quality evaluation result.
In order to facilitate understanding of those skilled in the art, specifically, as shown in fig. 8, the method includes:
s401, establishing a multi-dimensional data quality evaluation index in a data quality evaluation system according to actual conditions and service requirements;
s402, acquiring substation data acquired by the substation secondary equipment;
s403, clustering the transformer substation data by using an improved K-means algorithm to obtain a plurality of data sets;
s404, calculating a quantitative value of a data quality evaluation index corresponding to each data set by using an index evaluation algorithm;
s405, combining the expert opinions, giving a data quality evaluation index importance ranking, and giving a weight coefficient evaluation scale;
s406, subjectively weighting each data quality evaluation index by using a sequence relation method to obtain a subjective weight coefficient;
s407, calculating the average value and the standard deviation of the data quality evaluation index according to the quantization value of the data quality evaluation index;
s408, calculating objective weight coefficients of the data quality evaluation indexes through the average value and the standard deviation;
s409, comprehensively weighting each data quality evaluation index according to the least square sum of the deviation of the subjective and objective weight coefficients and the maximum evaluation value of the decision method to obtain comprehensive weight coefficients;
s410, calculating a data quality evaluation result by using the quantization value of each data quality evaluation index and the corresponding comprehensive weight coefficient;
and S411, obtaining the data quality of the transformer substation according to the data quality evaluation result.
The execution process of S401 to S411 may specifically refer to the description of the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Illustratively, a simulation model is built, and 2000 groups of data are selected as sample data (namely, substation data) from the obtained simulation data, so that a 2000 × 2 data matrix is obtained. The method in the embodiment is adopted to evaluate the quality of the data of the transformer substation, and specifically comprises the following steps:
(1) preprocessing the transformer substation data by using a K-means algorithm based on distance and weight weighting:
firstly, the transformer substation data is normalized to obtain dimensionless data, then the dimensionless data is subjected to dimensionality reduction processing by adopting a distance and weight weighting K-means algorithm to obtain two clustering centers in total, and the data proportion occupied by the two clustered data sets is 57% and 43%, as shown in FIG. 9.
(2) And (3) calculating a quantitative value of the data quality evaluation index:
a. and accuracy quantification values. According to the data characteristics of the transformer substation, the normal value ranges of the voltage and the current are set to be U E [67,10 ∈],I∈[0,16]Detecting four effective digits behind the decimal point as standard numerical values to obtain inaccurate data quantity S in the data setB1256, number of data S in the data set whose range does not meet the thresholdB11=34。
b. And (4) a completeness quantized value. According to the detection of null data and data bit invalidation (null) of each data set, the data quantity of the null data and the data bit invalidation is S respectivelyB21=0,SB22=29。
c. A consistency quantization value. After the transformer substation data is clustered, the transformer substation data is divided into two data sets, and consistency check is performed on the two data sets respectively. Setting the same kind of data deviation tolerance KcTolerance of deviation of heterogeneous data, K, 0.1r0.2. After consistency detection is carried out on the two data sets respectively, the abnormal number S of the same data in the data sets with reference to consistency can be obtainedB3139, different data logical consistency exception number SB32=36。
d. And a time quantization value. Because the simulation system can automatically upload data every 2 seconds, the maximum time threshold is set to be 0.05S, and the quantity S is larger than 0.05S according to the difference value between the actual updating time point and the expected updating time point of the dataB41=75。
e. Selecting decimal point as correct mode, and detecting the correctness to obtain the incorrect data format S in data setB5150, the number of incorrect data lengths S in the data setB52=86。
f. Recording redundancy S by comparison with another reference data setB61Setting the row threshold and the column threshold to be 15, and after data redundancy detection, the total quantity of the same data in each row in the data set is SB62113, total number of identical data per column S in the data setB622=11。
Calculating by the method in the above embodiment: accuracy quantitative value SB10.9551, the value of the complexity quantization SB2=0.9857,A quantized value S of the consistencyB30.9633, and the metric value SB40.9622, accuracy quantization value SB50.9323, redundancy quantization value SB6=0.9876。
The accuracy of the substation data is low, but the quality level of the substation data as a whole is high, as a result of the calculated accuracy quantification.
(3) The combined weighting method is used for weighting:
a. subjective weighting based on the order relation method. 5 experts were invited to discuss the importance of the data quality evaluation index and score the index evaluation scale. The importance degree of the data quality evaluation index can be expressed as: accuracy (B)1)>Consistency (B)3)>Perfection (B)2)>Accuracy (B)5)>Timeliness (B)4)>Redundancy (B)6) Taking the index evaluation scale of five experts and averaging to obtain r1=3,r3=4,r2=2,r 54, r4 is 2. The subjective weighting factors obtained by calculation are shown in table 1.
b. The evaluation matrix is obtained by the coefficient of variation method, and the objective weighting coefficients of the objective weighting method are obtained as shown in table 1.
c. According to the principle of minimizing the deviation sum of squares of the subjective and objective weight coefficients and maximizing the comprehensive evaluation value of the decision scheme, the relative importance degree of the subjective weight coefficient is 0.6142, the relative importance degree of the objective weight coefficient is 0.3854, and the combined weight coefficient is shown in table 1.
TABLE 1
Subjective weight coefficient Objective weight coefficient Combining weight coefficients
Accuracy of 0.1599 0.6784 0.3597
Degree of perfection 0.0536 0.0565 0.0547
Degree of conformity 0.3324 0.2261 0.2915
Degree of timeliness 0.0460 0.0071 0.0310
Accuracy of 0.3172 0.0283 0.2059
Redundancy 0.0908 0.0035 0.0572
(4) Calculating a data quality evaluation result:
according to the quantitative value of the data quality evaluation index and the corresponding comprehensive weight coefficient, the score of the data quality of the secondary equipment of the transformer substation can be obtained
Figure BDA0003174894940000181
Since the class section of the data quality can be freely divided according to the actual situation of each secondary device of the substation, the data quality of the secondary device of the substation is "excellent" because 96.2988 ∈ [95,100] when the section set in the above embodiment is adopted.
The data quality evaluation method can combine a subjective weighting method and an objective weighting method, and weights each data quality evaluation quality through a combined weighting method to obtain a comprehensive weight coefficient, so that the problem that evaluation results are inconsistent with reality due to the fact that the meaning and the mutual relation of indexes cannot be fully considered through a single weighting method in the traditional technology is solved, the preference of a decision maker on the evaluation indexes is reflected through the combined weighting method, the subjective randomness of weighting is reduced, the accuracy of the weight coefficients is improved, the accuracy of the data quality evaluation results is further improved, the evaluation results are real and reliable, and the stable operation of secondary equipment of the transformer substation is further ensured.
It should be understood that although the steps in the flowcharts of fig. 2, 3, 6-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 3, 6-8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or in alternation with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 10, there is provided a data quality evaluation apparatus including: index quantitative value acquisition module 11, empowerment module 12 and evaluation result calculation module 13, wherein:
the index quantitative value obtaining module 11 is configured to obtain a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the substation, where the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
the weighting module 12 is used for weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weight coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and the evaluation result calculation module 13 is configured to calculate a data quality evaluation result of the substation data according to the quantization value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the empowerment module 12 comprises: subjective weighting unit, objective weighting unit and comprehensive weight calculating unit, wherein,
the subjective weighting unit is used for subjectively weighting each data quality evaluation index by adopting a sequence relation method to obtain a subjective weighting coefficient of each data quality evaluation index;
the objective weighting unit is used for performing objective weighting on each data quality evaluation index by adopting a variation coefficient method to obtain an objective weighting coefficient of each data quality evaluation index;
and the comprehensive weight calculating unit is used for acquiring the comprehensive weight coefficient corresponding to each data quality evaluation index through the subjective weight coefficient and the objective weight coefficient.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the subjective weighting unit includes: a ranking subunit, a weighting subunit, an evaluation scale calculation subunit, and a subjective weight calculation subunit, wherein,
the ordering subunit is used for ordering the importance of each data quality evaluation index to obtain an importance sequence;
the weighting subunit is used for weighting each data quality evaluation index to obtain an initial weighting coefficient of each data quality evaluation index;
an evaluation scale calculation subunit, configured to calculate a weight evaluation scale according to the importance sequence and the initial weight coefficient;
and the subjective weight calculating subunit is used for determining the subjective weight coefficient of each data quality evaluation index according to the weight evaluation scale.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the objective weighting unit comprises: a first calculating subunit and an objective weight calculating subunit, wherein:
the first calculating subunit is used for acquiring the average value and the standard deviation of each data quality evaluation index according to the quantized value of each data quality evaluation index;
and the objective weight calculating subunit is used for calculating objective weight coefficients of the data quality evaluation indexes through the average value and the standard deviation.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the integrated weight calculation unit includes: a second computation subunit and a third computation subunit, wherein:
the second calculating subunit is used for acquiring a first relative importance degree corresponding to the viewing weight coefficient and a second relative importance degree corresponding to the objective weight coefficient;
and the third calculating subunit is used for calculating the comprehensive weight coefficient by using a combined weighting method through the first relative importance degree, the second relative importance degree, the subjective weight coefficient and the objective weight coefficient.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the index quantized value obtaining module 11 includes: a substation data acquisition unit, a clustering unit and an index quantitative value calculating operator unit, wherein,
acquiring substation data;
clustering transformer substation data through an improved K-means algorithm to obtain a plurality of data sets;
and calculating the quantitative value of the data quality evaluation index of each data set through a data quality evaluation algorithm.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the clustering unit includes: a fourth calculation subunit, a data processing subunit and a clustering subunit, wherein,
the fourth calculating subunit is used for acquiring the density, the average distance and the weight of each data point in the substation data;
the data processing subunit is used for processing the transformer substation data through the density, the average distance and the weight to obtain a clustering center and a clustering number;
and the clustering subunit is used for clustering the transformer substation data by adopting a K-means algorithm based on the clustering center and the clustering number to obtain a plurality of data sets.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
In one embodiment, the data quality evaluation indexes comprise accuracy, perfectness, consistency, timeliness, accuracy and redundancy of the substation data;
wherein, the accuracy is determined according to the total number of data in the data set, the number of data with inaccurate precision, the number of data with a range not meeting a threshold value, the number of data with invalid data bits and the number of data with redundant records;
the completeness is determined according to the total number of data in the data set, the number of invalid data bits, the number of null data and the number of redundant data;
the consistency is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant records, the number of data with abnormal reference consistency of the same data and the number of data with abnormal logic consistency of different data;
the timeliness is determined according to the total number of data in the data set, the number of invalid data of data bits, the number of data for recording redundancy and the number of data which is not updated timely;
the accuracy is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant data records, the number of data with incorrect format and the number of data with incorrect length;
the redundancy is determined according to the total number of data in the data set, the number of data with invalid data bits, the number of data with redundant records, the total number of identical data in each row and the total number of identical data in each column.
The data quality evaluation device provided in this embodiment may implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
For specific limitations of the data quality evaluation device, reference may be made to the above limitations of the data quality evaluation method, which are not described herein again. All or part of the modules in the data quality evaluation device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing substation data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data quality evaluation method.
Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
obtaining a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and calculating to obtain a data quality evaluation result of the substation data according to the quantized value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
In one embodiment, a storage medium is provided having a computer program stored thereon, the computer program when executed by a processor implementing the steps of:
obtaining a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and calculating to obtain a data quality evaluation result of the substation data according to the quantized value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to the computer program, which can be stored in a non-volatile computer readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (11)

1. A data quality evaluation method, characterized in that the method comprises:
obtaining a quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, wherein the combined weighting method comprises a subjective weighting method and an objective weighting method;
and calculating to obtain a data quality evaluation result of the substation data according to the quantization value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
2. The method of claim 1, wherein the weighting each of the data quality evaluation indicators by a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each of the data quality evaluation indicators comprises:
subjectively weighting each data quality evaluation index by adopting a sequence relation method to obtain a subjective weight coefficient of each data quality evaluation index;
performing objective weighting on each data quality evaluation index by using a variation coefficient method to obtain an objective weighting coefficient of each data quality evaluation index;
and acquiring a comprehensive weight coefficient corresponding to each data quality evaluation index according to the subjective weight coefficient and the objective weight coefficient.
3. The method of claim 2, wherein subjectively weighting each of the data quality assessment indicators by using an order relation method to obtain a subjective weighting factor of each of the data quality assessment indicators comprises:
ordering the importance of each data quality evaluation index to obtain an importance sequence;
weighting each data quality evaluation index to obtain an initial weight coefficient of each data quality evaluation index;
calculating a weight evaluation scale according to the importance sequence and the initial weight coefficient;
and determining a subjective weight coefficient of each data quality evaluation index according to the weight evaluation scale.
4. The method according to claim 2, wherein the objective weighting of each data quality evaluation index by using a coefficient of variation method to obtain an objective weighting coefficient of each data quality evaluation index comprises:
acquiring the average value and the standard deviation of each data quality evaluation index according to the quantization value of each data quality evaluation index;
and calculating objective weight coefficients of the data quality evaluation indexes according to the average value and the standard deviation.
5. The method according to claim 2, wherein the obtaining of the comprehensive weight coefficient corresponding to the data quality evaluation index through the subjective weight coefficient and the objective weight coefficient includes:
acquiring a first relative importance degree corresponding to the subjective weight coefficient and a second relative importance degree corresponding to the objective weight coefficient;
and calculating the comprehensive weight coefficient by using a combined weighting method according to the first relative importance degree, the second relative importance degree, the subjective weight coefficient and the objective weight coefficient.
6. The method according to any one of claims 1 to 5, wherein the obtaining of the quantitative value of each data quality evaluation index in the data quality evaluation system according to the substation data comprises:
acquiring the transformer substation data;
clustering the transformer substation data through an improved K-means algorithm to obtain a plurality of data sets;
and calculating the quantitative value of the data quality evaluation index of each data set through a data quality evaluation algorithm.
7. The method of claim 6, wherein the clustering the substation data by the modified K-means algorithm results in a plurality of data sets, comprising:
acquiring the density, the average distance and the weight of each data point in the substation data;
processing the substation data to obtain a clustering center and a clustering number according to the density, the average distance and the weight;
and based on the clustering centers and the clustering numbers, clustering the transformer substation data by adopting a K-means algorithm to obtain a plurality of data sets.
8. The method of claim 6, wherein the data quality assessment indicators include accuracy, perfection, consistency, timeliness, accuracy, and redundancy of the substation data;
wherein the accuracy is determined according to the total number of data in the data set, the number of data with inaccurate precision, the number of data with a range not meeting a threshold, the number of data with invalid data bits, and the number of data with redundant records;
the perfectness is determined according to the total number of data in the data set, the number of invalid data of the data bits, the number of null data and the number of redundant data;
the consistency is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant data records, the number of data with abnormal reference consistency of the same data and the number of data with abnormal logic consistency of different data;
the timeliness is determined according to the total number of data in the data set, the number of invalid data of the data bits, the number of redundant data to be recorded and the number of data which is not updated timely;
the accuracy is determined according to the total number of data in the data set, the number of invalid data bits, the number of redundant data records, the number of data with incorrect format and the number of data with incorrect length;
the redundancy is determined according to the total number of data in the data set, the number of invalid data of the data bits, the number of redundant data of the record, the total number of the same data in each row and the total number of the same data in each column.
9. A data quality evaluation apparatus, characterized in that the apparatus comprises:
the index quantitative value acquisition module is used for acquiring the quantitative value of each data quality evaluation index in a data quality evaluation system according to the data of the transformer substation, wherein the number of the data quality evaluation indexes in the data quality evaluation system is greater than a preset number threshold;
the weighting module is used for weighting each data quality evaluation index through a combined weighting method to obtain a comprehensive weighting coefficient corresponding to each data quality evaluation index, and the combined weighting method comprises a subjective weighting method and an objective weighting method;
and the evaluation result calculation module is used for calculating the data quality evaluation result of the substation data according to the quantized value of each data quality evaluation index and the comprehensive weight coefficient corresponding to each data quality evaluation index.
10. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.
11. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, realizing the steps of the method according to any one of claims 1 to 8.
CN202110829306.8A 2021-07-22 2021-07-22 Data quality evaluation method and device, computer equipment and readable storage medium Pending CN113469571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110829306.8A CN113469571A (en) 2021-07-22 2021-07-22 Data quality evaluation method and device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110829306.8A CN113469571A (en) 2021-07-22 2021-07-22 Data quality evaluation method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113469571A true CN113469571A (en) 2021-10-01

Family

ID=77881892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110829306.8A Pending CN113469571A (en) 2021-07-22 2021-07-22 Data quality evaluation method and device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113469571A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641825A (en) * 2021-10-15 2021-11-12 人民法院信息技术服务中心 Smart court system big data processing method and device based on objective information theory
CN116028838A (en) * 2023-01-09 2023-04-28 广东电网有限责任公司 Clustering algorithm-based energy data processing method and device and terminal equipment
CN117273552A (en) * 2023-11-22 2023-12-22 山东顺国电子科技有限公司 Big data intelligent treatment decision-making method and system based on machine learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641825A (en) * 2021-10-15 2021-11-12 人民法院信息技术服务中心 Smart court system big data processing method and device based on objective information theory
CN116028838A (en) * 2023-01-09 2023-04-28 广东电网有限责任公司 Clustering algorithm-based energy data processing method and device and terminal equipment
CN116028838B (en) * 2023-01-09 2023-09-19 广东电网有限责任公司 Clustering algorithm-based energy data processing method and device and terminal equipment
CN117273552A (en) * 2023-11-22 2023-12-22 山东顺国电子科技有限公司 Big data intelligent treatment decision-making method and system based on machine learning
CN117273552B (en) * 2023-11-22 2024-02-13 山东顺国电子科技有限公司 Big data intelligent treatment decision-making method and system based on machine learning

Similar Documents

Publication Publication Date Title
CN113469571A (en) Data quality evaluation method and device, computer equipment and readable storage medium
CN109409628B (en) Acquisition terminal manufacturer evaluation method based on metering big data clustering model
CN105930976B (en) Node voltage sag severity comprehensive evaluation method based on weighted ideal point method
CN109389145A (en) Electric energy meter production firm evaluation method based on metering big data Clustering Model
CN111062620B (en) Intelligent electric power charging fairness analysis system and method based on hybrid charging data
CN112633679A (en) Information quality quantization method, information quality quantization device, computer equipment and storage medium
CN113744089B (en) Transformer area household variable relation identification method and device
CN109389282A (en) A kind of electric energy meter production firm evaluation method based on gauss hybrid models
CN111709668A (en) Power grid equipment parameter risk identification method and device based on data mining technology
CN111859299A (en) Big data index construction method, device, equipment and storage medium
CN116482540A (en) Analysis and prediction method, device and system for battery voltage inconsistency
CN114862229A (en) Power quality evaluation method and device, computer equipment and storage medium
CN113469570A (en) Information quality evaluation model construction method, device, equipment and storage medium
CN114971345A (en) Quality measuring method, equipment and storage medium for built environment
CN114298659A (en) Data processing method and device for evaluation object index and computer equipment
CN114676749A (en) Power distribution network operation data abnormity judgment method based on data mining
CN105303194A (en) Power grid indicator system establishing method, device and computing apparatus
CN116400266A (en) Transformer fault detection method, device and medium based on digital twin model
Jiang et al. SRGM decision model considering cost-reliability
CN115473219A (en) Load prediction method, load prediction device, computer equipment and storage medium
CN113450142B (en) Clustering analysis method and device for power consumption behaviors of power customers
CN115796665A (en) Multi-index carbon efficiency grading evaluation method and device for green energy power generation project
CN109389281A (en) A kind of acquisition terminal production firm evaluation method based on gauss hybrid models
CN112256735B (en) Power consumption monitoring method and device, computer equipment and storage medium
CN115051363A (en) Distribution network area user change relation identification method and device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination