CN117555891A - Data quality management method and device - Google Patents

Data quality management method and device Download PDF

Info

Publication number
CN117555891A
CN117555891A CN202311773093.7A CN202311773093A CN117555891A CN 117555891 A CN117555891 A CN 117555891A CN 202311773093 A CN202311773093 A CN 202311773093A CN 117555891 A CN117555891 A CN 117555891A
Authority
CN
China
Prior art keywords
index
index values
kth group
data
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311773093.7A
Other languages
Chinese (zh)
Inventor
冀晨宇
苟晓东
王世睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ant Blockchain Technology Shanghai Co Ltd
Original Assignee
Ant Blockchain Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ant Blockchain Technology Shanghai Co Ltd filed Critical Ant Blockchain Technology Shanghai Co Ltd
Priority to CN202311773093.7A priority Critical patent/CN117555891A/en
Publication of CN117555891A publication Critical patent/CN117555891A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Factory Administration (AREA)

Abstract

A data quality management method and apparatus, the method comprising: determining a kth group of index values corresponding to a kth group of state data of the target equipment, wherein the kth group of state data comprises a plurality of state records reported by the target equipment, and the kth group of index values comprise m index values corresponding to m evaluation indexes; and processing the kth group of index values according to m comprehensive weights corresponding to the m evaluation indexes to obtain the quality score of the kth group of state data, wherein the comprehensive weight corresponding to any jth evaluation index is calculated based on the objective weight and subjective weight corresponding to the jth evaluation index, the objective weight corresponding to each of the m evaluation indexes is calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target equipment.

Description

Data quality management method and device
Technical Field
The embodiment of the specification belongs to the technical field of computers, and particularly relates to a data quality management method and device.
Background
With the advent of digital wave, internet of things technology has rapidly developed, and more internet of things devices and sensors are connected to a network, so that a large amount of internet of things data is generated. However, due to the influence of factors such as equipment quality and environment, the quality of the data of the internet of things is uneven, and the high-quality data of the internet of things is an important factor influencing downstream data application and business decision.
A new solution is desired in order to be able to evaluate the data quality more accurately.
Disclosure of Invention
The invention aims to provide a data quality management method and device.
In a first aspect, a data quality management method is provided, including: determining a kth group of index values corresponding to a kth group of state data of target equipment, wherein the kth group of state data comprises a plurality of state records reported by the target equipment, and the kth group of index values comprise m index values corresponding to m evaluation indexes; and processing the kth group of index values according to m comprehensive weights corresponding to the m evaluation indexes to obtain quality scores of the kth group of state data, wherein the comprehensive weight corresponding to any jth evaluation index is calculated based on the objective weight and subjective weight corresponding to the jth evaluation index, the objective weight corresponding to each of the m evaluation indexes is calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target equipment.
In a second aspect, there is provided a data quality management apparatus comprising: the index calculation unit is configured to determine a kth group of index values corresponding to a kth group of state data of the target equipment, wherein the kth group of state data comprises a plurality of state records reported by the target equipment, and the kth group of index values comprise m index values corresponding to m evaluation indexes; the quality calculation unit is configured to process the kth group of index values according to m comprehensive weights corresponding to the m evaluation indexes to obtain quality scores of the kth group of state data, wherein the comprehensive weight corresponding to any jth evaluation index is calculated based on the corresponding objective weight and subjective weight, the objective weight corresponding to each of the m evaluation indexes is calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target equipment.
In a third aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method provided in the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computing device, performs the method provided in the first aspect.
According to the technical scheme provided by the embodiment of the specification, based on n groups of history index values corresponding to n groups of history state data of the target equipment, objective weights corresponding to m evaluation indexes set on the target equipment are determined, comprehensive weights corresponding to the m evaluation indexes are comprehensively evaluated based on subjective weights and objective weights corresponding to the m evaluation indexes, further, index values corresponding to state data needing quality evaluation under the m evaluation indexes are processed based on the comprehensive weights corresponding to the m evaluation indexes, and the obtained quality scores can more accurately express the data quality of the corresponding state data, so that downstream business can be better carried out based on the quality scores of the state data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a technical scenario of a technical solution provided in an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for data quality management provided in an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a process for maintaining a set of metrics as exemplarily provided in an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a data quality management device according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solution in the present specification better understood by those skilled in the art, the technical solution in the embodiments of the present specification will be clearly and completely described in the following with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The internet of things device mainly comprises various types of intelligent home devices, intelligent wearing devices, industrial internet of things devices and the like. The intelligent household equipment is household equipment which realizes intelligence through the internet of things technology, and can comprise various types of equipment such as an intelligent door lock, an intelligent camera, an intelligent socket, intelligent lamplight and the like. The intelligent wearable device is a wearable device which integrates a sensor, a computing unit and a communication unit into articles such as clothes, watches and glasses and has the functions of data acquisition, processing and transmission, and can comprise various types of devices such as intelligent watches and intelligent glasses, for example. The industrial internet of things equipment refers to internet of things equipment applied to the fields of production and manufacture, traffic logistics, urban construction and the like, such as various types of equipment including sensors, industrial robots, intelligent traffic equipment and the like. The internet of things device may also include other types than the foregoing examples, such as smart medical devices and the like; the internet of things devices may also include other classes of devices than the foregoing examples, such as battery devices, smart cars, and photovoltaic devices.
The internet of things device may collect and report the state record according to certain rules, for example, collect and report the state record according to a predetermined time interval T1. For any internet of things device, a single status record collected and reported by the internet of things device generally includes field values corresponding to a plurality of preset fields, and the field values can indicate the running status of the internet of things device on a certain level.
The data records collected by the internet of things devices of different categories may correspond to different preset fields. For example, the status record of the battery device may include field values corresponding to a plurality of preset fields such as "battery power", "battery voltage", "battery health", and "reporting time"; the state record of the intelligent automobile may include field values corresponding to a plurality of preset fields such as a driving mileage, a driving speed, an azimuth angle, a driving state, a reporting time and the like.
Because the state record collected by the internet of things device generally comprises a plurality of field values, the field values corresponding to different fields generally need to be collected through different functional modules, and factors such as device quality, environmental conditions and the like can possibly influence the collection of the field values contained in the state record, and the like, the quality problems such as data loss, numerical value abnormality, poor timeliness and the like of the state record can be caused. Quality problems with status records may affect the proper performance of downstream traffic. For example, it may be difficult for related downstream applications to implement other services using poor quality internet of things data; and for example, the running state of the related internet of things equipment is difficult to timely and accurately regulate and control by the related downstream application, so that the internet of things equipment is difficult to efficiently run to achieve a specific business purpose.
The embodiment of the specification at least provides a data quality management method and device. The method comprises the steps that firstly, a kth group of index values corresponding to kth group of state data of target equipment are determined, the kth group of state data comprise a plurality of state records reported by the target equipment, and the kth group of index values comprise m index values corresponding to m evaluation indexes; and then, processing the kth group of index values according to m comprehensive weights corresponding to m evaluation indexes to obtain the quality score of the kth group of state data, wherein the comprehensive weight corresponding to any jth evaluation index is calculated based on the objective weight and subjective weight corresponding to the jth evaluation index, the objective weight corresponding to each of the m evaluation indexes is calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target equipment.
In the embodiment of the specification, based on n sets of historical index values corresponding to n sets of historical state data of the target device, objective weights corresponding to m evaluation indexes set on the target device are determined, comprehensive weights corresponding to the m evaluation indexes are comprehensively evaluated based on subjective weights and objective weights corresponding to the m evaluation indexes, further, the index values corresponding to the state data needing to be subjected to quality evaluation under the m evaluation indexes are processed based on the comprehensive weights corresponding to the m evaluation indexes, and the obtained quality scores can more accurately express the data quality of the corresponding state data, so that downstream services can be better performed based on the quality scores of the state data.
Fig. 1 is a schematic diagram of a technical scenario of a technical solution provided in an embodiment of the present disclosure. Referring to fig. 1, a data quality management apparatus for quality evaluation of status data of a target device may be deployed in a cloud computing platform in the form of a function-as-a-service (Function as aService, faaS) based on various possible containerization techniques; the cloud computing platform is built based on a plurality of computing devices, storage devices and network devices. Any target device, such as the aforementioned internet of things devices, e.g., smart cars or battery devices, may be communicatively coupled to the data quality management apparatus. The target equipment can automatically or repeatedly collect and report the state record of the target equipment to the data quality management device under the triggering of a user; for example, the intelligent automobile can periodically and automatically collect and report its own state record to the data quality management device according to a preset time interval T1. The data quality management device can automatically or under the triggering of a user, perform quality evaluation on one or more state records which are reported by any equipment and are not processed currently for a plurality of times, and obtain corresponding quality scores; for example, the data quality management device may periodically perform quality evaluation on the status record that has been reported and is not currently processed by the smart car at a preset time interval T2, where T2 is not less than T1.
The data quality management device can be used for configuring various object models for various types of internet of things equipment. The object model refers to a digital representation of an entity (such as a sensor, an on-board device, a building, a factory, etc.) in a physical space in a virtual space, and dimensions such as a attribute, a service, an event, etc. describe what the entity is, what can do, and which information can be provided to the outside, respectively. More specifically, the object model of the target device may describe the device class to which the target device belongs, the field values corresponding to which fields (i.e. a plurality of preset fields) are required or should be included in the state record collected and reported by the target device, and the data type of the field values under each preset field, so that the data quality management apparatus receives the structured state record through the data collecting unit.
The data acquisition unit of the data quality management device can receive the structured state records from the corresponding type of the Internet of things equipment by utilizing different object models by configuring the corresponding object models of the various types of the Internet of things equipment, so that the quality scores of the various types of the Internet of things equipment can be respectively calculated for a plurality of state records reported by the various types of the equipment.
Continuing with the previous example, the object model X1 corresponding to the battery device may define, for example, that its reported structured status record includes: the field values corresponding to a plurality of preset fields such as ' battery power ', ' battery voltage ', ' battery health ', ' reporting time ', ' and the like; the corresponding X2 object model of the intelligent automobile may define, for example, that the reported state record includes: corresponding to field values under a plurality of preset fields such as 'driving mileage', 'driving speed', 'azimuth', 'driving status', and 'reporting time'. If the object model X1 and the object model X2 are preconfigured by the data quality management device, the data quality management device may receive a plurality of state records reported by the battery device and the intelligent automobile respectively through the object model X1 and the object model X2, and further may calculate mass fractions of the plurality of state records reported by the battery device and the intelligent automobile respectively.
For a plurality of preset fields included/defined by the arbitrary object model, the data quality management device may further set an abnormal value determination rule corresponding to any y-th field in the plurality of preset fields through a corresponding rule configuration unit, where the abnormal value determination rule is used to determine whether the y-th field value in the state record belongs to an abnormal value after the data quality management device receives the state record reported by the corresponding internet of things device through the object model. The abnormal value determination rule may be set in combination with a specific service scenario, for example, a preset field "battery power" defined in an object model of "battery equipment", and the corresponding determination rule may include: for a field value corresponding to a preset field 'battery power' in the state record, if the unit of the field value is not "%", the field value belongs to an abnormal value, and if the value range of the field value exceeds a section [0, 100], the field value belongs to the abnormal value.
Fig. 2 is a flowchart of a data management method according to an embodiment of the present disclosure. The method, which illustratively describes the process of quality assessment of several status records reported by a target device, may be performed by any apparatus, device, platform or cluster of devices having computing/processing capabilities, such as by a data quality management apparatus deployed in a cloud computing platform.
Referring to fig. 2, the method may include, but is not limited to, some or all of the following steps S201 to S205.
Step S201, determining a kth group of index values corresponding to a kth group of status data of the target device, where the kth group of status data includes a plurality of status records reported by the target device, and the kth group of index values includes m index values corresponding to m evaluation indexes.
The relevant object may execute step S201 automatically or under the triggering of the user, and hereinafter, the description will be mainly given by taking the periodic acquisition and reporting of the state record of the target device to the data quality management device according to the preset time interval T1, and the periodic processing of the state record reported by the target device by the data quality management device according to the preset time interval T2 as an example, where T2 is not less than T1.
For any kth time interval, the data quality management apparatus may receive, in the kth time interval, a number of status records reported by the target device, where the number of status records may correspond to the kth set of status data as the target device.
The single state record includes field values corresponding to a plurality of preset fields. Wherein referring to the foregoing example, a plurality of preset fields corresponding to the state record may be defined in an object model of the target device, and the data quality management apparatus may receive, from the target device, a structured state record including field values corresponding to the plurality of preset fields through the object model of the target device.
The m evaluation indexes include at least one of the following indexes: integrity, accuracy, timeliness, and consistency.
The target device shall report the a-state record in the kth time interval, the single state record shall include b field values corresponding to b preset fields, and the missing field values in the data record are represented by using the predetermined character Null as an example, and each evaluation index and its corresponding index value in the foregoing examples are exemplarily described. Where it will be appreciated that the predetermined character used to represent the missing field value may be a character other than Null.
Integrity may generally be determined based on the null rate. Illustratively, the k-th set of status data should include a total field value of a×b; however, the number of the status records reported by the target device in the kth time interval may be less than a, and in any xth status record reported by the target device in the kth time interval, the field value corresponding to one or more preset fields may be missing (i.e. Null value) so as to cause that c field values are cumulatively missing in the a status record; the null rate is c/(a×b). Accordingly, the index value corresponding to the evaluation index "integrity" in the kth index value is, for example, 1-c/(a×b).
Accuracy can typically be expressed in terms of the proportion of the total field value that is occupied by the correct field value. For example, an abnormal value judgment rule corresponding to any y-th preset field in the b preset fields can be preset; for any y field value in any x-th state record which is not Null, determining whether the y field value in the x-th state record is known to be an abnormal value according to a preset abnormal value judgment rule corresponding to a y preset field; and further determining the accuracy of the kth group of state data according to the accumulated number d of the determined abnormal values in the plurality of state records. For example, the index value corresponding to the evaluation index "accuracy" in the kth group of index values may be (a×b-d)/(a×b).
Timeliness may generally be determined based on the average latency and/or update frequency of several state records. The partial state records may include reporting time, and the delay degree of the single state record may be determined based on the difference between the reporting time and the acquisition time, wherein the acquisition time may be included in the state record, or may be determined based on a time interval T1 of periodic acquisition of the state records by the target device; when the target device periodically collects the state records according to the time interval T1, the update frequency is determined based on the time interval T1. Correspondingly, the index value corresponding to the evaluation index "timeliness" in the kth group of index values can be calculated based on the preset reference delay degree and the reference update frequency, wherein the index value corresponding to the evaluation index "timeliness" is inversely related to the average delay degree, and the index value corresponding to the evaluation index "timeliness" is positively related to the update frequency.
Consistency may generally be determined based on the number of field values that are contradictory or conflicting under the same field. For example, a corresponding conflict detection rule can be set for any y-th field in the b preset fields, and a plurality of field values corresponding to the y-th field in a plurality of state records are processed through the conflict detection rule to obtain a conflict value of the y-th field; and further determining the consistency of the kth group of state data according to the accumulated number e of the determined conflict values in the state records. In one example, the index value corresponding to the evaluation index "consistency" in the kth group of index values may be (a×b-e)/(a×b).
Step S203, obtaining m comprehensive weights corresponding to m evaluation indexes, where the comprehensive weight corresponding to any jth evaluation index in the m evaluation indexes is calculated based on the objective weight and subjective weight corresponding to the j evaluation indexes, and the objective weights corresponding to the m evaluation indexes are calculated based on n sets of history index values, where the n sets of history index values belong to the first k sets of index values corresponding to the first k sets of state data of the target device.
When k is 1, the comprehensive weight corresponding to any jth evaluation index may be a reference value configured in advance.
When k is greater than 1, the n sets of history index values may be k-1 sets of index values corresponding to the first k-1 sets of status data of the target device, or may be k sets of index values corresponding to the first k sets of status data of the target device.
When the n sets of history index values do not include the kth set of index values, the m comprehensive weights corresponding to the m evaluation indexes may be calculated in advance, and step S203 only needs to read the m comprehensive weights from the designated storage location, so as to quickly complete the calculation of the quality score of the kth set of state data. It is understood that, even if the n sets of history index values do not include the kth set of index values, m comprehensive weights corresponding to the m evaluation indexes may be calculated after the kth set of index values are obtained.
When k is relatively large, it can also be achieved by setting a screening rule: and screening n groups of historical index values with the number not more than k from k-1 groups or k groups of index values corresponding to the previous k-1 groups or k groups of state data of the target equipment. For example, referring to fig. 3, a set of metrics may be established, and the aforementioned n may be the number of elements included in the set of metrics. For any kth group of state data, if the calculated quality score of the kth group of state data is larger than a preset threshold value, adding the kth group of index values corresponding to the kth group of state data as an element into an index set; if the number of elements included in the index set exceeds a preset value L, the element added to the index set first in time is deleted from the index set according to the time when the element is added to the index set. Thus, each element included in the index set is n groups of history index values.
An objective weight method can be adopted to calculate the objective weights corresponding to the m evaluation indexes based on the n groups of historical index values.
The foregoing objective weighting methods may include, but are not limited to, entropy weighting methods, principal component analysis methods, and standard deviation methods.
An example of calculating objective weights corresponding to the m evaluation indexes by using the entropy weight method is described below.
First, an original matrix r= (R) can be constructed based on n sets of history indexes and m evaluation indexes ij ) n*m Wherein r is ij Characterizing a standard value corresponding to an index value corresponding to a j-th evaluation index in the i-th set of historical indexes, and if the index value under the j-th evaluation index is positively correlated with the quality of the state data, r ij =(x ij -x min )/(x max -x min ) R if the index value at the jth evaluation index is inversely related to the quality of the state data ij =(x max -x ij )/(x max -x min ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is ij Characterizing index value, x corresponding to the j-th evaluation index in the i-th group history index min Characterizing the minimum value, x, of n index values corresponding to the jth evaluation index in the n groups of history indexes max And representing the maximum value of n index values corresponding to the j-th evaluation index in the n groups of history indexes. It should be noted that, if the larger the index value under a certain evaluation index is, the better the corresponding state data quality is, the index value under the evaluation index is positively correlated with the state data quality; if the quality of the corresponding state data is worse as the index value under a certain evaluation index is larger, the index value under the evaluation index is inversely related to the quality of the state data. It can be understood that, in the embodiment of the present disclosure, the index values corresponding to the integrity, accuracy, timeliness, consistency, and other evaluation indexes are all positively correlated with the quality of the state data.
Then, the original matrix is normalized to obtain a normalized matrix p= (P) ij ) n*m Wherein
Then, the entropy value of the j-th evaluation index is calculatedK=1/lnn satisfies E j Greater than 0.
Then, the information entropy redundancy (difference) d is calculated j =|1-E j |。
Finally, calculating objective weight corresponding to the j-th evaluation index
The subjective weight method can be adopted to calculate the subjective weight corresponding to each of the m evaluation indexes.
The foregoing subjective weighting methods may include, but are not limited to, analytic hierarchy, best-case, and delta-film.
An exemplary description will be made below taking the subjective weight corresponding to each of the m evaluation indexes calculated by using the analytic hierarchy process as an example.
First, a comparison matrix r= (R) of m rows and m columns is constructed by using a scale of relative importance ij ) m*m Wherein r is ij The importance degree of the ith evaluation index to the jth evaluation index is represented, and the importance degree can be a natural number in 1-9. r is (r) ij The value of 1 indicates that the ith evaluation index is as important as the jth evaluation index, r ij When the values of (a) are 3, 5, 7 and 9 in sequence, the importance degree of the ith evaluation index to the jth evaluation index is represented as 'slightly important', 'generally important', 'very important' and 'extremely important', r ij The values of (2), (4), (6) and (8) are the intermediate degrees of the corresponding values.
Then, the relative normalized weight of the jth evaluation index, in other words, the subjective weight of the jth evaluation index is calculatedSubjective weight->Wherein->
Calculating objective weight corresponding to the j-th evaluation index by the various possible methodsAnd subjective weight->Then, the objective weight is based on +.>And subjective weight->And calculating the comprehensive weight corresponding to the j-th evaluation index. For example, an objective weight coefficient w of the objective weight may be preset o Presetting a subjective weight coefficient w corresponding to the subjective weight s Wherein the objective weight coefficient w o And subjective weight coefficient w s Sum of 1, e.g. objective weight coefficient w o And subjective weight coefficient w s May be 0.4 and 0.6, respectively; and then calculate the comprehensive weight corresponding to the j-th evaluation index as
Step S205, the k group index value is processed according to m comprehensive weights corresponding to the m evaluation indexes, and the quality score of the k group state data of the target equipment is obtained.
The quality score of the kth set of state data of the target device may be obtained by weighted summation of the m index values included in the kth set of index values according to m comprehensive weights corresponding to the m evaluation indexes.
In some embodiments, a corresponding visual report may be generated based on the kth group of state data, the kth group of index values corresponding to the kth group of state data, and the quality score of the kth group of state data, so as to clearly show the quality condition of the kth group of state data to the user, and support other operations such as querying, analyzing, tracing, and the like, performed on the kth group of state data.
In some embodiments, when the quality score of the kth group of state data is lower than a certain preset threshold, the user may be further timely alerted through a corresponding rule, so that the user can timely take countermeasures, such as checking the working state of the target device, stopping the continuous use of the kth group of state data in the downstream application to perform the corresponding service, and so on.
In some embodiments, when the quality score of the kth group of state data reaches a certain preset threshold, the kth group of index values corresponding to the kth group of state data may be added as an element to the index set in the foregoing example, so that when the quality scores of the kth+1 group of the target device and the following groups of state data are calculated in the subsequent process, the kth group of index values are used as a group of historical index values, so as to support calculation of subjective weights and objective weights corresponding to m evaluation indexes respectively.
The same concept as the foregoing method embodiment is based on, and a data quality management apparatus 400 is also provided in the present embodiment. As described with reference to fig. 4, the data quality management apparatus includes: an index calculation unit 401, configured to determine a kth set of index values corresponding to a kth set of state data of a target device, where the kth set of state data includes a plurality of state records reported by the target device, and the kth set of index values includes m index values corresponding to m evaluation indexes; the quality calculating unit 403 is configured to process the kth group of index values according to m comprehensive weights corresponding to the m evaluation indexes, so as to obtain a quality score of the kth group of state data, wherein the comprehensive weight corresponding to any jth evaluation index is calculated based on the objective weight and the subjective weight corresponding to the jth evaluation index, the subjective weight and the objective weight corresponding to each of the m evaluation indexes are calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target device.
In one possible embodiment, the m evaluation indexes include accuracy; the single state record comprises field values corresponding to a plurality of preset fields; the apparatus 400 further comprises: a rule configuration unit 405 configured to preset, for any y-th preset field of the plurality of preset fields, an outlier determination rule corresponding to the y-th preset field; the quality calculating unit 403 is configured to determine, for a y-th field value in any x-th state record, whether the y-th field value in the x-th state record is known to be an outlier according to an outlier determination rule corresponding to the y-th preset field; and determining the accuracy of the k-th set of state data according to the determined accumulated number of the abnormal values.
In one possible implementation, the apparatus 400 further includes: the data collection unit 407 is configured to receive, through a preconfigured object model corresponding to the target device, any xth status record reported by the target device, where the object model includes a device class of the target device, a plurality of preset fields, and data formats required by the plurality of preset fields.
There is further provided in embodiments of the present specification a computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method provided in any embodiment of the present specification.
There is also provided in embodiments of the present specification a computer readable storage medium having stored thereon a computer program/instructions which, when executed in a computer, cause the computer to perform the method provided in any of the embodiments of the present specification.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation device is a server system. Of course, the present application does not exclude that as future computer technology evolves, the computer implementing the functions of the above-described embodiments may be, for example, a personal computer, a laptop computer, a car-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Although one or more embodiments of the present description provide method operational steps as described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in an actual device or end product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment) as illustrated by the embodiments or by the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For example, if first, second, etc. words are used to indicate a name, but not any particular order.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when one or more of the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage, graphene storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the various embodiments or examples described in this specification and the features of the various embodiments or examples may be combined and combined by those skilled in the art without contradiction.
The foregoing is merely an example of one or more embodiments of the present specification and is not intended to limit the one or more embodiments of the present specification. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present specification, should be included in the scope of the claims.

Claims (13)

1. A data quality management method, comprising:
determining a kth group of index values corresponding to a kth group of state data of target equipment, wherein the kth group of state data comprises a plurality of state records reported by the target equipment, and the kth group of index values comprise m index values corresponding to m evaluation indexes;
and processing the kth group of index values according to m comprehensive weights corresponding to the m evaluation indexes to obtain quality scores of the kth group of state data, wherein the comprehensive weight corresponding to any jth evaluation index is calculated based on the objective weight and subjective weight corresponding to the jth evaluation index, the objective weight corresponding to each of the m evaluation indexes is calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target equipment.
2. The method according to claim 1, wherein the comprehensive weight corresponding to the jth evaluation index is obtained by weighted summation of the subjective weight and the objective weight corresponding to the jth evaluation index based on a preset subjective weight coefficient and an objective weight coefficient.
3. The method of claim 1, the m evaluation metrics comprising at least one of: integrity, accuracy, timeliness, and consistency.
4. The method of claim 1, the m evaluation metrics comprising accuracy; the single state record comprises field values corresponding to a plurality of preset fields; wherein the method further comprises: presetting an abnormal value judgment rule corresponding to any y preset field in the plurality of preset fields;
the determining the kth group index value corresponding to the kth group state data of the target device comprises the following steps:
for the y field value in any x-th state record, determining whether the y field value in the x-th state record is known to be an abnormal value according to an abnormal value judgment rule corresponding to the y preset field;
and determining the accuracy of the k-th set of state data according to the determined accumulated number of the abnormal values.
5. The method of claim 1, the method further comprising: and receiving any x-th strip record reported by the target equipment through a preconfigured object model corresponding to the target equipment, wherein the object model comprises the equipment category of the target equipment, a plurality of preset fields and data formats required by the preset fields.
6. The method according to claim 1, wherein the objective weights corresponding to the m evaluation indexes are calculated based on one of the following methods: entropy weight method, principal component analysis method and standard deviation method.
7. The method of claim 1, wherein the subjective weights corresponding to the m evaluation indexes are calculated based on one of the following methods: analytic hierarchy process, best-case method, and delta method.
8. The method of any of claims 1-7, performed by a data quality management device deployed in a cloud computing platform in the form of a function-as-a-service.
9. A data quality management apparatus comprising:
the index calculation unit is configured to determine a kth group of index values corresponding to a kth group of state data of the target equipment, wherein the kth group of state data comprises a plurality of state records reported by the target equipment, and the kth group of index values comprise m index values corresponding to m evaluation indexes;
The quality calculation unit is configured to process the kth group of index values according to m comprehensive weights corresponding to the m evaluation indexes to obtain quality scores of the kth group of state data, wherein the comprehensive weights corresponding to any jth evaluation index are calculated based on the objective weights and subjective weights corresponding to the jth evaluation index, the subjective weights and objective weights corresponding to the m evaluation indexes are calculated based on n groups of history index values, and the n groups of history index values belong to the previous k groups of index values corresponding to the previous k groups of state data of the target equipment.
10. The apparatus of claim 9, the m evaluation metrics comprising accuracy; the single state record comprises field values corresponding to a plurality of preset fields; wherein the apparatus further comprises:
a rule configuration unit configured to preset an abnormal value judgment rule corresponding to any y-th preset field in the plurality of preset fields;
the quality calculating unit is configured to determine whether a y field value in any x-th state record is known to be an abnormal value according to an abnormal value judging rule corresponding to the y-th preset field for the y-th field value in any x-th state record; and determining the accuracy of the k-th set of state data according to the determined accumulated number of the abnormal values.
11. The apparatus according to claim 9 or 10, the apparatus further comprising: the data acquisition unit is configured to receive any xth state record reported by the target equipment through a preconfigured object model corresponding to the target equipment, wherein the object model comprises equipment types of the target equipment, a plurality of preset fields and data formats required by the preset fields.
12. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-8.
13. A computer readable storage medium having stored thereon a computer program which, when executed in a computing device, performs the method of any of claims 1-8.
CN202311773093.7A 2023-12-20 2023-12-20 Data quality management method and device Pending CN117555891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311773093.7A CN117555891A (en) 2023-12-20 2023-12-20 Data quality management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311773093.7A CN117555891A (en) 2023-12-20 2023-12-20 Data quality management method and device

Publications (1)

Publication Number Publication Date
CN117555891A true CN117555891A (en) 2024-02-13

Family

ID=89818518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311773093.7A Pending CN117555891A (en) 2023-12-20 2023-12-20 Data quality management method and device

Country Status (1)

Country Link
CN (1) CN117555891A (en)

Similar Documents

Publication Publication Date Title
CN109948641B (en) Abnormal group identification method and device
CN110262937B (en) Identification method and device for index abnormality reasons
CN109714395A (en) Cloud platform resource uses prediction technique and terminal device
CN110443618B (en) Method and device for generating wind control strategy
CN110634030B (en) Method, device and equipment for mining service indexes of applications
CN110046633B (en) Data quality detection method and device
CN109426912B (en) Wind control system optimization method, system and device and electronic equipment
CN109242220A (en) Charging station transaction power predicating method, device, electronic equipment and storage medium
CN115618748B (en) Model optimization method, device, equipment and storage medium
CN110909775A (en) Data processing method and device and electronic equipment
CN115964211A (en) Root cause positioning method, device, equipment and readable medium
CN116896512B (en) Cloud edge cooperative system evaluation method and device, storage medium and electronic equipment
CN116757278B (en) Training method and device of prediction model, storage medium and electronic equipment
CN116882767A (en) Risk prediction method and device based on imperfect heterogeneous relation network diagram
CN110874652B (en) Equipment state evaluation method, device, equipment and system
CN115456801B (en) Artificial intelligence big data wind control system, method and storage medium for personal credit
CN117555891A (en) Data quality management method and device
CN110874601B (en) Method for identifying running state of equipment, state identification model training method and device
CN113157767B (en) Risk data monitoring method, device and equipment
CN111242195A (en) Model, insurance wind control model training method and device and electronic equipment
CN111414945A (en) Method and device for identifying fluctuation of model efficiency
CN115862675B (en) Emotion recognition method, device, equipment and storage medium
CN111104979B (en) Method, device and equipment for generating user behavior value evaluation model
CN116363418A (en) Method and device for training classification model, storage medium and electronic equipment
CN116483622A (en) Method, device and equipment for generating abnormal recovery prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination