CN108595563A - A kind of data quality management method and device - Google Patents

A kind of data quality management method and device Download PDF

Info

Publication number
CN108595563A
CN108595563A CN201810328531.1A CN201810328531A CN108595563A CN 108595563 A CN108595563 A CN 108595563A CN 201810328531 A CN201810328531 A CN 201810328531A CN 108595563 A CN108595563 A CN 108595563A
Authority
CN
China
Prior art keywords
data
metadata
field
quality management
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810328531.1A
Other languages
Chinese (zh)
Inventor
林秀丽
石坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201810328531.1A priority Critical patent/CN108595563A/en
Publication of CN108595563A publication Critical patent/CN108595563A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data quality management method and devices.Data quality management method includes:The object comprising data and information resources is described using metadata technique, obtains physics table metadata, the physics table metadata includes field and field attribute;Create data quality management rule base;According to the data quality management rule base created, one or more data quality management rules are configured for each field attribute;Multiple data quality management rule setting logical relations of respectively each field attribute configuration;The corresponding cycle of operation is arranged in respectively each field;Extract the field in obtained physics table metadata, the corresponding field attribute of field according to extraction, the data quality management rule and logical relation for obtaining field attribute mapping, audit to the quality of data according to the data quality management rule and logical relation obtained.With the application of the invention, data quality management efficiency can be promoted.

Description

A kind of data quality management method and device
Technical field
The present invention relates to quality of data technology, more particularly to a kind of data quality management method and device.
Background technology
With the rapid development of internet, the data volume on network is more and more, the big data that mass data is formed Storage and analysis are the treasures of each enterprise.Wherein, the data quality management of big data becomes the problem of enterprise must face, It is the premise of data storage and analysis.When carrying out data quality management, need to ensure the consistency of data, accuracy, complete Property, so that it is guaranteed that the quality of enterprise's various businesses data.
Currently, each element of characterize data quality, for example, the indexs such as consistency, accuracy, integrality, real-time compare It is relatively abstract, it is more difficult to which that guidance and implementation data quality management empirically carry out data sieve generally by manual type inspection It looks into.But in the case of big data quantity, it is nearly impossible by artificial screening, data quality management efficiency is extremely inefficient, And manual type is easy to omit and does not meet the data of quality requirement, it is high to further result in quality of data precision, to influence Subsequent analysis and data mining effect.Therefore, it is necessary to technically establish can with the definition of metric data quality and detection means, It promotes and realizes data quality management using information technology.
Invention content
In view of this, it is a primary object of the present invention to propose a kind of data quality management method and device, data are promoted Quality management efficiency.
In order to achieve the above objectives, the present invention provides a kind of data quality management methods, including:
The object comprising data and information resources is described using metadata technique, obtains physics table metadata, institute It includes field and field attribute to state physics table metadata;
Data quality management rule base is created, data quality management rule includes:Length check rule, non-empty check gauge Then, uniqueness checks that rule, main external key check that rule, consistency check rule, code check rule, duplicate data check rule Deng one kind or its it is arbitrary combine, the mapping relations of field attribute and data quality management rule are set, different field attributes, Identical or different data quality management rule is mapped, identical data quality management rule can map different field categories Property;
According to the data quality management rule base created, one or more data quality managements are configured for each field attribute Rule;
Multiple data quality management rule setting logical relations of respectively each field attribute configuration;
The corresponding cycle of operation is arranged in respectively each field;
The field in obtained physics table metadata is extracted, according to the corresponding field attribute of field of extraction, obtains the word The data quality management rule and logical relation of section attribute mapping, are closed according to the data quality management rule and logic obtained System audits to the quality of data.
Preferably, described to include for the one or more data quality management rules of each field attribute configuration:
Each field attribute in physics table metadata is selected, from data quality management rule base, inquires field attribute With the mapping relations of data quality management rule, several data quality managements rule of field attribute mapping is obtained.
Preferably, if a certain field is not met in the data quality management rule of the corresponding field attribute mapping of the field Any data quality management rule for meeting logical relation records the field and incongruent one or more quality of data pipes Reason rule, and preserve.
Preferably, the method further includes:
After auditing to the quality of data, summarize and generate quality of data report, quality of data report is divided into summary report With report details two parts.Summary report includes physics table name, the audit time started, the audit end time, audit total duration, total Audit data amount, abnormal data amount, qualification rate.Report details include the unsanctioned field name of every metadata, it is unsanctioned Data quality management rule.
Preferably, the method further includes:
The quality of data report of generation is sent to pre-set responsible person to carry out metadata amendment.
Preferably, the method further includes:
After auditing to the quality of data, the physics table metadata that audit passes through can also be stored, and according to maintenance The field corresponding cycle of operation in the life cycle or physics table metadata of each physical table obtains first number beyond the cycle of operation According to the metadata beyond the cycle of operation is pushed to related data sources person liable, and (i.e. initial submission exceeds first number of the cycle of operation According to person liable), confirmed by related data sources person liable and cancel this exceed the cycle of operation metadata, the metadata after calcellation It no longer will be audited and be shared to other operation systems;If confirmation is not cancelled, set for the metadata beyond the cycle of operation The mark for extending a cycle of operation is set, to be pushed again after in next cycle of operation.
In order to achieve the above objectives, the present invention also provides a kind of data quality management devices, including:Metadata describes mould Block, rule base creation module, rule configuration module, logical relation setup module, cycle of operation setup module and Audit Module, Wherein,
Metadata describing module, for the object comprising data and information resources to be described using metadata technique, Physics table metadata is obtained, the physics table metadata includes field and field attribute;
Rule base creation module, for creating data quality management rule base, data quality management rule includes:Length is examined Look into rule, non-empty checks rule, uniqueness checks that regular, main external key checks rule, consistency check rule, code check rule Then, duplicate data checks one kind or its arbitrary combination of rule etc., and the mapping of field attribute and data quality management rule is arranged Relationship, different field attributes map identical or different data quality management rule, and identical data quality management is regular, Different field attributes can be mapped;
Rule configuration module, for according to the data quality management rule base created, one to be configured for each field attribute Or multiple data quality management rules;
Logical relation setup module, multiple data quality management rule settings for respectively each field attribute configuration Logical relation;
The corresponding cycle of operation is arranged for respectively each field in cycle of operation setup module;
Audit Module, for extracting the field in obtained physics table metadata, the corresponding field of field according to extraction Attribute obtains the data quality management rule and logical relation of field attribute mapping, according to the data quality management obtained Rule and logical relation audit to the quality of data.
Preferably, the rule configuration module includes:Selecting unit, query unit and data quality management rule inventory Storage unit, wherein
Selecting unit, for selecting each field attribute in physics table metadata;
Query unit is used for from the data quality management rule base that data quality management rule base storage unit stores, The mapping relations for inquiring field attribute and data quality management rule obtain several data quality managements of field attribute mapping Rule.
Preferably, described device further includes:
Quality of data report generation module generates quality of data report for after auditing to the quality of data, summarizing, Quality of data report is divided into summary report and report details two parts.Summary report includes physics table name, the audit time started, examines Count end time, audit total duration, total Audit data amount, abnormal data amount, qualification rate.Report details include every metadata not By field name, unsanctioned data quality management rule.
Preferably, described device further includes:
Metadata maintenance module, the physical table passed through for audit after auditing to the quality of data, can also to be stored Metadata, and the field corresponding cycle of operation in the life cycle of each physical table according to maintenance or physics table metadata obtain It is (i.e. initial to submit to be pushed to related data sources person liable by the metadata beyond the cycle of operation for metadata beyond the cycle of operation The person liable of metadata beyond the cycle of operation), confirmed by related data sources person liable and cancels the member for exceeding the cycle of operation Data, the metadata after calcellation no longer will be audited and be shared to other operation systems;If confirmation is not cancelled, exceed for this The metadata setting of the cycle of operation extends the mark of a cycle of operation, to be pushed again after in next cycle of operation.
As seen from the above technical solutions, a kind of data quality management method provided by the invention and device, the quality of data Management method includes:The object comprising data and information resources is described using metadata technique, obtains physics list cell number According to the physics table metadata includes field and field attribute;Create data quality management rule base;According to the data created Quality management rule base configures one or more data quality management rules for each field attribute;Respectively each field category Property configuration multiple data quality management rule setting logical relations;The corresponding cycle of operation is arranged in respectively each field;It carries Field in the physics table metadata obtained obtains field attribute mapping according to the corresponding field attribute of field of extraction Data quality management rule and logical relation, according to obtain data quality management rule and logical relation to data matter Amount is audited.Data quality management efficiency can effectively be promoted.
Description of the drawings
Fig. 1 is data quality management method flow diagram of the present invention;
Fig. 2 is data quality management apparatus structure schematic diagram of the present invention.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, right below in conjunction with the accompanying drawings and the specific embodiments The present invention is described in further detail.
Fig. 1 is data quality management method flow diagram of the present invention.As shown in Figure 1, this method includes:
Step 101, the object comprising data and information resources is described using metadata technique, obtains physics list cell Data, the physics table metadata include field and field attribute;
In the present embodiment, the various types metadata of JDBC technical limit spacing databases, including database metadata, object are utilized Manage table metadata, field metadata.As an alternative embodiment, integrated data management (TDQM, the Total Data based on MIT Quality Management) technology, establish the data quality management flow of complete closed loop.
In the present embodiment, to realize data quality management to the tool of data, visualization, configurableization regulatory requirement, Customized development is avoided, data and information resources can be described using metadata technique.
Metadata (Metadata) is to describe the data (data about other data) of other data, in other words The structured data (structured data) of relevant information for providing information resources, is description information resource or data etc. The data of object can simply and efficiently manage a large amount of networked datas by metadata with identification information resource;It realizes Effective discovery of information resources is searched.For example, for objects such as government affairs category information resources, for what the object was described Metadata includes:Information resources title, information resources publication date, information resources abstract, information resources responsible party, information resources Format etc..Data are concluded and are classified, analyze physics table metadata of the characterization per a kind of data or information resources, The each section of physics table metadata includes one or more fields, and field is identified with field attribute, a field attribute, May include multiple fields, multiple physics table metadata composition data libraries metadata, the field attribute in physics table metadata is mutual It differs, by taking government affairs category information resource object as an example, field attribute includes:Information resources title, information resources publication date, letter Resource abstract, information resources responsible party, information resources format are ceased, the particular content for including in field attribute is field.
In the present embodiment, as an alternative embodiment, database metadata can be the corresponding financial data of financial system Library, database metadata include:The information such as physics table metadata, version number, user name, address, wherein
In the present embodiment, as an alternative embodiment, physics table metadata is divided into:Mechanism table metadata, personnel's list cell number According to, order table metadata, including:Field table name, field and field attribute, wherein field table name refers to institution table Metadata, personnel's table metadata or order table metadata are identified with mechanism ID, personnel ID and order ID, word respectively Section attribute may include:Whether field identification field length, is the information such as sky, default value.Wherein, field identification is for characterizing The type of field, for example, field identification can be Mobile Directory Number, fixed telephone number, QQ number, identification card number etc..
Step 102, data quality management rule base is created;
In the present embodiment, as an alternative embodiment, each field attribute can be analyzed, be created according to field attribute Data quality management rule base, the data quality management rule base of establishment, can as one for defining data quality management rule The embodiment, data quality management rule is selected to include:Length check rule, non-empty check that rule, uniqueness check rule, main external key Check that regular, consistency check rule, code check rule, duplicate data check one kind or its arbitrary combination of rule etc., from And meet data quality management and the various requirement of inspection.Wherein,
Length check rule, for checking metadata length;For example, being cell-phone number code element number for field attribute According to data length must be 11;
In the present embodiment, the mapping relations of field attribute and length check rule are set, and different field attributes, mapping is not Same length check rule, identical length check rule, can map different field attributes.For example, for phone number The length check rule of field attribute (field identification), mapping is 11, for fixed telephone number field attribute, the length of mapping Degree checks that rule is 7 or 8, and for ZIP code field attribute, the length check rule of mapping is 6 etc..
Non-empty checks rule, for carrying out non-empty check to metadata;For example, being the member to open a bank account for field identification Data, it is desirable that ID card No. must fill out;
Uniqueness checks rule, for carrying out uniqueness inspection to tentation data in metadata;For example, being registered for website Metadata, it is desirable that phone number must be unique;
Main external key checks rule, for checking making a reservation for main external key in metadata;For example, for personnel's list cell number According to the mechanism id field of person chart must be the external key of the major key (id field) of institution table;
Consistency check rule, for carrying out consistency check to metadata;For example, being identification card number for field identification The metadata of code, the date of birth data item parts in metadata (field) must be identical as birthday data item.
Code check rule, checks for the code regulation to metadata;For example, being mailbox for field identification Metadata must satisfy:“^[A-Za-z0-9\u4e00-\u9fa5]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+ $”。
Whether duplicate data checks rule, for repeating to check to the metadata in field;As an optional implementation , in the present embodiment, using Jaro-Winkler algorithms, minimum similarity degree is 0.8 progress metadata re-scheduling.For example, can lead to The combination for crossing the data item such as " hospital name ", " address ", " grade " show that " Chengdu institute of traditional Chinese medicine " and " Chengdu institute of traditional Chinese medicine " are two Set of metadata of similar data or metadata.
In the present embodiment, the field category similar with the mapping relations of length check rule, different with setting field attribute Property, can map different non-empty check rule, uniqueness check rule, main external key check rule, consistency check rule with And code check rule, duplicate data check that rule is suitable for all metadata.
Step 103, according to the data quality management rule base created, one or more data are configured for each field attribute Quality management rule;
In the present embodiment, several quality of data pipes are respectively configured for the different field attribute in different physics table metadatas Reason rule.
In the present embodiment, as an alternative embodiment, corresponding physics table metadata is selected from database metadata;Choosing Each field attribute in physics table metadata is selected, from data quality management rule base, inquires field attribute and the quality of data The mapping relations of management rule obtain several (one or more) data quality managements rule of field attribute mapping.For example, For the phone number code check rule in the phone number field configuration code check rule of person chart, be ID card No. and ID card No. code check rule, ID card No. and date of birth one in date of birth field configuration code check rule Cause property checks rule.
Step 104, multiple data quality management rule setting logical relations of respectively each field attribute configuration;
In the present embodiment, as an alternative embodiment, based on the requirement to the quality of data, data quality management is chosen in configuration The logical relation of rule, in the present embodiment, logical relation support with or exclusive or etc. combine.For example, for contact method field Attribute allows in phone number field, fixed-line telephone field the two fields there are one field to be sky, then can be with configurating mobile number Code field mapping non-empty check rule with fixed-line telephone field mapping non-empty check rule be logic or relationship.
Step 105, the corresponding cycle of operation is arranged in respectively each field;
In the present embodiment, as an alternative embodiment, the cycle of operation includes:Second, point, when, day, week, the multiple types such as the moon.
Step 106, the field in the physics table metadata extracted, the corresponding field attribute of field of foundation extraction, Obtain the field attribute mapping data quality management rule and logical relation, according to obtain data quality management rule with And logical relation audits to the quality of data.
In the present embodiment, the quality of data check and is checked:According to the data quality management of configuration rule, to physical table The quality of data in metadata is audited, if a certain field does not meet the data matter of the corresponding field attribute mapping of the field Any data quality management rule for meeting logical relation in amount management rule records the field and one or more incongruent A data quality management rule, and preserve, in order to subsequent analysis and processing.
In the present embodiment, the field in the physics table metadata that extraction obtains can first inquire the corresponding fortune of the field The row period, if the step of field within the cycle of operation, executes field attribute corresponding according to the field of extraction, if the word Section then deletes the field not within the cycle of operation, next field in the physics table metadata extracted, then executes inquiry It the step of next field corresponding cycle of operation, so recycles, until the last field in physics table metadata.
In the present embodiment, since enterprise business data magnitude is all millions or more, as an alternative embodiment, Ke Yishe It sets multiple servers and carries out data quality management, and use MAP/REDUCE models, it would be desirable to the physics table metadata root of audit Piecemeal is carried out according to server performance height, if for example, server performance is preferable, even if load weight, is dispensed into point of the server The physical table amount of metadata of block is also larger, and to effectively reduce the configuration quantity of server, each server is according to the piecemeal pair received Metadata is audited, and after the completion of audit, auditing result is uploaded and is summarized with the physics table metadata to same enterprise, As an alternative embodiment, an enterprise corresponds to a physics table metadata.
In the present embodiment, as an alternative embodiment, data quality management rule base can be updated, including but not It is limited to:Newly-increased data quality management rule deletes data quality management is regular, changes data quality management rule etc..
In the present embodiment, as another alternative embodiment, after auditing to the quality of data, generation number can also be summarized According to quality report.Quality of data report is divided into summary report and report details two parts.Summary report includes physics table name, audit Time started, audit end time, audit total duration, total Audit data amount, abnormal data amount, qualification rate.Report that details include The unsanctioned field name of every metadata, unsanctioned data quality management rule.
In the present embodiment, as yet another alternative embodiment, the quality of data of generation can also be reported to be sent to setting in advance The responsible person set is to carry out metadata amendment.For example, all metadata quality problems are distributed according to data source person liable, number Problematic metadata is handled according to source person liable, and is submitted again, to execute the physics table metadata that extraction obtains In field the step of so that each data source person liable is come the problem that understanding oneself responsible data.
In the present embodiment, as yet another alternative embodiment, after data source person liable handles problematic metadata, Modification treated metadata is preserved using slow change dimension, i.e., is retained by the form of data line and record each and repaiied The metadata changed changes track to retain metadata.
In the present embodiment, as yet another alternative embodiment, after auditing to the quality of data, it is logical that audit can also be stored The physics table metadata crossed, and the corresponding operation of field in the life cycle of each physical table according to maintenance or physics table metadata Period obtains the metadata beyond the cycle of operation, the metadata beyond the cycle of operation is pushed to related data sources person liable (i.e. The initial person liable for submitting the metadata beyond the cycle of operation), confirmed by related data sources person liable and cancels this beyond operation The metadata in period, the metadata after calcellation no longer will be audited and be shared to other operation systems;If confirmation is not cancelled, The mark for extending a cycle of operation for the metadata setting beyond the cycle of operation, to be pushed away again after in next cycle of operation It send.
Data quality management method of the present invention, by establishing complete data quality management process, applicability is very strong, and Perfect quality problems are set and solve shared mechanism, realizes the end-to-end closed loop management of the quality of data, improves quality of data pipe Manage efficiency.Further, data quality management rule base configuration flexibility is strong, supports flexible expansion.It can be to disparate databases The data quality management rule and logical relation of the different field of table carry out flexible configuration, and come into force in real time, support the industry of enterprise Business diversity.Moreover, being developed based on data type metadata, Sybase, Informix, Oracle, DB2, SQL can be supported A variety of heterogeneous databases such as Server, Mysql.
Fig. 2 is data quality management apparatus structure schematic diagram of the present invention.As shown in Fig. 2, the device includes:Metadata describes Module 21, rule base creation module 22, rule configuration module 23, logical relation setup module 24, cycle of operation setup module 25 And Audit Module 26, wherein
Metadata describing module 21, for being retouched to the object comprising data and information resources using metadata technique It states, obtains physics table metadata, the physics table metadata includes field and field attribute;
In the present embodiment, data are concluded and are classified, analyzes object of the characterization per a kind of data or information resources Table metadata is managed, each section of physics table metadata includes one or more fields, and field is identified with field attribute, and one A field attribute may include multiple fields, multiple physics table metadata composition data libraries metadata, in physics table metadata Field attribute is different, and by taking government affairs category information resource object as an example, field attribute includes:Information resources title, information resources Publication date, information resources abstract, information resources responsible party, information resources format, the particular content for including in field attribute are Field.
Rule base creation module 22, for creating data quality management rule base, data quality management rule includes:Length Check that rule, non-empty check that rule, uniqueness check that rule, main external key check rule, consistency check rule, code check rule Then, duplicate data checks one kind or its arbitrary combination of rule etc., and the mapping of field attribute and data quality management rule is arranged Relationship, different field attributes map identical or different data quality management rule, and identical data quality management is regular, Different field attributes can be mapped;
In the present embodiment, the mapping relations of field attribute and data quality management rule are set, different field attributes reflects Different data quality management rules is penetrated, identical data quality management rule can map different field attributes.
Rule configuration module 23, for according to the data quality management rule base created, one to be configured for each field attribute A or multiple data quality management rules;
Logical relation setup module 24, multiple data quality management rules for being respectively each field attribute configuration are set Set logical relation;
In the present embodiment, as an alternative embodiment, logical relation includes:With or exclusive or etc. combine.
The corresponding cycle of operation is arranged for respectively each field in cycle of operation setup module 25;
In the present embodiment, as an alternative embodiment, the cycle of operation includes:Second, point, when, day, week, the multiple types such as the moon.
Audit Module 26, for extracting the field in obtained physics table metadata, the corresponding word of field according to extraction Section attribute obtains the data quality management rule and logical relation of field attribute mapping, according to the quality of data pipe obtained Reason rule and logical relation audit to the quality of data.
In the present embodiment, the quality of data check and is checked:According to the data quality management of configuration rule, to physical table The quality of data in metadata is audited, if a certain field does not meet the data matter of the corresponding field attribute mapping of the field Any data quality management rule for meeting logical relation in amount management rule records the field and one or more incongruent A data quality management rule, and preserve, in order to subsequent analysis and processing.
In the present embodiment, the field in the physics table metadata that extraction obtains can first inquire the corresponding fortune of the field The row period, if the step of field within the cycle of operation, executes field attribute corresponding according to the field of extraction, if the word Section then deletes the field not within the cycle of operation, next field in the physics table metadata extracted, then executes inquiry It the step of next field corresponding cycle of operation, so recycles, until the last field in physics table metadata.
In the present embodiment, as an alternative embodiment, data quality management rule base can be updated, including but not It is limited to:Newly-increased data quality management rule deletes data quality management is regular, changes data quality management rule etc..
In the present embodiment, as an alternative embodiment, rule configuration module 23 includes:Selecting unit, query unit and Data quality management rule base storage unit (not shown), wherein
Selecting unit, for selecting each field attribute in physics table metadata;
Query unit is used for from the data quality management rule base that data quality management rule base storage unit stores, The mapping relations for inquiring field attribute and data quality management rule obtain several data quality managements of field attribute mapping Rule.
In the present embodiment, as an alternative embodiment, which further includes:
Quality of data report generation module (not shown), for after auditing to the quality of data, summarizing generation The quality of data reports that quality of data report is divided into summary report and report details two parts.Summary report includes physics table name, examines Count time started, audit end time, audit total duration, total Audit data amount, abnormal data amount, qualification rate.Report details packet Include the unsanctioned field name of every metadata, unsanctioned data quality management rule.
In the present embodiment, as yet another alternative embodiment, the quality of data of generation can also be reported to be sent to setting in advance The responsible person set is to carry out metadata amendment.For example, all metadata quality problems are distributed according to data source person liable, number Problematic metadata is handled according to source person liable, and is submitted again, to execute the physics table metadata that extraction obtains In field the step of so that each data source person liable is come the problem that understanding oneself responsible data.
In the present embodiment, as yet another alternative embodiment, after data source person liable handles problematic metadata, Modification treated metadata is preserved using slow change dimension, i.e., is retained by the form of data line and record each and repaiied The metadata changed changes track to retain metadata.
In the present embodiment, as yet another alternative embodiment, which further includes:
Metadata maintenance module (not shown), for after auditing to the quality of data, audit can also to be stored By physics table metadata, and the corresponding fortune of field in the life cycle of each physical table according to maintenance or physics table metadata The row period obtains the metadata beyond the cycle of operation, the metadata beyond the cycle of operation is pushed to related data sources person liable (the i.e. initial person liable for submitting the metadata beyond the cycle of operation), is confirmed by related data sources person liable and cancels this beyond fortune The metadata in row period, the metadata after calcellation no longer will be audited and be shared to other operation systems;If confirmation is not cancelled, It is then the mark that the metadata setting beyond the cycle of operation extends a cycle of operation, to be carried out again after in next cycle of operation Push.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, made by any modification, equivalent replacement and improvement etc., should be included in the guarantor of the present invention Within the scope of shield.

Claims (10)

1. a kind of data quality management method, which is characterized in that including:
The object comprising data and information resources is described using metadata technique, obtains physics table metadata, the object It includes field and field attribute to manage table metadata;
Data quality management rule base is created, data quality management rule includes:Length check is regular, non-empty checks rule, only One property checks that rule, main external key check that rule, consistency check rule, code check rule, duplicate data check rule etc. A kind of or its arbitrary combination, is arranged the mapping relations of field attribute and data quality management rule, different field attributes, mapping Identical or different data quality management rule, identical data quality management rule, can map different field attributes;
According to the data quality management rule base created, one or more data quality management rule are configured for each field attribute Then;
Multiple data quality management rule setting logical relations of respectively each field attribute configuration;
The corresponding cycle of operation is arranged in respectively each field;
The field in obtained physics table metadata is extracted, according to the corresponding field attribute of field of extraction, obtains the field category Property mapping data quality management rule and logical relation, according to obtain data quality management rule and logical relation pair The quality of data is audited.
2. the method as described in claim 1, which is characterized in that described to configure one or more data matter for each field attribute Measuring management rule includes:
Each field attribute in physics table metadata is selected, from data quality management rule base, inquires field attribute and number According to the mapping relations of quality management rule, several data quality managements rule of field attribute mapping is obtained.
3. the method as described in claim 1, which is characterized in that if a certain field does not meet the corresponding field attribute of the field Any data quality management rule for meeting logical relation in the data quality management rule of mapping, records the field and is not inconsistent One or more data quality managements rule of conjunction, and preserve.
4. method as described in any one of claims 1 to 3, which is characterized in that the method further includes:
After auditing to the quality of data, summarize and generate quality of data report, quality of data report is divided into summary report and report Accuse details two parts.Summary report includes physics table name, audit time started, audit end time, audit total duration, total audit Data volume, abnormal data amount, qualification rate.Report that details include the unsanctioned field name of every metadata, unsanctioned data Quality management rule.
5. method as described in any one of claims 1 to 3, which is characterized in that the method further includes:
The quality of data report of generation is sent to pre-set responsible person to carry out metadata amendment.
6. method as described in any one of claims 1 to 3, which is characterized in that the method further includes:
After auditing to the quality of data, the physics table metadata that audit passes through can also be stored, and according to each object of maintenance The field corresponding cycle of operation in the life cycle or physics table metadata of reason table, the metadata beyond the cycle of operation is obtained, it will Metadata beyond the cycle of operation is pushed to related data sources person liable (the i.e. initial duty for submitting the metadata beyond the cycle of operation Let people), confirmed by related data sources person liable and cancelled the metadata for exceeding the cycle of operation, the metadata after calcellation will no longer Audited and shared to other operation systems;If confirmation is not cancelled, it is arranged for the metadata beyond the cycle of operation and extends The mark of one cycle of operation, to be pushed again after in next cycle of operation.
7. a kind of data quality management device, which is characterized in that including:Metadata describing module, rule base creation module, rule Configuration module, logical relation setup module, cycle of operation setup module and Audit Module, wherein
Metadata describing module is obtained for the object comprising data and information resources to be described using metadata technique Physics table metadata, the physics table metadata include field and field attribute;
Rule base creation module, for creating data quality management rule base, data quality management rule includes:Length check is advised Then, non-empty checks that regular, uniqueness inspection rule, main external key check that rule, consistency check are regular, code check is regular, again Complex data checks one kind or its arbitrary combination of rule etc., and the mapping relations of field attribute and data quality management rule are arranged, Different field attributes maps identical or different data quality management rule, identical data quality management rule, Ke Yiying Penetrate different field attributes;
Rule configuration module, for according to the data quality management rule base created, one or more to be configured for each field attribute A data quality management rule;
Logical relation setup module, multiple data quality management rule setting logics for respectively each field attribute configuration Relationship;
The corresponding cycle of operation is arranged for respectively each field in cycle of operation setup module;
Audit Module, for extracting the field in obtained physics table metadata, according to the corresponding field attribute of field of extraction, Obtain the field attribute mapping data quality management rule and logical relation, according to obtain data quality management rule with And logical relation audits to the quality of data.
8. device as claimed in claim 7, which is characterized in that the rule configuration module includes:Selecting unit, query unit And data quality management rule base storage unit, wherein
Selecting unit, for selecting each field attribute in physics table metadata;
Query unit, for from the data quality management rule base that data quality management rule base storage unit stores, inquiring The mapping relations of field attribute and data quality management rule obtain several data quality managements rule of field attribute mapping Then.
9. device as claimed in claim 7, which is characterized in that described device further includes:
Quality of data report generation module generates quality of data report, data for after auditing to the quality of data, summarizing Quality report is divided into summary report and report details two parts.Summary report includes physics table name, audit time started, audit knot Beam time, audit total duration, total Audit data amount, abnormal data amount, qualification rate.Report that details include that every metadata does not pass through Field name, unsanctioned data quality management rule.
10. device as claimed in claim 7, which is characterized in that described device further includes:
Metadata maintenance module, the physics list cell number passed through for audit after auditing to the quality of data, can also to be stored According to, and the field corresponding cycle of operation in the life cycle of each physical table according to maintenance or physics table metadata, acquisition exceed The metadata of the cycle of operation, the metadata beyond the cycle of operation is pushed to related data sources person liable, and (i.e. initial submit exceeds The person liable of the metadata of the cycle of operation), confirmed by related data sources person liable and cancelled the metadata for exceeding the cycle of operation, Metadata after calcellation no longer will be audited and be shared to other operation systems;If confirmation is not cancelled, exceed operation for this The metadata setting in period extends the mark of a cycle of operation, to be pushed again after in next cycle of operation.
CN201810328531.1A 2018-04-13 2018-04-13 A kind of data quality management method and device Pending CN108595563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810328531.1A CN108595563A (en) 2018-04-13 2018-04-13 A kind of data quality management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810328531.1A CN108595563A (en) 2018-04-13 2018-04-13 A kind of data quality management method and device

Publications (1)

Publication Number Publication Date
CN108595563A true CN108595563A (en) 2018-09-28

Family

ID=63622140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810328531.1A Pending CN108595563A (en) 2018-04-13 2018-04-13 A kind of data quality management method and device

Country Status (1)

Country Link
CN (1) CN108595563A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542886A (en) * 2018-11-23 2019-03-29 山东浪潮云信息技术有限公司 A kind of data quality checking method of Government data
CN109783482A (en) * 2018-12-28 2019-05-21 远光软件股份有限公司 A kind of data violation monitoring method and device
CN109903149A (en) * 2019-04-16 2019-06-18 北京国电通网络技术有限公司 Generation method and generation device of audit model, audit method and audit system
CN109933578A (en) * 2019-03-21 2019-06-25 浪潮软件集团有限公司 A kind of configurable automated data detection method for quality and system
CN109992576A (en) * 2019-03-01 2019-07-09 苏州龙石信息科技有限公司 A kind of government data quality evaluation and abnormal data recovery technique based on big data technology
CN110019566A (en) * 2019-03-13 2019-07-16 平安信托有限责任公司 Data checking, device, computer equipment and storage medium based on data warehouse
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件***有限公司 Data quality detection method and device
CN111061733A (en) * 2019-12-10 2020-04-24 北京明略软件***有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN111125075A (en) * 2019-12-17 2020-05-08 国网天津市电力公司电力科学研究院 Data management method and system for non-computable region
CN111597177A (en) * 2020-05-14 2020-08-28 重庆农村商业银行股份有限公司 Data governance method for improving data quality
CN111639077A (en) * 2020-05-15 2020-09-08 杭州数梦工场科技有限公司 Data management method and device, electronic equipment and storage medium
CN112182507A (en) * 2020-09-16 2021-01-05 支付宝(杭州)信息技术有限公司 Data quality measuring method, device and equipment
CN112667618A (en) * 2020-12-30 2021-04-16 湖南长城医疗科技有限公司 Public area sanitation platform quality control system and method
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
CN113792033A (en) * 2021-08-12 2021-12-14 北京中交兴路信息科技有限公司 Spark-based data quality checking method and device, storage medium and terminal
CN115292297A (en) * 2022-06-29 2022-11-04 江苏昆山农村商业银行股份有限公司 Method and system for constructing data quality monitoring rule of data warehouse

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246530A1 (en) * 2010-03-31 2011-10-06 Geoffrey Malafsky Method and System for Semantically Unifying Data
CN102571403A (en) * 2010-12-31 2012-07-11 北京亿阳信通软件研究院有限公司 Realization method and device for general data quality control adapter
CN103699693A (en) * 2014-01-10 2014-04-02 中国南方电网有限责任公司 Metadata-based data quality management method and system
CN105512283A (en) * 2015-12-04 2016-04-20 国网江西省电力公司信息通信分公司 Data quality management and control method and device
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246530A1 (en) * 2010-03-31 2011-10-06 Geoffrey Malafsky Method and System for Semantically Unifying Data
CN102571403A (en) * 2010-12-31 2012-07-11 北京亿阳信通软件研究院有限公司 Realization method and device for general data quality control adapter
CN103699693A (en) * 2014-01-10 2014-04-02 中国南方电网有限责任公司 Metadata-based data quality management method and system
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
CN105512283A (en) * 2015-12-04 2016-04-20 国网江西省电力公司信息通信分公司 Data quality management and control method and device

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542886A (en) * 2018-11-23 2019-03-29 山东浪潮云信息技术有限公司 A kind of data quality checking method of Government data
CN109783482A (en) * 2018-12-28 2019-05-21 远光软件股份有限公司 A kind of data violation monitoring method and device
CN109783482B (en) * 2018-12-28 2021-08-17 远光软件股份有限公司 Data violation monitoring method and device
CN109992576A (en) * 2019-03-01 2019-07-09 苏州龙石信息科技有限公司 A kind of government data quality evaluation and abnormal data recovery technique based on big data technology
CN110019566A (en) * 2019-03-13 2019-07-16 平安信托有限责任公司 Data checking, device, computer equipment and storage medium based on data warehouse
CN109933578A (en) * 2019-03-21 2019-06-25 浪潮软件集团有限公司 A kind of configurable automated data detection method for quality and system
CN109903149A (en) * 2019-04-16 2019-06-18 北京国电通网络技术有限公司 Generation method and generation device of audit model, audit method and audit system
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件***有限公司 Data quality detection method and device
CN111061733A (en) * 2019-12-10 2020-04-24 北京明略软件***有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN111061733B (en) * 2019-12-10 2024-01-19 北京明略软件***有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN111125075A (en) * 2019-12-17 2020-05-08 国网天津市电力公司电力科学研究院 Data management method and system for non-computable region
CN113127482B (en) * 2019-12-31 2024-03-26 奇安信科技集团股份有限公司 Data quality analysis method, device, computer equipment and storage medium
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
CN111597177A (en) * 2020-05-14 2020-08-28 重庆农村商业银行股份有限公司 Data governance method for improving data quality
CN111639077A (en) * 2020-05-15 2020-09-08 杭州数梦工场科技有限公司 Data management method and device, electronic equipment and storage medium
CN111639077B (en) * 2020-05-15 2024-03-22 杭州数梦工场科技有限公司 Data management method, device, electronic equipment and storage medium
CN112182507A (en) * 2020-09-16 2021-01-05 支付宝(杭州)信息技术有限公司 Data quality measuring method, device and equipment
CN112182507B (en) * 2020-09-16 2024-04-19 支付宝(杭州)信息技术有限公司 Data quality measurement method, device and equipment
CN112667618B (en) * 2020-12-30 2023-06-06 湖南长城医疗科技有限公司 Public area sanitary platform quality control system and method
CN112667618A (en) * 2020-12-30 2021-04-16 湖南长城医疗科技有限公司 Public area sanitation platform quality control system and method
CN113792033A (en) * 2021-08-12 2021-12-14 北京中交兴路信息科技有限公司 Spark-based data quality checking method and device, storage medium and terminal
CN115292297A (en) * 2022-06-29 2022-11-04 江苏昆山农村商业银行股份有限公司 Method and system for constructing data quality monitoring rule of data warehouse
CN115292297B (en) * 2022-06-29 2024-02-02 江苏昆山农村商业银行股份有限公司 Method and system for constructing data quality monitoring rule of data warehouse

Similar Documents

Publication Publication Date Title
CN108595563A (en) A kind of data quality management method and device
US10769159B2 (en) Systems and methods for data mining of historic electronic communication exchanges to identify relationships, patterns, and correlations to deal outcomes
US10339038B1 (en) Method and system for generating production data pattern driven test data
CN105765559B (en) Interactive case management system
US8769708B2 (en) Privileged document identification and classification system
US8799240B2 (en) System and method for investigating large amounts of data
CN105447184B (en) Information extraction method and device
US8612249B2 (en) Systems and methods for managing regulatory information
CN106682097A (en) Method and device for processing log data
US11775498B1 (en) Accelerated system and method for providing data correction
CN111382956A (en) Enterprise group relationship mining method and device
CN106682096A (en) Method and device for log data management
CN110178128A (en) It is indicated using the bitmap of optimization to manage extensive incidence set
US20110145005A1 (en) Method and system for automatic business content discovery
CN109657914A (en) Information-pushing method, device, computer equipment and storage medium
CN106708965A (en) Data processing method and apparatus
CN107748753A (en) It is a kind of based on double random extraction systems, method and device
CN107679977A (en) A kind of tax administration platform and implementation method based on semantic analysis
CN102855290A (en) Knowledge management method for mobile Internet
CN111415067A (en) Enterprise and personal credit rating system
US10430413B2 (en) Data information framework
RU47116U1 (en) DISTRIBUTED DOCUMENT CIRCUIT SUPPORT SYSTEM
CN113495978B (en) Data retrieval method and device
Ponomareva et al. Date preparation module of automated metallurgical products production system
JP2019537171A (en) System and method for efficiently delivering warning messages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Lin Xiuli

Document name: Notification of Publication and of Entering the Substantive Examination Stage of the Application for Invention

WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180928