CN105808604A - Data compliance management method and system - Google Patents

Data compliance management method and system Download PDF

Info

Publication number
CN105808604A
CN105808604A CN201410854455.XA CN201410854455A CN105808604A CN 105808604 A CN105808604 A CN 105808604A CN 201410854455 A CN201410854455 A CN 201410854455A CN 105808604 A CN105808604 A CN 105808604A
Authority
CN
China
Prior art keywords
data
close
dirty
rule set
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410854455.XA
Other languages
Chinese (zh)
Other versions
CN105808604B (en
Inventor
童廷洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201410854455.XA priority Critical patent/CN105808604B/en
Publication of CN105808604A publication Critical patent/CN105808604A/en
Application granted granted Critical
Publication of CN105808604B publication Critical patent/CN105808604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a data compliance management method and system, so as to solve the problem of being unable to timely and accurately dirty data in the prior art. The method comprises: firstly, analyzing the category of data, and categorizing the data into a corresponding data compliance set; and further, searching and recognizing data in the data compliance set, so as to recognize dirty data, so as to further modify the dirty data. By the data compliance management method disclosed by the present invention, the dirty data in a network database is processed, and the dirty data is converted into data that meets a data quality requirement by data compliance, so that data exchange between data nodes is achieved, data quality is ensured, the processing period of data compliance is shortened, the efficiency of data compliance is improved, and meanwhile the consistency of data systems is effectively ensured.

Description

Data are closed and are regulated reason method and system
Technical field
The invention belongs to data quality management technical field, be specifically related to a kind of data conjunction and regulate reason method and system.
Background technology
Along with development and the extensive use of network technology, the quantity of network data is more and more huger, and the task of data management is also more and more heavier.In the data of substantial amounts, except normal data, there is also a kind of data in origin system not in given range and/or practical business is meaningless and/or form illegal and/or there is nonstandard coding and/or there is the data of the defects such as ambiguous service logic, i.e. dirty data.Dirty data is a kind of undesirable data, and the practical application of system is meaningless.
Such as, in electronic government affairs system application domain exist dirty data, cause electronic government affairs system cannot be carried out produce, cannot query and search and data locking, accurate data etc. cannot be browsed.
Accordingly, it would be desirable to the compliance of data is managed, namely dirty data is processed, become satisfactory data, be i.e. the conjunction ruleization of data.It is be converted to dirty data or replace with to meet the data that the quality of data requires that data close ruleization.Prior art processes typically by dirty data is abandoned, returns or zero setting etc., it is achieved the conjunction ruleization of data.But, abandon, return or dirty data and data of the same period thereof after zero setting are required for again uploading, therefore, this method can bring extra work amount, and the process of data needs the cycle long, and work efficiency is low.
Summary of the invention
The embodiment provides one data conjunction and regulate reason method and system, to carry out the dirty data in network data closing ruleization process so that it is be converted to and meet the data that the quality of data requires, and improve data conjunction ruleization and treatment effeciency.
According to an aspect of the invention, it is provided a kind of data close rule management method, described method includes:
Data are carried out category analysis, described data is included into corresponding data and closes in rule set;
Close from described data respectively and rule set is searched for and identifies dirty data;
Revise described dirty data.
In such scheme, described method also includes: set up first data close rule set, second data close rule set, the 3rd data close rule set and the 4th data close rule set;Wherein,
Described first data close the set of the curing data that rule set is the restriction of constrained condition, described second data close the set of the data revised that rule set is the restriction of constrained condition, described 3rd data close the set that rule set is status attribute data, and described 4th data close the set that rule set is unstructured data.
In such scheme, described data are carried out category analysis, described data are included into corresponding data and close in rule set, including:
If the curing data that described data are the restriction of constrained condition, then described data are included into described first data and close rule set;
If the data revised that described data are the restriction of constrained condition, then described data are included into described second data and close rule set;
If described data are status attribute data, then described data are included into described 3rd data and close rule set;
If described data are unstructured data, then described data are included into described 4th data and close rule set.
In such scheme, described conjunction from described data respectively is searched for rule set and identifies dirty data, including:
Closing from described data advises set, and search has the data of respective attributes and enters corresponding conjunction rule queue;
Extract from default constraint set and close, with described, the constraints that rule queue is corresponding;
Judge whether the described data closed in rule queue meet the described constraints extracted;
When constraints described in described data fit, then it is qualified data by described data markers, when described data do not meet described constraints, is then dirty data by described data markers.
In such scheme, described correction dirty data, including:
When the curing data that described dirty data is the restriction of constrained condition, described curing data is included into and manually revises data queue, after manually revising, be labeled as qualified data;
When the data revised that described dirty data is the restriction of constrained condition, the data revised described constrained condition limited are modified according to constraints and basic data item, are labeled as qualified data after correction;
When described data are status attribute data, it is default data by described status attribute data correction, and is qualified data by revised data markers;
When described dirty data is unstructured data, adopts compression algorithm that described unstructured data is modified, after correction, be labeled as qualified data.
According to another aspect of the present invention, additionally providing a kind of data conjunction and regulate reason system, described system includes: category analysis module, dirty data identification module, dirty data correcting module;Wherein,
Described category analysis module for carrying out category analysis to data, and closes in rule set for described data are included into corresponding data;
Described dirty data identification module is connected with category analysis module, searches for for closing from described data respectively and identifies dirty data rule set;
Described dirty data correcting module is connected with dirty data identification module, is used for revising described dirty data.
In such scheme, described system also includes: data are closed rule set and set up module, is used for setting up that the first data close rule set, the second data close rule set, the 3rd data close rule set and the 4th data close rule set;Wherein, described first data close the set of the curing data that rule set is the restriction of constrained condition, described second data close the set of the data revised that rule set is the restriction of constrained condition, described 3rd data close the set that rule set is status attribute data, and described 4th data close the set that rule set is unstructured data.
In such scheme, described category analysis module is set up module with data conjunction rule set and is connected, and described category analysis module is further used for:
When the curing data that described data are the restriction of constrained condition, then described data are included into described first data and close rule set;
When the data revised that described data are the restriction of constrained condition, then described data are included into described second data and close rule set;
When described data are status attribute data, then described data are included into described 3rd data and close rule set;
When described data are unstructured data, then described data are included into described 4th data and close rule set.
In such scheme, described dirty data identification module farther includes: searches for submodule, judge submodule, labelling submodule, constraint set zygote module;Wherein,
Described search submodule is advised set for closing from described data, and search has the data of respective attributes and enters corresponding conjunction rule queue;
Described constraint set zygote module is for presetting constraints and storing the constraint set being made up of constraints;
Described judgement submodule is connected with described constraint set zygote module and search submodule, close, with described, the constraints that rule queue is corresponding for extracting from described constraint set zygote module, and judge whether the described data closed in rule queue meet the described constraints extracted;
Described labelling submodule is connected with described judgement submodule, for to judging that the data that submodule judges carry out labelling, when constraints described in described data fit, then it is qualified data by described data markers, when described data do not meet described constraints, then it is dirty data by described data markers.
In such scheme, described dirty data correcting module farther includes: data classification submodule, the first correction submodule, the second correction submodule, the 3rd correction submodule, the 4th correction submodule;Wherein,
Described data classification submodule, for described dirty data is classified, when the curing data that described dirty data is the restriction of constrained condition, sends the data to the first correction submodule;When the data revised that described dirty data is the restriction of constrained condition, send the data to the second correction submodule;When described data are status attribute data, send the data to the 3rd correction submodule;When described dirty data is unstructured data, sends the data to the described 4th and revise submodule;
Described first revises submodule, is used for storing and manually revises data queue, and provides the artificial interface revised, and will be qualified data through artificial revised data markers;
Described second revises submodule, is modified according to constraints and basic data item for the data revised described constrained condition limited, is labeled as qualified data after correction;
Described 3rd revises submodule, and being used for described status attribute data correction is default data, and is qualified data by revised data markers;
Described 4th revises submodule, is used for adopting compression algorithm that described unstructured data is modified, is labeled as qualified data after correction.
By the technical scheme of the above embodiment of the present invention it can be seen that the data that the embodiment of the present invention provides close rule management method, first pass through the classification to data and be analyzed, described data are included into corresponding data and close in rule set;The compliance of data is analyzed, formulates data and close rule set;Closing in data further and data are scanned for by rule set and identifies, thus identifying dirty data, further revising dirty data.Rule management method is closed by the data described in the embodiment of the present invention, dirty data in network data base is processed, become satisfactory data, close ruleization by data and dirty data is converted to the data meeting quality of data requirement, achieve the data interchange between back end, ensure that the quality of data, shorten data and close the process cycle of ruleization, improve data and close the efficiency of ruleization.Meanwhile, data system closes rule set according to data and classifies accordingly, and reusable, integral moving or transplanting, grafting are on other architectural frameworks, have been effectively ensured the concordance of data system.
Accompanying drawing explanation
Reason Method And Principle schematic diagram is regulated in the data conjunction that Fig. 1 is the embodiment of the present invention;
Reason method flow schematic diagram is regulated in the data conjunction that Fig. 2 is the embodiment of the present invention;
Fig. 3 is the preferred flow schematic diagram of step S1 shown in Fig. 2;
Fig. 4 is the preferred flow schematic diagram of step S2 shown in Fig. 2;
Fig. 5 is the preferred flow schematic diagram of step S3 shown in Fig. 2;
Reason system structure schematic diagram is regulated in the data conjunction that Fig. 6 is the embodiment of the present invention;
Fig. 7 is the dirty data identification module internal structure schematic diagram of embodiment illustrated in fig. 6;
Fig. 8 is the dirty data correcting module internal structure schematic diagram of embodiment illustrated in fig. 6.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with detailed description of the invention and with reference to accompanying drawing, the present invention is described in more detail.It should be understood that these descriptions are illustrative of, and it is not intended to limit the scope of the present invention.Additionally, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring idea of the invention.
The data that the embodiment of the present invention provides close rule management method, first driver element obtains data from data record set and enters conjunction rule queue, extract the constraints in the constraint set corresponding with the data extracted, judged the compliance of described data by constraints.If defect of data, veritified by data and revise.The data of the present invention close the data process that rule management method is particularly suited in E-Government management system.Below in conjunction with drawings and the specific embodiments, the present invention is described in detail.
For the dirty data of electronic government affairs system, it is use data constraint condition that data close the task of rule, identifies, filters or revise those undesirable data, is converted to by dirty data and meets the data that the quality of data requires.
Undesirable data mainly have deficiency of data, wrong data, abnormal data and the big class of repetition data four.Deficiency of data is characterized by some due loss of learnings, such as organization names, the title of branch company, area information disappearance etc..The generation of wrong data is not well established due to operation system, data do not judge after receiving data input and write direct background data base and cause have after such as numeric data defeated one-tenth full-shape numerical character, string data that a carriage return, date format be incorrect, the date crosses the border.Improper value includes input error and content false, and input error is caused by initial data typing personnel carelessness, and content false is caused by some odjective causes mostly, for instance the difference of the affiliated unit that personnel fill in and the promotion etc. of personnel.If abnormal data refers to that in all records, one or several interfield overwhelming majority follows certain pattern, other does not follow the record of this pattern, as age field exceedes the historical tidemark age etc..Meanwhile, abnormal data also includes the situation that the data that cause when origin system abandons external restraint based on the consideration of performance are inconsistent.Repeat data namely " duplicated records ", refer to that same live entities represents with a plurality of incomplete same record in data acquisition system, due to they difference in form, spelling, cause that data base management system can not correctly identify.From the angle of narrow sense, if two records are equal or enough similar in the value of some field, then it is assumed that the similar repetition each other of these two records.Identify that duplicated records is the core that data close that rule are movable.
The situations such as the inconsistent situation of data, different measurement units, out-of-date address, postcode are caused also to be the object that data close ruleization additionally, due to legal person or author change unit.These type of data needs carry out closing the main manifestations of ruleization: data formatting error, data are inconsistent, Data duplication, mistake, service logic is unreasonable, violate business rule etc., such as, the ID (identity number) card No. of invalidated, the date field of invalidated, process operator number be absent from, sex exceedes span etc..
Under normal circumstances, the data of the embodiment of the present invention are closed ruleization and are divided into three phases, first stage: set up data and close rule set, analytical data compliance, second stage: search for, identify the tuple of mistake, record, namely identify dirty data;Three phases: change wrong data.The above-mentioned stage can be achieved by concrete execution step.
Reason Method And Principle schematic diagram is regulated in the data conjunction that Fig. 1 is the embodiment of the present invention.
As it is shown in figure 1, the data of the present embodiment close rule management method, comprise the steps:
First stage includes:
Step S11, builds data record set.
Second stage includes:
Step S12, from data record set, extracted data enters closes ruleization queue, and driving data enters data and closes the engine of rule.
Step S13, takes the constraints of correspondence from constraint set.Constraint set is collecting of the constraints to three conjunction rule grades.Including the first estate required item data, i.e. curing data, strict constrained condition limits, and the change of its mistake is only manually veritified by afterwards, could change, namely manually by corresponding interface, curing data is modified, such as primary attribute data.The data item revised of the second constrained condition restriction of grade, according to constraints, for not conforming to the data of rule, it is possible to be modified to conjunction rule data, such as age, interval, number of times, distance.The tertiary gradient is status attribute data item, such as the past status attribute data, the under normal circumstances prior default settings of this kind of data, such as system initialisation phase default settings;After identifying wrong data, change to the default value in data record set.
Step S14, obtains data from closing ruleization queue, generally, constraints can be obtained by the method that the conjunction rule of attribute data are veritified, it is possible to is generated by service logic.For image class data, this example application characteristic according to E-Government, specify according to service logic, industry, statistical function etc. generates the set of view data compliance constraints.According to the constraints in constraint set, judge in data record, whether each data item meets this constraints one by one.
Step S15, if data fit constraints, is just labeled as data qualifier.
Step S16, if data do not meet constraints, defective is just labeled as defect of data.
Phase III includes:
Step S17, to number of non-compliances according to carrying out data veritification, according to closing the method that rule are veritified, change of modifying.Being promoted by data engine, after revising, data record inserts and closes in ruleization queue, drives according to engine, re-starts data conjunction ruleization, constantly moves in circles, restrain gradually, until data fit constraints.
Reason method flow schematic diagram is regulated in the data conjunction that Fig. 2 is the embodiment of the present invention.
Comprise the steps: as in figure 2 it is shown, the data of the present embodiment close rule management method
A kind of data close rule management method, it is characterised in that described method includes:
Data are carried out category analysis by step S1, described data are included into corresponding data and close in rule set.
Step S2, closes from described data respectively and searches for rule set and identify dirty data.
Step S3, revises described dirty data.
Preferably, as it is shown on figure 3, method described in technique scheme, before described step 1, it is also possible to including:
Step S0, set up first data close rule set, second data close rule set, the 3rd data close rule set and the 4th data close rule set.
Wherein, described first data close the set of the curing data that rule set is the restriction of constrained condition, described second data close the set of the data revised that rule set is the restriction of constrained condition, described 3rd data close the set that rule set is status attribute data, and described 4th data close the set that rule set is unstructured data.
Preferably, step S1 farther includes, as shown in Figure 3:
Data are carried out category analysis by step S101.
Here classification, the classification mainly from compliance angle, described data carried out, the curing data limited such as constrained condition, the data revised of constrained condition restriction.
If described data the curing data that described data are the restriction of constrained condition, are then included into described first data and close rule set by step S102.
If described data the data revised that described data are the restriction of constrained condition, are then included into described second data and close rule set by step S103.
Described data if described data are status attribute data, are then included into described 3rd data and close rule set by step S104.
Described data if described data are unstructured data, are then included into described 4th data and close rule set by step S105.
Here unstructured data, refers to view data under normal circumstances.
Wherein, described step S102, S103, S104 are optional step arranged side by side, according to data type, described data are included in corresponding set.Accordingly, the data described in step S102 to S104 are structural data.
Preferably, step S2 farther includes, as shown in Figure 4:
Step S201, closes from described data and advises set, and search has the data of respective attributes and enters corresponding conjunction rule queue.
Step S202, extracts from default constraint set and closes, with described, the constraints that rule queue is corresponding.
The constraint set here preset at, is the set of constraints, and default constraints is formed a set, as to needing to carry out the reference data of compliance examination data.The constraints here preset at, it is possible to be the statistical value of historical data, namely carry out data compliance analysis process is added up within a period of time to the significant condition of data compliance, it is also possible to be set as required, it is also possible to generated by service logic.
Step S203, it is judged that whether the data in described conjunction rule queue meet the described constraints extracted;When constraints described in described data fit, proceed to step S204;When not meeting described constraints when described data, proceed to step S205;
Step S204, is qualified data by described data markers,
Step S205, is dirty data by described data markers.
Preferably, step S3 farther includes, as shown in Figure 5:
Step S301, it is judged that described dirty data type.
Judgement to dirty data type in this step, it is common that judge that described dirty data is as attribute data or view data.Here attribute data is generally structural data, generally includes: the curing data of constrained condition restriction, the data revised of constrained condition restriction, status attribute data.View data is unstructured data.
Step S302, when the curing data that described dirty data is the restriction of constrained condition, is included into described curing data and manually revises data queue, be labeled as qualified data after manually revising.
Step S303, when the data revised that described dirty data is the restriction of constrained condition, the data revised described constrained condition limited are modified according to constraints and basic data item, are labeled as qualified data after correction.
Step S304, when described data are status attribute data, is default data by described status attribute data correction, is qualified data by revised data markers.
Step S305, when described dirty data is unstructured data, adopts compression algorithm that described unstructured data is modified, is labeled as qualified data after correction.
In this step, unstructured data typically refers to view data, and the conjunction ruleization correction of view data is generally adopted image compression algorithm.
For electronic government affairs system, view data is divided into two classes, dimensional printed chart picture and browser to browse the image of use, stores after the described data then electronically written chip meeting compliance requirement, standby.It is 2.5x3.5cm that browser browses the conjunction rule data of the view data of use, deep 24 colors in position, resolution 254, pixel 250x351.Adopting image compression algorithm, electronically written chip internal, as being sized to 2KB after the view data of boil down to jpeg format.For the satisfied printed image data closing rule numerical value, unpressed image is sized to 250*351*24/8=263250byte=257KB.And adopt image compression algorithm that the image preserved after described compression of images is sized to 51.4KB.
Table 1
Table 1 show the view data compliance constraints set of the embodiment of the present invention.As shown in table 1, the constraints of image pixels across data unit is 176pix to 320pix, it is preferred that closing rule numerical value is 250pix.The constraints of image longitudinal direction pixel data unit is 240pix to 448pix, it is preferred that closing rule numerical value is 351pix.The conjunction rule data of image pixel size can according to business need or historical usage effect, take compromise value, its valid interval can be the interval of best effects and the worst effect after probability statistics, as exceeded maximum interval value, it not big especially for the printing link in production process and the meaning inquiring and browsing link, can affect the value of other compliance attributes yet.The constraints of image lateral dimension data element is 1.75cm to 3.25cm, it is preferred that closing rule numerical value is 2.5cm.The constraints of image longitudinal direction pixel data unit is 2.45cm to 4.55cm, it is preferred that closing rule numerical value is 3.5cm.The constraints of image resolution ratio data element is 176 to 320, it is preferred that closing rule numerical value is 254.The constraints of the deep data element of figure image position is 8,16 or 24, and wherein the preferred rule numerical value that closes is 24.24 colors are that euchroic is color, for value 16 and 8, are also to accept.The constraints of the uncompressed size data unit of image is 176KB to 320KB, it is preferred that closing rule numerical value is 257KB.Image size and compression ratio are to deduce data, and the purpose of compliance is conducive to storage and display on the one hand, is conducive to chip-stored and display thereof on the other hand.The constraints of compression of images ratio data element is 1 to 20, it is preferred that closing rule numerical value is 5.During electronically written storage, compression of images proportional is 10 to 40 according to the constraints of unit, it is preferred that closing rule numerical value is 26.Chip electronically written memory space is 2KB, and obtaining compromise compression factor data element after statistics is 26, and whole image can either be stored in chip completely, and the view data imaging effect simultaneously derived from chip meets business need.
By data electronically written chip, adopting image compression algorithm, compression ratio is generally between 10:1 to 40:1, and compression ratio is more big, and quality is more low;On the contrary, compression ratio is more little, quality is more good.Can certainly finding equilibrium point between picture quality and document size, compression factor 26 is for closing rule numerical value.The mainly high-frequency information of jpeg format compression, retains better the information of color, is also applied to the image of continuous tone.
Close rule management method by the data described in the embodiment of the present invention, the dirty data in network data base is processed, becomes satisfactory data, close ruleization by data and dirty data is converted to the data meeting quality of data requirement.
Reason system structure schematic diagram is regulated in the data conjunction that Fig. 6 is the embodiment of the present invention.
As it is shown in fig. 7, the data of the present embodiment are closed regulates reason system, including: category analysis module 11, dirty data identification module 12, dirty data correcting module 13;Wherein,
Described category analysis module 11 is for being analyzed the compliance of data, and closes in rule set for described data are included into corresponding data.
Described dirty data identification module 12 is connected with category analysis module 11, searches for for closing from described data respectively and identifies dirty data rule set.Described dirty data correcting module 13 is connected with dirty data identification module 12, is used for revising described dirty data.
Described system can also include data and close rule set and set up module, is used for setting up that the first data close rule set, the second data close rule set, the 3rd data close rule set and the 4th data close rule set;Wherein, described first data close the set of the curing data that rule set is the restriction of constrained condition, described second data close the set of the data revised that rule set is the restriction of constrained condition, described 3rd data close the set that rule set is status attribute data, and described 4th data close the set that rule set is unstructured data.
Preferably, described compliance analysis can be also used for when the curing data that described data are the restriction of constrained condition, then described data being included into described first data and closing rule set;When the data revised that described data are the restriction of constrained condition, then described data are included into described second data and close rule set;When described data are status attribute data, then described data are included into described 3rd data and close rule set;When described data are unstructured data, then described data are included into described 4th data and close rule set.
Fig. 7 is the dirty data identification module internal structure schematic diagram of embodiment illustrated in fig. 6.
As it is shown in fig. 7, described dirty data identification module 12 farther includes: search for submodule 121, judge submodule 122, labelling submodule 124, constraint set zygote module 123;Wherein,
Described search submodule 121 is advised set for closing from the described data set up, and search has the data of respective attributes and enters corresponding conjunction rule queue.
Described constraint set zygote module 123 is for presetting constraints and storing the constraint set being made up of constraints.
Described judgement submodule 122 is connected with described constraint set zygote module 123 and search submodule 121, close, with described, the constraints that rule queue is corresponding for extracting from described constraint set zygote module 123, and judge whether the described data closed in rule queue meet the described constraints extracted.
Described labelling submodule 124 is connected with described judgement submodule 122, for to judging that the data that submodule 122 judges carry out labelling, when constraints described in described data fit, it is then qualified data by described data markers, when described data do not meet described constraints, then data described in labelling are dirty data.
Fig. 8 is the dirty data correcting module internal structure schematic diagram of embodiment illustrated in fig. 6.
As shown in Figure 8, described dirty data correcting module 13 farther includes: data classification submodule 131, first is revised submodule 132, second and revised submodule the 133, the 3rd correction submodule the 134, the 4th correction submodule 135.
Described data classification submodule 131, for described dirty data is classified, when the curing data that described dirty data is the restriction of constrained condition, sends the data to the first correction submodule 132;When the data revised that described dirty data is the restriction of constrained condition, send the data to the second correction submodule 133;When described data are status attribute data, send the data to the 3rd correction submodule 134;When described dirty data is unstructured data, sends the data to the described 4th and revise submodule 135.
Described first revises submodule 132, is used for storing and manually revises data queue, and provides the artificial interface revised, and will be qualified data through artificial revised data markers.
Described second revises submodule 133, is modified according to constraints and basic data item for the data revised described constrained condition limited, is labeled as qualified data after correction.
Described 3rd revises submodule 134, and being used for described status attribute data correction is default data, and is qualified data by revised data markers.
Described 4th revises submodule 135, is used for adopting compression algorithm that described unstructured data is modified, is labeled as qualified data after correction.
Closed by the data described in the embodiment of the present invention and regulate reason system, the dirty data in network data base is processed, becomes satisfactory data, close ruleization by data and dirty data is converted to the data meeting quality of data requirement.
It should be noted that, one of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment and module can be completed by hardware, can also be completed by the hardware that program carrys out instruction relevant, such as programmable logic controller (PLC) PLC, central controller CPU etc.;This program can be stored in computer-readable recording medium, and storage medium can include memorizer, disk or CD etc., such as CD-ROM.
It should be appreciated that the above-mentioned detailed description of the invention of the present invention is used only for exemplary illustration or explains principles of the invention, and it is not construed as limiting the invention.Therefore, any amendment of making when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., should be included within protection scope of the present invention.Additionally, claims of the present invention be intended to fall in the equivalents on scope and border or this scope and border whole change and modifications example.

Claims (10)

1. data close rule management method, it is characterised in that described method includes:
Data are carried out category analysis, described data is included into corresponding data and closes in rule set;
Close from described data respectively and rule set is searched for and identifies dirty data;
Revise described dirty data.
2. data according to claim 1 close rule management method, it is characterised in that described method also includes: set up first data close rule set, second data close rule set, the 3rd data close rule set and the 4th data close rule set;Wherein,
Described first data close the set of the curing data that rule set is the restriction of constrained condition, described second data close the set of the data revised that rule set is the restriction of constrained condition, described 3rd data close the set that rule set is status attribute data, and described 4th data close the set that rule set is unstructured data.
3. data according to claim 2 close rule management method, it is characterised in that described data are carried out category analysis, described data are included into corresponding data and close in rule set, including:
If the curing data that described data are the restriction of constrained condition, then described data are included into described first data and close rule set;
If the data revised that described data are the restriction of constrained condition, then described data are included into described second data and close rule set;
If described data are status attribute data, then described data are included into described 3rd data and close rule set;
If described data are unstructured data, then described data are included into described 4th data and close rule set.
4. data according to claim 1 conjunction rule management method, it is characterised in that described conjunction from described data respectively is searched for rule set and identify dirty data, including:
Closing from described data advises set, and search has the data of respective attributes and enters corresponding conjunction rule queue;
Extract from default constraint set and close, with described, the constraints that rule queue is corresponding;
Judge whether the described data closed in rule queue meet the described constraints extracted;
When constraints described in described data fit, then it is qualified data by described data markers, when described data do not meet described constraints, is then dirty data by described data markers.
5. the data according to any one of Claims 1-4 close rule management method, it is characterised in that described correction dirty data, including:
When the curing data that described dirty data is the restriction of constrained condition, described curing data is included into and manually revises data queue, after manually revising, be labeled as qualified data;
When the data revised that described dirty data is the restriction of constrained condition, the data revised described constrained condition limited are modified according to constraints and basic data item, are labeled as qualified data after correction;
When described data are status attribute data, it is default data by described status attribute data correction, and is qualified data by revised data markers;
When described dirty data is unstructured data, adopts compression algorithm that described unstructured data is modified, after correction, be labeled as qualified data.
6. data are closed and are regulated reason system, it is characterised in that described system includes: category analysis module, dirty data identification module, dirty data correcting module;Wherein,
Described category analysis module for carrying out category analysis to data, and closes in rule set for described data are included into corresponding data;
Described dirty data identification module is connected with category analysis module, searches for for closing from described data respectively and identifies dirty data rule set;
Described dirty data correcting module is connected with dirty data identification module, is used for revising described dirty data.
7. data according to claim 6 are closed and are regulated reason system, it is characterized in that, described system also includes: data are closed rule set and set up module, is used for setting up that the first data close rule set, the second data close rule set, the 3rd data close rule set and the 4th data close rule set;Wherein, described first data close the set of the curing data that rule set is the restriction of constrained condition, described second data close the set of the data revised that rule set is the restriction of constrained condition, described 3rd data close the set that rule set is status attribute data, and described 4th data close the set that rule set is unstructured data.
8. data according to claim 7 are closed and are regulated reason system, it is characterised in that described category analysis module is set up module with data conjunction rule set and is connected, and described category analysis module is further used for:
When the curing data that described data are the restriction of constrained condition, then described data are included into described first data and close rule set;
When the data revised that described data are the restriction of constrained condition, then described data are included into described second data and close rule set;
When described data are status attribute data, then described data are included into described 3rd data and close rule set;
When described data are unstructured data, then described data are included into described 4th data and close rule set.
9. data according to claim 6 are closed and are regulated reason system, it is characterised in that described dirty data identification module farther includes: search for submodule, judge submodule, labelling submodule, constraint set zygote module;Wherein,
Described search submodule is advised set for closing from described data, and search has the data of respective attributes and enters corresponding conjunction rule queue;
Described constraint set zygote module is for presetting constraints and storing the constraint set being made up of constraints;
Described judgement submodule is connected with described constraint set zygote module and search submodule, close, with described, the constraints that rule queue is corresponding for extracting from described constraint set zygote module, and judge whether the described data closed in rule queue meet the described constraints extracted;
Described labelling submodule is connected with described judgement submodule, for to judging that the data that submodule judges carry out labelling, when constraints described in described data fit, then it is qualified data by described data markers, when described data do not meet described constraints, then it is dirty data by described data markers.
10. the data according to any one of claim 6 to 9 are closed and are regulated reason system, it is characterized in that, described dirty data correcting module farther includes: data classification submodule, the first correction submodule, the second correction submodule, the 3rd correction submodule, the 4th correction submodule;Wherein,
Described data classification submodule, for described dirty data is classified, when the curing data that described dirty data is the restriction of constrained condition, sends the data to the first correction submodule;When the data revised that described dirty data is the restriction of constrained condition, send the data to the second correction submodule;When described data are status attribute data, send the data to the 3rd correction submodule;When described dirty data is unstructured data, sends the data to the described 4th and revise submodule;
Described first revises submodule, is used for storing and manually revises data queue, and provides the artificial interface revised, and will be qualified data through artificial revised data markers;
Described second revises submodule, is modified according to constraints and basic data item for the data revised described constrained condition limited, is labeled as qualified data after correction;
Described 3rd revises submodule, and being used for described status attribute data correction is default data, and is qualified data by revised data markers;
Described 4th revises submodule, is used for adopting compression algorithm that described unstructured data is modified, is labeled as qualified data after correction.
CN201410854455.XA 2014-12-31 2014-12-31 Data compliance management method and system Active CN105808604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410854455.XA CN105808604B (en) 2014-12-31 2014-12-31 Data compliance management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410854455.XA CN105808604B (en) 2014-12-31 2014-12-31 Data compliance management method and system

Publications (2)

Publication Number Publication Date
CN105808604A true CN105808604A (en) 2016-07-27
CN105808604B CN105808604B (en) 2021-02-05

Family

ID=56464899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410854455.XA Active CN105808604B (en) 2014-12-31 2014-12-31 Data compliance management method and system

Country Status (1)

Country Link
CN (1) CN105808604B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341202A (en) * 2017-06-21 2017-11-10 平安科技(深圳)有限公司 Appraisal procedure, device and the storage medium of business datum table amendment risk factor
CN108572997A (en) * 2017-03-14 2018-09-25 北京宸信征信有限公司 A kind of the integration storage system and method for the multi-source data with network attribute
CN111241082A (en) * 2020-01-13 2020-06-05 贝壳技术有限公司 Data correction method and device
CN111260238A (en) * 2020-01-21 2020-06-09 南方电网能源发展研究院有限责任公司 Risk data filtering and storing method
CN111611233A (en) * 2020-05-21 2020-09-01 国家卫星海洋应用中心 Data quality detection method and device and electronic equipment
CN111797080A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Model training method, data recovery device, storage medium and equipment
CN112215466A (en) * 2020-09-08 2021-01-12 支付宝(杭州)信息技术有限公司 Transaction data supervision processing method and device and electronic equipment
CN113947284A (en) * 2021-09-14 2022-01-18 广州市城市规划设计有限公司 Data compliance conversion method, device and system for homeland space planning
CN116226098A (en) * 2023-05-09 2023-06-06 北京尽微致广信息技术有限公司 Data processing method, device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290622A (en) * 2007-04-20 2008-10-22 鸿富锦精密工业(深圳)有限公司 Database cleaning system and method
US20130055042A1 (en) * 2011-08-31 2013-02-28 Accenture Global Services Limited Data quality analysis and management system
CN103412956A (en) * 2013-08-30 2013-11-27 北京中科江南软件有限公司 Data processing method and system for heterogeneous data sources

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290622A (en) * 2007-04-20 2008-10-22 鸿富锦精密工业(深圳)有限公司 Database cleaning system and method
US20130055042A1 (en) * 2011-08-31 2013-02-28 Accenture Global Services Limited Data quality analysis and management system
CN103412956A (en) * 2013-08-30 2013-11-27 北京中科江南软件有限公司 Data processing method and system for heterogeneous data sources

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108572997A (en) * 2017-03-14 2018-09-25 北京宸信征信有限公司 A kind of the integration storage system and method for the multi-source data with network attribute
CN108572997B (en) * 2017-03-14 2020-08-18 北京宸信征信有限公司 Integrated storage system and method of multi-source data with network attributes
CN107341202B (en) * 2017-06-21 2018-06-08 平安科技(深圳)有限公司 Business datum table corrects appraisal procedure, device and the storage medium of danger level
CN107341202A (en) * 2017-06-21 2017-11-10 平安科技(深圳)有限公司 Appraisal procedure, device and the storage medium of business datum table amendment risk factor
CN111797080A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Model training method, data recovery device, storage medium and equipment
CN111241082A (en) * 2020-01-13 2020-06-05 贝壳技术有限公司 Data correction method and device
CN111260238B (en) * 2020-01-21 2023-09-29 南方电网能源发展研究院有限责任公司 Risk data filtering and storing method
CN111260238A (en) * 2020-01-21 2020-06-09 南方电网能源发展研究院有限责任公司 Risk data filtering and storing method
CN111611233A (en) * 2020-05-21 2020-09-01 国家卫星海洋应用中心 Data quality detection method and device and electronic equipment
CN111611233B (en) * 2020-05-21 2021-01-26 国家卫星海洋应用中心 Data quality detection method and device and electronic equipment
CN112215466A (en) * 2020-09-08 2021-01-12 支付宝(杭州)信息技术有限公司 Transaction data supervision processing method and device and electronic equipment
CN113947284A (en) * 2021-09-14 2022-01-18 广州市城市规划设计有限公司 Data compliance conversion method, device and system for homeland space planning
CN116226098A (en) * 2023-05-09 2023-06-06 北京尽微致广信息技术有限公司 Data processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105808604B (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN105808604A (en) Data compliance management method and system
Li Knowledge gathering and matching in heterogeneous databases
Al-Zaidy et al. Automatic extraction of data from bar charts
US8693790B2 (en) Form template definition method and form template definition apparatus
CN109902090B (en) Method and device for acquiring field name
CN106709032A (en) Method and device for extracting structured information from spreadsheet document
CN103995904A (en) Recognition system for image file electronic data
CN110705515A (en) Hospital paper archive filing method and system based on OCR character recognition
US20060074950A1 (en) Apparatus and method for parametric group processing
CN105824862A (en) Image classification method based on electronic equipment and electronic equipment
JP5380040B2 (en) Document processing device
CN109344227A (en) Worksheet method, system and electronic equipment
Konidaris et al. A segmentation-free word spotting method for historical printed documents
CN103440315A (en) Web page cleaning method based on theme
KR101019627B1 (en) System and Method for Construction Automatic Bibliography based Pattern, and Recording Medium therefor
CN104933077B (en) Rule-based multifile information analysis method
CN110377768B (en) Intelligent graph recognition system and method
Diers et al. A survey of methods for automated quality control based on images
WO2020211380A1 (en) Intelligent recognition method for front-end code in page design, and related device
CN101673347A (en) Spitting method of electronic drawing file
TWI396990B (en) Citation record extraction system and method, and program product
CN105573984A (en) Socio-economic indicator identification method and device
Ondrejcek et al. Information extraction from scanned engineering drawings
CN106407292A (en) Method and device for detecting geometric variation of vector data in spatial database
CN105808783B (en) A kind of large file difference analysis method of difference Domain Name Form registering sites

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant