CN109491991A - A kind of unsupervised data auto-cleaning method - Google Patents
A kind of unsupervised data auto-cleaning method Download PDFInfo
- Publication number
- CN109491991A CN109491991A CN201811325335.5A CN201811325335A CN109491991A CN 109491991 A CN109491991 A CN 109491991A CN 201811325335 A CN201811325335 A CN 201811325335A CN 109491991 A CN109491991 A CN 109491991A
- Authority
- CN
- China
- Prior art keywords
- data
- rule
- attribute
- predicate
- order predicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of unsupervised data auto-cleaning methods, the following steps are included: A. data model learns, from may include invalid data initial data in learn attribute between dependence, by finding out implicit nisi or relatively weak dependence, the data model indicated with the form of Bayesian network is obtained;B. the generation of data cleansing rule;The generation of data cleansing rule is carried out after the complete data model for obtaining initial data or initial data sampling, and specifically generates predicate and first-order predicate rule;C. Markov logical network is generated based on the predicate generated in step B and first-order predicate rule;D. the generation of rule is made inferences based on the Markov logical network generated in step C and carries out the cleaning of data based on the reasoning results.Method of the invention may be implemented in the quality of data without effectively improving each operation system of company in the case where expending a large amount of manpower and material resources, facilitates management level and makes correct decisions.
Description
Technical field
The present invention relates to technical field of data administration, in particular to a kind of unsupervised data auto-cleaning method.
Background technique
What data in the real world were typically required cleaning (is below dirty number to the data definition cleaned
According to), it may such as include such as inconsistent, there are noise, incomplete or duplicate value.In commercial field, error number
According to may cause very huge economic loss.The mistake for commodity that client buys is thrown as the customer information of mistake may cause company
It passs, this not only adds the delivery costs of enterprise, while also having biggish bear to the image of enterprise within considerable time
Face is rung.
In existing data cleaning method, certain methods need artificial severe to participate in during data cleansing, such as
Cleaning is provided and suggests or confirms reparation etc.;Although not needing manually to participate in certain methods cleaning process, needs are mentioned
Before make relevant cleaning rule.In the case where data rule is unknown or cost of labor is difficult to bear, existing data
Cleaning method is simultaneously not suitable for.In view of the status of current data cleaning method, what this patent solved is exactly good without predefined
Cleaning rule is not necessarily to carry out data cleansing in the case of manually participating in simultaneously, promotes the quality of data.
Summary of the invention
It is insufficient in above-mentioned background technique the purpose of the present invention is overcoming, the unsupervised data of one kind side of cleaning automatically is provided
Method is based on statistical relational learning learning rules from data, carries out data cleansing based on probability inference, can effectively improve data
Cleaning efficiency and effect are realized in the data without effectively improving each operation system of company in the case where expending a large amount of manpower and material resources
Quality, promotes the satisfaction of user, while facilitating management level and making correct decisions based on the quality of data of promotion.
In order to reach above-mentioned technical effect, the present invention takes following technical scheme:
A kind of unsupervised data auto-cleaning method, comprising the following steps:
A. data model learn, from may include invalid data initial data in learn attribute between dependence,
By finding out implicit nisi or relatively weak dependence, the data indicated with the form of Bayesian network are obtained
Model;
B. the generation of data cleansing rule;Obtain initial data or initial data sampling complete data model it
The generation of data cleansing rule is carried out afterwards, and specifically generates predicate and first-order predicate rule, that is, first-order predicate logic expression formula;
C. Markov logical network is generated based on the predicate generated in step B and first-order predicate rule;
D. based on the Markov logical network generated in step C make inferences rule generation and based on the reasoning results into
The cleaning of row data.
Further, the step A is specifically included:
A1. treating repair data may be assessed and be sampled comprising the initial data of invalid data;
A2. the data set after raw data set or sampling is learnt, obtains being indicated with the form of Bayesian network
Data model structure;The structure of Bayesian network reflects dependence and degree of dependence between data attribute,
A3. the data set after raw data set or sampling is learnt to obtain the parameter of data model, specific shape
Formula is the conditional probability table of dependence;
A4. the parameter of the structure of merging data model and data model obtains complete data model.Further, institute
Step B is stated to specifically include:
B1. the relationship constant for indicating relationship between main body is defined;
B2. the complete data model according to obtained in step A4 generates corresponding first-order predicate logic expression formula: specific
Including generating predicate and first-order predicate rule according to the obtained Bayesian network of study, for single attribute be directed toward an attribute and
Multiple attributes are directed toward the different situations of an attribute, formulate the conversion that dependence is converted to first-order predicate logic expression formula respectively
Rule.
Further, in the step B2;
When being directed toward an attribute for single attribute, i.e. attribute A1And A2Between there are a directed edge and from A1It is directed toward A2,
Then by A1And A2Between dependence form turn to following first-order predicate logic:
Wherein v is tuple id1And id2A attribute value;
When being directed toward an attribute for multiple attributes, attribute A1、A2、…、AiIt is directed toward A simultaneouslyj, then its dependence form
Turn to following first-order predicate logic:
Wherein, v1、v2、…、viIt is tuple id1And id2In attribute A1、A2、…、AiOn attribute value.Further, described
Step C is specifically included:
C1. to the first-order predicate rule of generation according to whether being logical validity formula, i.e., explain that lower probability carries out as 1 any
It distinguishes, is divided into absolute rule and non-absolute rule;
C2. the calculating of weight is carried out to first-order predicate logic, including is formulated respectively for absolute rule and non-absolute rule
Different weight calculation strategy, wherein the weight assignment to absolute rule is positive infinity, uses mutual information to non-absolute rule
Calculate the weight of these rules;
C3. the first-order predicate generated according to step B2 is regular, and the mutual information between the rule-based attribute being related to calculates rule
Weight then;
C4. according to the weight calculation in step C3 as a result, obtaining the Ma Er of the data set after raw data set or sampling
It can husband's logical network.
Further, the step C3 is specifically included:
C3.1 is related to the different situations of two attributes and multiple attributes for a first-order predicate logic rule, makes respectively
Fixed different regular weighing computation method;Wherein,
The case where being related to two attributes for a first-order predicate logic rule, using two attributes in raw data set
Or the mutual information on the data set after sampling carries out the calculating of regular weight;
The mutual information is the real number of a value range between zero and one, if attribute is perfectly correlated, mutual information is
1, if uncorrelated, mutual information 0 completely, herein if rule is related to two attributes, mutual information is two attribute variables
Joint probability density assembly average, be related to the weight of two attributes, such as weight in this, as first-order predicate logic rule
Higher, then correlation is strong, explanatory strong;Because first-order predicate logic rule is related to the discrete feature of attribute, mutual information is defined as:
Wherein, P (x, y) is joint probability distribution function, and p (x) and p (y) is marginal probability density function
C3.2 introduces exponential function and is calculated when carrying out the calculating of regular weight, it is ensured that weights are not
Number less than 0 is equivalent to several attributes because the exponential function introduced is the potential function of several attributes of non-negative real function characterization
The weighted feature amount of feature, plays the role of normalized, and formula is as follows:
Further, the step D is specifically included:
D1. it is made inferences based on the step C4 Markov logical network generated, using Markov Chain Meng Teka
Gibbs sampling method in Lip river carries out rule-based reasoning, and the rule of gibbs sampler reasoning are generated according to Markov logical network
Then, the weight of gibbs sampler inference rule is determined;
D2. gibbs sampler inference pattern is constructed, usage factor figure determines reasoning mould as gibbs sampler inference pattern
The variable and the factor of factor graph in type, wherein the factor is for assessing the relationship between variable;
D3. according to the possible world of the step B2 predicate constructed variable generated;
D4. it is made inferences in the possible world of the predicate of step D3 according to the inference pattern that step D2 is constructed;
D5. based on step D4 reasoning as a result, being cleaned, being repaired to raw data set.
It further, is to select it is expected maximum value as the value after repairing when being repaired in the step D5.
Compared with prior art, the present invention have it is below the utility model has the advantages that
Unsupervised data auto-cleaning method of the invention is that the unsupervised automaticdata based on statistical relational learning is clear
Washing method is not necessarily to manpower intervention when carrying out data cleansing, therefore can greatly save the human cost of data cleansing, together
When due to being to carry out rule discovery from the initial data comprising dirty data automatically, there is no need to formulate in advance the quality of data rule
Then.This unsupervised automaticdata cleaning method can effectively promote the effect of data cleansing, improve the accuracy of data, simultaneously
The efficiency of data cleansing can also be improved.
Detailed description of the invention
Fig. 1 is the frame diagram of unsupervised data auto-cleaning method of the invention.
Specific embodiment
Below with reference to the embodiment of the present invention, the invention will be further elaborated.
Embodiment:
As shown in Figure 1, a kind of unsupervised data auto-cleaning method, it can be in the quality of data mode/rule feelings lacked
Under condition and without realizing data cleansing in the case of manpower intervention, while ensuring the effect and efficiency of data cleansing.
Specifically includes the following steps:
S10. data model learns:
To find out implicit mode/rule, need from may include invalid data initial data in learn between attribute
Dependence.Since there may be invalid data, the absolute or strong dependence between data Table Properties is not necessarily
In the presence of being indicated by finding out implicit nisi or relatively weak dependence, and with the form of Bayesian network
To data model.
The emphasis process that the step extracts is as follows:
S101. repair data is treated to be assessed and sampled;
S102. the data set after raw data set or sampling is learnt, the form for obtaining Bayesian network indicates
Data model structure, concrete form be Bayesian network;
S103. the data set after raw data set or sampling is learnt, obtains the parameter of data model, it is specific
Form is the conditional probability table of dependence;
S104. the structure and parameter for merging step S102 and the data model in step S103, obtains complete data mould
Type.
S20. the generation of data cleansing rule:
After the complete data model for obtaining initial data or initial data sampling, i.e. progress data cleansing is regular
Generation.
The generation of data cleansing rule has following main several steps:
S201. relationship constant is defined.Relationship constant has contained the relationship between multiple elements, be mainly used for indicate main body it
Between relationship, need to be defined the relationships constant such as " equivalence ", " matching " in this step.
S202. corresponding first-order predicate logic expression formula is generated according to data model.
Bayesian network is a kind of reflection of dependence between attribute in relation table, if node N1It is directed toward N2, then it represents that
N2N is depended in a way1.In view of this consideration, the Bayesian network building first-order predicate logic obtained according to study.
It is assumed that attribute A1And A2Between there are a directed edge and from A1It is directed toward A2, then can be by A1And A2Between dependence close
It is that form turns to following first-order predicate logic expression formula:
Wherein v is tuple id1And id2A attribute value.
If there is multiple attributes are directed toward an attribute, such as attribute A1、A2、…、AiIt is directed toward A simultaneouslyj, then between them according to
Bad relationship equally can turn to following first-order predicate logic in the form of:
Wherein v1、v2、…、viIt is tuple id1And id2In attribute A1、A2、…、AiOn attribute value.
It needs to be directed toward an attribute for single attribute in this step and multiple attributes is directed toward not sympathizing with for an attribute
Condition formulates dependence respectively and is converted to the transformation rule of first-order predicate logic expression formula, and is obtained according to S104 complete
Data model automatically generates predicate and first-order predicate rule.
S30. Markov logical network is generated based on the step S202 predicate generated and first-order predicate rule.
Markov Logic Network defines the probability distribution in possible world, the possible world under data cleansing scene
Refer to the possibility reparation of wrong data.Markov Logic Network includes first-order predicate logic rule and corresponding weight.Weight
It is the reflection of first-order predicate logic satisfaction degree, weight is bigger, and the degree for illustrating that first-order predicate logic meets is higher.
S301. the first-order predicate rule of generation is distinguished, is divided into absolute rule and non-absolute rule, specially to life
At first-order predicate rule according to whether being that logical validity formula explains that lower probability distinguishes as 1 any, be divided into absolute rule
Then with non-absolute rule.
S302. the calculating of weight is carried out to first-order predicate logic.
For absolute rule and non-absolute rule, different weight calculation strategies is formulated respectively.For absolute rule, weight
It is assigned a value of positive infinity.Non- absolute rule belongs to approximate satisfaction, for non-absolute rule, calculates these rules using mutual information
Weight.Each approximate first-order predicate logic rule met is a kind of reflection of dependence between attribute in relation table, is passed through
Mutual information between computation attribute indicates the degree of dependence of dependence.
S303. the first-order predicate generated according to step S302 is regular, the mutual information meter between the rule-based attribute being related to
Calculate the weight of rule.
It is related to the different situations of two attributes and multiple attributes for a first order logic rule, formulates respectively different
Regular weighing computation method.
The case where being related to two attributes for a first order logic rule, using two attributes in raw data set or
Mutual information in raw data set sampling carries out the calculating of regular weight.
Mutual information is the real number of a value range between zero and one, if attribute is perfectly correlated, mutual information 1, such as
Fruit is completely uncorrelated, then mutual information is 0.Herein if rule is related to two attributes, mutual information is the connection of two attribute variables
The assembly average for closing probability density, is related to the weight of two attributes in this, as first-order predicate logic rule, if weight is higher,
Then correlation is strong, explanatory strong;Because first-order predicate logic rule is related to the discrete feature of attribute, mutual information is defined as:
Wherein, P (x, y) is joint probability distribution function, and p (x) and p (y) is marginal probability density function.
It when carrying out the calculating of regular weight, introduces exponential function and is calculated, it is ensured that weights are >=0
Number, this is but also obtained weight can preferably reflect the dependence between attribute.Because of the exponential function right and wrong introduced
The potential function of several attributes of negative real function characterization, is equivalent to the weighted feature amount of several attributive character, plays normalized work
With formula is as follows:
Simultaneously as the increase of mutual information, weight exponentially increase, it is clear in data that high weight rule can be increased in this way
Effect during washing promotes the effect of data cleansing.
S304. according to the weight calculation of step S303 as a result, automatically deriving the horse of initial data or initial data sampling
Er Kefu logical network.
S40. the generation of rule is made inferences based on the step S304 Markov logical network generated and is based on reasoning knot
The cleaning of fruit progress data.
Specifically includes the following steps:
S401. it is made inferences based on the step S304 Markov logical network generated, using Markov Chain Meng Teka
Gibbs sampling method in Lip river carries out rule-based reasoning.The rule of gibbs sampler reasoning are generated according to Markov logical network
Then, the weight of gibbs sampler inference rule is determined.
S402. gibbs sampler inference pattern is constructed.
Usage factor figure is as gibbs sampler inference pattern.The variable and the factor for determining factor graph in inference pattern, because
Son is for assessing the relationship between variable.
S403. the possible world based on the step S202 predicate constructed variable generated, which is the basis of reasoning.
S404. the inference pattern based on step S402 building makes inferences in the possible world of the predicate of step S403.
S405. based on step S404 reasoning as a result, being cleaned, being repaired to raw data set.To each to be repaired
Data select it is expected maximum value as the value after repairing.
In summary, unsupervised data auto-cleaning method of the invention, is based on the unsupervised of statistical relational learning
Automaticdata cleaning method is not necessarily to manpower intervention when carrying out data cleansing, therefore can greatly save data cleansing
Human cost, simultaneously because carrying out rule discovery from the initial data comprising dirty data automatically, there is no need to formulate in advance
Quality of data rule.This unsupervised automaticdata cleaning method can effectively promote the effect of data cleansing, improve data
Accuracy, while the efficiency of data cleansing can also be improved.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses
Mode, however the present invention is not limited thereto.For those skilled in the art, essence of the invention is not being departed from
In the case where mind and essence, various changes and modifications can be made therein, these variations and modifications are also considered as protection scope of the present invention.
Claims (8)
1. a kind of unsupervised data auto-cleaning method, which comprises the following steps:
A. data model learn, from may include invalid data initial data in learn attribute between dependence, pass through
Implicit nisi or relatively weak dependence is found out, the data mould indicated with the form of Bayesian network is obtained
Type;
B. the generation of data cleansing rule;After the complete data model for obtaining initial data or initial data sampling i.e.
The generation of data cleansing rule is carried out, and specifically generates predicate and first-order predicate rule;
C. Markov logical network is generated based on the predicate generated in step B and first-order predicate rule;
D. the generation of rule is made inferences based on the Markov logical network generated in step C and is counted based on the reasoning results
According to cleaning.
2. the unsupervised data auto-cleaning method of one kind according to claim 1, which is characterized in that the step A tool
Body includes:
A1. treating repair data may be assessed and be sampled comprising the initial data of invalid data;
A2. the data set after raw data set or sampling is learnt, obtains the number indicated with the form of Bayesian network
According to the structure of model;
A3. the data set after raw data set or sampling is learnt to obtain the parameter of data model, concrete form is
The conditional probability table of dependence;
A4. the parameter of the structure of merging data model and data model obtains complete data model.
3. the unsupervised data auto-cleaning method of one kind according to claim 2, which is characterized in that the step B tool
Body includes:
B1. the relationship constant for indicating relationship between main body is defined;
B2. the complete data model according to obtained in step A4 generates corresponding first-order predicate logic expression formula: specifically including
The Bayesian network obtained according to study generates predicate and first-order predicate rule i.e. first-order predicate logic expression formula, belongs to for single
Property be directed toward the different situations that an attribute and multiple attributes be directed toward an attribute, formulate dependence respectively and be converted to first-order predicate
The transformation rule of logical expression.
4. the unsupervised data auto-cleaning method of one kind according to claim 3, which is characterized in that the step B2
In;
When being directed toward an attribute for single attribute, i.e. attribute A1And A2Between there are a directed edge and from A1It is directed toward A2, then will
A1And A2Between dependence form turn to following first-order predicate logic:
Wherein v is tuple id1And id2A attribute value;
When being directed toward an attribute for multiple attributes, attribute A1、A2、…、AiIt is directed toward A simultaneouslyj, then its dependence form turn to as
Under first-order predicate logic:
Wherein, v1、v2、…、viIt is tuple id1And id2In attribute A1、A2、…、AiOn attribute value.
5. the unsupervised data auto-cleaning method of one kind according to claim 3, which is characterized in that the step C tool
Body includes:
C1. the first-order predicate rule of generation is distinguished, is divided into absolute rule and non-absolute rule;
C2. the calculating of weight is carried out to first-order predicate logic, including formulates difference respectively for absolute rule and non-absolute rule
Weight calculation strategy, wherein be positive infinity to the weight assignment of absolute rule, non-absolute rule calculated using mutual information
The weight of these rules;
C3. the first-order predicate generated according to step B2 is regular, the mutual information computation rule between the rule-based attribute being related to
Weight;
C4. according to the weight calculation in step C3 as a result, obtaining the markov of the data set after raw data set or sampling
Logical network.
6. the unsupervised data auto-cleaning method of one kind according to claim 5, which is characterized in that the step C3 tool
Body includes:
C3.1 is related to the different situations of two attributes and multiple attributes for a first-order predicate logic rule, formulates respectively not
Same regular weighing computation method;Wherein,
The case where being related to two attributes for a first-order predicate logic rule, using two attributes in raw data set or
Mutual information on data set after sampling carries out the calculating of regular weight;
The mutual information is the real number of a value range between zero and one, if attribute is perfectly correlated, mutual information 1, such as
Fruit is completely uncorrelated, then mutual information is 0;
C3.2 introduces exponential function and is calculated when carrying out the calculating of regular weight, it is ensured that weights are not less than 0
Number.
7. the unsupervised data auto-cleaning method of one kind according to claim 6, which is characterized in that the step D tool
Body includes:
D1. it is made inferences based on the step C4 Markov logical network generated, using in Markov chain Monte-Carlo
Gibbs sampling method carry out rule-based reasoning, according to Markov logical network generate gibbs sampler reasoning rule, really
Determine the weight of gibbs sampler inference rule;
D2. gibbs sampler inference pattern is constructed, usage factor figure determines in inference pattern as gibbs sampler inference pattern
The variable and the factor of factor graph, wherein the factor is for assessing the relationship between variable;
D3. according to the possible world of the step B2 predicate constructed variable generated;
D4. it is made inferences in the possible world of the predicate of step D3 according to the inference pattern that step D2 is constructed;
D5. based on step D4 reasoning as a result, being cleaned, being repaired to raw data set.
8. the unsupervised data auto-cleaning method of one kind according to claim 7, which is characterized in that in the step D5
It is to select it is expected maximum value as the value after repairing when being repaired.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811325335.5A CN109491991B (en) | 2018-11-08 | 2018-11-08 | Unsupervised automatic data cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811325335.5A CN109491991B (en) | 2018-11-08 | 2018-11-08 | Unsupervised automatic data cleaning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109491991A true CN109491991A (en) | 2019-03-19 |
CN109491991B CN109491991B (en) | 2022-03-01 |
Family
ID=65695410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811325335.5A Active CN109491991B (en) | 2018-11-08 | 2018-11-08 | Unsupervised automatic data cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109491991B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117610541A (en) * | 2024-01-17 | 2024-02-27 | 之江实验室 | Author disambiguation method and device for large-scale data and readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105046559A (en) * | 2015-09-10 | 2015-11-11 | 河海大学 | Bayesian network and mutual information-based client credit scoring method |
KR20160115515A (en) * | 2015-03-27 | 2016-10-06 | 금오공과대학교 산학협력단 | A user behavior prediction System and Method for using mobile-based Life log |
CN106094744A (en) * | 2016-06-04 | 2016-11-09 | 上海大学 | The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining |
CN106528634A (en) * | 2016-10-11 | 2017-03-22 | 武汉理工大学 | Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
CN108304668A (en) * | 2018-02-11 | 2018-07-20 | 河海大学 | A kind of Forecasting Flood method of combination hydrologic process data and history priori data |
-
2018
- 2018-11-08 CN CN201811325335.5A patent/CN109491991B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20160115515A (en) * | 2015-03-27 | 2016-10-06 | 금오공과대학교 산학협력단 | A user behavior prediction System and Method for using mobile-based Life log |
CN105046559A (en) * | 2015-09-10 | 2015-11-11 | 河海大学 | Bayesian network and mutual information-based client credit scoring method |
CN106094744A (en) * | 2016-06-04 | 2016-11-09 | 上海大学 | The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining |
CN106528634A (en) * | 2016-10-11 | 2017-03-22 | 武汉理工大学 | Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
CN108304668A (en) * | 2018-02-11 | 2018-07-20 | 河海大学 | A kind of Forecasting Flood method of combination hydrologic process data and history priori data |
Non-Patent Citations (3)
Title |
---|
SUSHOVAN DE 等: "BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata", 《2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》 * |
段亮: "基于概率图模型的数据清洗", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
王勇: "基于一阶逻辑的知识表示与自动提取", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117610541A (en) * | 2024-01-17 | 2024-02-27 | 之江实验室 | Author disambiguation method and device for large-scale data and readable storage medium |
CN117610541B (en) * | 2024-01-17 | 2024-06-11 | 之江实验室 | Author disambiguation method and device for large-scale data and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109491991B (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335168B (en) | Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU | |
JP6176979B2 (en) | Project management support system | |
CN110910004A (en) | Reservoir dispatching rule extraction method and system with multiple uncertainties | |
CN112559963B (en) | Dynamic parameter identification method and device for power distribution network | |
CN110083728B (en) | Method, device and system for optimizing automatic picture data cleaning quality | |
CN111814342B (en) | Complex equipment reliability hybrid model and construction method thereof | |
CN108536471A (en) | A kind of software configuration important module recognition methods based on complex network | |
CN107247666A (en) | A kind of feature based selection and the software defect number Forecasting Methodology of integrated study | |
CN106951963B (en) | Knowledge refining method and device | |
CN109548029A (en) | A kind of two-stage method for trust evaluation of nodes of Wireless Sensor Networks | |
CN115775026B (en) | Federal learning method based on tissue similarity | |
Wang et al. | On the use of time series and search based software engineering for refactoring recommendation | |
CN109491991A (en) | A kind of unsupervised data auto-cleaning method | |
CN109947752A (en) | A kind of automaticdata cleaning method based on DeepDive | |
Pedro et al. | Decision-maker preference modeling in interactive multiobjective optimization | |
Pan et al. | Black-box test-coverage analysis and test-cost reduction based on a Bayesian network model | |
CN117370744A (en) | Dynamic cleaning method and system for abnormal power consumption data of power consumer | |
CN117575564A (en) | Extensible infrastructure network component maintenance and transformation decision evaluation method and system | |
Lucchese et al. | Networks cardinality estimation using order statistics | |
CN104360948A (en) | IEC 61850 configuration file engineering consistency test method based on fuzzy algorithm | |
Borgelt | A conditional independence algorithm for learning undirected graphical models | |
Munikoti et al. | Bayesian graph neural network for fast identification of critical nodes in uncertain complex networks | |
Chan et al. | Using genetic programming for developing relationship between engineering characteristics and customer requirements in new products | |
Ibraigheeth et al. | Software reliability prediction in various software development stages | |
CN112699229A (en) | Self-adaptive question-pushing method based on deep learning model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |