CN109491991A - A kind of unsupervised data auto-cleaning method - Google Patents

A kind of unsupervised data auto-cleaning method Download PDF

Info

Publication number
CN109491991A
CN109491991A CN201811325335.5A CN201811325335A CN109491991A CN 109491991 A CN109491991 A CN 109491991A CN 201811325335 A CN201811325335 A CN 201811325335A CN 109491991 A CN109491991 A CN 109491991A
Authority
CN
China
Prior art keywords
data
rule
attribute
predicate
order predicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811325335.5A
Other languages
Chinese (zh)
Other versions
CN109491991B (en
Inventor
李玲
唐军
吴纯彬
于跃
陈秋宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201811325335.5A priority Critical patent/CN109491991B/en
Publication of CN109491991A publication Critical patent/CN109491991A/en
Application granted granted Critical
Publication of CN109491991B publication Critical patent/CN109491991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of unsupervised data auto-cleaning methods, the following steps are included: A. data model learns, from may include invalid data initial data in learn attribute between dependence, by finding out implicit nisi or relatively weak dependence, the data model indicated with the form of Bayesian network is obtained;B. the generation of data cleansing rule;The generation of data cleansing rule is carried out after the complete data model for obtaining initial data or initial data sampling, and specifically generates predicate and first-order predicate rule;C. Markov logical network is generated based on the predicate generated in step B and first-order predicate rule;D. the generation of rule is made inferences based on the Markov logical network generated in step C and carries out the cleaning of data based on the reasoning results.Method of the invention may be implemented in the quality of data without effectively improving each operation system of company in the case where expending a large amount of manpower and material resources, facilitates management level and makes correct decisions.

Description

A kind of unsupervised data auto-cleaning method
Technical field
The present invention relates to technical field of data administration, in particular to a kind of unsupervised data auto-cleaning method.
Background technique
What data in the real world were typically required cleaning (is below dirty number to the data definition cleaned According to), it may such as include such as inconsistent, there are noise, incomplete or duplicate value.In commercial field, error number According to may cause very huge economic loss.The mistake for commodity that client buys is thrown as the customer information of mistake may cause company It passs, this not only adds the delivery costs of enterprise, while also having biggish bear to the image of enterprise within considerable time Face is rung.
In existing data cleaning method, certain methods need artificial severe to participate in during data cleansing, such as Cleaning is provided and suggests or confirms reparation etc.;Although not needing manually to participate in certain methods cleaning process, needs are mentioned Before make relevant cleaning rule.In the case where data rule is unknown or cost of labor is difficult to bear, existing data Cleaning method is simultaneously not suitable for.In view of the status of current data cleaning method, what this patent solved is exactly good without predefined Cleaning rule is not necessarily to carry out data cleansing in the case of manually participating in simultaneously, promotes the quality of data.
Summary of the invention
It is insufficient in above-mentioned background technique the purpose of the present invention is overcoming, the unsupervised data of one kind side of cleaning automatically is provided Method is based on statistical relational learning learning rules from data, carries out data cleansing based on probability inference, can effectively improve data Cleaning efficiency and effect are realized in the data without effectively improving each operation system of company in the case where expending a large amount of manpower and material resources Quality, promotes the satisfaction of user, while facilitating management level and making correct decisions based on the quality of data of promotion.
In order to reach above-mentioned technical effect, the present invention takes following technical scheme:
A kind of unsupervised data auto-cleaning method, comprising the following steps:
A. data model learn, from may include invalid data initial data in learn attribute between dependence, By finding out implicit nisi or relatively weak dependence, the data indicated with the form of Bayesian network are obtained Model;
B. the generation of data cleansing rule;Obtain initial data or initial data sampling complete data model it The generation of data cleansing rule is carried out afterwards, and specifically generates predicate and first-order predicate rule, that is, first-order predicate logic expression formula;
C. Markov logical network is generated based on the predicate generated in step B and first-order predicate rule;
D. based on the Markov logical network generated in step C make inferences rule generation and based on the reasoning results into The cleaning of row data.
Further, the step A is specifically included:
A1. treating repair data may be assessed and be sampled comprising the initial data of invalid data;
A2. the data set after raw data set or sampling is learnt, obtains being indicated with the form of Bayesian network Data model structure;The structure of Bayesian network reflects dependence and degree of dependence between data attribute,
A3. the data set after raw data set or sampling is learnt to obtain the parameter of data model, specific shape Formula is the conditional probability table of dependence;
A4. the parameter of the structure of merging data model and data model obtains complete data model.Further, institute Step B is stated to specifically include:
B1. the relationship constant for indicating relationship between main body is defined;
B2. the complete data model according to obtained in step A4 generates corresponding first-order predicate logic expression formula: specific Including generating predicate and first-order predicate rule according to the obtained Bayesian network of study, for single attribute be directed toward an attribute and Multiple attributes are directed toward the different situations of an attribute, formulate the conversion that dependence is converted to first-order predicate logic expression formula respectively Rule.
Further, in the step B2;
When being directed toward an attribute for single attribute, i.e. attribute A1And A2Between there are a directed edge and from A1It is directed toward A2, Then by A1And A2Between dependence form turn to following first-order predicate logic:
Wherein v is tuple id1And id2A attribute value;
When being directed toward an attribute for multiple attributes, attribute A1、A2、…、AiIt is directed toward A simultaneouslyj, then its dependence form Turn to following first-order predicate logic:
Wherein, v1、v2、…、viIt is tuple id1And id2In attribute A1、A2、…、AiOn attribute value.Further, described Step C is specifically included:
C1. to the first-order predicate rule of generation according to whether being logical validity formula, i.e., explain that lower probability carries out as 1 any It distinguishes, is divided into absolute rule and non-absolute rule;
C2. the calculating of weight is carried out to first-order predicate logic, including is formulated respectively for absolute rule and non-absolute rule Different weight calculation strategy, wherein the weight assignment to absolute rule is positive infinity, uses mutual information to non-absolute rule Calculate the weight of these rules;
C3. the first-order predicate generated according to step B2 is regular, and the mutual information between the rule-based attribute being related to calculates rule Weight then;
C4. according to the weight calculation in step C3 as a result, obtaining the Ma Er of the data set after raw data set or sampling It can husband's logical network.
Further, the step C3 is specifically included:
C3.1 is related to the different situations of two attributes and multiple attributes for a first-order predicate logic rule, makes respectively Fixed different regular weighing computation method;Wherein,
The case where being related to two attributes for a first-order predicate logic rule, using two attributes in raw data set Or the mutual information on the data set after sampling carries out the calculating of regular weight;
The mutual information is the real number of a value range between zero and one, if attribute is perfectly correlated, mutual information is 1, if uncorrelated, mutual information 0 completely, herein if rule is related to two attributes, mutual information is two attribute variables Joint probability density assembly average, be related to the weight of two attributes, such as weight in this, as first-order predicate logic rule Higher, then correlation is strong, explanatory strong;Because first-order predicate logic rule is related to the discrete feature of attribute, mutual information is defined as:
Wherein, P (x, y) is joint probability distribution function, and p (x) and p (y) is marginal probability density function
C3.2 introduces exponential function and is calculated when carrying out the calculating of regular weight, it is ensured that weights are not Number less than 0 is equivalent to several attributes because the exponential function introduced is the potential function of several attributes of non-negative real function characterization The weighted feature amount of feature, plays the role of normalized, and formula is as follows:
Further, the step D is specifically included:
D1. it is made inferences based on the step C4 Markov logical network generated, using Markov Chain Meng Teka Gibbs sampling method in Lip river carries out rule-based reasoning, and the rule of gibbs sampler reasoning are generated according to Markov logical network Then, the weight of gibbs sampler inference rule is determined;
D2. gibbs sampler inference pattern is constructed, usage factor figure determines reasoning mould as gibbs sampler inference pattern The variable and the factor of factor graph in type, wherein the factor is for assessing the relationship between variable;
D3. according to the possible world of the step B2 predicate constructed variable generated;
D4. it is made inferences in the possible world of the predicate of step D3 according to the inference pattern that step D2 is constructed;
D5. based on step D4 reasoning as a result, being cleaned, being repaired to raw data set.
It further, is to select it is expected maximum value as the value after repairing when being repaired in the step D5.
Compared with prior art, the present invention have it is below the utility model has the advantages that
Unsupervised data auto-cleaning method of the invention is that the unsupervised automaticdata based on statistical relational learning is clear Washing method is not necessarily to manpower intervention when carrying out data cleansing, therefore can greatly save the human cost of data cleansing, together When due to being to carry out rule discovery from the initial data comprising dirty data automatically, there is no need to formulate in advance the quality of data rule Then.This unsupervised automaticdata cleaning method can effectively promote the effect of data cleansing, improve the accuracy of data, simultaneously The efficiency of data cleansing can also be improved.
Detailed description of the invention
Fig. 1 is the frame diagram of unsupervised data auto-cleaning method of the invention.
Specific embodiment
Below with reference to the embodiment of the present invention, the invention will be further elaborated.
Embodiment:
As shown in Figure 1, a kind of unsupervised data auto-cleaning method, it can be in the quality of data mode/rule feelings lacked Under condition and without realizing data cleansing in the case of manpower intervention, while ensuring the effect and efficiency of data cleansing.
Specifically includes the following steps:
S10. data model learns:
To find out implicit mode/rule, need from may include invalid data initial data in learn between attribute Dependence.Since there may be invalid data, the absolute or strong dependence between data Table Properties is not necessarily In the presence of being indicated by finding out implicit nisi or relatively weak dependence, and with the form of Bayesian network To data model.
The emphasis process that the step extracts is as follows:
S101. repair data is treated to be assessed and sampled;
S102. the data set after raw data set or sampling is learnt, the form for obtaining Bayesian network indicates Data model structure, concrete form be Bayesian network;
S103. the data set after raw data set or sampling is learnt, obtains the parameter of data model, it is specific Form is the conditional probability table of dependence;
S104. the structure and parameter for merging step S102 and the data model in step S103, obtains complete data mould Type.
S20. the generation of data cleansing rule:
After the complete data model for obtaining initial data or initial data sampling, i.e. progress data cleansing is regular Generation.
The generation of data cleansing rule has following main several steps:
S201. relationship constant is defined.Relationship constant has contained the relationship between multiple elements, be mainly used for indicate main body it Between relationship, need to be defined the relationships constant such as " equivalence ", " matching " in this step.
S202. corresponding first-order predicate logic expression formula is generated according to data model.
Bayesian network is a kind of reflection of dependence between attribute in relation table, if node N1It is directed toward N2, then it represents that N2N is depended in a way1.In view of this consideration, the Bayesian network building first-order predicate logic obtained according to study.
It is assumed that attribute A1And A2Between there are a directed edge and from A1It is directed toward A2, then can be by A1And A2Between dependence close It is that form turns to following first-order predicate logic expression formula:
Wherein v is tuple id1And id2A attribute value.
If there is multiple attributes are directed toward an attribute, such as attribute A1、A2、…、AiIt is directed toward A simultaneouslyj, then between them according to Bad relationship equally can turn to following first-order predicate logic in the form of:
Wherein v1、v2、…、viIt is tuple id1And id2In attribute A1、A2、…、AiOn attribute value.
It needs to be directed toward an attribute for single attribute in this step and multiple attributes is directed toward not sympathizing with for an attribute Condition formulates dependence respectively and is converted to the transformation rule of first-order predicate logic expression formula, and is obtained according to S104 complete Data model automatically generates predicate and first-order predicate rule.
S30. Markov logical network is generated based on the step S202 predicate generated and first-order predicate rule.
Markov Logic Network defines the probability distribution in possible world, the possible world under data cleansing scene Refer to the possibility reparation of wrong data.Markov Logic Network includes first-order predicate logic rule and corresponding weight.Weight It is the reflection of first-order predicate logic satisfaction degree, weight is bigger, and the degree for illustrating that first-order predicate logic meets is higher.
S301. the first-order predicate rule of generation is distinguished, is divided into absolute rule and non-absolute rule, specially to life At first-order predicate rule according to whether being that logical validity formula explains that lower probability distinguishes as 1 any, be divided into absolute rule Then with non-absolute rule.
S302. the calculating of weight is carried out to first-order predicate logic.
For absolute rule and non-absolute rule, different weight calculation strategies is formulated respectively.For absolute rule, weight It is assigned a value of positive infinity.Non- absolute rule belongs to approximate satisfaction, for non-absolute rule, calculates these rules using mutual information Weight.Each approximate first-order predicate logic rule met is a kind of reflection of dependence between attribute in relation table, is passed through Mutual information between computation attribute indicates the degree of dependence of dependence.
S303. the first-order predicate generated according to step S302 is regular, the mutual information meter between the rule-based attribute being related to Calculate the weight of rule.
It is related to the different situations of two attributes and multiple attributes for a first order logic rule, formulates respectively different Regular weighing computation method.
The case where being related to two attributes for a first order logic rule, using two attributes in raw data set or Mutual information in raw data set sampling carries out the calculating of regular weight.
Mutual information is the real number of a value range between zero and one, if attribute is perfectly correlated, mutual information 1, such as Fruit is completely uncorrelated, then mutual information is 0.Herein if rule is related to two attributes, mutual information is the connection of two attribute variables The assembly average for closing probability density, is related to the weight of two attributes in this, as first-order predicate logic rule, if weight is higher, Then correlation is strong, explanatory strong;Because first-order predicate logic rule is related to the discrete feature of attribute, mutual information is defined as:
Wherein, P (x, y) is joint probability distribution function, and p (x) and p (y) is marginal probability density function.
It when carrying out the calculating of regular weight, introduces exponential function and is calculated, it is ensured that weights are >=0 Number, this is but also obtained weight can preferably reflect the dependence between attribute.Because of the exponential function right and wrong introduced The potential function of several attributes of negative real function characterization, is equivalent to the weighted feature amount of several attributive character, plays normalized work With formula is as follows:
Simultaneously as the increase of mutual information, weight exponentially increase, it is clear in data that high weight rule can be increased in this way Effect during washing promotes the effect of data cleansing.
S304. according to the weight calculation of step S303 as a result, automatically deriving the horse of initial data or initial data sampling Er Kefu logical network.
S40. the generation of rule is made inferences based on the step S304 Markov logical network generated and is based on reasoning knot The cleaning of fruit progress data.
Specifically includes the following steps:
S401. it is made inferences based on the step S304 Markov logical network generated, using Markov Chain Meng Teka Gibbs sampling method in Lip river carries out rule-based reasoning.The rule of gibbs sampler reasoning are generated according to Markov logical network Then, the weight of gibbs sampler inference rule is determined.
S402. gibbs sampler inference pattern is constructed.
Usage factor figure is as gibbs sampler inference pattern.The variable and the factor for determining factor graph in inference pattern, because Son is for assessing the relationship between variable.
S403. the possible world based on the step S202 predicate constructed variable generated, which is the basis of reasoning.
S404. the inference pattern based on step S402 building makes inferences in the possible world of the predicate of step S403.
S405. based on step S404 reasoning as a result, being cleaned, being repaired to raw data set.To each to be repaired Data select it is expected maximum value as the value after repairing.
In summary, unsupervised data auto-cleaning method of the invention, is based on the unsupervised of statistical relational learning Automaticdata cleaning method is not necessarily to manpower intervention when carrying out data cleansing, therefore can greatly save data cleansing Human cost, simultaneously because carrying out rule discovery from the initial data comprising dirty data automatically, there is no need to formulate in advance Quality of data rule.This unsupervised automaticdata cleaning method can effectively promote the effect of data cleansing, improve data Accuracy, while the efficiency of data cleansing can also be improved.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses Mode, however the present invention is not limited thereto.For those skilled in the art, essence of the invention is not being departed from In the case where mind and essence, various changes and modifications can be made therein, these variations and modifications are also considered as protection scope of the present invention.

Claims (8)

1. a kind of unsupervised data auto-cleaning method, which comprises the following steps:
A. data model learn, from may include invalid data initial data in learn attribute between dependence, pass through Implicit nisi or relatively weak dependence is found out, the data mould indicated with the form of Bayesian network is obtained Type;
B. the generation of data cleansing rule;After the complete data model for obtaining initial data or initial data sampling i.e. The generation of data cleansing rule is carried out, and specifically generates predicate and first-order predicate rule;
C. Markov logical network is generated based on the predicate generated in step B and first-order predicate rule;
D. the generation of rule is made inferences based on the Markov logical network generated in step C and is counted based on the reasoning results According to cleaning.
2. the unsupervised data auto-cleaning method of one kind according to claim 1, which is characterized in that the step A tool Body includes:
A1. treating repair data may be assessed and be sampled comprising the initial data of invalid data;
A2. the data set after raw data set or sampling is learnt, obtains the number indicated with the form of Bayesian network According to the structure of model;
A3. the data set after raw data set or sampling is learnt to obtain the parameter of data model, concrete form is The conditional probability table of dependence;
A4. the parameter of the structure of merging data model and data model obtains complete data model.
3. the unsupervised data auto-cleaning method of one kind according to claim 2, which is characterized in that the step B tool Body includes:
B1. the relationship constant for indicating relationship between main body is defined;
B2. the complete data model according to obtained in step A4 generates corresponding first-order predicate logic expression formula: specifically including The Bayesian network obtained according to study generates predicate and first-order predicate rule i.e. first-order predicate logic expression formula, belongs to for single Property be directed toward the different situations that an attribute and multiple attributes be directed toward an attribute, formulate dependence respectively and be converted to first-order predicate The transformation rule of logical expression.
4. the unsupervised data auto-cleaning method of one kind according to claim 3, which is characterized in that the step B2 In;
When being directed toward an attribute for single attribute, i.e. attribute A1And A2Between there are a directed edge and from A1It is directed toward A2, then will A1And A2Between dependence form turn to following first-order predicate logic:
Wherein v is tuple id1And id2A attribute value;
When being directed toward an attribute for multiple attributes, attribute A1、A2、…、AiIt is directed toward A simultaneouslyj, then its dependence form turn to as Under first-order predicate logic:
Wherein, v1、v2、…、viIt is tuple id1And id2In attribute A1、A2、…、AiOn attribute value.
5. the unsupervised data auto-cleaning method of one kind according to claim 3, which is characterized in that the step C tool Body includes:
C1. the first-order predicate rule of generation is distinguished, is divided into absolute rule and non-absolute rule;
C2. the calculating of weight is carried out to first-order predicate logic, including formulates difference respectively for absolute rule and non-absolute rule Weight calculation strategy, wherein be positive infinity to the weight assignment of absolute rule, non-absolute rule calculated using mutual information The weight of these rules;
C3. the first-order predicate generated according to step B2 is regular, the mutual information computation rule between the rule-based attribute being related to Weight;
C4. according to the weight calculation in step C3 as a result, obtaining the markov of the data set after raw data set or sampling Logical network.
6. the unsupervised data auto-cleaning method of one kind according to claim 5, which is characterized in that the step C3 tool Body includes:
C3.1 is related to the different situations of two attributes and multiple attributes for a first-order predicate logic rule, formulates respectively not Same regular weighing computation method;Wherein,
The case where being related to two attributes for a first-order predicate logic rule, using two attributes in raw data set or Mutual information on data set after sampling carries out the calculating of regular weight;
The mutual information is the real number of a value range between zero and one, if attribute is perfectly correlated, mutual information 1, such as Fruit is completely uncorrelated, then mutual information is 0;
C3.2 introduces exponential function and is calculated when carrying out the calculating of regular weight, it is ensured that weights are not less than 0 Number.
7. the unsupervised data auto-cleaning method of one kind according to claim 6, which is characterized in that the step D tool Body includes:
D1. it is made inferences based on the step C4 Markov logical network generated, using in Markov chain Monte-Carlo Gibbs sampling method carry out rule-based reasoning, according to Markov logical network generate gibbs sampler reasoning rule, really Determine the weight of gibbs sampler inference rule;
D2. gibbs sampler inference pattern is constructed, usage factor figure determines in inference pattern as gibbs sampler inference pattern The variable and the factor of factor graph, wherein the factor is for assessing the relationship between variable;
D3. according to the possible world of the step B2 predicate constructed variable generated;
D4. it is made inferences in the possible world of the predicate of step D3 according to the inference pattern that step D2 is constructed;
D5. based on step D4 reasoning as a result, being cleaned, being repaired to raw data set.
8. the unsupervised data auto-cleaning method of one kind according to claim 7, which is characterized in that in the step D5 It is to select it is expected maximum value as the value after repairing when being repaired.
CN201811325335.5A 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method Active CN109491991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811325335.5A CN109491991B (en) 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811325335.5A CN109491991B (en) 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method

Publications (2)

Publication Number Publication Date
CN109491991A true CN109491991A (en) 2019-03-19
CN109491991B CN109491991B (en) 2022-03-01

Family

ID=65695410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811325335.5A Active CN109491991B (en) 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method

Country Status (1)

Country Link
CN (1) CN109491991B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610541A (en) * 2024-01-17 2024-02-27 之江实验室 Author disambiguation method and device for large-scale data and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046559A (en) * 2015-09-10 2015-11-11 河海大学 Bayesian network and mutual information-based client credit scoring method
KR20160115515A (en) * 2015-03-27 2016-10-06 금오공과대학교 산학협력단 A user behavior prediction System and Method for using mobile-based Life log
CN106094744A (en) * 2016-06-04 2016-11-09 上海大学 The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining
CN106528634A (en) * 2016-10-11 2017-03-22 武汉理工大学 Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
CN108304668A (en) * 2018-02-11 2018-07-20 河海大学 A kind of Forecasting Flood method of combination hydrologic process data and history priori data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160115515A (en) * 2015-03-27 2016-10-06 금오공과대학교 산학협력단 A user behavior prediction System and Method for using mobile-based Life log
CN105046559A (en) * 2015-09-10 2015-11-11 河海大学 Bayesian network and mutual information-based client credit scoring method
CN106094744A (en) * 2016-06-04 2016-11-09 上海大学 The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining
CN106528634A (en) * 2016-10-11 2017-03-22 武汉理工大学 Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
CN108304668A (en) * 2018-02-11 2018-07-20 河海大学 A kind of Forecasting Flood method of combination hydrologic process data and history priori data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SUSHOVAN DE 等: "BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata", 《2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》 *
段亮: "基于概率图模型的数据清洗", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王勇: "基于一阶逻辑的知识表示与自动提取", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610541A (en) * 2024-01-17 2024-02-27 之江实验室 Author disambiguation method and device for large-scale data and readable storage medium
CN117610541B (en) * 2024-01-17 2024-06-11 之江实验室 Author disambiguation method and device for large-scale data and readable storage medium

Also Published As

Publication number Publication date
CN109491991B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
JP6176979B2 (en) Project management support system
CN110910004A (en) Reservoir dispatching rule extraction method and system with multiple uncertainties
CN112559963B (en) Dynamic parameter identification method and device for power distribution network
CN110083728B (en) Method, device and system for optimizing automatic picture data cleaning quality
CN111814342B (en) Complex equipment reliability hybrid model and construction method thereof
CN108536471A (en) A kind of software configuration important module recognition methods based on complex network
CN107247666A (en) A kind of feature based selection and the software defect number Forecasting Methodology of integrated study
CN106951963B (en) Knowledge refining method and device
CN109548029A (en) A kind of two-stage method for trust evaluation of nodes of Wireless Sensor Networks
CN115775026B (en) Federal learning method based on tissue similarity
Wang et al. On the use of time series and search based software engineering for refactoring recommendation
CN109491991A (en) A kind of unsupervised data auto-cleaning method
CN109947752A (en) A kind of automaticdata cleaning method based on DeepDive
Pedro et al. Decision-maker preference modeling in interactive multiobjective optimization
Pan et al. Black-box test-coverage analysis and test-cost reduction based on a Bayesian network model
CN117370744A (en) Dynamic cleaning method and system for abnormal power consumption data of power consumer
CN117575564A (en) Extensible infrastructure network component maintenance and transformation decision evaluation method and system
Lucchese et al. Networks cardinality estimation using order statistics
CN104360948A (en) IEC 61850 configuration file engineering consistency test method based on fuzzy algorithm
Borgelt A conditional independence algorithm for learning undirected graphical models
Munikoti et al. Bayesian graph neural network for fast identification of critical nodes in uncertain complex networks
Chan et al. Using genetic programming for developing relationship between engineering characteristics and customer requirements in new products
Ibraigheeth et al. Software reliability prediction in various software development stages
CN112699229A (en) Self-adaptive question-pushing method based on deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant