CN102346873A - Multi-standard information processing method of uncertain data - Google Patents

Multi-standard information processing method of uncertain data Download PDF

Info

Publication number
CN102346873A
CN102346873A CN2010102405413A CN201010240541A CN102346873A CN 102346873 A CN102346873 A CN 102346873A CN 2010102405413 A CN2010102405413 A CN 2010102405413A CN 201010240541 A CN201010240541 A CN 201010240541A CN 102346873 A CN102346873 A CN 102346873A
Authority
CN
China
Prior art keywords
criteria information
multiple criteria
uncertain data
wsd
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102405413A
Other languages
Chinese (zh)
Other versions
CN102346873B (en
Inventor
黄震华
向阳
张波
陈千
王栋
刘立平
伍申申
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN 201010240541 priority Critical patent/CN102346873B/en
Publication of CN102346873A publication Critical patent/CN102346873A/en
Application granted granted Critical
Publication of CN102346873B publication Critical patent/CN102346873B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a multi-standard information processing method of uncertain data, which comprises the steps that: (1) multi-standard information inquiry targeting at the uncertain data is rewritten equivalently; (2) the multi-standard information inquire on a probabilistic relationship part is optimized; (3) an inquiry optimizer generates a multi-standard information inquiry execution plan on the probabilistic relationship part; and (4) an inquiry processor inquires the multi-standard information of the uncertain data targeting at the uncertain data according to the plan generated in Step (3), and displays a result through a display. Compared with the prior art, the multi-standard information processing method on the uncertain data has the advantages that the economic benefits and the market competitiveness of an enterprise are improved, and the like.

Description

Multiple criteria information processing method on a kind of uncertain data
Technical field
The present invention relates to a kind of multiple criteria information processing method, especially relate to the multiple criteria information processing method on a kind of uncertain data.
Background technology
How express-analysis goes out Useful Information from mass data, and is an important means that improves the business economic benefit and the market competitiveness for each administration and supervision authorities of enterprise provide effective decision support.In recent years, the researchist mainly comes for enterprise high-quality decision support to be provided from on-line analytical processing (Online Analytical Processing) and these two aspects of data mining (Data Mining).Whole business data is explored and is had an X-rayed in the multidimensional on-line equiries (like Top-n inquiry, KNN inquiry, Rank inquiry, Range inquiry and iceberg inquiry etc.) that on-line analytical processing is intended to through a series of complicacies, and returns the generality information of mass data.Like this, the user just can accomplish relevant information analysis according to the generality information of small data quantity.Yet; The graduate R.Agrawal professor of the Almaden of IBM Corporation points out in SIGMOD international conference in 2000; Traditional multidimensional on-line equiries need the user that the preference weight vector (Preference Weight-Vector) on the analysis space is provided in advance, and this is impossible in practical application.Therefore, traditional multidimensional on-line equiries technology can't be applied to and not have the relevant field of weight analysis processing.Handle in order effectively to support the enterprise customer to have weight analysis, in the ICDE of calendar year 2001 international conference, professor S.Borzsonyi of German Passau university proposes the notion and the technology of multiple criteria information inquiry first.Through the domination operational character on the defined analysis space, the multiple criteria information inquiry is returned and is positioned at the object tuple of respectively arranging preface chain top in the business data.At present, the multiple criteria information query technique is widely used in Intellectual analysis, city navigation system, data mining and fields such as visual, intelligent system of defense and Geographic Information System.
Data are gathered deepening constantly of demand and improve along with enterprise, uncertain data (Uncertain Data) are paid attention to widely.In the actual industry of majority (industries such as for example advanced manufacturing, logistics, finance, telecommunications, Aero-Space); Enterprise is owing to the restriction of the factors such as fuzzy and imperfection of the precision that is subjected to data acquisition equipment, data self; Make uncertainty ubiquity in the inside data of enterprise source of data, uncertain data just play the part of crucial role.Because uncertain data need be introduced the probability distribution information and possible world instance (Possible Worlds) semanteme of relation table/attribute field; Therefore with respect to traditional relational, the uncertain data storehouse is more more complicated than traditional relational on key elements such as data model, algebraic manipulation rule, functional dependence, data storage and inquiry semanteme.Thereby the multiple criteria information query technique on the traditional relational can't directly apply on the uncertain data storehouse.For example the L.Antova professor points out in ICDE international conference in 2008: for the BNL algorithm; Analysis cost on traditional relational is the PTIME time complexity, and the cost of analysing in the MayBMS uncertain data storehouse system of Cornell University's exploitation is the coNP-Complete time complexity.
Mainly there are following four important deficiencies in multiple criteria information query technique on the uncertain data at present: (1) prior art is not considered the data representation system of uncertain data bottom; Just simply uncertain data is stored in the relation database table of same band probability distribution information, and this is unpractical in practical application.(2) prior art only designs the multiple criteria information inquiry algorithm on the uncertain data to fixing analysis space, and their employed R-trees, kd-tree and AR-to set index all are scalar types, can't expand in the application scenarios of any analysis space.Yet in practical application, the multiple criteria information inquiry of no weight is towards any customer analysis space.(3) prior art is not with in the integrated query optimizer that enters present main flow uncertain data storehouse (U-Relational database, ULDB database and UDBMS database etc.) of multiple criteria information inquiry; Therefore; When relating to the operation of algebraically such as Conf, Merge, Ujoin when the multiple criteria information inquiry on the uncertain data storehouse; The query optimizer in main flow uncertain data storehouse can not provide effective query execution plan, thereby has a strong impact on user's stand-by period.(4) prior art does not take into full account the probabilistic information counting yield of multiple criteria information object collection; The time complexity of the probabilistic information computing method that correlation technique provides is #P-Hard; Therefore in real world applications, only probabilistic information user's computing time of multiple criteria information object just can not put up with.
Summary of the invention
The object of the invention is exactly the multiple criteria information processing method that provides in order to overcome the defective that above-mentioned prior art exists on a kind of uncertain data that improves the economic benefit of enterprises and the market competitiveness.
The object of the invention can be realized through following technical scheme:
Multiple criteria information processing method on a kind of uncertain data is characterized in that, may further comprise the steps:
(1) rewrites carry out equivalence towards the multiple criteria information inquiry of uncertain data;
(2) the multiple criteria information inquiry on the probabilistic relation parts is optimized;
(3) the query optimizer generating probability is closed multiple criteria information inquiry executive plan on the based part;
(4) query processor is inquired about the multiple criteria information towards uncertain data according to the plan that step (3) generates, and the result is shown through display.
Described step (1) may further comprise the steps carry out the equivalence rewriting towards the multiple criteria information inquiry of uncertain data:
1) a relationship with an object instance is organized into a plurality of probability with a constraint condition G-Tabset information table
Figure BSA00000210437500031
;
2)
Figure BSA00000210437500032
is the number of factoring polynomials probabilistic relationships into parts, each of which the probability of the relationship between the probability of a number of components of the conjunctive table instance, where the probability of relationship widget set W = {WSD1, ..., WSDn}, WSDi as the probability of the i-th component relationships; Datalog language rules set D = {DL1, ..., DLn}, which DLi is WSDi:-Insi1 ^ ... ^ Insim, represents the probability of relationship part WSDi probabilistic relational table by the m-th instance of cooperation taken together.
It is following that described step (2) is optimized process to the multiple criteria information inquiry on the probabilistic relation parts;
After step (1), generate u multiple criteria information inquiry in the system
Figure BSA00000210437500033
Wherein each inquiry
Figure BSA00000210437500034
Input parameter be a probabilistic relation parts WSD ' i∈ W, system directly do not obtain this u probabilistic relation parts WSD ' 1..., WSD ' uOn multiple criteria information inquiry result set, but from probabilistic relation parts collection W={WSD1 ..., among the WSDn} based on the mode of cost pick out optimum v (the individual probabilistic relation parts WSD of v<u) " 1..., WSD " v, wherein through probabilistic relation parts WSD " i(the multiple criteria information object collection of 1≤i≤is v) answered WSD ' 1..., WSD ' uIn multiple criteria information inquiry on several probabilistic relation parts.
Multiple criteria information inquiry executive plan process is following on the based part of described step (3) query optimizer generating probability pass:
1) design a correct operation of multi-criteria information query and the operation of various uncertain relationship between the order of execution equivalent transformation rules;
2) query optimizer obtains multiple criteria information inquiry executive plan based on the equivalence transformation rule set.
Described step (4) query processor is following to carry out query script towards the multiple criteria information of uncertain data according to the plan that step (3) generates:
1) query processor tissue and index uncertainty relation object, the system that makes can obtain the multiple criteria information object set on any analysis space fast;
2) query processor calculates the multiple criteria information object and is integrated into the probability that exists under the possible world instance semanteme.
Compared with prior art; The multiple criteria information inquiry efficient that the present invention improves on enterprise's uncertain data is core; Design one cover is fit to semantic, the effective any analysis space demand of process user of possible world instance; And can with uncertain data storehouse products perfection device and processor seamless integrated multiple criteria information query technique and implementation algorithm thereof; For each administration and supervision authorities of enterprise provide effective decision support, thereby can improve the economic benefit of enterprises and the market competitiveness.
Description of drawings
Fig. 1 is a process flow diagram of the present invention;
Fig. 2 is a hardware configuration synoptic diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
Embodiment 1
Like Fig. 1, as shown in Figure 2, the multiple criteria information processing method on a kind of uncertain data may further comprise the steps:
Step 1: 1 pair of multiple criteria information inquiry towards uncertain data of pretreater is carried out equivalence and is rewritten.
Because the uncertain data storehouse product (like MayBMS database, ULDB database and UDBMS database etc.) of main flow is all based on the possible world instance model at present; Therefore with respect to a relationship object of traditional database, they need store the probabilistic relation table instance of exponential number.In order to facilitate management and query, these products have passed two stages to store uncertain data, in the first stage, a relationship with an object instance is organized into a plurality of probability with a constraint condition G-Tabset information table
Figure BSA00000210437500041
In the first two stages,
Figure BSA00000210437500042
is the number of factoring polynomials probabilistic relationships into parts, each of which the probability of the relationship between the probability of a number of components of the conjunctive table instance.In many real world applications, enterprise does not store large-scale probabilistic relation table instance usually, and only stores two types of equivalent data: 1. small-scale probabilistic relation parts collection W={WSD 1..., WSD n; 2. Datalog language rule collection D={DL 1..., DL n, DL wherein iForm is WSD i:-Ins I1^...^Ins Im, expression probabilistic relation parts WSD iForm by the conjunction of m probabilistic relation table instance.
When the enterprise business information was analyzed, for the ease of understanding, the input parameter that the user submits multiple criteria information inquiry SQ to is probabilistic relation table instance often, promptly shape as:
Figure BSA00000210437500043
Expressed semanteme is for obtaining input data I ns 1^...^Ins zOn multiple criteria information object set.Yet the uncertain data storehouse product of main flow is not stored large-scale probabilistic relation table instance usually; And only preserve probabilistic relation parts collection and Datalog language rule collection; Therefore in order to allow query optimizer can discern and resolve the multiple criteria information inquiry statement on the uncertain data; In enterprise's intermediate application layer, the present invention designs rewriting algorithm of equal value efficiently and converts the SQ equivalence on several probabilistic relation parts multiple criteria information inquiry.
Step 2: the multiple criteria information inquiry on the probabilistic relation parts is optimized.
In first invention step, the multiple criteria information inquiry equivalence in the conjunction of probabilistic relation table instance is rewritten as the conjunction of multiple criteria information inquiry on several probabilistic relation parts.Therefore through after of equal value the rewriting, there be u multiple criteria information inquiry in the system
Figure BSA00000210437500051
Wherein each inquiry Input parameter be a probabilistic relation parts WSD ' i∈ W.In order to return complete multiple criteria information inquiry result set, a direct way is exactly to obtain this u the multiple criteria information inquiry result set on the probabilistic relation parts respectively.Though it is comparatively simple to find that this direct processing mode implements, there are two serious performance deficiencies in it: 1. because the multiple criteria information inquiry is the CPU sensitivity, therefore this direct processing mode will spend a large amount of CPU time expenses; Therefore 2. in the application of enterprise-level, each probabilistic relation parts accounts for bigger storage space usually, calls in this u probabilistic relation parts from disk and advances internal memory and will spend a large amount of I/O expenses.
In order effectively to overcome above two performance deficiencies, need to improve this simple processing mode in the business system application layer.The present invention designs a kind of shared processing mechanism, and has proved theoretically under optimal situation, and shared processing mechanism is saved the I/O expense of the CPU time expense of 1/e ≈ 37% and (e-1)/e ≈ 63% than direct processing mode.The principle of shared processing mechanism is: system does not directly obtain this u probabilistic relation parts WSD ' 1..., WSD ' uOn multiple criteria information inquiry result set, but from probabilistic relation parts collection W={WSD 1..., among the WSDn} based on the mode of cost pick out optimum v (the individual probabilistic relation parts WSD of v<u) " 1..., WSD " v, probabilistic relation parts WSD wherein " i(the multiple criteria information object collection of 1≤i≤v) can be used for answering WSD ' 1..., WSD ' uIn multiple criteria information inquiry on several probabilistic relation parts, thereby only need less multiple criteria information inquiry number of times and probabilistic relation parts on a small scale as the input data.
Step 3: query optimizer 2 generating probabilities are closed multiple criteria information inquiry executive plan on the based part.
When the system applies layer obtains v probabilistic relation parts WSD that optimizes " 1..., WSD " vAfterwards, for wherein each probabilistic relation parts WSD " i, before the multiple criteria information object collection on obtain it, query optimizer need produce effective multiple criteria information inquiry executive plan from the logic aspect.The implementation plan needs to optimize the implementation cost perspective, to provide information on multi-criteria query
Figure BSA00000210437500053
and the uncertainty relation operations (eg Uselection, Conf, Merge and Ujoin, etc.) between the execution order.
Because existing uncertain data library inquiry optimizer does not comprise the rule of equivalence transformation between a cover multiple criteria information inquiry operation and the uncertainty relation operation execution sequence; Therefore in order to obtain correct multiple criteria information object collection; Query optimizer just is put in multiple criteria information inquiry operation the root node position of left degree of depth conjunction tree (Left-deepConjunctive Tree) simply, and based on setting the sequencing of going up each running node multiple criteria information inquiry executive plan is provided.Through analyzing; A major defect of this multiple criteria information inquiry executive plan is: because multiple criteria information inquiry operation must could be implemented after all uncertainty relation operations all are finished; Therefore system need carry out the multiple criteria information inquiry on the data of extensive interim storage, thereby the efficient that causes implementing the multiple criteria information inquiry is extremely low.
In order to effectively resolve the query optimizer these shortcomings, the present invention is to design a multi-criteria information query operations
Figure BSA00000210437500061
and the operation of various uncertain relationship between the order of execution equivalent transformation rules, as well as before and after transformation costs assessed.Simultaneously, prove the correctness of equivalence transformation rule set theoretically.On the other hand, the information in a given multi-criteria query
Figure BSA00000210437500062
and the query involves a number of uncertainties relational operations (such as Uselection, Conf, Merge and Ujoin, etc.) in the case of the present invention to modify using the equivalent transformation rule set depth left together Take the tree, thus improving the query optimizer information provided by multi-criteria query execution plan.
Step 4: query processor 3 is inquired about the multiple criteria information towards uncertain data according to the plan that step (3) generates, and the result is shown through display 4.
Query optimizer is submitted to query processor with the multiple criteria information inquiry query execution plan that generates.Afterwards, query processor is implemented multiple criteria information inquiry inquiry from physical layer according to the executive plan of formulating in advance, and obtains multiple criteria information inquiry object set and under possible world instance semanteme, have a probability.Find that multiple criteria information inquiry inquiry implementation method on the uncertain data is integrated to be advanced in the query processor if will have now, in practical application, following two problems occur to I haven't seen you for ages so:
1. existing implementation method is only to fixing analysis space, and their employed R-trees, kd-tree and AR-to set index all are scalar types.Because the index structure of scalar type is the real number value of one dimension with the coordinate Mapping of hyperspace, has therefore lost most positional informations, thereby can't expand in the application scenarios of any analysis space.
2. existing implementation method does not take into full account calculates there is probability in multiple criteria information inquiry object set under possible world instance semanteme efficient, and it all is #P-Hard that these methods are obtained the time complexity that there is probable value in multiple criteria information inquiry object set.Therefore, the time cost user that only accomplishes this work of system just can not put up with.
To be able to query processor to provide effective multi-criteria query the physical embodiment of information inquiry, given an arbitrary input data set Ψ and analyze spatial information on U multi-criteria query
Figure BSA00000210437500063
, under the premise of the present invention to devise efficient algorithms from Ψ quickly obtain information on U multi-criteria object set and its possible world semantics instance probability of presence.
Embodiment 2
When the enterprise customer on isomery uncertain data storehouse, submit the multiple criteria information inquiry SQ on any analysis space to after, the present invention is to be stored in the probabilistic relation parts collection W={WSD in the uncertain data storehouse 1..., WSD nAnd Datalog language rule collection D={DL 1..., DL nBe the center, at first SQ is carried out equivalence and rewrite, and generate the multiple criteria information inquiry on u the probabilistic relation parts, promptly Wherein
Figure BSA00000210437500072
Be probabilistic relation parts WSD ' iMultiple criteria information inquiry on the ∈ W, and u≤n.Then, use shared processing mechanism from from probabilistic relation parts collection W={WSD 1..., among the WSDn} based on the mode of cost pick out optimum v (the individual probabilistic relation parts WSD of v<u) " 1..., WSD " v, probabilistic relation parts WSD wherein " i(the multiple criteria information object collection of 1≤i≤v) can be used for answering WSD ' 1..., WSD ' uIn multiple criteria information inquiry on several probabilistic relation parts.Next, the present invention defines a rigorous proof of the correctness of equivalent transformation rule set, the rule set includes a multi-criteria query information
Figure BSA00000210437500073
and the operation of various uncertain relationship between the order of execution equivalent transformation rules, and for the most optimized Each multi-criteria information query
Figure BSA00000210437500074
Using the equivalent transformation rule set to modify the depth of conjunctive tree left to get information on the best multi-criteria query execution plan.At last, the present invention is directed to best multiple criteria information inquiry executive plan, come efficiently to generate multiple criteria information object collection and under possible world instance semanteme, to have a probability.
Submit to multiple criteria information inquiry SQ on any analysis space to generating multiple criteria information object collection and under possible world instance semanteme, having a probability from the user, enforcement of the present invention is through four steps.
For first step (promptly rewriteeing) to carry out equivalence towards the multiple criteria information inquiry of uncertain data; The present invention at first adopts Datalog instrument and first-order predicate logic to be the rewriting descriptive language, and defines the multiple criteria information inquiry formalization semanteme that rewrites of equal value on the uncertain data with this.Then, accomplish of equal value rewriting of multiple criteria information inquiry of uncertain data through two stages.In the phase one, the present invention is based on Datalog language rule collection D, the inverted rules technology of using professor A.Levy of Princeton University to propose is fallen the irrelevant probabilistic relation component set M with inquiry SQ at polynomial time complexity inner filtration; And in subordinate phase; The present invention confirms that at first the minimum probability that is used for rewritten query SQ of equal value concerns components number u; Then to probabilistic relation parts collection W-M; With the predicate isomorphism is filtering characteristic; Utilize all radixes of Apriori characteristic acquisition to equal candidate's probabilistic relation parts subclass of u, and therefrom choose any one satisfy subclass Ω ' that rewrites the extension condition of equal value=WSD ' 1..., WSD ' uAs the output result of algorithm, promptly
Figure BSA00000210437500075
For second step (promptly the multiple criteria information inquiry on the probabilistic relation parts being optimized), implement efficient in order to improve it, the present invention divides two kinds of situation to consider to u.(i) when u≤6, system need spend less cost to accomplish the more task of criteria information query optimization usually.In this case; The present invention at first constructs the oriented bigraph (bipartite graph) of weighting; Probabilistic relation parts collection W and S are mapped as the vertex set of bigraph (bipartite graph),, the cost information of deriving between each probabilistic relation parts are mapped as on the vertex set and limit collection of bigraph (bipartite graph) simultaneously based on multiple criteria information inquiry cost model.Then, the present invention covers (Minimum Weighted Set Cover) problem with the minimum weight collection that the multiple criteria information inquiry optimization problem equivalence on the probabilistic relation parts converts on the bigraph (bipartite graph), comes from W, accurately to obtain v optimum probabilistic relation parts.(ii) can know according to the figure theory; The theoretical time complexity of minimum weight collection covering problem is NPC, therefore, and when u >=6; The scale of the oriented bigraph (bipartite graph) of weighting will expand rapidly, and this moment, system need spend the task that bigger cost is accomplished multiple criteria information inquiry optimization.In this case, the present invention is based on figure shortest path first theory, at first, in the constant time complexity, convert the oriented bigraph (bipartite graph) of weighting of being constructed in (i) into Steiner weight path figure through introducing a virtual vertex.Then, select at the OLAP Materialized View on the basis of technology, the present invention is the oriented Steiner tree among the generation pass figure in the polynomial time complexity, and then obtains the approximate optimal solution of multiple criteria information inquiry optimization problem.According to oriented Steiner tree theory, the time complexity PT of approximate data with optimize lower bound OB and can adjust and weigh through being not less than 1 positive number.
For the third step (the query optimizer to generate probabilistic relationships part information on multi-criteria query execution plan) of the present invention to complete two major areas of work: (i) design a proper multi-criteria information query operation with a variety of non- operation is performed to determine the relationship between sequence equivalent transformation rules; (ii) based on equivalent transformation rule set to obtain valid information on multi-criteria query execution plan.In (i); The present invention is the basis with the relational algebra theory in the relational database; Design the equivalence transformation rule of 39 correctness through strict proof; These rule definitions multiple criteria information inquiry operation operate the operational rule that is satisfied between (like Uselection, Merge, UProjection and Ujoin etc.) with various uncertainty relation; Like law of commutation, law of association, grouping rule and repeated elimination rule etc., and support the equivalence transformation between the different operating execution sequence through these operational rules.In (ii); The simple multiple criteria information inquiry executive plan that the present invention provides query optimizer is as optimizing basic point; Utilize different equivalence transformation rules; On left degree of depth conjunction tree, through on move/strategies such as push-down operation node, merging/splitting operation node and map function node generate various candidate operations and carry out sequence.Then, calculate the time overhead of each candidate sequence, and therefrom choose the multiple criteria information inquiry executive plan that an optimal sequence produces the cost minimum based on multiple criteria information inquiry cost model evaluation device.
(query processor is inquired about the multiple criteria information towards uncertain data according to the plan that step (3) generates for the 4th step; And the result shown through display); The present invention also mainly accomplishes the work of two aspects: (i) effectively organize and index uncertainty relation object, the system that makes can obtain the multiple criteria information object set on any analysis space fast; (ii) efficient calculation multiple criteria information object is integrated into the probability that exists under the possible world instance semanteme.In (i), the present invention designs regular grid structure (Regular Grid Index) tissue and index uncertainty relation object, and this regular grid index structure can satisfy the multiple criteria information inquiry on any analysis space.Significantly reduce the time overhead that obtains the set of multiple criteria information object through two stages then:, the present invention is based on minimum description length (MDL:Minimal Description Length) principle and come the uninterested possible world instance of user in the regular grid of automatic deletion in the phase one; And in subordinate phase, the present invention utilizes that the domination between cell and mutex relation reduce the number of comparisons between the possible world instance in the regular grid.In (ii), the present invention will calculate the multiple criteria information object and be integrated into and exist the probability equivalence to convert into to calculate the true assignment number of DNF normal form under the possible world instance semanteme, and design two kinds of diverse ways and solve this problem.First method utilizes the Davis-Putnam function of artificial intelligence field accurately to obtain the true assignment number of DNF normal form.A given DNF normal form, the present invention turns to a plurality of sub-DNF normal forms independent and not shared variable with this normal form Davis-Putnam, and comes the true assignment number of precise statistics through recursive mode.And second method is used the Karp-Luby random algorithm, in polynomial time, obtains to have the true assignment number approximate value that the precision lower bound guarantees.The Karp-Luby random algorithm is confirmed the approximate value of true assignment number based on covering holder Caro (Monte Carlo) thought through the random simulation in N step.

Claims (5)

1. the multiple criteria information processing method on the uncertain data is characterized in that, may further comprise the steps:
(1) rewrites carry out equivalence towards the multiple criteria information inquiry of uncertain data;
(2) the multiple criteria information inquiry on the probabilistic relation parts is optimized;
(3) the query optimizer generating probability is closed multiple criteria information inquiry executive plan on the based part;
(4) query processor is inquired about the multiple criteria information towards uncertain data according to the plan that step (3) generates, and the result is shown through display.
2. the multiple criteria information processing method on a kind of uncertain data according to claim 1 is characterized in that, described step (1) may further comprise the steps carry out the equivalence rewriting towards the multiple criteria information inquiry of uncertain data:
1) a relationship with an object instance is organized into a plurality of probability with a constraint condition G-Tabset information table
Figure FSA00000210437400011
2)
Figure FSA00000210437400012
is the number of factoring polynomials probabilistic relationships into parts, each of which the probability of the relationship between the probability of a number of components of the conjunctive table instance, where the probability of relationship widget set W = {WSD1, ..., WSDn}, WSDi relation to the probability of the i-th component; Datalog language rules set D = {DL1, ..., DLn}, which DLi is WSDi:-Insi1 ^ ... ^ Insim, represents the probability of relationship part WSDi consists of m examples conjunctive probability from relational tables.
3. the multiple criteria information processing method on a kind of uncertain data according to claim 1 is characterized in that, it is following that described step (2) is optimized process to the multiple criteria information inquiry on the probabilistic relation parts;
After step (1), generate u multiple criteria information inquiry in the system
Figure FSA00000210437400013
Wherein each inquiry
Figure FSA00000210437400014
Input parameter be a probabilistic relation parts WSD ' i∈ W, system directly do not obtain this u probabilistic relation parts WSD ' 1..., WSD ' uOn multiple criteria information inquiry result set, but from probabilistic relation parts collection W={WSD1 ..., among the WSDn} based on the mode of cost pick out optimum v (the individual probabilistic relation parts WSD of v<u) " 1..., WSD " v, wherein through probabilistic relation parts WSD " i(the multiple criteria information object collection of 1≤i≤is v) answered WSD ' 1..., WSD ' uIn multiple criteria information inquiry on several probabilistic relation parts.
4. the multiple criteria information processing method on a kind of uncertain data according to claim 1 is characterized in that, multiple criteria information inquiry executive plan process is following on the based part of described step (3) query optimizer generating probability pass:
1) design a correct operation of multi-criteria information query
Figure FSA00000210437400015
and the operation of various uncertain relationship between the order of execution equivalent transformation rules;
2) query optimizer obtains multiple criteria information inquiry executive plan based on the equivalence transformation rule set.
5. the multiple criteria information processing method on a kind of uncertain data according to claim 1 is characterized in that, described step (4) query processor is following to carry out query script towards the multiple criteria information of uncertain data according to the plan that step (3) generates:
1) query processor tissue and index uncertainty relation object, the system that makes can obtain the multiple criteria information object set on any analysis space fast;
2) query processor calculates the multiple criteria information object and is integrated into the probability that exists under the possible world instance semanteme.
CN 201010240541 2010-07-29 2010-07-29 Multi-standard information processing method of uncertain data Expired - Fee Related CN102346873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010240541 CN102346873B (en) 2010-07-29 2010-07-29 Multi-standard information processing method of uncertain data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010240541 CN102346873B (en) 2010-07-29 2010-07-29 Multi-standard information processing method of uncertain data

Publications (2)

Publication Number Publication Date
CN102346873A true CN102346873A (en) 2012-02-08
CN102346873B CN102346873B (en) 2013-08-14

Family

ID=45545526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010240541 Expired - Fee Related CN102346873B (en) 2010-07-29 2010-07-29 Multi-standard information processing method of uncertain data

Country Status (1)

Country Link
CN (1) CN102346873B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117297A1 (en) * 2013-01-31 2014-08-07 Hewlett-Packard Development Company, L.P. Approximate query processing
CN104750860A (en) * 2015-04-16 2015-07-01 东北大学 Data storage method of uncertain data
CN105302858A (en) * 2015-09-18 2016-02-03 北京国电通网络技术有限公司 Distributed database system node-spanning check optimization method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905727A (en) * 2006-08-16 2007-01-31 亿阳信通股份有限公司 Method for network managing resource message access

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905727A (en) * 2006-08-16 2007-01-31 亿阳信通股份有限公司 Method for network managing resource message access

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周傲英等: "不确定性数据管理技术研究综述", 《计算机学报》 *
周逊等: "不确定数据上两种查询的分布式聚集算法", 《计算机研究与发展》 *
崔斌等: "基于不确定数据的查询处理综述", 《计算机应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117297A1 (en) * 2013-01-31 2014-08-07 Hewlett-Packard Development Company, L.P. Approximate query processing
CN104750860A (en) * 2015-04-16 2015-07-01 东北大学 Data storage method of uncertain data
CN104750860B (en) * 2015-04-16 2017-11-10 东北大学 A kind of date storage method of uncertain data
CN105302858A (en) * 2015-09-18 2016-02-03 北京国电通网络技术有限公司 Distributed database system node-spanning check optimization method and system
CN105302858B (en) * 2015-09-18 2019-02-05 北京国电通网络技术有限公司 A kind of the cross-node enquiring and optimizing method and system of distributed data base system

Also Published As

Publication number Publication date
CN102346873B (en) 2013-08-14

Similar Documents

Publication Publication Date Title
Zhou et al. A learned query rewrite system using monte carlo tree search
Kathuria et al. Batched gaussian process bandit optimization via determinantal point processes
Singh et al. Orion 2.0: native support for uncertain data
Bicevska et al. Towards NoSQL-based data warehouse solutions
US20090077010A1 (en) Optimization of Database Queries Including Grouped Aggregation Functions
US7324991B1 (en) Sampling in a multidimensional database
US9146960B2 (en) Adaptive optimization of iterative or recursive query execution by database systems
Kennedy et al. PIP: A database system for great and small expectations
US20120117054A1 (en) Query Analysis in a Database
CN104700190B (en) One kind is for project and the matched method and apparatus of professional
US9720966B2 (en) Cardinality estimation for optimization of recursive or iterative database queries by databases
US20170031980A1 (en) Visual Aggregation Modeler System and Method for Performance Analysis and Optimization of Databases
Olteanu et al. F: Regression models over factorized views
US20180336262A1 (en) Geometric approach to predicate selectivity
US9152670B2 (en) Estimating number of iterations or self joins required to evaluate iterative or recursive database queries
Arvanitis et al. Efficient influence-based processing of market research queries
Chanson et al. The traveling analyst problem: definition and preliminary study
Fischer et al. F2DB: the flash-forward database system
CN102346873B (en) Multi-standard information processing method of uncertain data
CN110874366B (en) Data processing and inquiring method and device
Fischer et al. Offline design tuning for hierarchies of forecast models
US9268817B2 (en) Efficient evaluation of hierarchical cubes by non-blocking rollups and skipping levels
Bicevska et al. NoSQL-based data warehouse solutions: sense, benefits and prerequisites
Singh et al. The Orion Uncertain Data Management System.
CN103714095A (en) Multidimensional profile calculation data processing method being oriented to fuzzy databases

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130814

Termination date: 20160729

CF01 Termination of patent right due to non-payment of annual fee