CN104239581A - Database-system-oriented replicated data provenance tracing method - Google Patents

Database-system-oriented replicated data provenance tracing method Download PDF

Info

Publication number
CN104239581A
CN104239581A CN201410539143.XA CN201410539143A CN104239581A CN 104239581 A CN104239581 A CN 104239581A CN 201410539143 A CN201410539143 A CN 201410539143A CN 104239581 A CN104239581 A CN 104239581A
Authority
CN
China
Prior art keywords
origin
copy
copying
inquiry
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410539143.XA
Other languages
Chinese (zh)
Inventor
许国艳
胡煜欣
罗章璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201410539143.XA priority Critical patent/CN104239581A/en
Publication of CN104239581A publication Critical patent/CN104239581A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a database-system-oriented replicated data provenance tracing method. The method comprises the steps of (1) designing a data provenance tracing semantic model PC-CS suitable for replicated data in a database system and classifying the semantic model into four types for definition according to different replication conditions to solve the problem that replicated data provenance information can not be well expressed by means of a semantic model in a Perm; (2) designing an inquiry rewriting rule under the PC-CS model based on relational schema representation of the semantic model PC-CS, and introducing a replication expression for extension specifically; (3) rewriting an ordinary inquiry into a new inquiry containing PC-CS provenance based on the designed rewriting rule, so that tracing of replicated data provenance is achieved. According to the method, the completeness of semanteme and saving of storage space are fully considered, and the requirement of a user for tracing replicated data provenance in a database inquiry can be met.

Description

A kind of data base-oriented system copy data origin method for tracing
Technical field
The present invention relates to data origin and follow the trail of field, more specifically, relate in database the data origin copying data and follow the trail of field, propose a kind of data base-oriented system, based on query rewrite copy data origin follow the trail of method, specifically completing and copy the design that data origin follows the trail of semantic model PC-CS and the query rewrite rule based on this model, realizing the tracking to copying data origin in Database Systems.
Background technology
Data origin is the information of the whole history to data processing, comprises the source of data and processes all follow-up process of these data.B.Glavic and G.Alonso proposes and utilizes query rewrite to realize the Perm system (Provenance extension of the relational model) of origin tracking in Database Systems.Data origin Formal Representation is become relation schema by system, add in relation schema by query rewrite rule by playing source information again, obtain a corresponding new inquiry, perform the result that this new inquiry obtains and both include the Query Result the same with given inquiry, also add data origin information, realize the tracking of data origin with this.In Perm, devise a kind of origin semantic model PI-CS (Perm Influence Contribution Semantics), carry out data of description origin with relation schema form.PI-CS model belongs to influence-CS model, and the origin namely exporting data item refers to as all input data be used in this data item, as long as this input data item creates impact to output data item just think that it belongs to the origin exporting data item.In PI-CS model, define more operational character, data origin can be expressed as unified relation schema, achieves equally with data to be optimized and to inquire about.
But PI-CS model is not distinguished for the origin situation exporting data item in Perm, Output rusults can not be specified by which input data Replica.
Summary of the invention
Goal of the invention: the deficiency clearly can not expressing copy-CS class origin for PI-CS model in Perm, the invention provides a kind of based on query rewrite, be applicable to Perm, the data origin method for tracing copying data origin can be expressed.First method provides data origin and follows the trail of semantic model PC-CS, specifically provides in Database Systems and copies data origin classification of type and definition; Then represent on basis in the relation schema of semantic model PC-CS, under designing PC-CS model, query rewrite is regular, and concrete introducing copies expression formula and carries out expansion realization; Finally based on the rewriting rule of design, be the new inquiry of band PC-CS origin by common query rewrite, perform new inquiry and realize the tracking copying data origin.
Technical scheme: a kind of data base-oriented system copy data origin method for tracing, comprising:
PC-CS model:
Take PI-CS as model basis, introduce the concept copying mapping, devise the semantic model PC-CS that can describe copy-CS class origin, its definition is provided to different types.
1, origin classification in PC-CS
In query script, according to being that input data whole tuple has been carried out some attribute item copying or input in data tuple and copied, can being divided into and copying (complete copy) and partial replication (partial copy) completely; According to being to input directly the copying or indirectly copying of data, can being divided into and directly copying (direct copy) and transmission copies (transitive copy).So, in PC-CS model, data origin is divided into four kinds of situations: Complete-Direct-Copy-CS model (being called for short CDC-CS), Partial-Direct-Copy-CS model (being called for short PDC-CS), Complete-Transitive-Copy-CS model (being called for short CTC-CS) and Partial-Transitive-Copy-CS model (being called for short PTC-CS).
2, the definition of PC-CS model
If satisfied condition:
So evidence collection CD/CT (q, t) is just called the CDC-CS/CTC-CS origin of result tuple t under inquiry q;
If satisfied condition:
So evidence collection PD/PT (q, t) is just called the PDC-CS/PTC-CS origin of result tuple t under inquiry q.
Wherein q represents inquiry, a represents an attribute of inquiry Q input item, T represents result tuple, w represents the evidence collection of result tuple T, CM () represents one and copies mapping, and PI (q, t) represents the PI-CS origin of result tuple t under inquiry q, w' represents the PI-CS evidence collection of result tuple T, w'[i] represent i-th element in evidence w'.
Query rewrite rule under PC-CS model
First relation schema expression is carried out to PC-CS origin, then according to directly to copy map and transmit copy Mapping Design go out its separately copy expression formula, then introduce and copy expression formula and add in PI-CS rewriting rule, obtain PC-CS rewriting rule, finally Correctness Analysis has been carried out to PC-CS rewriting rule.
1, the relation schema of PC-CS origin represents
For at fundamental relation R 1..., R non the relation schema Q of PC-CS data origin of inquiry q, a q cD/CT/PD/PTbe expressed as:
Q CD/CT/PD/PT={(t,w[1]',...,w[n]') m|t p∈Q∧w m∈CD(q,t)/CT(q,t)/PD(q,t)/PT(q,t)}
w [ i ] ′ = w [ i ] if w [ i ] ≠ ⊥ null ( R i ) else
Wherein: w [i] represents i-th element in evidence w.CD (q, t) the CDC-CS origin of result tuple t under inquiry q is represented, CT (q, t) the CTC-CS origin of result tuple t under inquiry q is represented, PD (q, t) represent the PDC-CS origin of result tuple t under inquiry q, PT (q, t) represents the PTC-CS origin of result tuple t under inquiry q.Q cD/CT/PD/PTrepresent inquiry q four kinds of Provenance relation patterns, for each result tuple t and evidence w, by tuple (t, w [1] ' ..., w [n] ') add Q to cD/CT/PD/PTin.
What 2, PC-CS originated from copies expression formula
Before definition PC-CS rewriting rule, use C (q) C *(a i, x) define and copy expression formula CM (q), copy expression formula CM (q) for representing tracking inquiry q being copied to mapping, wherein C (q) represents that inquiry q's copies attribute mapping table, C* (a i, what x) represent projection expression copies mapping, for a projection Π a(q), if y → z one of expresses in A, projection input item attribute a ithe mapping that copies comprise y, so z is also comprised in a icopy in mapping.The present invention provide respectively inquiry q under each operational character directly copy and transmission copies expression formula CM (q).
3, PC-CS origin query rewrite rule
PC-CS rewriting rule is introduced and is copied expression formula and obtain on the basis of PI-CS rule, belongs to the expansion of PI-CS model.According to CM (q) definition and PI-CS rewriting rule, in rewriting, increase an outermost projection, in this projection, contain C (q) attribute for storage replication evidence collection.
The operational character C (comprising CD/CT/PD/PT) that PC-CS origin rewrites: represent and an inquiry q is mapped to a rewritten query q CD / CT / PD / PT = Π Q , P * ( q C ) ( q C ) . Wherein Q represents the result of calculation of q, P *q () represents the projection expression of q, symbol C represents the operational character that PC-CS origin rewrites.
4, PC-CS rewriting rule Correctness Analysis
Known according to the definition of PC-CS model, the PC-CS origin set of inquiry q refers to that the PI-CS origin set removing of inquiry q does not have to participate in the PI-CS origin set copied, and wherein participates in the origin set copied and is obtained by the expression formula (CM (q)) that copies of q.From the different of PI-CS rewriting rule, PC-CS rewriting rule is that inquiring about q copies expression formula CM (q).
So PC-CS rewriting rule Correctness Analysis, what can be converted into analysis and consult q copies the correctness mapping expression formula CM (q) and express, attribute a in the base table of i.e. analysis and consult q, the result tuple t of inquiry q, whether the evidence collection that the PI-CS of inquiry q and result tuple t originates from PI (q, t) satisfies condition (1):
Π C(a)(q C)=CM(q,a,w,t) (1)
Wherein q cobtained by the expression formula CM (q) that copies of q, CM (q, a, w, t) is for copying the definition expression of mapping.
The data origin that copies based on rewriting rule follows the trail of realization
The method that the present invention follows the trail of according to the origin copying data of the data base-oriented system proposed, Based PC-CS query rewrite rule, devises origin and follows the trail of prototype system framework and follow the trail of the algorithmic function of main modular, and analyze respective algorithms.
The present invention adopts technique scheme, has following beneficial effect:
Experiment shows, the origin method for tracing that the present invention proposes, and has lower storage demand and the impact lower on original system, can meet the demand copying data origin tracking of information.
Accompanying drawing explanation
Fig. 1 is for directly copying each operational character of origin semantic model;
Fig. 2 is that transmission copies each operational character of origin semantic model;
Fig. 3 is for directly copying expression formula definition;
Fig. 4 is that transmission copies expression formula definition;
Fig. 5 is PC-CS model manipulation symbol rewriting rule;
Fig. 6 is origin tracing system frame diagram;
Fig. 7 is traverseQueryTree function false code;
Fig. 8 is generateCM function false code;
Fig. 9 is rewriteQueryNodeCopy function false code;
Figure 10 is origin tracing module primary function and flow process;
Figure 11 is data relationship mode configuration and query SQL statement;
Figure 12 is two kinds of query statement Comparative result;
Figure 13 is that the present invention contrasts in attribute item and perm system;
Figure 14 is that the present invention contrasts at tuple item and perm system.
Embodiment
Below in conjunction with specific embodiment, illustrate the present invention further, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to the various equivalent form of value of the present invention has all fallen within the application's claims limited range.
PC-CS model copy expression formula operational character
According to the classification of PC-CS model, from direct reconstructed model (CDC-CS/PDC-CS) with transmit reconstructed model (CTC-CS/PTC-CS) two parts, provide and directly copy and transmit copying to map and expressing of each operational character when copying.
In query expression, operational character comprises sky atom operation symbol, unary operator, attended operation symbol, set operation symbol, wherein unary operator comprises projection, selection, assembles, attended operation symbol comprises Nature Link, left outside connection, right outside connection, complete outer connection, and set operation symbol comprises union, ships calculation, difference operation.Represent system based on Perm relational algebra, PC-CS model directly copies and transmits copying of each operational character when copying and maps expression formula definition respectively as depicted in figs. 1 and 2.
PC-CS query rewrite Design with Rule
What 1, PC-CS originated from copies expression formula
The present invention uses C (q) C *(a i, x) defining and copy expression formula CM (q), copying expression formula CM (q) for representing tracking inquiry q being copied to mapping.Projection expression copy mapping time, introduce C *(a i, x) represent, for a projection Π a(q), if y → z one of expresses in A, projection input item attribute a ithe mapping that copies comprise y, so z is also comprised in a icopy in mapping, this situation is represented as: copy expression formula to obtain according to copying mapping, with C (q) C *(a i, x) represent.Provide respectively below inquiry q under each operational character directly copy and transmission copies the definition of expression formula CM (q).
Represent system based on Perm relational algebra, PC-CS model directly copies expression formula definition and transmission copies expression formula definition as shown in Figure 3 and Figure 4.
2, PC-CS rewriting rule
PC-CS rewriting rule is introduced and is copied expression formula CM (q) and obtain on the basis of PI-CS rule, belongs to the expansion of PI-CS model.PC-CS semantic model rewriting rule designed by the present invention as shown in Figure 5.
3, PC-CS rewriting rule Correctness Analysis
Known according to the definition of PC-CS model, the PC-CS blood lineage set of inquiry q refers to that the PI-CS blood lineage of q gathers removing and do not have to participate in the PI-CS blood lineage set copied, and wherein participates in the blood lineage's set copied and is obtained by the expression formula (CM (q)) that copies of q.From the different of PI-CS rewriting rule, PC-CS rewriting rule is that inquiring about q copies expression formula CM (q).
Therefore copying to be mapping through and copy expression formula CM (q) and express correctness at this analysis and consult q, attribute a in the base table of i.e. analysis and consult q, the result tuple t of inquiry q, whether the evidence collection in PI-CS blood lineage PI (q, t) of inquiry q and result tuple t satisfies condition (1):
Π C(a)(q C)=CM(q,a,w,t) (1)
Wherein q cobtained by the expression formula CM (q) that copies of q, CM (q, a, w, t) is for copying the definition expression of mapping.
Analytic process is as follows:
(1)q=R
Π C ( a ) ( q C ) = Π C ( a ) ( R C ) = Π C ( a ) ( Π R , R → N ( R ) , CM ( q ) ( R ) ) = Π C ( a ) ( Π R , R → N ( R ) , { a 1 } → C ( a 1 ) , . . . , { a n } → C ( a n ) ( R ) ) = { a } = CM ( q , a , w , t ) .
Obvious condition (1) is set up.
(2) operation is selected directly to copy q=σ in situation c(q 1)
Select operation σ c(q 1) copy and map CM (σ c(q 1)) with first copy mapping and select σ again c(CM (q 1)) the same, the expression after removing selection is similar with proof q=R.
(3) operation transmission is selected to copy q=σ in situation c(q 1)
Select operation σ c(q 1) transmission copy CM (σ c(q 1)) comprise C (q 1) in C (a), wherein C (a) satisfies condition and the set of CM (q, a, t, w) comprises wherein corresponding CM (the σ of condition c(q 1)) in (x=y) of if condition, y ∈ CM (q 1, a, w, t) and y ∈ C (a) in corresponding if condition, all contain such element x in two set, illustrates that two kinds are expressed mutually corresponding, condition (1) establishment.
(4) q=Π in projection copying's situation a(q 1)
Projection operation ∏ a(q 1) copy CM (Π a(q 1)) in if what represent that y is comprised in inquiry q1 input attributes item a copies in mapping, so rename ythe mapping that copies that → z is projected in attribute a comprises z, the corresponding CM (Π of this situation a(q 1), a, w, t) in expression formula { x| (b → x) ∈ A ∧ b ∈ CM (q 1, a, w, y) and ∧ y.A=t}.CM (Π a(q 1)) in represent the projection situation of the expression formula C that has ready conditions, the corresponding CM (Π of this situation a(q 1), a, w, t) in expression formula with situation, express mutually corresponding for two kinds, condition (1) is set up.
(5) q=α in aggregation operator situation g, agg(q 1)
The copying of aggregation operator maps copies removing aggregation operator input item the part attribute not belonging to grouping in mapping, and copy expression formula by the packet attributes set in expression formula and aggregation operator that copies of input item and do to occur simultaneously to obtain, so condition (1) is set up.
(6) under Nature Link operation directly copies situation
Directly the copying of Nature Link maps and simply will connect copying to map and making union operation and obtain of each input item, Nature Link copy expression formula be by thus input item copy expression formula combine come, express the same for such two kinds, condition (1) is set up.
(7) under Nature Link operation transmission copies situation
The transmission of Nature Link copies to map and comprises from the attribute directly copied and the part attribute meeting alternative condition C.The copying of part attribute of condition C maps and copies expression formula and select to operate σ c(q 1) copy and map with to copy expression formula the same, therefore, obtain operating the same result with selecting, what the transmission of Nature Link copied copy, and to map with copying expression formula be of equal value, condition (1) establishment.
(8) in left outside attended operation situation
The copying of left outside connection map the attribute that comprises and Nature Link copy map the attribute comprised remove a part of attribute after the same, the attribute wherein removed is from input item on the right of connector, and the evidence collection w only eligible C of this input item.Although copy expression formula not clear and definite detecting this condition, if w does not meet condition C, so will compose as null value from the attribute of the expression w rewriteeing the right in the result tuple rewriteeing left outside connector.Therefore, copying of left outside connection maps and copies expression equivalence, and condition (1) is set up.
(9) in right outside attended operation situation
The copying of right outside connection map the attribute that comprises and Nature Link copy map the attribute comprised remove a part of attribute after the same, the attribute wherein removed is from connector left side input item, and the evidence collection w only eligible C of this input item.Although copy expression formula not clear and definite detecting this condition, if w does not meet condition C, so will compose as null value from the attribute of the expression w rewriteeing the left side in the result tuple rewriteeing right outside connector.Therefore, copying of right outside connection maps and copies expression equivalence, and condition (1) is set up.
(10) in complete outer attended operation situation
Be the comprehensive of (8) and (9) in complete outer attended operation situation, analytic process is similar, and condition (1) is set up.
(11) q=q and under operational circumstances 1∪ q 2
And operation directly copies and transmits, and copying of copying map is that union is carried out in the mapping that copies of input item, if then remove input item evidence collection w [q i] be ⊥ w [q i] time input item uncared-for attribute.And operation copy expression formula be by and the mapping that copies operating input item be combined.If evidence collection w is [q i] be ⊥ w [q i] time, rewrite q and will produce input tuple, the attribute wherein exporting tuple comprises P (q) and composes as null value ε with C (q), C (q).Like this, attribute C (a) comprises a and copies mapping set attribute.Therefore, condition (1) is set up.
(12) q=q under friendship operational circumstances 1∩ q 2
It is that union is carried out in the mapping that copies of all input items of friendship operation that friendship operation directly copies and transmit the mapping that copies copied.The expression formula that copies handing over operation is combined the expression formula that copies of all input items.Therefore the expression and a that produce of C (a) attribute to copy mapping set the same, condition (1) establishment.
(13) q=q under operational circumstances is differed from 1-q 2
Difference operation q copies the input item q that mapping is exactly the difference operation symbol left side 1copy mapping.The expression formula that copies of difference operation is exactly by q 1the merging copying attribute and null value, so difference operation copy map with copy expression equivalence, condition (1) set up.
In summary, all satisfy condition (1) in situation (1) to situation (13), therefore the design of the PC-CS rewriting rule of various operational character is all correct.
The data origin that copies based on rewriting rule follows the trail of realization
It is realize on the basis of PostgreSQL PostgreSQL database that origin of the present invention follows the trail of prototype system, and belonging to is the expansion of PostgreSQL database.Fig. 6 is whole flow process framework, follows the trail of in prototype system in origin of the present invention, and follow the trail of if need to carry out origin, the query statement so following the trail of origin enters origin tracing module, carries out query rewrite and obtains query tree.Origin tracing module utilizes PC-CS data origin rewriting rule, carries out to query tree the query rewrite followed the trail of that originates from, and after one that obtains new query rewrite tree, then enters into and optimizes module and generate executive plan and perform.For whole flow process, the process that query tree carries out rewriteeing is exactly the process recorded the data origin of this inquiry.
1, primary function in the tracing module that originates from
(1) traverseQueryTree function
The program entry of origin tracing module is traverseQueryTree function, it be input as query tree (i.e. structure QueryNode), export as the new query tree after query rewrite, the function realized is: each query node of recursive traversal input inquiry tree once, checks that whether each query node is by origin overwrite flags.If be not rewritten mark, then query node is rewritten; Be labeled, recursive traversal inquires about the node in the scope table of this node, until no longer there is query node in scope table.
TraverseQueryTree function, first the subquery no matter in query node q scope table, directly call rewriteCopyQueryNode function to rewrite node q, then just recursive call rewriteCopyQueryNode function rewrites direct subquery.In such rewrite process, not do not adopt recursion method from bottom to top and what select is top-down recursion method, because this method does not change the store status of original query tree in memory device.Such as, for an aggregation operator in projection operation, when adopting rewrite method from bottom to top, what first carry out rewriteeing is projection operation.But when carrying out rewriting aggregation operator, source projection operation is again required, like this, extra preservation source projection operation is also needed.In order to avoid this extra complex steps, so select origin trace mode from bottom to top.Fig. 7 is traverseQueryTree function false code.
(2) generateCM function
What the present invention's function generateCM produced query node copies attribute, and function is input as the query node that will be rewritten, and exports to copy Attribute expression for this query node, is stored in data structure CopyInfo.First, the attribute that copies being produced all scope tables of query node q by recursive call generateCM is expressed, then analysis and consult node q, obtain its operational character opTree, then to express and operational character obtains q copies Attribute expression with the attribute that copies of the scope table that calculates before, finally call simplifyCM and copy Attribute expression based on relational algebra simplification.Fig. 8 is generateCM function false code.
(3) rewriteCopyQueryNode function
RewriteCopyQueryNode function realizes the porch that PC-CS origin rewrites, and its input needs to carry out the query node that PC-CS origin rewrites, and output is that PC-CS originates from the rewrite node rewritten.The first call function generateCM of rewriteCopyQueryNode produces and copies attribute information, and then call function rewriteQueryNodeCopy carries out PC-CS origin to query node and rewrites, and finally goes copying attribute and adding in the attribute of rewriting.Wherein rewriteQueryNodeCopy function can judge the operational character of query node, according to the rewriting rule of the different operating symbol defined in Fig. 5, calls different functions and rewrites query node.Fig. 9 is rewriteQueryNodeCopy function false code.
2, the tracing module algorithm process that originates from flow process
Call relation between origin tracing module main algorithm function as shown in Figure 10.Its concrete steps are as follows:
(1) the first node of traverseQueryTree call function rewriteCopyQueryNode to query tree q rewrites.
(2) what the first call function generateCM of rewriteCopyQueryNode produced query node copies attribute.
(3) rewriteCopyQueryNode call function rewriteQueryNodeCopy rewrites, and calls SPJ operation rewrite function rewriteSPJQuery, aggregation operator rewriting function rewriteAggQuery, set operation rewriting function rewriteSetQuery according to node type.
(4) rewriteQueryNodeCopy calls composite function generateInclusion and will copy attribute and rewrite posterior nodal point associative list and reach qC and return to rewriteCopyQueryNode.Wherein generateCM can simplify copying Attribute expression by call function simplifyCM.
(5) traverseQueryTree obtains rewriteCopyQueryNode to after the rewriting rreturn value of node, subquery whether can be also had to judge to query tree, if there is subquery, then recursive call function traverseQueryTree, from top to bottom the node of subquery is rewritten, until there is not subquery.
3, Realization analysis
The present invention improves from realizing layer PI-CS model, and with the addition of the source information that record copies data, what can track different situations copies data origin, can meet different user to the demand playing source information.
(1) systems feasibility analysis
Figure 11 is data relationship mode configuration and query SQL statement.Origin query statement and stsndard SQL statement is compared in Figure 11.Figure 12 is two kinds of query statement Comparative result.
Query statement Q5:select provenance name, sum (price) from shop, sales, items where name=sName and itemId=id converts relational algebra expression-form to:
Q=α name, sum (price)name=sName ∧ itemId=id(shop × sales × items)) rewriting of PTC-CS origin is carried out to q.Process is as follows:
P * ( q C ) = if ( C * ( a 1 ) ) then ( N ( a 1 ) ) else ( ϵ ) → N ( a 1 ) , . . . , if ( C * ( a n ) ) then ( N ( a n ) ) else ( ϵ ) → N ( a n )
The part PTC-CS origin Q that the theory calculate inquiring about Q5 goes out pTbe expressed as follows:
Q PT
Implement mutually to compare with reality by the theoretical analysis of statement inquiry Q5, illustrate that the present invention is the same with general data storehouse, carry out normal SQL query.Carry out origin inquiry, the origin Query Result obtained is consistent with the origin result of theoretical method calculating before simultaneously.
(2) system performance comparative analysis
Figure 13 is the execution result to same query statement under of the present invention management system and Perm system, in execution result, this plays management system and clearly can give expression to which attribute item in input data and there occurs and copy, and Perm system generally can only be expressed each attribute item and has an impact to result.In Figure 14 execution result, this plays management system and clearly can give expression to which tuple in input data and there occurs and copy, and Perm system generally can only be expressed all tuples and all has an impact to result.So, PC-CS model of the present invention is compared with PI-CS model, can be clear and definite indicate is carried out in data conversion process in SQL statement, which result data is directly copied by input data, and PI-CS model can only be general expression input data that result data is had an impact, the input data copied can not be given expression to accurately.

Claims (7)

1. data base-oriented system copy a data origin method for tracing, it is characterized in that, comprise the following steps:
1) provide the semantic model PC-CS copying data origin and follow the trail of, specifically provide in Database Systems and copy data origin classification of type and definition;
2) represent on basis in the relation schema of semantic model PC-CS, make one group and copy data origin query rewrite rule;
3) complete copying data origin tracking according to query rewrite rule.
2. data base-oriented system according to claim 1 copy data origin method for tracing, it is characterized in that: step 1) described in semantic model PC-CS be according to the needs copying data origin in data base querying and follow the trail of, on the basis of refinement duplicate category, introduce the concept copying mapping, carry out copying of each operational character and map expression, realize the semantic description copying data origin tracking; Comprise following steps further:
11) copy completely or partial replication according to copying of generation, directly copy or transmit to copy, specifically be divided into all directly copy, part directly copies, directly transmit copy, part transmission copies the semantic model of Four types, provides definition to dissimilar semantic model;
12) to the expression copying mapping, according to 11) classification, directly copying mapping and transmission in PC-CS semantic model are copied to mapping and provide respectively and directly copy origin semantic operation symbol and transmission copies origin semantic operation symbol, respectively defines sky atom operation symbol and unary operator, attended operation symbol and set operation accord with;
13) map origin semantic operation symbol and transmission and copy according to directly to copy and map origin semantic operation symbol, devise the mapping expression formula of each operational character respectively: directly copy expression formula and transmission copies expression formula.
3. data base-oriented system as claimed in claim 2 copy data origin method for tracing, it is characterized in that:
If satisfied condition:
So evidence collection CD/CT (q, t) is just called the CDC-CS/CTC-CS origin of result tuple t under inquiry q;
If satisfied condition:
So evidence collection PD/PT (q, t) is just called the PDC-CS/PTC-CS origin of result tuple t under inquiry q;
Wherein q represents inquiry, and a represents an attribute of inquiry Q input item, and T represents result tuple, and w represents the evidence collection of result tuple T, and CM () represents one and copies mapping.
4. data base-oriented system according to claim 1 copy data origin method for tracing, it is characterized in that: described step 2) in institute make one group copy data origin query rewrite rule, that outermost layer increases a projection being used for storage replication evidence set attribute in query rewrite, according to the inquiry request of user, successively layer by layer according to the rewriting rule of current layer operational character from the outermost layer operational character of former inquiry, carry out query rewrite, till the operational character of innermost layer has been rewritten, then the data origin information that the new inquiry rewritten obtains former Query Result data is performed, comprise following steps further:
21) based on copying expression formula, the query rewrite rule of PC-CS is designed; Query rewrite rule specifically contains the rewriting rule that unary operator, attended operation symbol, set operation symbol and origin attribute comprise, and devises corresponding rewriting expression formula respectively;
22) Correctness Analysis is carried out respectively to copying data origin query rewrite rule.
5. data base-oriented system as claimed in claim 4 copy data origin method for tracing, it is characterized in that:
Use C (q) C *(a i, x) defining and copy expression formula CM (q), copying expression formula CM (q) for representing tracking inquiry q being copied to mapping;
PC-CS rewriting rule is introduced and is copied expression formula and obtain on the basis of PI-CS rule, according to CM (q) definition and PI-CS rewriting rule, an outermost projection is increased in rewriting, C (q) attribute for storage replication evidence collection is contained, this projection expression P in this projection *q () represents; This external application symbol C represents the operational character that PC-CS origin rewrites;
The operational character C that PC-CS origin rewrites: represent and an inquiry q is mapped to a rewritten query
q CD / CT / PD / PT = Π Q , P * ( q C ) ( q C ) .
6. data base-oriented system according to claim 1 copy data origin method for tracing, it is characterized in that: described step 3) according to query rewrite rule complete to copy data origin follow the trail of, comprise following steps further:
31) according to the origin method for tracing based on query rewrite rule, devise and copy data origin tracing system framework;
32) origin tracing module primary function structure is designed, realize the rewriting of query tree;
33) to query manipulation and interpretation of result, illustration method validity and feasibility.
7. data base-oriented system according to claim 6 copy data origin method for tracing, it is characterized in that: the copying of analysis and consult q is mapping through copies expression formula CM (q) and express correctness, attribute a in the base table of i.e. analysis and consult q, the result tuple t of inquiry q, whether the evidence collection in PI-CS blood lineage PI (q, t) of inquiry q and result tuple t satisfies condition (1):
Π C(a)(q C)=CM(q,a,w,t) (1)
Wherein q cobtained by the expression formula CM (q) that copies of q, CM (q, a, w, t) is for copying the definition expression of mapping.
CN201410539143.XA 2014-10-13 2014-10-13 Database-system-oriented replicated data provenance tracing method Pending CN104239581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410539143.XA CN104239581A (en) 2014-10-13 2014-10-13 Database-system-oriented replicated data provenance tracing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410539143.XA CN104239581A (en) 2014-10-13 2014-10-13 Database-system-oriented replicated data provenance tracing method

Publications (1)

Publication Number Publication Date
CN104239581A true CN104239581A (en) 2014-12-24

Family

ID=52227640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410539143.XA Pending CN104239581A (en) 2014-10-13 2014-10-13 Database-system-oriented replicated data provenance tracing method

Country Status (1)

Country Link
CN (1) CN104239581A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881427A (en) * 2015-04-01 2015-09-02 北京科东电力控制***有限责任公司 Data blood relationship analyzing method for power grid regulation and control running
CN105912595A (en) * 2016-04-01 2016-08-31 华南理工大学 Data origin collection method of relational databases
CN107016065A (en) * 2017-03-16 2017-08-04 陕西科技大学 It is customizable to rely on semantic effective origin filter method
CN108038170A (en) * 2017-12-07 2018-05-15 中国科学院电子学研究所苏州研究院 A kind of semantic track data base construction method based on expansion PostgreSQL

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
B.GLAVIC: "Perm: Efficient Provenance Support for Relational Databases", 《PHD THESIS,UNIVERSITY OF ZURICH》 *
BORIS GLAVIC ETAL.: "Using SQL for Efficient Generation and Querying of Provenance Information", 《IN SEARCH OF ELEGANCE IN THE THEORY AND PRACTICE OF COMPUTATION,SPRINGER BERLIN HEIDELBERG》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881427A (en) * 2015-04-01 2015-09-02 北京科东电力控制***有限责任公司 Data blood relationship analyzing method for power grid regulation and control running
CN105912595A (en) * 2016-04-01 2016-08-31 华南理工大学 Data origin collection method of relational databases
CN107016065A (en) * 2017-03-16 2017-08-04 陕西科技大学 It is customizable to rely on semantic effective origin filter method
CN108038170A (en) * 2017-12-07 2018-05-15 中国科学院电子学研究所苏州研究院 A kind of semantic track data base construction method based on expansion PostgreSQL

Similar Documents

Publication Publication Date Title
JP7273045B2 (en) Dimensional Context Propagation Techniques for Optimizing SQL Query Plans
US10565200B2 (en) Conversion of model views into relational models
JP6609262B2 (en) Mapping of attributes of keyed entities
CN103714129B (en) Dynamic data structure based on conditional plan and the construction device of relation and construction method
CN111712809A (en) Learning ETL rules by example
CN107491476B (en) Data model conversion and query analysis method suitable for various big data management systems
CN102693310A (en) Resource description framework querying method and system based on relational database
Patroumpas et al. Towards geospatial semantic data management: strengths, weaknesses, and challenges ahead
US20110022627A1 (en) Method and apparatus for functional integration of metadata
CN104239581A (en) Database-system-oriented replicated data provenance tracing method
De Virgilio et al. R2G: a Tool for Migrating Relations to Graphs.
CN105912721B (en) RDF data distributed semantic parallel inference method
CN114219089B (en) Construction method and equipment of new-generation information technology industry knowledge graph
KR101288208B1 (en) System of entity-relationship model reformulation of sparql query results on rdf data and the method
Mami et al. Generating realistic synthetic relational data through graph variational autoencoders
CN103092960A (en) Method for building software product feature tree model based on demand cluster
Bogorny et al. Semantic-based pruning of redundant and uninteresting frequent geographic patterns
Hamed et al. Query interface for smart city internet of things data marketplaces: A case study
CN116795859A (en) Data analysis method, device, computer equipment and storage medium
Li et al. Parallel search processing of tree-structured data in a big data environment
Wang et al. A survey on data cleaning methods in cyberspace
CN104572991A (en) Heterogeneous facet conversion-based component retrieval method in network forming software component library
Hosain et al. An algebraic language for semantic data integration on the hidden web
Shahzad et al. Automated Generation of Graphs from Relational Sources to Optimise Queries for Collaborative Filtering
CN111324673A (en) Evaluation system of forest ecosystem based on multisource heterogeneous data processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141224

RJ01 Rejection of invention patent application after publication