CN101615190B - Safe XML keyword search method - Google Patents

Safe XML keyword search method Download PDF

Info

Publication number
CN101615190B
CN101615190B CN2009100558125A CN200910055812A CN101615190B CN 101615190 B CN101615190 B CN 101615190B CN 2009100558125 A CN2009100558125 A CN 2009100558125A CN 200910055812 A CN200910055812 A CN 200910055812A CN 101615190 B CN101615190 B CN 101615190B
Authority
CN
China
Prior art keywords
node
slca
xml
schema
xml document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100558125A
Other languages
Chinese (zh)
Other versions
CN101615190A (en
Inventor
杨卫东
李晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN2009100558125A priority Critical patent/CN101615190B/en
Publication of CN101615190A publication Critical patent/CN101615190A/en
Application granted granted Critical
Publication of CN101615190B publication Critical patent/CN101615190B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of extensible language (XML) keyword search, in particular to a safe XML keyword search method. By combing XML keyword search and XML safety control, the method of the invention for the first time studies the XML keyword search technology based on safe access control, comprising the following steps: on the basis of the smallest and the lowest common ancestor (SLCA) of the XML keyword and view-based safe access control rules, determining XML keyword search result (SSLCA) based on the safe access control rules; establishing a keyword index based on safe views; and establishing keyword search algorithm (SIL) based on the keyword index. The method of the invention realizes highly effective and safe keyword search.

Description

The XML keyword search method of safety
Technical field
The invention belongs to Extensible Markup Language (XML) key search technical field, be specifically related to safe XML keyword search method, be included on the minimum last common ancestor (SLCA) and basis of XML key word, determine XML key search result (SSLCA) based on the safe access control rule based on the safe access control rule of view; Foundation is based on the key word index of secured views; And key search algorithm on this basis.
Background technology
Key search is simple and easy to usefulness, is widely used in information retrieval field.In recent years, in database field, the combining information retrieval technique is carried out key search at database and is become a research focus.Because XML has become the message exchange standard between internet and enterprise, the research of XML key search has obtained extensive concern [1-9].Existingly mainly concentrate on the definition of result for retrieval and related algorithm research about XML key search system, and the research of the order models of Query Result.One of them classic methods is minimum last common ancestor (SLCA) method [1] [2] [3] [5] [9].If one piece of XML document is regarded as one tree, the SLCA method is returned is that one group of its minimum is replied subtree (smallest answer subtree), a minimum is replied subtree and is defined as comprising all key words, and any its subtree does not comprise the subtree of all key words, and the velamen of this seed tree is called a SLCA.
On the other hand, a large amount of XML data occur on network, are also bringing very big challenge aspect the data security access control, and are being subjected to people's attention always.Current, be based on the access control of secured views about one of the XML data security main research, and obtained certain progress [13-18].That these methods are used is Structured Query Language (SQL) XQuery or XPath, on XML pattern or XML document, add certain safety instruction (Security Specifications), thereby obtain the view (Security View) of a safety, then with the visit of certain memory access control device control user to raw data.
What showed among example 1 Fig. 1 is the tree of an XML document, has write down the department of a company, document and personal information.Wherein each node is represented with its label, and the numeral below the label is its Du Wei (Dewey) coding [11].Fig. 2 is the pattern (Schema) of this XML document.The present invention's hypothesis is as follows to the defined secure access rule of the document among Fig. 1: (1) has only the employee of this department could visit the information of this department; (2) employee's age (Age) and wage (Salary) are security informations, and be invisible fully; (3) all invisible for the All Files inside the department of the DeptNo=of department " #0001 "; (4) file of grade Grade=1 is classified papers in other any one departments, and is all invisible to the employee, and the employee of file this department of other grades as seen.
According to the secure access rule of above definition, establish the user and submit to three key words " Computer ", " Grade " and " Tom " to inquire about this XML document.Definition by the SLCA node in [2] uses the method for SLCA can obtain four SLCA nodes (as the node of the red-label among Fig. 1): File (0.0.0.0), File (0.1.0.0), File (0.1.0.1) and Staff (0.1.3.0) from Fig. 1 as can be known.But the obviously not every subtree that is root with these four nodes all satisfies the secure access constraint.Such as because department number is " #0001 ", not as seen all information that file File (0.0.0.0) comprises should be.
Because Extensible Markup Language (XML) has been widely used in the exchanges data between various application and the information source, on the basis of safe access control the XML data is carried out key search and more merits attention.Such challenge that application brought is:
1. key search is simple and easy to usefulness, and all is to adopt the method for XQuery or XPath to the XML data retrieval on the basis of safe access control before, and how safe access control and key search technology being combined is a challenge.
2. the key search Technology Need is set up index to the key word in the XML document, just must set up the index technology that can effectively carry out key search on the basis of safe access control and will combine safe access control and key search technology.
Designing a kind of new algorithm, can be on the basis of safe access control to return to the user to effective result for retrieval be important.
For the keyword search on the XML database, most of research all is based on the method for SLCA.XRANK[1], [2], [3], XSEarch[5] and [9] be wherein the most famous.[2] it is as follows to have defined the notion of SLCA node: comprise all key words in (1) present node or the descendants's node; (2) any one descendants's node can not be a SLCA.The subtree that will be root with SLCA has comprised a judgment criteria as return results: promptly result's quality depends on the height of SLCA fully.[3] defined the notion of a MLCA (Meaningful LCA).[5] then defined the notion of an interconnected relationship (Interconnection Relationship).In fact they are similar to the notion of SLCA.These researchs mainly concentrate in the selection of the algorithm of key search and return results, but the safety issue of the XML that does not all consider.
Research based on the XML safe access control has [13-18], safe access control pattern in these articles all is that a series of secure access rules of definition are controlled the visit of user to the XML data, and these access rules use a series of safety instruction (Security Specifications) to carry out formal description again.[12,13,14] also mentioned the notion of secured views (SecurityViews) in, secured views is the mapping of safety instruction on XML Schema, and the inquiry of XML data is based on the structurized inquiry [13,14,15 of secured views, 18], and then by mapping relations inquiry is transferred on this XML document.These researchs all are to concentrate on the XML data to carry out on the method for safe access control mainly, and the visit of XML data is all realized by structured query sentence.
Summary of the invention
The objective of the invention is to safe access control and XML key search technology are combined, propose a kind of method of the XML key search based on safe access control.
The XML keyword search method of the safety that the present invention proposes, concrete steps are as follows:
1) foundation is based on the XML index of Mode S chema safety instruction, at first the Schema to XML document sets up safety instruction, then by XML document is carried out depth-first traversal, use the mode of Dewey coding that node is encoded, in ergodic process, consider the safety instruction of Schema simultaneously, when having traveled through an XML document when counting, promptly set up index based on the safety instruction of Schema; The structure of index is an inverted index, deposit with the structure that key-value is right, key promptly is a key word, value promptly is the tabulation that the Dewey coding of key word place node in XML document is formed, simultaneously, each node is also safeguarded a list variable in the value list, is used to deposit the relevant nodal information of node condition therewith;
2) seek SSLCA according to XML index based on the Schema safety instruction, during operation, the user submits keyword query to, calculate the Dewey encoded radio of the minimum last common ancestor node of key word by inverted index, because the SLCA node of at this time calculating might not satisfy safety instruction, so the information of utilizing each node list variable to deposit is carried out beta pruning to the SLCA node and is eliminated, and gets SSLCA node to the end;
3) at last the SSLCA node that obtains is returned to the user.
Below the semanteme based on the key search of safety instruction is described
The present invention is a set W={w who comprises k key word with the query-defined of user's submission i| i=1 ..., k} is an XML document tree with the XML data definition:
Definition 1:XML document tree (XML Document Tree).An XML document tree can be represented as one 2 tuple T, T=(V, E), value (dw) is the operation of carrying out on the XML document tree, wherein:
V is the set that tree goes up all nodes, and each node all has unique Dewey coding;
E ⊆ V × V , It is the set of tree top.
Value (dw) function is used for obtaining the value that Dewey is encoded to the node of dw;
Stipulate in addition that for two node v on the XML tree and v ' formula v≤v ' expression node v is that the ancestors of node v ' or v and v ' are same nodes.
Definition 2:SLCA (minimum last common ancestor).The node that an XML document is set is called as the node of the SLCA of set of keyword set W, if this node satisfies: comprise all key words in (1) label or in the label of descendent node, (2) are the SLCA nodes without any a descendent node.
Three key words " Computer ", " Grade " and " Tom " as user's submission in the example 1.Then its SLCA node is File (0.0.0.0), File (0.1.0.0), File (0.1.0.1) and Staff (0.1.3.0).
Definition 3: safety instruction (Security Specifications).Safety instruction S is one two tuple S=(D, ann), wherein D represents the Schema of an XML document, be function ann (A of last every directed edge definition of Schema, B), wherein A represents a node on the Schema, and B represents the child node of node A, if function ann (A, B) the such expression formula of the words that on Schema, are shown definition below can being:
ann(A,B)=Y|[q]|N
Wherein, the child node B that Y, [q], N are illustrated respectively in node A among the Schema is addressable, conditional access and inaccessible, q is the conditional access statement of representing with XPath in addition.If function ann (A, B) on Schema not by explicit definition, the accessibility of Node B has just been inherited the accessibility of node A.From another point of view, if (A, B) by explicit definition, the accessibility of Node B has just covered the accessibility of node A to function ann.
With the Schema among Fig. 2 serves as to investigate object, and the secure access rule in the example 1 can be defined as safety instruction as shown in Figure 3 formally.
This safety instruction is identified secured views a: q that can obtain as shown in Figure 4 on Schema 1, q 2Expression conditional access, wherein q 1=Company/Dept[DeptNo=$login_DeptNo], expression has only the employee of this department could visit this department's internal information; q 2=" #0001 "], the All Files that expression department is numbered in the department of " #0001 " all is sightless; And q 3=Company/Dept/Files/File[Grade>2], expression has only the value of the child node Grade of File node to be only addressable greater than this File node under 1 the situation; Mark is come on node with dashed lines limit for inaccessible, stamp identifier " N " simultaneously on the limit, and the node that is not labeled all is addressable node, has perhaps all inherited the accessibility of father node.
Definition 4: the node of conditional access (Conditionally Accessible Node).On the Schema that has defined safety instruction, accessed if one type node need satisfy certain condition ability, claim that such node is the node of conditional access.Therefore, on the XML document corresponding with this Schema, all such nodes can be called as the node of conditional access.
Definition 5:SSLCA (the minimum last common ancestor of safety).The SLCA node of set of keyword set W on XML document tree that has defined the secure access rule is called as the SSLCA node, if this SLCA node satisfies all secure access rules.
According to top definition and Fig. 4, node type is Dept among Fig. 1, Files, the node of the File q that satisfies condition respectively 1, q 2, q 3Situation under could be accessed, so node Dept (0.0), Files (0.0.0), File (0.0.0.0), Dept (0.1), Files (0.1.0), File (0.1.0.0), File (0.1.0.1) are the nodes of conditional access.And File (0.1.0.0) and two nodes of Staff (0.1.3.0) are the SSLCA node.
Definition 6: the node (Conditionally Relative Node) that condition is relevant.Because the condition that the conditional access node is satisfied is represented with the XPath statement, and the XPath statement has write down the routing information of the destination node that can have access to by this statement, so the destination node that the present invention can have access to the XPath conditional statement by certain conditional access node calls the relevant node of conditional access node condition therewith.
As shown in Figure 1 and Figure 4, node Dept (0.0), Files (0.0.0), File (0.0.0.0) are the nodes of conditional access, and they are respectively q at required satisfied condition 1, q 2, q 3Pass through q 1, q 2As can be known, all be DeptNo (0.0.2) node under Dept (0.0) node with node Dept (0.0), node that Files (0.0.0) condition is relevant; The node relevant with node File (0.0.0.0) condition is Grade (0.0.0.0.1) node under File (0.0.0.0) node.
Below the inventive method is further described.
1, foundation is based on the XML index of Schema safety instruction
Because index of the present invention is based on the index of Schema safety instruction, so the record of correlated condition and variable in the process of traversal XML document tree, must must considering for safety instruction.
Specific practice is as follows: for each the node v in the XML document safeguards two parts information: the one, and the Dewey coding of node is represented with v.deweyNo; The 2nd, to the node of each conditional access, safeguard a node listing relevant with this node condition, represent with v.list.In the process of traversal XML document tree, carry out following operation simultaneously when calculating the Dewey coding: if v is an inaccessible node, then in v.list, deposit an identifier " N ", expression node v inaccessible for each node v; If v is the node of a conditional access, in Schema, find out the node type relevant with the node type condition of node v, get the last common ancestor of these two kinds of node types, be designated M.Then in XML document, type is the node of M among the bottom-up ancestors that find out node v, be designated m, m≤v then, and the node relevant with node v condition is inevitable in the child node of node m, therefore, as long as the value of all relevant with node v condition in the child node of node m nodes is put into the information that v.list can obtain all nodes relevant with node v condition.If when traversing node v, the child node of node m is not also traversed, just note the Dewey coding of node m in node v the inside, when traverse other child nodes of node m next time again these child nodes in the Dewey of all nodes relevant with node v condition encode and put into v.list.After once traveling through complete XML document tree, just set up based on the index of the safety instruction of Schema.
According to top method, after once traveling through complete XML document tree, the Dewey coding of XML document tree wherein needs the information in the list variable of node of conditional access as follows respectively as shown in Figure 1:
Dept(0.0).list={0.0.2}
Files(0.0.0).list={0.0.2}
File(0.0.0.0).list={0.0.0.0.1}
Age(0.0.3.0.1).list={N}
Salary(0.0.3.0.2).list={N}
Dept(0.1).list={0.1.2}
Files(0.1.0).list={0.1.2}
File(0.1.0.0).list={0.1.0.0.1}
File(0.1.0.1).list={0.1.0.1.1}
Age(0.1.3.0.1).list={N}
Salary(0.1.3.0.2).list={N}
2, seek SSLCA according to XML index based on safety instruction
When using the index calculation SSLCA of Schema safety instruction, the method for being taked is: from the set of keywords W={w that provides i| i=1 ..., among the k}, obtain a set L who meets the SLCA node of these keyword queries, from L, take out SLCA then one by one and judge whether the current SLCA that comes that is removed satisfies safety instruction.
Concrete way is divided following four steps:
(1) judges earlier whether the subtree (following abbreviation " subtree of SLCA ") that is root with current SLCA node comprises the inaccessible node, if do not comprise then directly jump to step (3), all inaccessible nodes are cropped from this SLCA subtree if just comprise;
(2) judge by the subtree of the SLCA of cutting whether comprise all key words again,, then arrive and change step (3), otherwise directly forward step (4) to if comprise;
(3) the bottom-up method of use is found out the path from this SLCA node to the root node of Schema, use top-down method according to the XML index of being set up then, judge one by one from root node to this SLCA node this road through on whether the node of each node and SLCA subtree whether all satisfy separately safety instruction, if do not satisfy then to step (4), all satisfy and then keep this SLCA, make it become a SSLCA;
(4) current SLCA is deleted from L.
The user has submitted three key words " Computer ", " Grade " and " Tom " in example 3. examples 1.Use the method for SLCA can obtain set L={File (0.0.0.0) File (0.1.0.0) who forms by four SLCA nodes that satisfy condition, File (0.1.0.1), Staff (0.1.1.0) } do not comprise the inaccessible node for SLCA node File (0.0.0.0), use the bottom-up method to find its path of arriving root to be Company (0) → Dept (0.0) → Files (0.0.0) → File (0.0.0.0), the user's De $login_DeptNo=" #0002 " that supposes current login system is according to the index of having set up, use the top-town algorithm as can be known, node Company (0) is addressable node, Dept (0.0) is the conditional access node, and Dept (0.0) .list={0.0.2}, value (0.0.2)=" #0001 " and user's De $login_DeptNo=" #0002 " are so node Dept (0.0) has violated the condition q of safety instruction among Fig. 4 1, therefore, SLCA node File (0.0.0.0) will delete from set L.In like manner, violated safety instruction condition q among Fig. 4 for SLCA node File (0.1.0.0) 3, this SLCA node also can be deleted from set L; Satisfy all safety instruction conditions among Fig. 4 for SLCA node File (0.1.0.1), this SLCA node will be retained; And, crop all invisible nodes for being the subtree of replying of root with SLCA node Staff (0.1.3.0).Afterwards, discovery has still comprised all key words in the remainder of replying subtree that with Staff (0.1.3.0) node is root, then use bottom-up and top-down method to check the path of this node to root, find that it meets safety instructions all among Fig. 4 fully, at this moment only gather among the L remaining two and satisfy safety instruction SLCA node { File (0.1.0.1), Staff (0.1.3.0) }, these two nodes all are the SSLCA nodes, and system can return to the user to the XML document segment that is root with these two SSLCA nodes (through the remaining part of cutting) at last.
To sum up, the present invention proposes to carry out the problem of safe access control for the first time in the XML key search based on SLCA, XML safe access control problem and XML key search technology are combined, and have proposed a total solution, unique characteristic below the tool:
1 formalization has defined the safe XML keyword search notion of SSLCA as a result, and has proposed to seek the method for SSLCA.
2 provide the index establishing method based on pattern (Schema) safety instruction.
3 provide the algorithm SIL that seeks SSLCA based on this index.
4 have proved that by experiment the algorithm that the present invention uses is efficiently.
XML key search based on safety instruction is that the XML document that has added safety instruction is set up index in essence, carries out key search then on the basis of this index, seeks the process of the minimum last common ancestor (SSLCA) of safety.
Description of drawings
Fig. 1 is the XML document diagram.
Fig. 2 is XML Schema diagram.
Fig. 3 is the safety instruction diagram based on Schema.
Fig. 4 is the reflection diagram of safety instruction on Schema.
Fig. 5 is that IL method and SIL method are carried out efficiency ratio.
Wherein (a) is the result's comparison in data set T1, (b) is that the result in data set T2 compares, and (c) is that the result in data set T3 compares, and (d) is that the result in DBLP compares.
Embodiment
The inventor uses Java to realize XKSearch[2 respectively] in IL method and the inventive method (being designated as the SIL method), these two methods are mainly compared on search efficiency.The configuration of experimental situation is: CPU is Intel Pentium1.73GHz, in save as 1G.
XML key search process based on safe access control mainly comprises two parts, and first is the index of setting up based on the safety instruction of Schema, and second portion is to utilize this index calculation SSLCA.Algorithm uses pseudo-code to be described below respectively:
The pseudo-code of setting up index is as follows:
Input: XML document tree T, its root node r and based on the safety instruction S of Schema
Output: the XML document tree T ' that meets the Schema safety instruction through coding
12.r.deweyNo=0;
13.recursiveTraverse(r);
14.
15.Function?recursiveTraverse(v)
16.Ifv?is?a?type?ofunaccessible?node?according?to?S
17.add′N′to?v.list;
18.If?v?is?a?type?of?conditionally?accessible?node
19.add?v′s?all?conditionally?relative?nodes?to?v.list;
20.For?each?children?node?wi?of?v
21.generate?wi.deweyNo?according?to?v.deweyNo
recursiveTraverse(wi)。
The pseudo-code of the algorithm of calculating SSLCA is as follows:
Input: through the XML document tree T and the set of keyword set W={Wi|i=1 of coding ... k}
Output: SSLCA node set
15.Get?all?SLCA?nodes?L?for?the?keywords?W
16.Foreach?SLCA?node?s?in?L
17.if?s?contains?the?inaccessable?nodes
18.Remove?the?inaccessable?nodes?from?s;
19.check?whether?s?still?contains?all?the?keywords;
20.if?not
21.delete?s?from?L;
22.expand?a?path?from?s?to?it’s?all?ancestors?untill?root;
23.check?each?node?in?this?path?from?root?to?s;
24.if?there?is?one?node?doesn’t?satify?its?security?specification;
25.Delete?s?from?L;
26.else?check?each?children?node?in?s
27.if?there?is?one?node?doesn′t?satify?its?security?specification
Delete?s?from?L。
The XML data set is in order to make experimental result truer, and the present invention has not only used the artificial data collection but also used the real data collection to experimentize.
Artificial data collection: use three different data set T1 (15.7M), T2s (23.5M), the T3 (35.2M) of the XML document Core Generator [10] of IBM based on the present invention's oneself Schema (Fig. 2) generation.
The real data collection: the DBLP data set and the Schema that use http://dblp.uni-trier.de/xml/ to provide, the DBLP data set size in the experiment is 47M.
On search efficiency, these two methods are compared: in the experiment, it is that 1000-1500 time key word experimentizes that the present invention is chosen in the data centralization frequency of occurrences at random, difference based on the key word number, respectively these four data sets of T1, T2, T3 and DBLP have been compared IL method in [2] and SIL method in the difference of carrying out on the efficient, experimental result as shown in Figure 5.As can be seen from the figure the execution time of SIL method is always long than the execution time of IL method, that is because after calculating SLCA result with the IL method, also to the SLCA among the result be screened, eliminate the SLCA that does not satisfy the Schema safety instruction and could get to the end SSLCA.Though this step needs extra time overhead, and is all very little with respect to original efficiency ratio amplification.
List of references
[1]L.Guo,F.Shao,C.Botev,and?J.Shanmugasundaram.XRANK:Ranked?Keyword?Search?overXML?Documents.[C]//SIG?MOD?,2003.San?Diego,California,2003.
[2]Y.Xu,Y.Papakonstantinou.Efficient?Keyword?Search?for?Smallest?LCAs?in?XMLDatabases.[C]//SIGMOD,2005.Baltimore,Maryland,USA,2005:527-538
[3]Y.Li,C.Yu,and?H.V.Jagadish.Schema-Free?XQuery.[C]//Proceedings?of?the?30th?VLDBConference,2004.Toronto,Canada,2004:72-83.
[4]V.Hristidis,N.Koudas,Y.Papakonstantinou,and?D.Srivastava.Keyword?ProximitySearch?in?XML?Trees.[C]//IEEE?Transactions?on?Knowledge?and?Data?Engineering,2006:525-539.
[5]S.Cohen,J.Mamou,Y.Kanza,and?Y.Sagiv.XSEarch:A?Semantic?Search?Engine?forXML.[C]//Proceedings?of?the?29th?VLDB?Conference,Berlin,Germany,2003.
[6]G.Li,J.Feng,J.Wang,and?L.Zhou.Effective?Keyword?Search?for?Valuable?LCAs?overXML?Documents.[C]//CIKM,2007.Lisboa,Portugal,2007:1-10.
[7]Z.Liu?and?Y.Chen.Identifying?Meaningful?Return?Information?for?XML?Keyword?Search.[C]//SIGMOD,2007.Beijing,China,2007:329-340.
[8]Z.Liu?and?Yi?Chen.Reasoning?and?Identifying?Relevant?Matchesfor?XML?Keyword?Search.[C]//VLDB,2008.Auckland,New?Zealand,2008.
[9]C.Sun,C.Chan,and?A.K.Goenka.Multiway?SLCA-based?Keyword?Search?in?XML?Data.[C]//WWW,2007.Banff,Alberta,Canada,2007:1043-1052.
[10]Angel?Luis?Diaz?and?Douglas?Lovell.XML?Generator. http://www.alphawo rks.ibm.com/tech/xmlgenerator,[M]//September?1999.
[11]M.Dewey,Dewey?Decimal?Classification?System, http://www.mtsu.edu/vvesper /dewey.html.
[12]Gabriel?Kuper,Fabio?Massacci,Nataliya?Rassadko.Generalized?XML?Security?Views.[C]//SACMAT,2005.
[13]W.Fan,C.-Y.Chan,and?M.Garofalakis.Secure?XML?querying?with?security?views.[C]//SACMAT’05,Stockholm,Sweden,2005:77-84.
[14]S.Mohan,A.Sengupta,and?Y.Wu.Access?control?for?XML-a?dynamic?query?rewritingapproach.[C]//Proceedings?of?the?31st?VLDB?Conference,Trondheim,Norway,2005:1-12.
[15]Trondheim,Norway,2005S.Mohan,J.Klinginsmith,A.Sengupta,and?Y.Wu.Accesscontrol?for?XML?with?enhanced?security?specifications.[C]//ICDE,2006.Atlanta,Georgia,USA,2006:1-1
[16]I.Fundulaki?and?M.Marx.Specifying?access?control?policies?for?XML?documents?withXPath.[C]//SACMAT’04.New?York,USA,2004
[17]Sriram?Mohan,Yuqing?Wu.IPAC-An?Interactive?Approach?to?Access?Control?forSemi-Structured?Data.[C]//VLDB?`06,Seoul,Korea?2006:1147-1150.
[18]Bogdan?Cautis.Distributed?Access?Control:A?Privacy?conscious?Approach.[C]//SACMAT’07,Sophia?Antipolis,France,2007:61-70.

Claims (1)

1. the XML keyword search method of a safety is characterized in that concrete steps are as follows:
1) foundation is based on the XML index of Mode S chema safety instruction, at first the Schema to XML document sets up safety instruction, then by XML document is carried out depth-first traversal, use the mode of Dewey coding that node is encoded, in ergodic process, consider the safety instruction of Schema simultaneously, when having traveled through an XML document tree, promptly set up index based on the safety instruction of Schema; The structure of index is an inverted index, deposit with the structure that key-value is right, key promptly is a key word, value promptly is the tabulation that the Dewey coding of key word place node in XML document is formed, simultaneously, each node is also safeguarded a list variable in the value list, is used to deposit the relevant nodal information of node condition therewith;
2) seek SSLCA according to XML index based on the Schema safety instruction, during operation, the user submits keyword query to, calculate the Dewey encoded radio of the SLCA node of key word by inverted index, the information of utilizing each node list variable to deposit is carried out beta pruning to the SLCA node and is eliminated, and gets SSLCA node to the end;
3) at last the SSLCA node that obtains is returned to the user;
It is wherein, described that to set up the concrete steps of XML index based on Mode S chema safety instruction as follows:
1) safeguard two parts information for each the node v in the XML document: the one, the Dewey coding of node is represented with v.deweyNo; The 2nd, to the node of each conditional access, safeguard a node listing relevant with this node condition, represent with v.list;
2) in the process of traversal XML document tree, carry out following operation when calculating the Dewey coding: if v is an inaccessible node, then in v.list, deposit an identifier " N ", expression node v inaccessible for each node v; If v is the node of a conditional access, in Schema, find out the node type relevant with the node type condition of node v, get the last common ancestor of these two kinds of node types, be designated M; Then in XML document, type is the node of M among the bottom-up ancestors that find out node v, is designated m, and then m<v, and the node relevant with node v condition is inevitable in the child node of node m; The value of all relevant with node v condition in the child node of node m nodes is put into the information that v.list can obtain all nodes relevant with node v condition; If when traversing node v, the child node of node m is not also traversed, just note the Dewey coding of node m in node v the inside, when traverse other child nodes of node m next time again these child nodes in the Dewey of all nodes relevant with node v condition encode and put into v.list; After once traveling through complete XML document tree, just set up based on the index of the safety instruction of Mode S chema;
The way of described XML index calculation SSLCA based on Mode S chema safety instruction is as follows:
From the set of keywords W={w that provides i| i=1 ..., among the k}, find out a set L who meets the SLCA node of these keyword queries, from set L, take out SLCA one by one then, and judge whether the current SLCA node that comes that is removed satisfies safety instruction; Following four steps of concrete branch:
The subtree that it is root that step (1) is judged with current SLCA node earlier, be called for short " subtree of SLCA ", whether comprise the inaccessible node,, all inaccessible nodes are cropped from this SLCA subtree if just comprise if do not comprise then directly jump to step (3);
Step (2) judges by the subtree of the SLCA of cutting whether comprise all key words again, if comprise, then arrives and changes step (3), otherwise directly forward step (4) to;
The bottom-up method of step (3) use is found out the path from this SLCA node to the root node of Schema, use top-down method according to the XML index of being set up then, judge one by one from root node to this SLCA node this road through on whether the node of each node and SLCA subtree whether all satisfy separately safety instruction, if do not satisfy then to step (4), all satisfy and then keep this SLCA, make it become a SSLCA;
Step (4) is deleted current SLCA from L;
Wherein, XML is an Extensible Markup Language, and SLCA is minimum last common ancestor, and SSLCA is the minimum last common ancestor of safety; Relevant notion is defined as follows:
Definition 1:XML document tree is represented by one 2 tuple T, T=(V, E), value (dw) is the operation of carrying out on the XML document tree, wherein:
V is the set that tree goes up all nodes, and each node all has unique Dewey coding;
Figure FSB00000599606600021
It is the set of tree top;
Value (dw) function is used for obtaining the value that Dewey is encoded to the node of dw;
Definition 2:SLCA, the node that an XML document is set is called as the node of the SLCA of set of keyword set W, if this node satisfies: comprise all key words in (1) label or in the label of descendent node, (2) are the SLCA nodes without any a descendent node;
Definition 3: safety instruction, safety instruction S is one two tuple S=(D, ann), wherein D represents the Schema of an XML document, for function ann of last every directed edge definition of Schema (A, B), wherein A represents a node on the Schema, B represents the child node of node A, if function ann (A, B) the such expression formula of the words that on Schema, are shown definition below can being:
ann(A,B)=Y|[q]|N
Wherein, the child node B that Y, [q], N are illustrated respectively in node A among the Schema is addressable, conditional access and inaccessible, q is the conditional access statement of representing with XPath in addition; If function ann (A, B) on Schema not by explicit definition, the accessibility of Node B has just been inherited the accessibility of node A; If function ann (A, B) by explicit definition, the accessibility of Node B has just covered the accessibility of node A;
Definition 4: the node of conditional access, accessed if one type node need satisfy certain condition ability on the Schema that has defined safety instruction, claim that such node is the node of conditional access;
Definition 5:SSLCA, the SLCA node of the set of keyword set W on XML document tree that has defined the secure access rule is called as the SSLCA node, if this SLCA node satisfies all secure access rules;
Definition 6: the node that condition is relevant, the destination node that the XPath conditional statement by certain conditional access node can have access to are called the node that conditional access node condition therewith is correlated with.
CN2009100558125A 2009-07-31 2009-07-31 Safe XML keyword search method Expired - Fee Related CN101615190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100558125A CN101615190B (en) 2009-07-31 2009-07-31 Safe XML keyword search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100558125A CN101615190B (en) 2009-07-31 2009-07-31 Safe XML keyword search method

Publications (2)

Publication Number Publication Date
CN101615190A CN101615190A (en) 2009-12-30
CN101615190B true CN101615190B (en) 2011-12-07

Family

ID=41494832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100558125A Expired - Fee Related CN101615190B (en) 2009-07-31 2009-07-31 Safe XML keyword search method

Country Status (1)

Country Link
CN (1) CN101615190B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794312A (en) * 2010-03-08 2010-08-04 上海交通大学 XML (Extensive Makeup Language) access control method based on security view
CN102087666B (en) * 2011-01-30 2012-10-31 华东师范大学 Indexes based on covering relationship between nodes and key words, constructing method and query method thereof
JP2013084074A (en) * 2011-10-07 2013-05-09 Sony Corp Information processing device, information processing server, information processing method, information extracting method and program
CN102867054A (en) * 2012-09-13 2013-01-09 江苏乐买到网络科技有限公司 XML (extensible markup language) keyword query method
CN103116654B (en) * 2013-03-06 2016-08-24 同方知网(北京)技术有限公司 A kind of XML data node code compression method
CN103279514A (en) * 2013-05-22 2013-09-04 河海大学 Method for inferring XML keyword query goal node type
CN103544281A (en) * 2013-10-23 2014-01-29 中安消技术有限公司 Method, device and system for retrieving keywords

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101089851A (en) * 2007-07-12 2007-12-19 复旦大学 XML flow buffer store manage method based on partial binary prefix code
CN101201834A (en) * 2007-11-01 2008-06-18 复旦大学 Method for searching XML data stream keyword based on document type definition
CN101241502A (en) * 2008-03-13 2008-08-13 复旦大学 XML document keyword searching and clustering method based on semantic distance model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101089851A (en) * 2007-07-12 2007-12-19 复旦大学 XML flow buffer store manage method based on partial binary prefix code
CN101201834A (en) * 2007-11-01 2008-06-18 复旦大学 Method for searching XML data stream keyword based on document type definition
CN101241502A (en) * 2008-03-13 2008-08-13 复旦大学 XML document keyword searching and clustering method based on semantic distance model

Also Published As

Publication number Publication date
CN101615190A (en) 2009-12-30

Similar Documents

Publication Publication Date Title
CN101615190B (en) Safe XML keyword search method
Kaushik et al. Covering indexes for branching path queries
Tian et al. Implementing a scalable XML publish/subscribe system using relational database systems
Furche et al. RDF querying: Language constructs and evaluation methods compared
KR101082814B1 (en) Searching method for ontology information using keyword and device thereof
US8775356B1 (en) Query enhancement of semantic wiki for improved searching of unstructured data
Hachicha et al. A survey of XML tree patterns
CN100558078C (en) The complex small-branch mode method for inquiring and matching of XML flow data
Qtaish et al. XAncestor: An efficient mapping approach for storing and querying XML documents in relational database using path-based technique
Huang et al. Streaming Transformation of XML to RDF using XPath-based Mappings
Katib et al. RIQ: Fast processing of SPARQL queries on RDF quadruples
Svoboda et al. Linked data indexing methods: A survey
Liu et al. Tree pattern matching in heterogeneous fuzzy XML databases
Zanibbi et al. The tangent search engine: Improved similarity metrics and scalability for math formula search
Zhou et al. Unifying the processing of xml streams and relational data streams
Sakr Algebra‐based XQuery cardinality estimation
Xiang et al. A new indexing strategy for XML keyword search
Leela et al. Schema-conscious XML indexing
Qtaish et al. Query mapping techniques for XML documents: A comparative study
Yan et al. Fuzzy XML Queries and Index
Kung et al. A novel twig-join swift using SST-based representation for efficient retrieval of internet XML
Svoboda et al. On Distributed Querying of Linked Data.
Qunai Hierarchical Stack-Based Twig Query Algorithm of XML Data
Ma et al. Modeling and Management of Fuzzy Semantic RDF Data
Baggi et al. PHIL: A Lazy Implementation of a Language for Approximate Filtering of XML Documents

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111207

Termination date: 20140731

EXPY Termination of patent right or utility model