CN105630881A - Data storage method and query method for RDF (Resource Description Framework) - Google Patents

Data storage method and query method for RDF (Resource Description Framework) Download PDF

Info

Publication number
CN105630881A
CN105630881A CN201510955821.5A CN201510955821A CN105630881A CN 105630881 A CN105630881 A CN 105630881A CN 201510955821 A CN201510955821 A CN 201510955821A CN 105630881 A CN105630881 A CN 105630881A
Authority
CN
China
Prior art keywords
data
storage
query
rdf
tlv triple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510955821.5A
Other languages
Chinese (zh)
Other versions
CN105630881B (en
Inventor
袁柳
张鸿洋
翟梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201510955821.5A priority Critical patent/CN105630881B/en
Publication of CN105630881A publication Critical patent/CN105630881A/en
Application granted granted Critical
Publication of CN105630881B publication Critical patent/CN105630881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • G06F16/8373Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data storage method and query method for an RDF (Resource Description Framework). The data storage method comprises the steps of designing entity-oriented RDF data storage structure and storage mapping; converting a URI (Uniform Resource Identifier) and a literal amount of RDF data into 64-bit binary data; and storing the 64-bit binary data according to the designed storage structure. The data query method comprises the steps of analyzing and converting an SPARQL query statement; estimating single query cost according to an analysis result of the whole data set and a connection relationship among queries by a plurality of query triples in the SPARQL statement; and finally generating a least-cost query process. According to the data storage method and query method, the data comparison speed can be greatly increased and the storage space can be reduced; and compared with a conventional method for directly converting SPARQL into SQL to perform query, the query method provided by the invention has the advantages that the query efficiency is greatly improved and the query method can be used in the fields of Web data management, Web semantic retrieval and the like.

Description

The date storage method of a kind of RDF and querying method
Technical field
The invention belongs to web data management technique field, be specifically related to date storage method and the querying method of the RDF of a kind of search efficiency reducing the memory space of RDF data, raising SPARQL.
Background technology
RDF (resourcedescriptionframework) is the framework that WWW (WorldWideWeb) upper information is described proposed by WWW, and it provides information Description standard for the various application on Web. RDF subject S (Subject), predicate P (Predicate), object O (Object) triple form the resource on Web is described. Wherein, subject generally represents the information entity (or concept) on Web with Uniform Resource Identifier URI (UniformResourceIdentifiers), and predicate describes the association attributes that entity has, and object is corresponding property value. Such form of presentation makes RDF can be used to represent appointing on WebThe information what is identified, and make it can exchange among applications and not lose semantic information. Therefore, RDF becomes the standard that semantic data describes, and is widely used in the description of metadata, body and semantic net. Along with being on the increase of semantic web data, structure stores and inquire about the system of these semantic web datas efficiently becomes the very important aspect that semantic net application obtains popularizing, and RDF is as the description basis of semantic web data, therefore studies the efficient storage of RDF data and inquire about the focus becoming research of semantic web. The storage mode of current RDF data and optimal way mainly have three kinds.
The first, based on the storage mode of relational database
Owing to RDF data can regard the set of<Subject, Predicate, Object>tlv triple as, therefore the most natural mode is to use triple table directly to store these data. Therefore many RDF data based on relational database store system and directly use relational database, and design triple table or similar mode store RDF data. The step of the method comprises: RDF data is resolved to tlv triple by (1); (2) URI in tlv triple is encoded by MD5 (MessageDigestAlgorithm5) Hash, and intercepts front 64 identifiers as resource of MD5 Hash; (3) use the table of one 3 row that data are stored in relevant database, and set up relative index. But, the method is when carrying out SPARQL inquiry, it is necessary to SPARQL query language is converted into SQL SQL and inquires about, it is necessary to the conversion operation of multilamellar. Owing to RDF data and relation data are very different, when RDF data is stored in relation database table, it is necessary to carry out the map operation between table. Therefore the efficiency of space utilisation and inquiry is reduced.
The second, based on the storage mode of local binary file
RDF document is able to store in file with certain form, and in semantic net, substantial amounts of RDF document just exists with the form of RDF/XML. RDF data and relation data structurally make a big difference, and it describes much more complicated than relational database of grammer, but using RDF to describe resource is have bigger motility. Store RDF document based on fixed disk file and can reach better storage efficiency, can ensure that simultaneously and quickly respond inquiry, at present more existing design based on the system of the storage organization of hard disk, these systems are often by means of the general B-tree of data base, B+ tree and Hash table technology. But, the storage mode development cost based on file is of a relatively high, and owing to the RDF semantic web data being basic describes basis, if also having needs to support that on basic storage organization data carry out inquiry reasoning, that just also needs to do substantial amounts of work.
The third, based on the storage mode of internal memory
Along with the development of hardware technology, internal memory is also more and more cheap, and memory size is also increasing, and the RDF data storage system based on internal memory that builds also becomes the focus of Recent study. First internal memory can provide quickish access speed, it is possible to data carry out real-time operation, saves the I/O expense of disk, if designing a good RDF of storage organization in internal memory to store system, it is possible to further improve inquiry and the efficiency analyzed. But, which is not suitable for the storage of large-scale RDF data, and current option b RAHMS, BitMat etc. do not support the direct inquiry of SPARQL. The visible RDF storage organization based on internal memory is still within constantly studying and improving the stage.
Summary of the invention
It is an object of the invention to overcome the deficiency of above-mentioned prior art, it is proposed to a kind of provide for RDF education resource that to compare speed between a kind of data fast and reduce the RDF data storage method of memory space.
Present invention also offers a kind of and above-mentioned storage method mate and can the RDF data querying method of quick search, thus improving the recall precision of RDF education resource.
To achieve these goals, the technical solution used in the present invention is:
The storage method of RDF data of the present invention is made up of following steps:
(1) storage organization of the RDF data of entity-oriented is designed
(1.1) mode of entity-oriented is adopted, data are stored in the k row of relevant database n row, wherein k is the meansigma methods of the predicate quantity of all subjects in RDF data, n is the sum of the line number line that all subjects need, as the predicate quantity sum��k of single subject, then required line number line=1; As sum > k time, then carry out multirow storage, then required line number line=(sum/k)+1;
(1.2) determine after k value, according to mapping predicates algorithm to, predicate is transferred row subscript, obtain the list structure of n row k row;
Wherein the predicate of step (1.2) is converted into the lower target of row method particularly includes:
(1.2.1) calculating row subscript with mapping predicates algorithm, the formula of mapping predicates algorithm is:
h 1 &CirclePlus; h 2 ... &CirclePlus; h j ( u r i ) = i , i &Element; &lsqb; 0 , k &rsqb;
H in formula1, h2��hjCorresponding to j hash function, i is row subscript;
(1.2.2) remain without, when j hash function has calculated, the lower timestamp finding the free time, then open up new a line, these data are stored to h1In the subscript calculated.
(2) design maps for the storage of RDF data
Adopt hash algorithm that URI and the literal of RDF data are separately converted to 64 bit binary data, URI takes the high 64 of hash algorithm, literal measure the low 64 of hash algorithm, ascending order arrangement will be carried out in the binary data storage of conversion to hash concordance list and to the row in hash concordance list, in order to quickly carry out mapping by binary chop algorithm during lookup and convert;
(3) RDF data storage
After RDF data is carried out mapping according to the method for step (2) and changes, first time storage is in the list structure of step (1), storage is analyzed to data in list structure, create analytical table S, record each Subject and Object tlv triple number comprised and the highest 20 the highest with frequency for URI 20 frequencies that literal is corresponding of the frequency of occurrences, list structure according still further to step (1), using Object as storage entity, storage to the data in list structure is carried out second time storage again after mapping and the conversion of step (2), namely the data storage of RDF is completed.
The RDF data querying method that a kind of with above-mentioned RDF data storage method is mated, it is made up of following steps:
(a.1) extraction of variable and conversion
Tlv triple parent map pattern in SPARQL query statement is decomposed, and determine that the variable number in query statement is count, mapping mode URI in query statement and literal respectively referred in the step (2) in storage method is translated into 64 bit binary data, and the variable comprised carries out the assignment of-1 to-count;
(a.2) conversion of basic query chart-pattern
According to the tlv triple parent map Mode Decomposition result in step (a.1), each parent map pattern being converted into tlv triple query node structure, wherein tlv triple query node structure is:
Tlv triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
The mark of storage mode selects first time storage or the second time storage of step (3) in RDF data storage method;
To URI and literal, the Id of subject, predicate, object respectively 64 bit binary data; To variable, the Id of subject, predicate, object corresponds to institute's assigned value;
(a.3) expression of attended operation is inquired about
Tlv triple according to decomposing in parent map pattern in step (a.1) compares mutually, to the tlv triple that there is identical variable, establish a connection with the node Id in step (a.2) structure for unique identifier, and annexation is converted into attended operation limit structure, wherein attended operation limit structure is:
Attended operation limit structure
{
The Id of the node of initial tlv triple,
Terminate the Id of the node of tlv triple,
The Id of co-variate
;
(a.4) Query Cost of each inquiry is calculated
According to the tlv triple query node structure obtained in step (a.2), the attended operation limit structure obtained in step (a.3) is carried out respectively according to cost algorithms costing analysis, the cost value obtaining attended operation limit structure is c, and the formula of cost algorithms is:
TMC(t,m,S)��c
Wherein: t is the tlv triple needing inquiry; M is the middle first time storage of step (3) or the second time storage of RDF data storage method; S is analytical table;
(a.5) generation of inquiry plan
The cost value c of all attended operation limits structure obtained in step (a.4) is carried out ascending sequence, obtain the sequence node by cost value sequence, choosing the node that in sequence, c value is minimum is start node, choose the next node in sequence successively, if the variable in node is not inquired about, then it is attached inquiry, until the variable in all nodes all completes inquiry, namely realizes the inquiry of statement.
Also include step (a.6) after above-mentioned steps (a.5) and set up caching mechanism, particularly as follows: the query statement of user's input is carried out hash operation according to the set of the tlv triple query node structure obtained in step (a.2), obtain the end value of hash function, if cache list exists this value, then directly take out buffered results and feed back to user; Otherwise, then repeat step (a.3) to (a.5), acquired results is stored in hard disk, the end value of corresponding address mark and hash function is stored in cache list.
The date storage method of the RDF of the present invention and querying method are the optimization of the memory structure to data, and for this structure, SPARQL are done query optimization, it is achieved the method that the education resource based on RDF carries out quickly retrieval and inquiry. Compared with prior art, the invention have the advantages that
(1) 64 bit binary data are used to replace the storage of URI originally and literal, the speed compared between data can be promoted greatly and reduce memory space, simultaneously to URI and literal, take the high 64 and low 64 of hash algorithm respectively, to distinguish URI with literal for identical character string. And the storage record of hash index is ranked up, in order to during lookup, quickly navigate to required record by binary chop algorithm.
(2) for the storage organization of RDF data, adopt the mode of entity-oriented (entry-oriented), store with subject (Subject) for entity with object (Object) for entity two ways simultaneously, the former realizes going inquiry predicate (Predicate) from subject (Subject) efficiently, it is to avoid the substantial amounts of attended operation when inquiry of the conventional store mode; The latter realizes efficient from predicate (Predicate) to the inquiry of Subject (subject).
(3) SPARQL query statement is resolved and converts, by the multiple each inquiry tlv triple in SPARQL statement according to the annexation between the analysis result of whole data set and each inquiry, estimate single inquiry cost, ultimately generate minimum cost querying flow, compare and traditional direct SPARQL is converted into SQL inquires about, significantly promote search efficiency.
(4) adding caching mechanism in the process of inquiry, the data set that enquiry frequency is high is carried out buffer memory, cache list in internal memory, the row in each cache list comprises end value and the address designation of hash function, promotes the efficiency of inquiry.
(5) present invention proposes Data Storage Models and query optimization plan can extend to the fields such as web data management, Web semantic retrieval, the even storage and retrieval of other RDF resource data.
Accompanying drawing explanation
Fig. 1 is the analysis of the SPARQL of step (a.2) in embodiment and converts schematic diagram.
Fig. 2 is the explanation that SPARQL generates query tree of step (a.3) in embodiment.
Fig. 3 is the cache model schematic diagram of step (a.6) in embodiment.
Detailed description of the invention
Below in conjunction with drawings and Examples, the present invention is described further.
In the present embodiment, the date storage method of RDF is realized by following steps:
(1) design maps for the storage of RDF data
Storage organization for RDF data, adopt the mode of entity-oriented (entry-oriented), data being stored in the k row of relevant database n row, wherein k is the meansigma methods of the predicate quantity of all subjects in RDF data, and n is the sum of the line number line that all subjects need.
(1.1) the columns k and required line number n of list structure are determined
As predicate (Predicate) quantity sum��k of single subject (Subject), then required line number line=1; As sum > k time, then need multirow tuple to store, required line number line=(sum/k)+1;
Such as data below:
(CharlesFlint,born,1850)
(CharlesFlint,died,1934)
(CharlesFlint,founder,IBM)
(LarryPage,born,1973)
(LarryPage,founder,Google)
(LarryPage,board,Google)
(LarryPage,home,PaloAlto)
(Android,developer,Google)
(Android,version,4.1)
(Android,kernel,Linux)
(Android,preceded,4.0)
(Android,graphics,OpenGL)
Storage form is as shown in table 1:
Table 1 is with the Object storage table being entity
(1.2) the subscript i that predicate (Predicate) stores is determined
After determining k value, according to mapping predicates algorithm, predicate is transferred to row subscript, when multiple predicates of same target obtain identical subscript through mapping algorithm, then it is called conflict, it is necessary to define multiple hash algorithm and utilize the row in space as much as possible and avoid conflict, when multiple hash algorithm have calculated and still there is conflict, then storing for this Subject many increases tuple a line, mapping predicates algorithmic function is:
h 1 &CirclePlus; h 2 ... &CirclePlus; h j ( u r i ) = i , i &Element; &lsqb; 0 , k &rsqb;
H in formula1, h2��hjCorresponding to j hash function, i is row subscript,
Remain without, when j hash function has calculated, the lower timestamp finding the free time, then open up new a line, these data are stored to h1In the subscript calculated.
Associative list 1, checks the tlv triple that Subject is Android, it is assumed that this tlv triple is inserted in data base one by one, and arranging j is 2, then there is h1,h2, the subscript process calculating pred is as shown in table 2:
Table 2 is for calculating target process under predicate
Developer is through h1Calculating obtains subscript 1, and now subscript 1 element-free, directly places.
Version is in like manner placed into subscript 2.
Kernel is through h1Calculating, obtain subscript 1, now 1 is not idle, and meaning clashes, then use h2It is 3 that continuation calculating obtains subscript, places.
Preceded is through h1It is that k places that calculating obtains subscript.
Graphics is through h1,h2The subscript 3 and 2 obtained all is conflicted, then newly-built a line, puts it into pred3��
(2) design maps for the storage of RDF data
The tlv triple data of usual RDF are divided into two classes: URI and literal.
Adopt hash algorithm that URI and literal are separately converted to 64 bit binary data, the high 64 of hash algorithm is taken for URI, the low 64 of hash algorithm is measured for literal, to distinguish URI and the literal of identical characters string, ascending order arrangement will be carried out in the binary data storage of conversion to hash concordance list and to the row in hash concordance list, in order to quickly carry out mapping by binary chop algorithm during lookup and convert;
(3) RDF data storage
RDF data is mapped and after conversion according to the method for step (2), first time storage is in the list structure of step (1), and storage is analyzed to data in list structure, create analytical table S, record each Subject and Object tlv triple number comprised and the highest 20 the highest with frequency for URI 20 frequencies that literal is corresponding of the frequency of occurrences, list structure according still further to step (1), using Object as storage entity, storage to the data in list structure is carried out second time storage again after mapping and the conversion of step (2), complete the data storage of RDF.
With the data in table 1, storage form is shown in table 3:
Table 3 is that the data in table 1 are by the storage form that Object is entity
The efficient method for quickly querying of a kind of RDF data suitable in said method storage, is realized by following steps:
6 tlv triple parent map pattern (BasicGraphPattern are comprised with SPARQL statement, BGP) for example, next SPARQL query statement is needed to change, conversion in order that can conveniently the storage result of bottom be operated, after conversion, each tlv triple is carried out Query Cost estimation, ultimately form lowest costs and perform flow process, specifically realized by following steps:
(a.1) extraction of variable and conversion
By the tlv triple parent map pattern (BasicGraphPattern of SPARQL query statement, BGP) decompose, and determine that the variable number in query statement is count, the mapping of the step (2) that the URI in query statement and literal store method with reference to above-mentioned RDF data is translated into 64 bit binary data with method for transformation, and the variable for comprising in query statement carries out the assignment of-1 to-count;
Such as data below:
SELECT? x? yWHERE{
Xhome " PaloAlto ". //q1
Yfounder " IBM ". //q2
Zfounder " Google ". //q3
XmemberOf? z. //q4
Zrevenue? y. //q5
Xdeveloper? y. //q6
}
Above-mentioned query statement is resolved, obtain three variablees? x,? y,? z, and all of variable is carried out id be encoded to-1 ,-2 ,-3, for other URI or literal, then directly carry out inquiring about in the concordance list of step (2).
(a.2) conversion of basic query chart-pattern
Referring to Fig. 1, according to tlv triple parent map pattern (BasicGraphPattern, the BGP) decomposition texture in step (a.1), each parent map pattern being converted into tlv triple query node structure, wherein tlv triple query node structure is:
Tlv triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
To URI and literal, the Id of subject, predicate, object respectively 64 bit binary data; To variable, the Id of subject, predicate, object is institute's assigned value;
The mark of storage mode may select first time storage (access-by-Subject) and second time storage (access-by-Object) of step (3) in above-mentioned RDF data storage method, first time storage realizes going inquiry predicate (Predicate) from subject (Subject) efficiently, it is to avoid the substantial amounts of attended operation when inquiry of the conventional store mode; When subject the unknown, optional second time storage mode inquiry.
Before carrying out single tlv triple inquiry, first have to the incidence relation determining between number and tlv triple variable and the constant of the number of each tlv triple variable, constant, may decide that the order of inquiry according to these relations.
(a.3) expression of attended operation is inquired about
Tlv triple according to tlv triple parent map Mode Decomposition all of in step (a.1) compares mutually, the tlv triple that there is identical variable is established a connection with the node Id in step (a.2) structure for unique identifier, and annexation is converted into attended operation limit structure, wherein attended operation limit structure is:
Attended operation limit structure
{
The Id of the node of initial tlv triple,
Terminate the Id of the node of tlv triple,
The Id of co-variate
}
Ultimately form the attended operation structure in Fig. 2.
Query statement being converted through above-mentioned and process, it is achieved that the coding of variable and collection, the tlv triple of parent map pattern represents and the attended operation inquired about represents.
(a.4) Query Cost of each inquiry is calculated
According to the tlv triple query node structure obtained in step (a.2), the attended operation limit structure conventionally cost algorithms obtained in step (a.3) is carried out costing analysis, the cost value obtaining attended operation limit structure is c, and the formula of cost algorithms is:
TMC(t,m,S)��c
Wherein: t is the tlv triple needing inquiry; M is the middle first time storage of storage method step (3) or the second time storage of RDF data, and S is analytical table;
Such as:
(? xfounderGoogle)
Use access-by-Object for this tlv triple, then the execution result of TMC function is: the tlv triple number comprised in each Object in analytical table S.
(a.5) generation of inquiry plan
The cost value c of all attended operation limits structure obtained in step (a.4) is carried out ascending sequence, obtain the sequence node by cost value sequence, choosing node minimum for c in sequence is start node, choose the next node in sequence successively, if the variable in node is not inquired about, then it is attached inquiry, until the variable in all nodes all completes inquiry, namely realizes the inquiry of statement.
With reference to Fig. 2, first inquiry plan is chosen first tlv triple query node in inquiry plan and, as starting point, is chosen the 4th query node in inquiry plan structure, according to the information of the inquiry plan provided, to variable? x be attached operation, obtain two variablees<? x? z>intermediate result set; This intermediate result set the again with five is inquired about tlv triple node carry out according to variable? z be attached operation, obtain three variablees middle table<? z? x? y>, by that analogy, perform all of query statement, will obtain? z? x? y>middle table. Finally the result of inquiry is carried out SELECT operation, take out variable? x? the value that y is corresponding.
(a.6) caching mechanism is set up
In the process of data query, setting up the result of caching mechanism caching query, referring to Fig. 3, thus promoting the efficiency of inquiry, concrete operations are:
The query statement of user's input is carried out hash operation according to the set of the tlv triple query node structure obtained in step (a.2), obtains the end value of hash function, if cache list exists this value, then directly take out buffered results and feed back to user; Otherwise, then repeat the above steps (a.3) arrives (a.5), is stored in hard disk by acquired results, and the end value of corresponding address mark and hash function is stored in cache list. When the capacity of buffer memory exceedes intended setting, the frequency according to inquiry, delete minimum frequency.

Claims (4)

1. a RDF data storage method, it is characterised in that be made up of following steps:
(1) storage organization of the RDF data of entity-oriented is designed
(1.1) mode of entity-oriented is adopted, data are stored in the k row of relevant database n row, wherein k is the meansigma methods of the predicate quantity of all subjects in RDF data, n is the sum of the line number line that all subjects need, as the predicate quantity sum��k of single subject, then required line number line=1; As sum > k time, then carry out multirow storage, then required line number line=(sum/k)+1;
(1.2) determine after k value, according to mapping predicates algorithm to, predicate is transferred row subscript, obtain the list structure of n row k row;
(2) design maps for the storage of RDF data
Adopt hash algorithm that URI and the literal of RDF data are separately converted to 64 bit binary data, URI takes the high 64 of hash algorithm, literal measure the low 64 of hash algorithm, ascending order arrangement will be carried out in the binary data storage of conversion to hash concordance list and to the row in hash concordance list, in order to quickly carry out mapping by binary chop algorithm during lookup and convert;
(3) RDF data storage
After RDF data is carried out mapping according to the method for step (2) and changes, first time storage is in the list structure of step (1), storage is analyzed to data in list structure, create analytical table S, record each Subject and Object tlv triple number comprised and the highest 20 the highest with frequency for URI 20 frequencies that literal is corresponding of the frequency of occurrences, list structure according still further to step (1), using Object as storage entity, storage to the data in list structure is carried out second time storage again after mapping and the conversion of step (2), namely the data storage of RDF is completed.
2. the date storage method towards RDF according to claim 1, it is characterised in that: described step (1.2) predicate is converted into the lower calibration method of row and is:
(1.2.1) calculating row subscript with mapping predicates algorithm, the formula of mapping predicates algorithm is:
h 1 &CirclePlus; h 2 ... &CirclePlus; h j ( u r i ) = i , i &Element; &lsqb; 0 , k &rsqb;
H in formula1, h2��hjCorresponding to j hash function, i is row subscript;
(1.2.2) remain without, when j hash function has calculated, the lower timestamp finding the free time, then open up new a line, these data are stored to h1In the subscript calculated.
3. the RDF data querying method mated with the RDF data storage method described in claim 1, it is characterised in that be made up of following steps:
(a.1) extraction of variable and conversion
Tlv triple parent map pattern in SPARQL query statement is decomposed, and determine that the variable number in query statement is count, mapping mode URI in query statement and literal respectively referred in the step (2) in storage method is translated into 64 bit binary data, and the variable comprised carries out the assignment of-1 to-count;
(a.2) conversion of basic query chart-pattern
According to the tlv triple parent map Mode Decomposition result in step (a.1), each parent map pattern being converted into tlv triple query node structure, wherein tlv triple query node structure is:
Tlv triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
The mark of storage mode selects first time storage or the second time storage of step (3) in RDF data storage method;
To URI and literal, the Id of subject, predicate, object respectively 64 bit binary data; To variable, the Id of subject, predicate, object corresponds to institute's assigned value;
(a.3) expression of attended operation is inquired about
Tlv triple according to decomposing in parent map pattern in step (a.1) compares mutually, to the tlv triple that there is identical variable, establish a connection with the node Id in step (a.2) structure for unique identifier, and annexation is converted into attended operation limit structure, wherein attended operation limit structure is:
Attended operation limit structure
{
The Id of the node of initial tlv triple,
Terminate the Id of the node of tlv triple,
The Id of co-variate
;
(a.4) Query Cost of each inquiry is calculated
According to the tlv triple query node structure obtained in step (a.2), the attended operation limit structure obtained in step (a.3) is carried out respectively according to cost algorithms costing analysis, the cost value obtaining attended operation limit structure is c, and the formula of cost algorithms is:
TMC(t,m,S)��c
Wherein: t is the tlv triple needing inquiry; M is the middle first time storage of step (3) or the second time storage of RDF data storage method; S is analytical table;
(a.5) generation of inquiry plan
The cost value c of all attended operation limits structure obtained in step (a.4) is carried out ascending sequence, obtain the sequence node by cost value sequence, choosing the node that in sequence, c value is minimum is start node, choose the next node in sequence successively, if the variable in node is not inquired about, then it is attached inquiry, until the variable in all nodes all completes inquiry, namely realizes the inquiry of statement.
4. RDF data querying method according to claim 3, it is characterised in that also include step (a.6) after described step (a.5) and set up caching mechanism, particularly as follows:
The query statement of user's input is carried out hash operation according to the set of the tlv triple query node structure obtained in step (a.2), obtains the end value of hash function, if cache list exists this value, then directly take out buffered results and feed back to user; Otherwise, then repeat step (a.3) to (a.5), acquired results is stored in hard disk, the end value of corresponding address mark and hash function is stored in cache list.
CN201510955821.5A 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF Active CN105630881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510955821.5A CN105630881B (en) 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510955821.5A CN105630881B (en) 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF

Publications (2)

Publication Number Publication Date
CN105630881A true CN105630881A (en) 2016-06-01
CN105630881B CN105630881B (en) 2019-04-09

Family

ID=56045814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510955821.5A Active CN105630881B (en) 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF

Country Status (1)

Country Link
CN (1) CN105630881B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066573A (en) * 2017-04-10 2017-08-18 北京工商大学 It is a kind of based on the data correlation access method of three-dimensional table structure and application
CN107229704A (en) * 2017-05-25 2017-10-03 深圳大学 A kind of resource description framework querying method and system based on KSP algorithms
CN107480199A (en) * 2017-07-17 2017-12-15 深圳先进技术研究院 Query Reconstruction method, apparatus, equipment and the storage medium of database
CN108268580A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The answering method and device of knowledge based collection of illustrative plates
CN109446358A (en) * 2018-08-27 2019-03-08 电子科技大学 A kind of chart database accelerator and method based on ID caching technology
CN109656946A (en) * 2018-09-29 2019-04-19 阿里巴巴集团控股有限公司 A kind of multilist relation query method, device and equipment
CN110019911A (en) * 2017-12-29 2019-07-16 苏州工业职业技术学院 Support the querying method and device of the knowledge mapping of Knowledge Evolvement
CN110168533A (en) * 2016-12-15 2019-08-23 微软技术许可有限责任公司 Caching to subgraph and the subgraph of caching is integrated into figure query result
CN112287043A (en) * 2020-12-29 2021-01-29 成都数联铭品科技有限公司 Automatic graph code generation method and system based on domain knowledge and electronic equipment
CN112732746A (en) * 2021-01-13 2021-04-30 首都师范大学 SPARQL endpoint association-based dynamic connection ordering method
CN114996370A (en) * 2022-08-03 2022-09-02 杰为软件***(深圳)有限公司 Data conversion and migration method from relational database to semantic triple
US11748506B2 (en) 2017-02-27 2023-09-05 Microsoft Technology Licensing, Llc Access controlled graph query spanning
US11755569B2 (en) * 2018-01-18 2023-09-12 Universite Jean Monnet Saint Etienne Method for processing a question in natural language

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033768A1 (en) * 2003-08-08 2005-02-10 Sayers Craig P. Method and apparatus for identifying an object using an object description language
US20120117081A1 (en) * 2008-08-08 2012-05-10 Oracle International Corporation Representing and manipulating rdf data in a relational database management system
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN103970820A (en) * 2014-01-23 2014-08-06 河海大学 Method and device for visualization of Web multimedia resource open annotation data
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033768A1 (en) * 2003-08-08 2005-02-10 Sayers Craig P. Method and apparatus for identifying an object using an object description language
US20120117081A1 (en) * 2008-08-08 2012-05-10 Oracle International Corporation Representing and manipulating rdf data in a relational database management system
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN103970820A (en) * 2014-01-23 2014-08-06 河海大学 Method and device for visualization of Web multimedia resource open annotation data
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁柳等: ""一种基于聚类模式的RDF数据聚类方法"", 《计算机科学》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110168533B (en) * 2016-12-15 2023-08-08 微软技术许可有限责任公司 Caching of sub-graphs and integrating cached sub-graphs into graph query results
CN110168533A (en) * 2016-12-15 2019-08-23 微软技术许可有限责任公司 Caching to subgraph and the subgraph of caching is integrated into figure query result
US11748506B2 (en) 2017-02-27 2023-09-05 Microsoft Technology Licensing, Llc Access controlled graph query spanning
CN107066573A (en) * 2017-04-10 2017-08-18 北京工商大学 It is a kind of based on the data correlation access method of three-dimensional table structure and application
CN107066573B (en) * 2017-04-10 2020-04-17 北京工商大学 Data association access method based on three-dimensional table structure and application
CN107229704A (en) * 2017-05-25 2017-10-03 深圳大学 A kind of resource description framework querying method and system based on KSP algorithms
CN108268580A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The answering method and device of knowledge based collection of illustrative plates
CN107480199A (en) * 2017-07-17 2017-12-15 深圳先进技术研究院 Query Reconstruction method, apparatus, equipment and the storage medium of database
CN107480199B (en) * 2017-07-17 2020-06-12 深圳先进技术研究院 Query reconstruction method, device, equipment and storage medium of database
CN110019911A (en) * 2017-12-29 2019-07-16 苏州工业职业技术学院 Support the querying method and device of the knowledge mapping of Knowledge Evolvement
US11755569B2 (en) * 2018-01-18 2023-09-12 Universite Jean Monnet Saint Etienne Method for processing a question in natural language
CN109446358A (en) * 2018-08-27 2019-03-08 电子科技大学 A kind of chart database accelerator and method based on ID caching technology
CN109656946A (en) * 2018-09-29 2019-04-19 阿里巴巴集团控股有限公司 A kind of multilist relation query method, device and equipment
CN112287043B (en) * 2020-12-29 2021-06-18 成都数联铭品科技有限公司 Automatic graph code generation method and system based on domain knowledge and electronic equipment
CN112287043A (en) * 2020-12-29 2021-01-29 成都数联铭品科技有限公司 Automatic graph code generation method and system based on domain knowledge and electronic equipment
CN112732746A (en) * 2021-01-13 2021-04-30 首都师范大学 SPARQL endpoint association-based dynamic connection ordering method
CN114996370A (en) * 2022-08-03 2022-09-02 杰为软件***(深圳)有限公司 Data conversion and migration method from relational database to semantic triple

Also Published As

Publication number Publication date
CN105630881B (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN105630881A (en) Data storage method and query method for RDF (Resource Description Framework)
Ali et al. A survey of RDF stores & SPARQL engines for querying knowledge graphs
Özsu A survey of RDF data management systems
Tan et al. Enabling query processing across heterogeneous data models: A survey
Graux et al. Sparqlgx: Efficient distributed evaluation of sparql with apache spark
Stuckenschmidt et al. Index structures and algorithms for querying distributed RDF repositories
CN100550019C (en) OODB Object Oriented Data Base access method and system
Bereta et al. Representation and querying of valid time of triples in linked geospatial data
Etcheverry et al. Enhancing OLAP analysis with web cubes
US9141666B2 (en) Incremental maintenance of range-partitioned statistics for query optimization
Stadler et al. Sparklify: A scalable software component for efficient evaluation of sparql queries over distributed rdf datasets
US20060015809A1 (en) Structured-document management apparatus, search apparatus, storage method, search method and program
US11960479B2 (en) Processing iterative query constructs in relational databases
Banane et al. SPARQL2Hive: An approach to processing SPARQL queries on Hive based on meta-models
Gao et al. GLog: A high level graph analysis system using MapReduce
Theocharidis et al. SRX: efficient management of spatial RDF data
JP5844824B2 (en) SPARQL query optimization method
US20140067853A1 (en) Data search method, information system, and recording medium storing data search program
Botoeva et al. Ontology-based data access–Beyond relational sources
CN106445913A (en) MapReduce-based semantic inference method and system
Glake et al. Towards Polyglot Data Stores--Overview and Open Research Questions
Marathe et al. Integrating the Orca Optimizer into MySQL.
US20080301085A1 (en) Dynamic Database File Column Statistics for Arbitrary Union Combination
Awada et al. Cost Estimation Across Heterogeneous SQL-Based Big Data Infrastructures in Teradata IntelliSphere.
CN115391424A (en) Database query processing method, storage medium and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant