CN104778277A - RDF (radial distribution function) data distributed type storage and querying method based on Redis - Google Patents

RDF (radial distribution function) data distributed type storage and querying method based on Redis Download PDF

Info

Publication number
CN104778277A
CN104778277A CN201510213313.XA CN201510213313A CN104778277A CN 104778277 A CN104778277 A CN 104778277A CN 201510213313 A CN201510213313 A CN 201510213313A CN 104778277 A CN104778277 A CN 104778277A
Authority
CN
China
Prior art keywords
node
predicate
subject
redis
rdf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510213313.XA
Other languages
Chinese (zh)
Inventor
汪璟玢
董书暕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201510213313.XA priority Critical patent/CN104778277A/en
Publication of CN104778277A publication Critical patent/CN104778277A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an RDF (radial distribution function) distributed type storage and querying method based on Redis. The RDF distributed type storage method is characterized in that the RDF distributed type storage method of an RDF distributed type storage system which utilizes Type-P data distributing and is based on the Redis is adopted; by judging the complexity of a to-be-queried sentence, different querying methods are selected, so as to quickly and effectively query. The RDF distributed type storage and querying method based on the Redis has the advantages that the distributed type storage system and the optimized querying method are combined, so the querying range is effectively reduced, and the querying efficiency is improved; the working efficiency is high under the conditions of multiple querying tuple modes and complicated semantics, so the storage and querying requirements of a large amount of RDF data are met.

Description

A kind of RDF Data distribution8 formula based on Redis stores and querying method
Technical field
The present invention relates to RDF data to store and inquiring technology field, particularly a kind of RDF distributed storage based on Redis and querying method.
Background technology
At present, RDF (the Resource Description Framework) management system of some maturations has been sent out by academia Jian, as Jena, Sesame, RDF-3X, the a little system mostly traditional centralized relevant database of Bian stores RDF data, data are stored in relevant database according to certain organizational form, utilize ripe relation or Object Relational Database to carry out backstage storage, SPARQL inquiry is changed into SQL query statement and inquires about.
Along with the growth rapidly of RDF data, oneself warp of traditional RDF storage management system based on relevant database cannot adapt to the magnanimity RDF data of explosive growth, and increasing researcher starts to utilize the mass data storage of distributed system and computation capability to solve magnanimity RDF data management problem.Distributed RDF data store and usually adopt distributed file system to store data with the form of document form or the many concordance lists of NoSQL database with inquiry, usually adopt the connection of MapReduce computation module treatment S PARQL clause at query aspects, or the API utilizing database to provide realizes query processing.The research of this respect is the study hotspot of nearly 2 years, but is also in the starting stage of research, does not also have ripe system schema to occur at present.Adopt traditional Relational DataBase storage system to store RDF data and there is many storage bottlenecks, and the non-mode feature of RDF data makes it be difficult to use the query optimization policies of Relational DBMS.Now there are some researches show that relevant database stores when processing magnanimity RDF data lower than distributed data base with search efficiency; And adopting the storage mode of file system for extensive RDF data, search efficiency is very low; Although all very fast based on the storage mode storing queries efficiency of internal memory, by the restriction of memory size, be only adapted to RDF data on a small scale.
Summary of the invention
The object of the present invention is to provide a kind of RDF distributed storage based on Redis and querying method, to solve existing centralized Redis(Remote Dictionary Server) store and inquire about the problem by memory size restriction existed.
For achieving the above object, technical scheme of the present invention is: a kind of RDF Data distribution8 formula storage means based on Redis, is characterized in that, realizes in accordance with the following steps:
S1: provide one based on the RDF Data distribution8 formula storage system of Redis, this storage system comprises a: management node (Manage Node) and the processing node (Process Node) matched with this management node (Manage Node) and memory node (Storage Node); Wherein, described management node (Manage Node) provide external interface, is responsible for receiving and resolving outside RDF data;
S2: in the Redis of memory node (Storage Node), first according to the definition in RDF body of data, set up with the class database of the class name life belonging to subject, simultaneously in such database, for each attribute is set up with the community set of this attribute names, i.e. predicate set; According to type and the predicate of resolving subject in tlv triple corresponding to rear RDF data, the subject that by subject be same class, predicate is identical is not repeatedly placed in the predicate set of such database, and be that each subject in this predicate set sets up the object set named with the predicate of its correspondence with this subject, in order to deposit this subject and all objects corresponding to predicate thereof; Then for predicate reversion backup set up in each predicate, namely according to same predicate, one is set up with the reversion predicate set of this predicate reversion predicate name; This reversion predicate set is not repeatedly placed in again by resolving the object that in tlv triple corresponding to rear RDF data, subject is same class, predicate is identical, and be that in this reversion predicate set, the subject set named with the predicate of its correspondence with this object set up in each object, to deposit this object and all subjects corresponding to predicate thereof.
In an embodiment of the present invention, each memory node (Storage Node) comprises a Redis, all can create N number of class database, and this N is positive integer in each Redis.
In an embodiment of the present invention, described management node (Manage Node) accesses the class database in the Redis of each memory node (Storage Node) by IP address corresponding to each memory node (Storage Node), port address and class database accession number.
In an embodiment of the present invention, within the storage system, the API that described processing node (Process Node) is provided by Redis communicates with described memory node (Storage Node).
In an embodiment of the present invention, within the storage system, described processing node (Process Node) and described memory node (Storage Node) relation that is multi-to-multi.
Further, a kind of RDF Data distribution8 formula querying method based on Redis is also provided, it is characterized in that, comprise the steps:
S31: management node (Manage Node) judges the Type that query statement is corresponding; If class is known belonging to the subject of query statement, predicate is unknown, then proceed to step S32; If class belonging to the subject of query statement is unknown, predicate is known, then proceed to step S33; If class is known belonging to the subject of query statement, predicate is known, then proceed to step S34;
S32: search from the class database that the Redis of each memory node (Storage Node) of described storage system is corresponding;
S33: the field of definition obtaining predicate from the ontology file of query statement, using the subject type of the common factor of the field of definition of all predicates as inquiry, is converted into the type that class belonging to subject is known, predicate is known, and proceeds to step S34;
S34: management node (Manage Node) judge subject in query statement or object whether known, if subject or object wherein have one known, management node (Manage Node) is directly inquired about, and the time complexity of this query script is O(1); If subject and or object all unknown, then proceed to step S35;
S35: management node (Manage Node) searches the registration table of processing node (ProcessNode), according to the number of processing node (ProcessNode) registered in this registration table, whole query task is divided into the subtask of corresponding number, and distributes to each processing node (ProcessNode) and inquire about; The memory node (Storage Node) that processing node (ProcessNode) is corresponding according to query statement in subquery task, inquires about from this memory node (Storage Node); Result set is returned to management node (Manage Node) after having inquired about by processing node (ProcessNode).
In an embodiment of the present invention, subject or the known inquiry of object, inquiry is divided into three phases by management node (Manage Node): query statement analysis, locator data collection and perform query manipulation.
In an embodiment of the present invention, the more and semantic more complicated query statement for tuple number of modes, query task is divided into multiple subquery task and is sent to processing node (ProcessNode) and performs by management node (Manage Node).
In an embodiment of the present invention, more and the semantic more complicated query statement for tuple number of modes, management node (Manage Node) generates connection strategy by connecting selection strategy tree (SST), and in described connection selection strategy tree (SST), selection strategy tree comprises a root node: Decision node, for generating connection strategy; Described Decision node next stage is the Pi node generated by predicate correspondence in every bar query statement, Pi node comprises two seed node: Si (subject) node and Oi (object) node, and Si (subject) node comprises the subject example that all predicates are Pi, Oi (object) node comprises the object example that all predicates are Pi; Except Decision node, each node has oneself weights, and symbol definition is as follows: i-th P node in Pi:SST; The S child node of Si: the i-th P node; The O child node of Oi: the i-th P node; A jth s child node under sj:Si node; A jth o node under oj:Oi node; Weight computing formula is as follows:
The weights value (Si) of SST according to the Si node of each Pi node and the weights value (Oi) of Oi node, obtains data query collection, specifically comprises the steps:
S41: if value (Si) >value (Oi), then proceed to step S42; If if value (Si) <value (Oi), then proceed to step S43; If value (Si)=value (Oi), then proceed to step S42 or step S43 at random;
S42: using the object of query statement as key, corresponding subject set as value stored in Map, and value (Pi)=value (Oi);
S43: using the subject of query statement as key, corresponding object set as value stored in Map, and value (Pi)=value (Si);
Compared to prior art, the present invention has following beneficial effect: a kind of RDF distributed storage method based on Redis proposed by the invention and querying method, meet the non-mode feature of RDF data, RDF data are stored in the high-speed cache of Redis with the form of key-value, compare file storage and there is search efficiency faster, for simple query statement, query time can not increase along with the increase of data volume, search efficiency is close to constant time, for the query statement of complexity, in conjunction with proposed storage means and selection strategy tree (SST) connection selecting method, make it also have and well inquire about effect, the design of distributed type assemblies, how many real Redis are specifically had to store after making need not to be concerned about during inquiry, so just can carry out infinite expanding memory node by parallel expansion Redis Master server, effectively reduce query context, improve search efficiency, and also can efficiently work when the more and semanteme of tuple number of modes inquired about is more complicated.
Accompanying drawing explanation
Fig. 1 is the systematic schematic diagram based on the RDF distributed storage method of Redis in the present invention.
Fig. 2 is the storage principle figure based on the RDF distributed memory system of Redis in the present invention.
Fig. 3 is the systematic schematic diagram based on the RDF distributed enquiring method of Redis in the present invention.
Fig. 4 is the structural drawing of selection strategy tree (SST) in the present invention.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is specifically described.
The invention provides a kind of RDF distributed storage method based on Redis,
S1: provide one based on the RDF Data distribution8 formula storage system of Redis, as shown in Figure 1, this storage system comprises: a management node (Manage Node) and the processing node (Process Node) matched with this management node (Manage Node) and memory node (Storage Node); Wherein, described management node (Manage Node) provide external interface, be responsible for receiving and resolving outside RDF data, and these RDF data are divided by the class in tlv triple belonging to subject, each class after dividing is stored into corresponding memory node (Storage Node), management node (Manage Node) provides external query interface simultaneously, and external system can carry out data query by the query interface of management node (Manage Node);
S2: as shown in Figure 2, in the Redis of memory node (Storage Node), first according to the definition in RDF body of data, set up with the class database of the class name life belonging to subject, as DB_Type_1, DB_Type_n etc., simultaneously in such database, for each attribute is set up with the community set of this attribute names, i.e. predicate set, as the P1_Set in DB_Type_1; According to type and the predicate of resolving subject in tlv triple corresponding to rear RDF data, the subject that by subject be same class, predicate is identical is not repeatedly placed in the predicate set of such database, and be that each subject in this predicate set sets up the object set named with the predicate of its correspondence with this subject, in order to deposit this subject and all objects corresponding to predicate thereof, S1_P1_Set as corresponding in the S1 in P1_Set and this S1; Then for predicate reversion backup set up in each predicate, namely according to same predicate, one is set up with the reversion predicate set of this predicate reversion predicate name, as P1_Reverse_Set; This reversion predicate set is not repeatedly placed in again by resolving the object that in tlv triple corresponding to rear RDF data, subject is same class, predicate is identical, and be that in this reversion predicate set, the subject set named with the predicate of its correspondence with this object set up in each object, to deposit this object and all subjects corresponding to predicate thereof, as the O1 in P1_Reverse_Set, and the O1_P1_Reverse_Set that this O1 is corresponding.
By adopting said method, no matter being known for subject or that object is known query statement, effectively can reducing query context, improve search efficiency.
Further, as depicted in figs. 1 and 2, in the present embodiment, each memory node (Storage Node) comprises a Redis, all can create N number of database, and this N is positive integer in each Redis; Described management node (Manage Node) accesses the class database in the Redis of each memory node (Storage Node) by IP address corresponding to each memory node (Storage Node), port address and database accession number; In whole storage system, described processing node (Process Node) by Redis provide API communicate with described memory node (Storage Node); The relation that processing node described in native system (Process Node) and described memory node (Storage Node) are multi-to-multi.
Further, a kind of RDF distributed enquiring method based on Redis is also provided, after completing Distributed Storage, be queried the Type that data set just can be determined according to query statement, navigate to the StorageNode at data place, then navigate to the data set at place according to predicate, thus when reducing data set, inquire about, as shown in Figure 3, realize in accordance with the following steps:
S31: management node (Manage Node) judges the Type that query statement is corresponding; If class is known belonging to the subject of query statement, predicate is unknown, then proceed to step S32; If class belonging to the subject of query statement is unknown, predicate is known, then proceed to step S33; If class is known belonging to the subject of query statement, predicate is known, then proceed to step S34;
S32: search from the class database that the Redis of each memory node (Storage Node) of described storage system is corresponding;
S33: the field of definition obtaining predicate from the ontology file of query statement, using the subject type of the common factor of the field of definition of all predicates as inquiry, is converted into the type that class belonging to subject is known, predicate is known, and proceeds to step S34;
S34: management node (Manage Node) judge subject in query statement or object whether known, if subject or object wherein have one known, because the storage means in the present embodiment is that key value is to storage, or when subject object wherein have one known when, management node (Manage Node) is directly inquired about, and the time complexity of this query script is O(1); If subject and or object all unknown, then proceed to step S35;
S35: management node (Manage Node) searches the registration table of processing node (ProcessNode), in the present embodiment, this registration table stores the relevant informations such as the IP of processing node, according to the number of processing node (ProcessNode) registered in this registration table, whole query task is divided into the subtask of corresponding number, and distributes to each processing node (ProcessNode) and inquire about; Processing node (ProcessNode), according to the memory node (Storage Node) at query statement corresponding data place in subquery task, carries out data query from this memory node (Storage Node); Result set is returned to management node (Manage Node) after having inquired about by processing node (ProcessNode).
If do not adopt the storage means and querying method that propose in the embodiment of the present invention, for the query statement of complexity, need to find all related datas, then carry out attended operation, but adopt the querying method by the storage means that proposes in the present embodiment and correspondence, owing to setting up in storage system process, corresponding management node (Manage Node), each processing node (ProcessNode) and each memory node (Storage Node) establish corresponding topology diagram, processing node (ProcessNode) as long as obtain the data required for present treatment node (ProcessNode) connection from the database be assigned to, carry out attended operation, Query Result is gathered management node (Manage Node) by last each processing node (ProcessNode), effectively can utilize distributed proccessing like this, also the memory pressure of single personal computer can be alleviated.
In the present embodiment, for subject or the known inquiry of object, inquiry is divided into three phases by management node (ManageNode): query statement analysis, locator data collection (i.e. place class database) and perform query manipulation.For Q1:
Query steps for Q1 performs as follows:
S51:ManageNode analysis and consult statement, obtains the class belonging to result set, type:GraduateStudent;
S52: obtain predicate, predicate:takesCourse, because object is known, therefore PREDICAT is takesCourse_R;
S53: the storageNode obtaining data set place according to type;
S54:ManageNode performs query manipulation;
Specific implementation process is as follows:
1.Begin
2.sparqlQuery: the sparql query statement of required inquiry;
3.getDataBaseByType (): the storageNode obtaining place according to Type; The response of order returns, and the result that many are ordered can be bundled to and return to client together after processing many orders by redis service end;
5.pl.smembers (key): obtain the element in set set corresponding in redis according to key;
6.type = sparqlQuery.getType;
7.dataBase = getDataBaseByType(type);
8.predicate = sparqlQuery.getPredicate;
9.IF (subject is known)
10. key= subject +”_”+predicate;
11.ELSE (object is known)
12. key=object +”_”+predicate+”_R”;
13.End IF
14.pl=dataBase.getPipeline ;
15.Set<String> response = pl.smembers(key);
16. pl.sync;
17.End
Wherein, what deposit in response is exactly the result set inquired about.
In the present embodiment, the more and semantic more complicated query statement for tuple number of modes, query task is divided into multiple subquery task and is sent to processing node (ProcessNode) and performs by management node (Manage Node)., for Q9:
Query steps for Q9 is as follows:
S61:ManageNode obtains in all Student memory nodes, advisor, advisor_R, takeCourse, the size of teacherOf, teacherOf_R set in takeCourse_R and all Faculty memory nodes, thus structure is selected spanning tree to generate to connect selection strategy;
S62:ManageNode searches ProcessNode registration table, according to the number of registered ProcessNode, whole query task is divided into the subtask of corresponding number, and sends to the ProcessNode of registration to calculate the storageNode information package at the data place required for subtask and subtask;
After S63:ProcessNode has inquired about, result set is returned to ManageNode;
ManageNode division of tasks algorithm pseudo code:
1.Begin
2.dataBase: the database at first statement subject place in inquiry plan;
3.predicate: the predicate of first statement in inquiry plan;
4.keySet: the public subject set of first statement and Article 2 statement in inquiry plan;
5.processNum: the number being connected to the ProcessNode of ManageNode;
6.dataMap: store the related data needed for ProcessNode subtasking;
7.separateSet (Set set, int num): set is divided into num set;
The packet communicated between 8.DataPacket:ManageNode and ProcessNode;
9.Pipeline pl=dataBase.getPipeline; The data pipe in // connection data storehouse
10.keySet = pl.smembers(predicate);
11. List<Set<String>> list = separateSet(keySet,processNum);
12. FOR(int i=0;i<processNum;i++)
13. dataMap.put("keySet", l.get(i));
14. DataPacket dataPacket = new DataPacket(DataPacket.search_type, dataMap);
15.objectOutputStream[i].writeObject(dataPacket);
16.END FOR
17.End
The join algorithm pseudo-code of ProcessNode:
1.Begin
The database at 2.dataBasei: data set i place;
3.predicatei: the predicate of i-th statement in inquiry plan;
4.keySet: the public subject set of first statement and Article 2 statement in inquiry plan;
5.Pipeline pli=dataBasei.getPipeline; The data pipe of // connection i-th database
6.FOR(KEY:keySet)
7. Set L1 = pl1.smembers(KEY+"_"+predicate1);
8. Set L2 = pl2.smembers(KEY+"_"+predicate2);
9. FOR(STRING1:L1)
10. FOR(STRING2;l2)
11. IF(pl3.sismember(
STRING1+"_"+predicate3, STRING2))
12. // do anything
13. END IF
14. END FOR
15. END FOF
16.END FOR
17.End
Further, in the pattern match about BGP (Basic Graph Pattern), by ensureing under the prerequisite that Query Result is correct someway, its query script time cost is reduced, title the method is an optimisation strategy about BGP.SST(SelectivityStrategyTree) connect selection strategy by the analysis to query statement, do not repeat the number of subject by obtaining from corresponding stored node in corresponding predicate set and do not repeat the number of object, generate selection strategy tree.In the present embodiment, as shown in Figure 4, more and the semantic more complicated query statement for tuple number of modes, management node (Manage Node) generates connection strategy by connecting selection strategy tree (SST), and described selection strategy tree SST comprises a root node, i.e. decision node Decision Node, is responsible for generating connection strategy, the next stage be connected with described decision node Decision Node is by the predicate node Predicate Node generated by predicate in every bar query statement, each predicate node Predicate Node comprises two seed node: subject node Subject Node and object node Object Node, described subject node Subject Node comprises all subjects that predicate is the corresponding predicate of this predicate node Predicate Node, described object node Object Node comprises all objects that predicate is the corresponding predicate of this predicate node Predicate Node, namely Decision node next stage is that the Pi node generated by predicate correspondence in every bar query statement (does not comprise type, type and the class belonging to each subject), Pi node comprises two seed node: Si (subject) node and Oi (object) node, and Si (subject) node comprises the subject example that all predicates are Pi, Oi (object) node comprises the object example that all predicates are Pi.As shown in Figure 4, comprise for the subject node S1 of predicate node P1, P1 the subject example that all predicates are P1; Object node O1 comprises the object example that all predicates are P1.
Further, in the present embodiment, in described selection strategy tree (SST), except Decision node, each node has oneself weights, and symbol definition is as follows: i-th P node in Pi:SST; The S child node of Si: the i-th P node; The O child node of Oi: the i-th P node; A jth s child node under sj:Si node; A jth o node under oj:Oi node; Weight computing formula is as follows:
The weights value (Si) of SST according to the Si node of each Pi node and the weights value (Oi) of Oi node, obtains data query collection, specifically comprises the steps:
S41: if value (Si) >value (Oi), then proceed to step S42; If if value (Si) <value (Oi), then proceed to step S43; If value (Si)=value (Oi), then proceed to step S42 or step S43 at random;
S42: using the object of query statement as key, corresponding subject set as value stored in Map, and value (Pi)=value (Oi);
S43: using the subject of query statement as key, corresponding object set as value stored in Map, and value (Pi)=value (Si);
Wherein, during the course, Map is the container that key-value pair stores, in order to ensure when key is known at O(1) find value in the time.
The weights of all predicate node Predicate Node are obtained in selection strategy tree SST, the weights of each predicate node Predicate Node sort by selection strategy tree SST from small to large, and two corresponding for predicate in predicate node Predicate Node minimum for weights query statements are first connected, connect the result generated to be connected with next query statement again, complete query statement and connect.In the present embodiment, for query statement Q9 the most complicated in LUMB:
Then calculated by above-mentioned steps, can show that corresponding connection scheme is: 2->1->3.
Be more than preferred embodiment of the present invention, all changes done according to technical solution of the present invention, when the function produced does not exceed the scope of technical solution of the present invention, all belong to protection scope of the present invention.

Claims (9)

1. based on a RDF Data distribution8 formula storage means of Redis, it is characterized in that, realize in accordance with the following steps:
S1: provide one based on the RDF Data distribution8 formula storage system of Redis, this storage system comprises a: management node (Manage Node) and the processing node (Process Node) matched with this management node (Manage Node) and memory node (Storage Node); Wherein, described management node (Manage Node) provide external interface, is responsible for receiving and resolving outside RDF data;
S2: in the Redis of memory node (Storage Node), first according to the definition in RDF body of data, set up with the class database of the class name belonging to subject, simultaneously in such database, for each attribute is set up with the community set of this attribute names, i.e. predicate set; According to type and the predicate of resolving subject in tlv triple corresponding to rear RDF data, the subject that by subject be same class, predicate is identical is not repeatedly placed in the predicate set of such database, and be that each subject in this predicate set sets up the object set named with the predicate of its correspondence with this subject, in order to deposit this subject and all objects corresponding to predicate thereof; Then for predicate reversion backup set up in each predicate, namely according to same predicate, one is set up with the reversion predicate set of this predicate reversion predicate name; This reversion predicate set is not repeatedly placed in again by resolving the object that in tlv triple corresponding to rear RDF data, subject is same class, predicate is identical, and be that in this reversion predicate set, the subject set named with the predicate of its correspondence with this object set up in each object, to deposit this object and all subjects corresponding to predicate thereof.
2. a kind of RDF distributed storage method based on Redis according to claim 1, is characterized in that: each memory node (Storage Node) comprises a Redis, all can create N number of class database, and this N is positive integer in each Redis.
3. a kind of RDF distributed storage method based on Redis according to claim 1, is characterized in that: described management node (Manage Node) accesses the class database in the Redis of each memory node (Storage Node) by IP address corresponding to each memory node (Storage Node), port address and class database accession number.
4. a kind of RDF distributed storage method based on Redis according to claim 1, it is characterized in that: within the storage system, the API that described processing node (Process Node) is provided by Redis communicates with described memory node (Storage Node).
5. a kind of RDF distributed storage method based on Redis according to claim 1, it is characterized in that: within the storage system, the relation that described processing node (Process Node) and described memory node (Storage Node) are multi-to-multi.
6., based on the RDF Data distribution8 formula querying method based on Redis of a kind of RDF distributed storage method based on Redis described in any one of claim 1 ~ 5, it is characterized in that, comprise the steps:
S31: management node (Manage Node) judges the Type that query statement is corresponding; If class is known belonging to the subject of query statement, predicate is unknown, then proceed to step S32; If class belonging to the subject of query statement is unknown, predicate is known, then proceed to step S33; If class is known belonging to the subject of query statement, predicate is known, then proceed to step S34;
S32: search from the class database that the Redis of each memory node (Storage Node) of described storage system is corresponding;
S33: the field of definition obtaining predicate from the ontology file of query statement, using the subject type of the common factor of the field of definition of all predicates as inquiry, is converted into the type that class belonging to subject is known, predicate is known, and proceeds to step S34;
S34: management node (Manage Node) judge subject in query statement or object whether known, if subject or object wherein have one known, management node (Manage Node) is directly inquired about, and the time complexity of this query script is O(1); If subject and or object all unknown, then proceed to step S35;
S35: management node (Manage Node) searches the registration table of processing node (ProcessNode), according to the number of processing node (ProcessNode) registered in this registration table, whole query task is divided into the subtask of corresponding number, and distributes to each processing node (ProcessNode) and inquire about; The memory node (Storage Node) that processing node (ProcessNode) is corresponding according to query statement in subquery task, inquires about from this memory node (Storage Node); Result is returned to management node (Manage Node) after having inquired about by processing node (ProcessNode).
7. a kind of RDF distributed enquiring method based on Redis according to claim 6, it is characterized in that: subject or the known inquiry of object, inquiry is divided into three phases by management node (Manage Node): query statement analysis, locator data collection and perform query manipulation.
8. a kind of RDF distributed enquiring method based on Redis according to claim 6, it is characterized in that: the more and semantic more complicated query statement for tuple number of modes, query task is divided into multiple subquery task and is sent to processing node (ProcessNode) and performs by management node (Manage Node).
9. a kind of RDF distributed enquiring method based on Redis according to claim 6, it is characterized in that: the more and semantic more complicated query statement for tuple number of modes, management node (Manage Node) generates connection strategy by connecting selection strategy tree (SST), and in described connection selection strategy tree (SST), selection strategy tree comprises a root node: Decision node, for generating connection strategy; Described Decision node next stage is the Pi node generated by predicate correspondence in every bar query statement, Pi node comprises two seed node: Si (subject) node and Oi (object) node, and Si (subject) node comprises the subject example that all predicates are Pi, Oi (object) node comprises the object example that all predicates are Pi; Except Decision node, each node has oneself weights, and symbol definition is as follows: i-th P node in Pi:SST; The S child node of Si: the i-th P node; The O child node of Oi: the i-th P node; A jth s child node under sj:Si node; A jth o node under oj:Oi node; Weight computing formula is as follows:
The weights value (Si) of SST according to the Si node of each Pi node and the weights value (Oi) of Oi node, obtains data query collection, specifically comprises the steps:
S41: if value (Si) >value (Oi), then proceed to step S42; If if value (Si) <value (Oi), then proceed to step S43; If value (Si)=value (Oi), then proceed to step S42 or step S43 at random;
S42: using the object of query statement as key, corresponding subject set as value stored in Map, and value (Pi)=value (Oi);
S43: using the subject of query statement as key, corresponding object set as value stored in Map, and value (Pi)=value (Si).
CN201510213313.XA 2015-04-30 2015-04-30 RDF (radial distribution function) data distributed type storage and querying method based on Redis Pending CN104778277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510213313.XA CN104778277A (en) 2015-04-30 2015-04-30 RDF (radial distribution function) data distributed type storage and querying method based on Redis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510213313.XA CN104778277A (en) 2015-04-30 2015-04-30 RDF (radial distribution function) data distributed type storage and querying method based on Redis

Publications (1)

Publication Number Publication Date
CN104778277A true CN104778277A (en) 2015-07-15

Family

ID=53619741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510213313.XA Pending CN104778277A (en) 2015-04-30 2015-04-30 RDF (radial distribution function) data distributed type storage and querying method based on Redis

Country Status (1)

Country Link
CN (1) CN104778277A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447156A (en) * 2015-11-30 2016-03-30 北京航空航天大学 Resource description framework distributed engine and incremental updating method
CN105760221A (en) * 2016-02-02 2016-07-13 中博信息技术研究院有限公司 Task dispatching system with distributed calculating frame
CN106156319A (en) * 2016-07-05 2016-11-23 北京航空航天大学 Telescopic distributed resource description framework data storage method and device
CN106528648A (en) * 2016-10-14 2017-03-22 福州大学 Distributed keyword approximate search method for RDF in combination with Redis memory database
CN106790742A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 A kind of method and device of IP matchings
CN108763451A (en) * 2018-05-28 2018-11-06 福州大学 Streaming RDF data parallel reasoning algorithm based on Spark Streaming
CN109522053A (en) * 2017-09-20 2019-03-26 阿里巴巴集团控股有限公司 A kind of massive parallel processing and data processing method
CN109992658A (en) * 2019-04-09 2019-07-09 智言科技(深圳)有限公司 A kind of SPARQL inquiring structuring method of Knowledge driving
CN110909111A (en) * 2019-10-16 2020-03-24 天津大学 Distributed storage and indexing method based on knowledge graph RDF data characteristics
CN113312432A (en) * 2021-05-08 2021-08-27 北京旷视科技有限公司 Associated information processing method and device, computer storage medium and electronic equipment
CN113590647A (en) * 2021-07-29 2021-11-02 中国联合网络通信集团有限公司 SQL statement optimization method, device, equipment, storage medium and product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN103475687A (en) * 2013-05-24 2013-12-25 北京网秦天下科技有限公司 Distributed method and distributed system for downloading website data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN103475687A (en) * 2013-05-24 2013-12-25 北京网秦天下科技有限公司 Distributed method and distributed system for downloading website data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
董书暕等: ""HMSST:一种高效的SPARQL查询优化算法"", 《计算机科学》 *
邓海龙: ""基于列数据库和图缓存的海量RDF管理"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447156A (en) * 2015-11-30 2016-03-30 北京航空航天大学 Resource description framework distributed engine and incremental updating method
CN105760221A (en) * 2016-02-02 2016-07-13 中博信息技术研究院有限公司 Task dispatching system with distributed calculating frame
CN106156319A (en) * 2016-07-05 2016-11-23 北京航空航天大学 Telescopic distributed resource description framework data storage method and device
CN106528648B (en) * 2016-10-14 2019-10-15 福州大学 In conjunction with the distributed RDF keyword proximity search method of Redis memory database
CN106528648A (en) * 2016-10-14 2017-03-22 福州大学 Distributed keyword approximate search method for RDF in combination with Redis memory database
CN106790742A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 A kind of method and device of IP matchings
CN109522053A (en) * 2017-09-20 2019-03-26 阿里巴巴集团控股有限公司 A kind of massive parallel processing and data processing method
CN108763451A (en) * 2018-05-28 2018-11-06 福州大学 Streaming RDF data parallel reasoning algorithm based on Spark Streaming
CN108763451B (en) * 2018-05-28 2022-03-11 福州大学 Streaming RDF data parallel reasoning algorithm based on Spark Streaming
CN109992658A (en) * 2019-04-09 2019-07-09 智言科技(深圳)有限公司 A kind of SPARQL inquiring structuring method of Knowledge driving
CN109992658B (en) * 2019-04-09 2023-04-11 智言科技(深圳)有限公司 Knowledge-driven SPARQL query construction method
CN110909111A (en) * 2019-10-16 2020-03-24 天津大学 Distributed storage and indexing method based on knowledge graph RDF data characteristics
CN110909111B (en) * 2019-10-16 2023-07-14 天津大学 Distributed storage and indexing method based on RDF data characteristics of knowledge graph
CN113312432A (en) * 2021-05-08 2021-08-27 北京旷视科技有限公司 Associated information processing method and device, computer storage medium and electronic equipment
CN113590647A (en) * 2021-07-29 2021-11-02 中国联合网络通信集团有限公司 SQL statement optimization method, device, equipment, storage medium and product
CN113590647B (en) * 2021-07-29 2024-02-23 中国联合网络通信集团有限公司 SQL sentence optimization method, device, equipment, storage medium and product

Similar Documents

Publication Publication Date Title
CN104778277A (en) RDF (radial distribution function) data distributed type storage and querying method based on Redis
EP3365808B1 (en) Proxy databases
CN101436192B (en) Method and apparatus for optimizing inquiry aiming at vertical storage type database
CN107291807B (en) SPARQL query optimization method based on graph traversal
Lanti et al. The NPD Benchmark: Reality Check for OBDA Systems.
US10360269B2 (en) Proxy databases
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
US7730055B2 (en) Efficient hash based full-outer join
US20190042624A1 (en) Computer-implemented method for improving query execution in relational databases normalized at level 4 and above
CN102999563A (en) Network resource semantic retrieval method and system based on resource description framework
US20100235344A1 (en) Mechanism for utilizing partitioning pruning techniques for xml indexes
CN103823846A (en) Method for storing and querying big data on basis of graph theories
CN103605750B (en) A kind of quick distributed data paging method
CN105550332A (en) Dual-layer index structure based origin graph query method
CN105357247A (en) Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network
Abdelaziz et al. Query optimizations over decentralized RDF graphs
CN108804580B (en) Method for querying keywords in federal RDF database
Cappellari et al. A path-oriented rdf index for keyword search query processing
CN107609091B (en) Method for realizing cross-database multi-table combined query system
Wang et al. A provenance storage method based on parallel database
Rajith et al. JARS: join-aware distributed RDF storage
Zhu et al. Hydb: Access optimization for data-intensive service
Katchaounov et al. Scalable view expansion in a peer mediator system
He et al. A method of RDF fuzzy query based on no query language service with permutated breadth first search algorithm
Li et al. Query optimization for massive RDF data based on Spark

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150715