CN108959318A - Distributed keyword query method based on RDF graph - Google Patents
Distributed keyword query method based on RDF graph Download PDFInfo
- Publication number
- CN108959318A CN108959318A CN201710376203.4A CN201710376203A CN108959318A CN 108959318 A CN108959318 A CN 108959318A CN 201710376203 A CN201710376203 A CN 201710376203A CN 108959318 A CN108959318 A CN 108959318A
- Authority
- CN
- China
- Prior art keywords
- rdf
- vertex
- sentence
- subgraph
- crt
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention designs a kind of distributed keyword query method based on RDF graph, belongs to information retrieval field.The present invention converts RDF sentence figure for RDF data figure first;Secondly conditional depth-priority-searching method and simulated annealing are utilized, RDF sentence figure is split according to two most basic principles of data balancing between subgraph after edge cut set minimum and segmentation;The RDF sentence figure after segmentation is finally refined as RDF data figure, the vertex cut set of RDF graph is obtained, and utilize reverse search algorithm and Hadoop distributed computing framework, realizes the efficient quick search of keyword.The present invention efficiently solves traditional algorithm to the limitation of large-scale dataset segmentation efficiency in the case where guaranteeing the atomicity and semantic integrity of RDF data, and greatly improves the search efficiency of keyword.
Description
Technical field
The present invention relates to the distributed keyword query methods based on RDF graph, belong to information retrieval field.
Background technique
Figure is a kind of generally existing data structure, is widely used in every field.Keyword based on RDF graph structure is looked into
Inquiry is a current research hotspot, it allows user in the case where not using labyrinth query language, is obtained efficient
Query result.Current most of search algorithms are realized under centralized environment, i.e., keyword query can only be on single machine
Processing.In fact, it is very time-consuming for carrying out keyword query on single machine as the scale of RDF graph constantly expands, therefore
There are highly important theoretical value and realistic meaning to figure processing and storage under distributed environment.
Currently used keyword query technology is the digraph that RDF data is expressed as to a tape label, the top in figure
Subject and object in the corresponding triple of point, predicate is side.Make to figure the related information that RDF data had both been able to maintain between data
Semantic information is not lost again, therefore the query processing of RDF data is usually transformed into figure matching problem, i.e., on RDF data figure
Positioning includes the steiner tree (Steiner Tree) of keyword.Since the connectivity of diagram data inherently and figure calculate performance
Strong coupling feature out, so need to just reduce each son of distributed treatment as far as possible to realize the efficient parallel processing of figure
The degree of coupling between figure, then effective figure segmentation is exactly to realize the important means of decoupling.Current figure partitioning algorithm mainly has two
A principle: first is that improving the connectivity inside subgraph, the connectivity between subgraph is reduced;Second is that considering the equilibrium of subgraph scale
Property, guarantee that the data scale of each subgraph is balanced as far as possible, biggish inclination do not occur.Wherein Kim et al. proposes the side SBV-Cut
Method, this method determine balance vertex by the method for random walk, are split according to balance top pair graph, so that each height
It include approximately equal vertex in figure;And guarantee that the quantity at point of contact is minimum by expansion and modularity two attributes.
However, the number on vertex is identical (or close) in each subgraph after being divided using this method, but each vertex is closed
The quantity on the side of connection is different, to cause the imbalance of data between subgraph.Simultaneously with the continuous expansion of figure scale, pass
The limitation of system algorithm (such as KL, DFEP, VSEP) in figure scale, so that these algorithms can not meet data in explosive
The demand of growth.
Summary of the invention
The deficiency of the present invention regarding to the issue above proposes the distributed keyword query method based on RDF graph.This method exists
In the case where guaranteeing segmentation sub-graph data balance, the efficient segmentation of big data figure is realized, while can be realized the quick of keyword
Inquiry, to meet the query demand of user.Technical solution implementation steps are as follows:
(1) RDF sentence figure is converted by RDF graph:
It is concentrated in RDF data, each basic statement of RDF triple as RDF data indicates one of resource on Web
A integrated semantic, therefore when being split to RDF data collection, it is necessary to assure the atomicity of each RDF triple;Simultaneously
Blank node simply indicates the presence of something or other, without specified overall identification so can only share same in local use
The RDF triple of blank node expresses the common context of blank node, if such RDF triple is separated, table
The context reached will be destroyed.Therefore RDF sentence s is made of RDF triple and meets the following conditions:
Any two RDF member ancestral in condition 1:s be it is attachable, i.e., when two RDF member ancestrals share the same blank section
When point, the two RDF member ancestrals are attachable;
Any one RDF member ancestral in condition 2:s cannot connect with the RDF member ancestral not in s;
The present invention uses (s, p, o) to describe a RDF triple, and is abbreviated as t, with s (t), p (t) and o (t) difference table
Show main body, predicate and the object in triple, wherein RDF digraph and RDF sentence figure are defined as follows:
RDF digraph: setting G=(V, E, L) indicates the RDF digraph of a tape label, wherein by main body in RDF triple
With the vertex set V={ v | v ∈ s (T) ∪ o (T) } of object composition, the directed edge of the predicate composition of relationship between subject and object
Set E={<s (t), o (t)>| t ∈ T }, object vertex is directed toward by main body vertex in the direction on side.L is the set of label, L=
Lv∪Lp, wherein LvIndicate vertex label, LpIndicate predicate label.
RDF sentence figure: G is sets=(S, E, l, w) indicates a RDF sentence figure, it is a vertex weighted-graph,
Each of middle figure node corresponds to a RDF sentence, and S indicates the vertex in figure, and E indicates the side in figure.If si, sj∈ S and si
≠sj,t′∈sjAnd there is t.s=t ' .s or t.o=t ' .s then siAnd sjIt is associated, i.e. (si, sj)∈E.L is
One label function, forl(si) it is the subject comprising the sentence, the local label collection of predicate or object
It closes;W is the weight on vertex,w(si) it is equal to the number of RDF member ancestral included in the sentence.
(2) based on the side partitioning algorithm (REC) of RDF sentence figure:
According to two principles of figure segmentation it is found that one is divided it is necessary to reduce as far as possible in order to reduce Internet traffic
Side quantity.Therefore the present invention carries out depth-first traversal using the smallest principle of Vertex Degree, since this method easily sinks into office
Portion's optimal solution, so avoiding the situation using simulated annealing;Two in order to realize the balances of data between subgraph, by RDF graph
In side be evenly distributed in each subgraph, i.e., the number on side in each RDF subgraphIf G will be schemedsIt is divided into k
Subgraph indicates the subgraph where each vertex using function P, wherein different subgraph is indicated with { 1,2...k }, then label
Subgraph where j=p (s) indicates sentence s is Sj, wherein SjMeet following two conditionAnd Si∩Sj=Φ,Wherein, the specific steps of the side partitioning algorithm of RDF sentence figure are as follows:
Step A: the number on side in input RDF subgraph
According to RDF sentence figure Gs, input the number e on side in RDF subgraph;
Step B: setting " access flag "
The smallest vertex of degree in RDF sentence figure is solved, is put it into set D, and is each vertex setting one
A " access flag ";
Step C: the vertex in traversal set D is split RDF sentence figure
C1: the access order on vertex in given set D, if currently in RDF sentence figure all not visited vertex power
The sum of weight | Gs| > e then sequentially selects a not visited vertex from D;
C2: the vertex is added in set S while the state on the vertex being set to has accessed;
C3: if all vertex weights of current subgraph | S | < e and set N (S, the G on the vertex adjacent with set Ss)!
=null, then from N (S, Gs) one vertex v of middle random selection;
C4: if the vertex v is not visited and removes vertex adjacent with vertex v in S | N (v, Gs) S | number most
Small, i.e. the degree on the vertex is minimum.Jump to step C2;
C5: if in current collection S vertex weight | S | < e, and N (S, Gs)=null then returns to the top of recent visit
Point, and jump to step C4;
C6: if at this time in current collection S vertex weight | S | > e jumps to step C1;
D step: optimal solution is sought in simulated annealing
D1: the sequence of vertex access in given set D;
D2: the number on the side divided according to such access order is calculated;
D3: the access order on two vertex in random replacement D, if divide at this time while number be less than in step D2 while
Number illustrates that new result is better than old as a result, then replacing old access order with new access order;
D4: simulated annealing function is finally called;
(3) keyword query is realized using Hadoop distributed computing framework
I step: divide the determination of the vertex cut set of RDF graph
The vertex cut set of segmentation RDF graph can be obtained in the intersection for solving RDF subgraph;
II step: the determination of directed tree
The definition of given query result tree (RT), if the set with the matched vertex keyword K is M (K)={ m1,
m2...ms, query result is defined as the directed tree for meeting following condition:
1) tree root of G is R;
2) for each set miIn some vertex vi, exist from R to viDirected path
During keyword query, the query result tree that those include Partial key word is known as candidate result tree,
Referred to as CRT.
III step: keyword query is realized using reverse search algorithm
The process for realizing keyword query using MapReduce is specifically described below, is mainly made of following four step,
That is MR1: in the map stage, candidate result tree is searched first with reverse search algorithm, indicates (CRT with 4 yuan of ancestralsi, K, B, Si),
Middle CRTiIt is the one tree using R as tree root, K indicates CRTiThe keyword for being included, B indicate the divided top in the subgraph
Point, SiIndicate CRTiThe subgraph at place;Secondly, 4 yuan of ancestrals to be packaged into the key-value pair < B, CRT of key-valuei>;
MR2: two candidate result tree CRT in the combiner stage, in different subgraphsiAnd CRTjIf the two CRT
Associated segmentation vertex Vi∩Vj≠ Φ, then by CRTiAnd CRTjIt is put into only one combiner;
MR3: in the reduce stage, the CRT in the same combiner is merged;If including in the result after merging
The keyword of all inquiries then exports the query result;
MR4: in the range stage, since RDF data collection is very big, an inquiry might have multiple matched inquiries
As a result.However, user is only interested in the query result of sub-fraction under normal conditions, it is therefore desirable to using score function to looking into
Result is ask to score.Therefore this patent is scored using currently used method using the compactedness of result, returns to Top-k
A query result.
Detailed description of the invention
The basic framework of distributed keyword query method of the Fig. 1 based on RDF graph
Fig. 2 RDF exemplary diagram
Fig. 3 converts RDF graph to the exemplary diagram of RDF sentence figure
The flow chart of side partitioning algorithm of the Fig. 4 based on RDF sentence figure
The flow chart of Fig. 5 simulated annealing
The flow chart of Fig. 6 MapReduce realization keyword query
Fig. 7 RDF segmentation figure
The comparison figure of the response time of Fig. 8 different partitioning algorithms
Fig. 9 keyword query example
The comparison figure of response time before and after Figure 10 parallelization
Specific embodiment
Below with reference to the embodiments and with reference to the accompanying drawing further description of the technical solution of the present invention.
Embodiment: this patent using true data set swetodblp (http://lsdis.cs.uga.edu/
Projects/SemDis/Swetodblp), Data subject is the information that computer science is published an article.In the data altogether
Comprising 681636 triples, storage occupies 53.6MB, and number of edges and number of vertex are respectively 1026375 and 373219.
The present invention is based on the flow charts of the distributed keyword query method of RDF graph, as shown in Figure 1, leading as we know from the figure
To include following 3 stages:
First stage: RDF sentence figure is converted by the RDF graph in Fig. 2, as shown in Figure 3.From figure it is found that will have no right to have
It is converted into the undirected RDF sentence figure of vertex cum rights to RDF graph, the number in RDF sentence figure represents the RDF tri- for including in the sentence
The number of tuple.
Second stage: the side partitioning algorithm based on RDF sentence figure, flow chart is as shown in figure 4, the algorithm is substantially sharp
Figure segmentation is carried out with the smallest principle of degree of vertex in figure and depth-first traversal.Below according to the RDF sentence figure in Fig. 3, give
The example that the fixed algorithm is realized.
Primary condition: this degree of 4 vertex in figure of D={ S3, S4, S5, S6 } is 1, by visited [Si]=
False (i=3,4,5,6).It is assumed that figure is divided into 2 subgraphs, i.e. k=2, then in each RDF subgraph side number e=(2+1
+ 1+1+1+3)/2=4.5, probably there are 5 sides in each subgraph.
C1: it is assumed that the access order on vertex is { S5, S6, S3, S4 }, the at this time weight on vertex not visited in figure in D
For 9 > 5, select the vertex begun stepping through for S5
C2: S5 is added to S={ S5 } in set S, and visited [S5]=true
C3:| S |=2 < 5, N (S, Gs)={ S1 }
C4: S1 is added in set S, S={ S1, S5 } and visited [S1]=true;Due to | S |=3 < 5 continue
Execute R2, R3.N (S, Gs)={ S1, S5, S6, S2 } in only S6 and S2 it is not visited, wherein | N (S6, Gs) S |=0, | N
(S2, Gs) S |=| { S1, S3, S4 } { S1, S5 } |=| { S3, S4 } | the associated vertex=2, S6 be less than the associated vertex S2,
Therefore S6 is added to S={ S1, S5, S6 } and visited [S6]=true. in set S
C5: however with N (S, Gs) S={ S1, S5, S6 } { S1, S5, S6 }=NULL, and | S |=4 < 5, therefore retract
To the vertex S1 of recent visit, the vertex adjacent with S1 only has S2, therefore S2 is added to S=in S { S1, S5, S6, S2 },
Visited [S2]=true, | S |=5, then side associated with vertex in set S is removed, that is, removes S2-S3 and S2-S4
Two sides.Its weight of S3 not visited in figure at present, S4 and be 3+1=4 < 5, therefore remaining figure do not have to be split again,
Two subgraphs are finally obtained, the set of subgraph is respectively { S1, S5, S2, S6 } and { S3, S4 }, and the number on divided side is 2
Since the algorithm is easy to get to locally optimal solution, the situation is avoided using simulated annealing, as shown in Figure 5.
An example of algorithm realization is given below.
D1: the access order of given set D is { S5, S6, S3, S4 }, i.e. initial solution
D2: objective function is that the number on divided side is minimum, is accessed according to working as known to side partitioning algorithm described above
Sequence is that the number on the divided side { S5, S6, S3, S4 } is 2
D3: the condition for generating new explanation is whether the number on divided side after adjusting access order reduces, if reducing
Generate new explanation.Such as the access order of the D after adjustment is { S3, S4, S5, S6 } at present, divides according to side described above and calculates
The side of Fa Ke get segmentation is S1-S2, number 1, and the set of two subgraphs is respectively { S2, S3, S4 } and { S1, S5, S6 } this time
Segmentation effect be better than step D2, therefore replace old access order with this stylish access order
D4: constantly optimizing the process using simulated annealing, until optimal solution is obtained, since simulated annealing makes
With than wide so be no longer described in detail herein
3) three phases: realize that the procedure chart of keyword query is as shown in Figure 6 using Hadoop distributed computing framework.
Below by taking Fig. 2 RDF graph as an example, the process for carrying out distributed keyword query using reverse search algorithm is as follows.
After being split RDF sentence figure, need to be refined as RDF graph again.According to second stage based on RDF
It is respectively { S2, S3, S4 } and { S1, S5, S6 } that the side partitioning algorithm of sentence figure, which can obtain divided two sub- set of graphs, then refines
RDF subgraph afterwards is as shown in fig. 7, the intersection for solving the two RDF subgraphs is that vertex { iswc } assumes searching keyword K=
{ paper-45, paper-13, OWL }
MR1: the subgraph after segmentation is placed on 2 different nodes, is realized in the Map stage using reverse search algorithm
The inquiry of candidate result tree CRT, the two CRT are respectively CRT1=< { paper-45- > isPartof- > iswc },
{ paper-45 }, { iswc }, { (L) }>and CRT2=<{ iswc- > hasPart- > paper-13- > title- > Can OWL
And Logie Programming Live Together Happily }, { paper-13, OWL }, { iswc }, { (R) } > and with
<key, the result of value>key-value pair form storage inquiry;
MR2: it will be put into identical key value but exist from the CRT of different subgraphs in the combiner stage same
There are cutpoint key={ iswc } and the left side Fig. 7 subgraph and the right subgraph are respectively present in combiner, between CRT1 and CRT2
In, therefore its two CRT is put into the same combiner;
MR3: being attached merging for the CRT in the same combiner in the reduce stage, and the result after merging is RT
={ paper-45- > isPartof- > iswc- > hasPart- > paper-13- > title- > Can OWL and
Logie Programming Live Together Happily } and connect after RT in include all searching keywords, then
Export query result;
MR4: in the range stage, query result is ranked up and returns to Top-k result to user.
In order to compare the advantage of this patent partitioning algorithm REC, a comparative analysis, such as Fig. 8 are with VSEP and DVCP algorithm
It is shown.Can be obtained from figure REC segmentation the response time it is most short, the time longest of DVCP.The advantage of REC partitioning algorithm is to RDF
On the one hand figure, which carries out compression processing, ensure that on the other hand the atomicity of RDF data and semantic integrity reduce vertex in figure
Number, to reduce the traversal space of algorithm to improve the efficiency of figure segmentation.VSEP algorithm passes through where exchange vertex
Subgraph reduce divided number of vertices, need that two vertex is arbitrarily selected to carry out from figure during iteration each time
Exchange time complexity is o (n2) (number that n is vertex in figure), with the increase of RDF data scale, corresponding RDF
The number of vertex of figure can become very big, the efficiency of extreme influence figure segmentation.DVCP algorithm mainly passes through the subgraph where exchange side, makes
With these when associated vertex includes least subregion, to reduce the vertex cut, in fact in figure side number
Far more than the number on vertex, so the efficiency being split using edge flip, which is lower than, exchanges the efficiency being split using vertex.
In order to compare the advantage of the search algorithm under distributed environment, 10 group polling examples are given, as shown in Figure 9.Respectively
Search algorithm is executed on single machine and cluster, average response time of this 10 group polling example on different clustered nodes such as Figure 10
It is shown.The efficiency that the efficiency of available parallelization inquiry will be inquired much higher than single machine from figure, and with interstitial content
Increase, the parallelization inquiry response time is constantly to reduce, but the amplitude changed is smaller and smaller.It even appear that working as node
When number is changed by 40 to 50, query responding time has almost no change.Therefore, by testing it can also be seen that being looked into parallel doing
When inquiry, the number of node needs appropriateness to choose, and just parallel effect can be made to reach best.
Above-mentioned is only concentration embodiment of the invention, it is noted that those skilled in the art are in technical solution of the present invention
It is carried out in range.
Claims (4)
1. the distributed keyword query method based on RDF graph comprising the steps of:
Step (1): RDF sentence figure is converted by RDF graph;
Step (2): the side partitioning algorithm (REC) based on RDF sentence figure;
Step (3): keyword query is realized using Hadoop distributed computing framework.
2. the distributed keyword query method according to claim 1 based on RDF graph, which is characterized in that the step
(1) RDF sentence s is made of RDF triple and meets the following conditions in:
Any two RDF member ancestral in condition 1:s be it is attachable, i.e., when two RDF member ancestrals share the same blank node,
The two RDF member ancestrals are attachable;
Any one RDF member ancestral in condition 2:s cannot connect with the RDF member ancestral not in s;
It uses (s, p, o) to describe a RDF triple, and is abbreviated as t, respectively indicated in triple with s (t), p (t) and o (t)
Main body, predicate and object, wherein RDF digraph and RDF sentence figure are defined as follows:
RDF digraph: setting G=(V, E, L) indicates the RDF digraph of a tape label, wherein by main body and visitor in RDF triple
The vertex set V={ v | v ∈ s (T) ∪ o (T) } of body composition, the collection of the directed edge of the predicate composition of relationship between subject and object
Conjunction E=< s (t), o (t) > | and t ∈ T }, object vertex is directed toward by main body vertex in the direction on side, and L is the set of label, L=Lv
∪Lp, wherein LvIndicate vertex label, LpIndicate predicate label;
RDF sentence figure: G is sets=(S, E, l, w) indicates a RDF sentence figure, it is a vertex weighted-graph, wherein in figure
The corresponding RDF sentence of each node, S indicates the vertex in figure, and E indicates the side in figure;If si, sj∈ S and si≠sj,t′∈sjAnd there is t.s=t ' .s or t.o=t ' .s then siAnd sjIt is associated, i.e. (si, sj)∈E;L is a mark
Function is signed, forIt is the subject comprising the sentence, the local label set of predicate or object;W is top
The weight of point,It is equal to the number of RDF member ancestral included in the sentence.
3. the distributed keyword query method according to claim 1 based on RDF graph, which is characterized in that the step
It (2) is split according to two basic principles of figure segmentation.One in order to reduce Internet traffic it is necessary to reducing quilt as far as possible
The number of edges amount of segmentation.Therefore depth-first traversal is carried out using the smallest principle of Vertex Degree, and uses Simulated Anneal Algorithm Optimize
Segmentation result;Two, in order to realize the balances of data between subgraph, the side in RDF graph are evenly distributed in each subgraph, i.e.,
The number on side in each RDF subgraphIf G will be schemedsIt is divided into k subgraph, indicates each vertex place using function P
Subgraph, wherein different subgraph is indicated with { 1,2...k }, then the subgraph where label j=p (s) indicates sentence s is Sj,
Wherein SjMeet following two conditionAnd Si∩Sj=Φ,Wherein, the side segmentation of RDF sentence figure
The specific steps of algorithm are as follows:
Step A: the number on side in input RDF subgraph
According to RDF sentence figure Gs, input the number e on side in RDF subgraph;
Step B: setting " access flag "
The smallest vertex of degree in RDF sentence figure is solved, is put it into set D, and is arranged one for each vertex and " visits
Ask mark ";
Step C: the vertex in traversal set D is split RDF sentence figure
C1: the access order on vertex in given set D, if currently in RDF sentence figure all not visited vertex weight it
With | Gs| > e then sequentially selects a not visited vertex from D;
C2: the vertex is added in set S while the state on the vertex being set to has accessed;
C3: if all vertex weights of current subgraph | S | < e and set N (S, the G on the vertex adjacent with set Ss)!=
Null, then from N (S, Gs) one vertex v of middle random selection;
C4: if the vertex v is not visited and removes vertex adjacent with vertex v in S | N (v, Gs) S | number it is minimum, i.e., should
The degree on vertex is minimum.Jump to step C2;
C5: if in current collection S vertex weight | S | < e, and N (S, Gs)=null then returns to the vertex of recent visit, and
Jump to step C4;
C6: if at this time in current collection S vertex weight | S | > e jumps to step C1;
D step: optimal solution is sought in simulated annealing
D1: the sequence of vertex access in given set D;
D2: the number on the side divided according to such access order is calculated;
D3: the access order on two vertex in random replacement D, if divide at this time while number be less than in step D2 while
Number, illustrates that new result is better than old as a result, then replacing old access order with new access order;
D4: simulated annealing function is finally called.
4. the distributed keyword query method according to claim 1 based on RDF graph, which is characterized in that the step
(3) the specific implementation process is as follows:
I step: divide the determination of the vertex cut set of RDF graph
The vertex cut set of segmentation RDF graph can be obtained in the intersection for solving RDF subgraph;
II step: the determination of directed tree
The definition of given query result tree (RT), if the set with the matched vertex keyword K is M (K)={ m1, m2...ms,
Query result is defined as the directed tree for meeting following condition:
1) tree root of G is R;
2) for each set miIn some vertex vi, exist from R to viDirected path;
During keyword query, the query result tree that those include Partial key word is known as candidate result tree, referred to as
For CRT;
III step: keyword query is realized using reverse search algorithm
The process for realizing keyword query using MapReduce is specifically described below, is mainly made of following four step, i.e.,
MR1: in the map stage, candidate result tree is searched first with reverse search algorithm, indicates (CRT with 4 yuan of ancestralsi, K, B, Si),
Wherein CRTiIt is the one tree using R as tree root, K indicates CRTiThe keyword for being included, B indicate divided in the subgraph
Vertex, SiIndicate CRTiThe subgraph at place;Secondly, 4 yuan of ancestrals to be packaged into the key-value pair < B, CRT of key-valuei>;
MR2: two candidate result tree CRT in the combiner stage, in different subgraphsiAnd CRTjIf the two CRT are closed
The segmentation vertex V of connectioni∩Vj≠ Φ, then by CRTiAnd CRTjIt is put into only one combiner;
MR3: in the reduce stage, the CRT in the same combiner is merged;If comprising all in the result after merging
The keyword of inquiry then exports the query result;
MR4: in the range stage, since RDF data collection is very big, an inquiry might have multiple matched query results.
However, user is only interested in the query result of sub-fraction under normal conditions, therefore inquiry is tied using the compactedness of result
Fruit scoring is to return to Top-k query result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710376203.4A CN108959318A (en) | 2017-05-25 | 2017-05-25 | Distributed keyword query method based on RDF graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710376203.4A CN108959318A (en) | 2017-05-25 | 2017-05-25 | Distributed keyword query method based on RDF graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108959318A true CN108959318A (en) | 2018-12-07 |
Family
ID=64493947
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710376203.4A Pending CN108959318A (en) | 2017-05-25 | 2017-05-25 | Distributed keyword query method based on RDF graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108959318A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222240A (en) * | 2019-05-24 | 2019-09-10 | 华中科技大学 | A kind of space RDF data keyword query method based on summary figure |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140156633A1 (en) * | 2012-11-30 | 2014-06-05 | International Business Machines Corporation | Scalable Multi-Query Optimization for SPARQL |
CN104462253A (en) * | 2014-11-20 | 2015-03-25 | 武汉数为科技有限公司 | Topic detection or tracking method for network text big data |
CN104765875A (en) * | 2015-04-24 | 2015-07-08 | 海南易建科技股份有限公司 | Distributed processing method and system for passenger behavior data |
CN106227722A (en) * | 2016-09-12 | 2016-12-14 | 中山大学 | A kind of extraction method based on listed company's bulletin summary |
-
2017
- 2017-05-25 CN CN201710376203.4A patent/CN108959318A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140156633A1 (en) * | 2012-11-30 | 2014-06-05 | International Business Machines Corporation | Scalable Multi-Query Optimization for SPARQL |
CN104462253A (en) * | 2014-11-20 | 2015-03-25 | 武汉数为科技有限公司 | Topic detection or tracking method for network text big data |
CN104765875A (en) * | 2015-04-24 | 2015-07-08 | 海南易建科技股份有限公司 | Distributed processing method and system for passenger behavior data |
CN106227722A (en) * | 2016-09-12 | 2016-12-14 | 中山大学 | A kind of extraction method based on listed company's bulletin summary |
Non-Patent Citations (2)
Title |
---|
李慧颖等: "基于关键词的RDF数据查询方法", 《东南大学学报 自然科学版》 * |
王振涛: "基于二分图的RDF关键词扩展查询算法研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222240A (en) * | 2019-05-24 | 2019-09-10 | 华中科技大学 | A kind of space RDF data keyword query method based on summary figure |
CN110222240B (en) * | 2019-05-24 | 2021-03-26 | 华中科技大学 | Abstract graph-based space RDF data keyword query method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud | |
US9031994B1 (en) | Database partitioning for data processing system | |
CN106021457B (en) | RDF distributed semantic searching method based on keyword | |
US10803121B2 (en) | System and method for real-time graph-based recommendations | |
US20190236215A1 (en) | System and method for hierarchical distributed processing of large bipartite graphs | |
Wang et al. | Skyframe: a framework for skyline query processing in peer-to-peer systems | |
Chen et al. | MapReduce skyline query processing with a new angular partitioning approach | |
Yu et al. | Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system | |
CN109359115B (en) | Distributed storage method, device and system based on graph database | |
CN104699698A (en) | Graph query processing method based on massive data | |
US9558266B1 (en) | System and method for discovering groups whose members have a given attribute | |
Belesiotis et al. | Spatio-textual user matching and clustering based on set similarity joins | |
Sun et al. | Interactive spatial keyword querying with semantics | |
Yang et al. | An improved cop-kmeans clustering for solving constraint violation based on mapreduce framework | |
Wu et al. | Dynamic index construction with deep reinforcement learning | |
Cheng et al. | Distributed indexes design to accelerate similarity based images retrieval in airport video monitoring systems | |
Shrivastava et al. | Text document clustering based on phrase similarity using affinity propagation | |
Kalyvas et al. | Skyline and reverse skyline query processing in SpatialHadoop | |
Moutafis et al. | Algorithms for processing the group K nearest-neighbor query on distributed frameworks | |
CN108959318A (en) | Distributed keyword query method based on RDF graph | |
Tayal et al. | A new MapReduce solution for associative classification to handle scalability and skewness in vertical data structure | |
Niu et al. | Semi-supervised plsa for document clustering | |
CN116383247A (en) | Large-scale graph data efficient query method | |
Priyadarshi et al. | AWAPart: adaptive workload-aware partitioning of knowledge graphs | |
CN110209895A (en) | Vector index method, apparatus and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181207 |