CN114020781B - Query task optimization method based on technological consultation large-scale graph data - Google Patents

Query task optimization method based on technological consultation large-scale graph data Download PDF

Info

Publication number
CN114020781B
CN114020781B CN202111316037.1A CN202111316037A CN114020781B CN 114020781 B CN114020781 B CN 114020781B CN 202111316037 A CN202111316037 A CN 202111316037A CN 114020781 B CN114020781 B CN 114020781B
Authority
CN
China
Prior art keywords
nodes
query
node
optimization method
query task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111316037.1A
Other languages
Chinese (zh)
Other versions
CN114020781A (en
Inventor
鄂海红
宋美娜
梁静茹
刘雨薇
魏秋实
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202111316037.1A priority Critical patent/CN114020781B/en
Publication of CN114020781A publication Critical patent/CN114020781A/en
Priority to PCT/CN2022/087215 priority patent/WO2023077731A1/en
Application granted granted Critical
Publication of CN114020781B publication Critical patent/CN114020781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In the query task optimization method, the query task optimization system and the storage medium based on the technological consultation large-scale graph data, the identification of the query task is obtained, and the corresponding query optimization method is selected according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting graph traversal unfolding sequence strategies, CARDINALITY reducing, mode advancing and materialized view, then querying a graph database by utilizing the query optimization method, and outputting query results. Therefore, in the method provided by the disclosure, the corresponding query optimization method can be selected according to the identification of the query task, so that the flexibility of the query method is improved. Meanwhile, in the method provided by the disclosure, the query optimization method improves the query efficiency of the query task under different scenes of technological consultation large-scale graph data, reduces the complexity of query calculation, and shortens the time spent on query.

Description

Query task optimization method based on technological consultation large-scale graph data
Technical Field
The application relates to the field of large-scale graph data query, in particular to a query task optimization method, a query task optimization device and a storage medium based on technological consultation of large-scale graph data.
Background
The query task on the graph data is one of the most fundamental problems in the field of knowledge graph, so that efficient query processing is generally required on large-scale graph data, so that a user can quickly obtain a query result.
At present, although query optimization technology on graph data has advanced to a great extent, some problems still exist: such as graph partitioning techniques for graph query optimization, graph data can be split into multiple servers, but the servers have higher communication costs and processing overhead. In addition, most of query optimization technologies perform query optimization based on graph data of social networks, and are not applicable to graph data of complex topological structures of scientific and technological consultation scenes. Therefore, how to consult the query task optimization of large-scale graph data based on technology is a problem to be solved.
Disclosure of Invention
The application provides a query task optimization method, a query task optimization system and a query task optimization storage medium based on technological consultation large-scale graph data, and aims to provide a query task optimization method based on technological consultation large-scale graph data.
An embodiment of a first aspect of the present application provides a query task optimization method based on technological consultation large-scale graph data, including:
Acquiring an identification of a query task;
Selecting a corresponding query optimization method according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting a graph traversal unfolding sequence strategy, CARDINALITY reduction, mode advance and materialized view;
and querying the graph database by using the query optimization method, and outputting a query result.
An embodiment of a second aspect of the present application provides a query task optimization system based on technological consultation of large-scale graph data, including:
The acquisition module is used for acquiring the identification of the query task;
the selection module is used for selecting a corresponding query optimization method according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting a graph traversal expansion sequence strategy, CARDINALITY reduction, mode advance and materialized view;
And the display module is used for inquiring the graph database by utilizing the inquiry optimization method and outputting an inquiry result.
The embodiment of the third aspect of the application provides a computer storage medium, wherein the computer storage medium stores computer executable instructions; the computer executable instructions, when executed by a processor, are capable of implementing the method as described in the first aspect above.
An embodiment of the fourth aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is capable of implementing the method according to the first aspect.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
In the query task optimization method, the query task optimization system and the storage medium based on the technological consultation large-scale graph data, the identification of the query task is obtained, and the corresponding query optimization method is selected according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting graph traversal unfolding sequence strategies, CARDINALITY reducing, mode advancing and materialized view, then querying a graph database by utilizing the query optimization method, and outputting query results. Therefore, in the method provided by the disclosure, the corresponding query optimization method can be selected according to the identification of the query task, so that the flexibility of the query method is improved. Meanwhile, in the method provided by the disclosure, the query optimization method improves the query efficiency of the query task under different scenes of technological consultation large-scale graph data, reduces the complexity of query calculation, and shortens the time spent on query.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a query task optimization method based on technological consultation large-scale graph data according to an embodiment of the present application;
Fig. 2 is a schematic structural diagram of a query task optimization system based on technological consultation large-scale graph data according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
In the query task optimization method, the query task optimization system and the storage medium based on the technological consultation large-scale graph data, the identification of the query task is obtained, and the corresponding query optimization method is selected according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting graph traversal unfolding sequence strategies, CARDINALITY reducing, mode advancing and materialized view, then querying a graph database by utilizing the query optimization method, and outputting query results. Therefore, in the method provided by the disclosure, the corresponding query optimization method can be selected according to the identification of the query task, so that the flexibility of the query method is improved. Meanwhile, in the method provided by the disclosure, the query optimization method improves the query efficiency of the query task under different scenes of technological consultation large-scale graph data, reduces the complexity of query calculation, and shortens the time spent on query.
The following describes a query task optimization method and a query task optimization system based on technological consultation large-scale graph data according to an embodiment of the application with reference to the accompanying drawings.
Example 1
Fig. 1 is a flow chart of a query task optimization method based on technological consultation large-scale graph data according to an embodiment of the present application, and as shown in fig. 1, the method may include:
step 101, obtaining the identification of the query task.
It should be noted that, in the embodiments of the present disclosure, the query task may include an organization, talents, and industry chains. In the embodiment of the disclosure, the organization may be an ID of a company and the talents may be personnel
In the embodiment of the disclosure, the identification of the query task may be obtained according to the content of the query task. By way of example, in embodiments of the present disclosure, assuming that a query task is to view the corporate, patent situation associated with a person, the identity of the query task is obtained.
Step 102, selecting a corresponding query optimization method according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting a graph traversal unfolding sequence strategy, CARDINALITY reduction, mode advance and materialized view.
In the embodiment of the disclosure, different identifiers correspond to different query optimization methods, and the corresponding query methods can be selected according to the identifiers of the query tasks.
And, query optimization precautions in embodiments of the present disclosure may include adjusting graph traversal expansion order policies, CARDINALITY reduction, pattern advance, materialized views.
Further, in the embodiment of the present disclosure, the graph traversal expansion sequence policy is adjusted to consult an actual query scene in combination with science and technology, and the graph traversal expansion sequence of the bidirectional BFS is designed, and searching is started from two directions of the starting point and the ending point, and once a position that has been searched in the other direction is searched (or a certain state is accessed by both directions), a shortest path connecting the starting point and the ending point is found. And then to a point in the middle of the shortest path, meet at the path midpoint, so the number of nodes of the bidirectional BFS is of the order of 2 x nm/2+1.
Specifically, in an embodiment of the present disclosure, the adjustment graph traversal expansion order policy may include the following steps:
s11, inputting a source entity node and a target entity node, and inputting an intermediate entity node type mtype and a path mode pattern;
S12, initializing two node sets S1 and S2, wherein S1 is initialized to an input source entity node, and S2 is initialized to an input target entity node;
s13, calculating the unfolding sequence of the bidirectional BFS by using patterns and mtype, and using pattern1 to represent the left unfolding sequence and pattern2 to represent the right unfolding sequence;
s14, if S1 or S2 is not empty, continuing to execute the step S15; otherwise, step S111 is performed;
s15, S is a set of expansion nodes of the layer;
s16, exchanging S1 and S2, and alternately expanding from the left end and the right end;
s17, expanding next-layer neighbor nodes of each node in the S1 set according to the mode, and representing the next-node;
s18, judging the node in each next_nodes, and if the node is in the S set, namely finding a path, performing step S111;
s19, adding all nodes next_nodes expanded in the layer into a set S, copying the set S to S1, and storing paths;
s110, repeating the step S14;
S111, ending.
For example, in the embodiment of the disclosure, the query task gives an industry chain tag and personnel information person, and queries its child industry chain tag from the tag, and a patent belonging to the child industry chain tag, and a company to which the patent belongs, and associated personnel such as the job title/investment of the company. In the constructed technological consultation knowledge graph, 146284 patent intermediate nodes are generated on the path of the industrial chain-sub-industrial chain label-patent, and if 146284 patents are expanded by using unidirectional BFS, explosive intermediate results are generated, so that the query performance is seriously affected.
If the graph traversal expansion sequence optimization strategy of the bidirectional BFS in the embodiment of the present disclosure is used, bidirectional search is performed from the starting point and the end point, that is, the two directions of the industry chain label-child industry chain label-patent and the personnel-company-patent are traversed, 146284 patent intermediate nodes generated by the industry chain label-child industry chain label-patent are processed into a hash table, then the process is reversed from the personnel node, a set of results are generated by the personnel-company-patent path, finally the set of results are intersected with the hash table, a path which meets the condition and communicates the starting point and the end point is found, and the time complexity also only needs o (n).
Further, CARDINALITY represents the number of unique values after deduplication, such as Columns Cardinality (column radix) refers to the number of non-duplicate values that a column contains in embodiments of the present disclosure. This number has a direct impact on the effect of model compression and the performance of the engine when scanning. It is desirable to minimize CARDINALITY to reduce the time required for a query.
Wherein, in an embodiment of the present disclosure, CARDINALITY reduction may include the steps of:
s21, inputting a source entity node and a path mode pattern;
S22, next_nodes are node sets of the next layer, and are initialized to neighbor nodes of the next layer of source entity nodes expanded according to the mode;
s23, de-duplicating the next_nodes;
S24, q is a node queue, and is initialized to be next_nodes;
s25, if q is not null, continuing to execute the step S26; otherwise, executing step S212;
s26, the size is the number of the current queues;
s27, if the size is not empty, continuing to execute the step S28; otherwise, executing step S211;
S28, popping up a current queue node;
S29, expanding next-layer neighbor nodes next_nodes of the node according to the mode;
S210, adding next_nodes into a queue q;
s211, if the pattern is traversed currently, continuing to execute the step S212, otherwise executing the step S25;
s212, ending.
For example, in the embodiment of the disclosure, in the knowledge graph under the actual scenario of technological consultation, heavy edges or different types of edges may exist between two points, for example, three relations of "company-investor"/"company-public stakeholder-person"/"company-tenninal" exist between a "company" node and a "person" node. Thus, looking for "people" nodes adjacent to a company from some company, it is possible to locate some identical "people" nodes from the above three relationships, thereby generating duplicate nodes. And the number of redundant nodes is increased by CARDINALITY, when the repeated 'personnel' nodes continue to search for adjacent nodes, the repeated traversal is performed, so that the number of intermediate nodes is increased, and the query time is increased. Thus, in embodiments of the present disclosure, distinct advance optimization strategies are used to reduce cardinality.
Specifically, in the embodiment of the present disclosure, the task of query under the scenario of technological consultation is to give person, search for its associated company from the given person query, and the patent owned by the company, and the industry chain label to which the patent belongs, and output the label tuple of the company, the patent, and the industry chain without repetition, which accords with the path. The embodiment of the disclosure uses distinct to reduce CARDINALITY optimization strategies in advance, advances the deduplication operation to the generation of repeated nodes, namely immediately performs the deduplication operation after the 'personnel' node traverses to the 'company' node, reduces 201 repeated company intermediate nodes to 131 company nodes without repetition, thereby reducing the generation of intermediate nodes and the subsequent traversing time.
Further, in the embodiment of the present disclosure, target data needs to be acquired and screened according to service conditions, and this process is filtering of data queries. There are a large number of filtering operations in the large-scale graph query task, and various filtering conditions used in the filtering process are necessary steps for acquiring accurate data, such as basic algorithms (<, >, =), logical operations (AND, OR, NOT), and pattern matching.
In an embodiment of the present disclosure, the mode advance may include the steps of:
S31, inputting source entity nodes, path mode patterns and filter_patterns;
S32, initializing a mode advance set filter_ nodeset;
s33, q is a node queue, and is initialized to be an input source entity node;
S34, if q is not null, continuing to execute the step S35; otherwise, step S313 is performed;
s35, initializing the number size of the current queues;
S36, if the size is not empty, continuing to step S37; otherwise, executing step S312;
S37, popping up a current queue node;
S38, expanding next-layer neighbor nodes next_nodes of the node according to the mode;
s39, judging whether the current next_nodes node type is the node type of filter_ nodeset, if yes, continuing to execute the step S310; otherwise, executing step S311;
S310, traversing nodes next_node of the next_nodes set, and filtering out the nodes if the nodes next_node is in the filter_ nodeset set;
S311, adding next_nodes into a queue q;
S312, if the pattern is traversed currently, continuing to execute the step S313, otherwise executing the step S35;
S313, ending.
For example, in the embodiment of the present disclosure, the query task in the scientific and technological consultation scenario is to give the tag information tag of the industry chain, query the company associated with the tag from the tag, and the patent owned by the company, and there is a filtering condition that: the company cannot have abnormal business, namely, no pattern of company-abnormal business exists, and no repeated company and patent tuple is output.
In particular, the pattern advance in embodiments of the present disclosure is to replace traversal operations in a pattern with efficient lookups of the collection. The method comprises the steps of making a company-operation abnormality mode in advance, putting company ID information associated with an operation abnormality node into a hash table, judging whether the operation abnormality node exists in the hash table or not by a filtering condition, and if the operation abnormality node does not exist in the hash table, indicating that the company does not exist in the hash table, carrying out set search only by 3292 times of o (1) time complexity, thereby improving the query efficiency.
Further, in the embodiment of the disclosure, the result of the operations with more time consumption such as table connection or aggregation is calculated and stored in advance mainly by using the materialized view, so that the operations with more time consumption can be avoided when the query task is executed subsequently, and thus the query result can be obtained quickly. Under the technological consultation scene, the materialized view greatly improves the query performance of the hot spot problems of the same query result which is frequently reused, so that the data is quickly read from the materialized view.
For example, in the embodiment of the disclosure, an inquiry task under a scientific and technological consultation scene gives an industry chain label information tag, inquires its child industry chain label from the tag and a company belonging to the child industry chain label, inquires about a path which takes the child industry chain label as a starting node and finally traverses a path reaching the company node by a path patent, and counts the company information and the number of patents conforming to the mode. This is very time consuming if each company is queried separately. However, the materialized view method in the embodiment of the disclosure can acquire the patents owned by each company in advance, judge the industry chain label to which each patent belongs for each patent, aggregate the obtained patent number under the industry chain label, and input the obtained patent number into the attribute of the company-industry chain label side, and the pre-calculated materialized view improves the query efficiency.
And 103, inquiring the graph database by utilizing an inquiry optimization method, and outputting an inquiry result.
In the embodiment of the present disclosure, the query optimization method in step 102 is used to query the graph database, and the query result is output. And, in embodiments of the present disclosure, the query results may include an association between nodes in a graph database.
In the query task optimization method based on the technological consultation large-scale graph data, the identification of a query task is obtained, and a corresponding query optimization method is selected according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting a graph traversal expansion sequence strategy, CARDINALITY reducing, mode advancing and materialized view, then querying a graph database by utilizing the query optimization method, and outputting a query result. Therefore, in the method provided by the disclosure, the corresponding query optimization method can be selected according to the identification of the query task, so that the flexibility of the query method is improved. Meanwhile, in the method provided by the disclosure, the query optimization method improves the query efficiency of the query task under different scenes of technological consultation large-scale graph data, reduces the complexity of query calculation, and shortens the time spent on query.
FIG. 2 is a schematic structural diagram of a query task optimization system based on technological consultation large-scale graph data according to an embodiment of the present application, where the system may include:
An obtaining module 201, configured to obtain an identifier of a query task;
The selection module 202 is configured to select a corresponding query optimization method according to the identification of the query task, where the query optimization method includes adjustment of a graph traversal expansion sequence policy, CARDINALITY reduction, mode advance, and materialized view;
and the display module 203 is configured to query the graph database by using a query optimization method, and output a query result.
In embodiments of the present disclosure, the query task may include an organization, talents, and industry chains, among others.
In the query task optimization method, the query task optimization system and the storage medium based on the technological consultation large-scale graph data, the identification of the query task is obtained, and the corresponding query optimization method is selected according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting graph traversal unfolding sequence strategies, CARDINALITY reducing, mode advancing and materialized view, then querying a graph database by utilizing the query optimization method, and outputting query results. Therefore, in the method provided by the disclosure, the corresponding query optimization method can be selected according to the identification of the query task, so that the flexibility of the query method is improved. Meanwhile, in the method provided by the disclosure, the query optimization method improves the query efficiency of the query task under different scenes of technological consultation large-scale graph data, reduces the complexity of query calculation, and shortens the time spent on query.
The embodiment of the third aspect of the application provides a computer storage medium, wherein the computer storage medium stores computer executable instructions; the computer executable instructions, when executed by a processor, are capable of implementing the method as described in the first aspect above.
An embodiment of the fourth aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is capable of implementing the method according to the first aspect.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (6)

1. The query task optimization method based on technological consultation large-scale graph data is characterized by comprising the following steps of:
Acquiring an identification of a query task;
Selecting a corresponding query optimization method according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting a graph traversal unfolding sequence strategy, CARDINALITY reduction, mode advance and materialized view;
inquiring the graph database by utilizing the inquiry optimization method, and outputting an inquiry result;
The inquiry task comprises a mechanism, talents and an industry chain;
the CARDINALITY reduction, comprising:
s21, inputting a source entity node and a path mode pattern;
S22, next_nodes are node sets of the next layer, and are initialized to neighbor nodes of the next layer of source entity nodes expanded according to the mode;
s23, de-duplicating the next_nodes;
S24, q is a node queue, and is initialized to be next_nodes;
s25, if q is not null, continuing to execute the step S26; otherwise, executing step S212;
s26, the size is the number of the current queues;
s27, if the size is not empty, continuing to execute the step S28; otherwise, executing step S211;
S28, popping up a current queue node;
S29, expanding next-layer neighbor nodes next_nodes of the node according to the mode;
S210, adding next_nodes into a queue q;
S211, if the pattern is traversed currently, continuing to execute the step S212, otherwise executing the step S25;
s212, ending.
2. The query task optimization method of claim 1, wherein the adjustment graph traverses a deployment order strategy, comprising:
s11, inputting a source entity node and a target entity node, and inputting an intermediate entity node type mtype and a path mode pattern;
s12, initializing two node sets S1 and S2, wherein S1 is initialized to an input source entity node, and S2 is initialized to an input target entity node;
s13, calculating the unfolding sequence of the bidirectional BFS by using patterns and mtype, and using pattern1 to represent the left unfolding sequence and pattern2 to represent the right unfolding sequence;
s14, if S1 or S2 is not empty, continuing to execute the step S15; otherwise, step S111 is performed;
s15, S is a set of expansion nodes of the layer;
s16, exchanging S1 and S2, and alternately expanding from the left end and the right end;
s17, expanding next-layer neighbor nodes of each node in the S1 set according to the mode, and representing the next-node;
s18, judging the node in each next_nodes, and if the node is in the S set, namely finding a path, performing step S111;
S19, adding all nodes next_nodes expanded in the layer into a set S, copying the set S to S1, and storing paths;
s110, repeating the step S14;
S111, ending.
3. The query task optimization method of claim 1, wherein the pattern advances, comprising:
S31, inputting source entity nodes, path mode patterns and filter_patterns;
S32, initializing a mode advance set filter_ nodeset;
s33, q is a node queue, and is initialized to be an input source entity node;
S34, if q is not null, continuing to execute the step S35; otherwise, step S313 is performed;
s35, initializing the number size of the current queues;
S36, if the size is not empty, continuing to step S37; otherwise, executing step S312;
S37, popping up a current queue node;
S38, expanding next-layer neighbor nodes next_nodes of the node according to the mode;
s39, judging whether the current next_nodes node type is the node type of filter_ nodeset, if yes, continuing to execute the step S310; otherwise, executing step S311;
S310, traversing nodes next_node of the next_nodes set, and filtering out the nodes if the nodes next_node is in the filter_ nodeset set;
S311, adding next_nodes into a queue q;
s312, if the pattern is traversed currently, continuing to execute the step S313, otherwise executing the step S35;
S313, ending.
4. A query task optimization system based on technological consultation of large-scale graph data, the system comprising:
The acquisition module is used for acquiring the identification of the query task;
the selection module is used for selecting a corresponding query optimization method according to the identification of the query task, wherein the query optimization method comprises the steps of adjusting a graph traversal expansion sequence strategy, CARDINALITY reduction, mode advance and materialized view;
the display module is used for inquiring the graph database by utilizing the inquiry optimization method and outputting an inquiry result;
The inquiry task comprises a mechanism, talents and an industry chain;
the CARDINALITY reduction, comprising:
s21, inputting a source entity node and a path mode pattern;
S22, next_nodes are node sets of the next layer, and are initialized to neighbor nodes of the next layer of source entity nodes expanded according to the mode;
s23, de-duplicating the next_nodes;
S24, q is a node queue, and is initialized to be next_nodes;
s25, if q is not null, continuing to execute the step S26; otherwise, executing step S212;
s26, the size is the number of the current queues;
s27, if the size is not empty, continuing to execute the step S28; otherwise, executing step S211;
S28, popping up a current queue node;
S29, expanding next-layer neighbor nodes next_nodes of the node according to the mode;
S210, adding next_nodes into a queue q;
S211, if the pattern is traversed currently, continuing to execute the step S212, otherwise executing the step S25;
s212, ending.
5. A computer storage medium, wherein the computer storage medium stores computer-executable instructions; the computer executable instructions, when executed by a processor, are capable of implementing the method of any of claims 1-3.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of claims 1-3 when the program is executed.
CN202111316037.1A 2021-11-08 2021-11-08 Query task optimization method based on technological consultation large-scale graph data Active CN114020781B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111316037.1A CN114020781B (en) 2021-11-08 2021-11-08 Query task optimization method based on technological consultation large-scale graph data
PCT/CN2022/087215 WO2023077731A1 (en) 2021-11-08 2022-04-15 Query task optimization method based on science and technology consultation large-scale graph data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111316037.1A CN114020781B (en) 2021-11-08 2021-11-08 Query task optimization method based on technological consultation large-scale graph data

Publications (2)

Publication Number Publication Date
CN114020781A CN114020781A (en) 2022-02-08
CN114020781B true CN114020781B (en) 2024-05-31

Family

ID=80062381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111316037.1A Active CN114020781B (en) 2021-11-08 2021-11-08 Query task optimization method based on technological consultation large-scale graph data

Country Status (2)

Country Link
CN (1) CN114020781B (en)
WO (1) WO2023077731A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020781B (en) * 2021-11-08 2024-05-31 北京邮电大学 Query task optimization method based on technological consultation large-scale graph data
CN114880504B (en) * 2022-07-08 2023-03-31 支付宝(杭州)信息技术有限公司 Graph data query method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291807A (en) * 2017-05-16 2017-10-24 中国科学院计算机网络信息中心 A kind of SPARQL enquiring and optimizing methods based on figure traversal
CN108038136A (en) * 2017-11-23 2018-05-15 上海斯睿德信息技术有限公司 The method for building up and graph inquiring method of Company Knowledge collection of illustrative plates based on graph model
CN110795456A (en) * 2019-10-28 2020-02-14 北京百度网讯科技有限公司 Map query method and device, computer equipment and storage medium
CN110941741A (en) * 2018-09-21 2020-03-31 百度在线网络技术(北京)有限公司 Path search processing method, device, server and storage medium for graph data
CN111241350A (en) * 2020-01-07 2020-06-05 平安科技(深圳)有限公司 Graph data query method and device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8326825B2 (en) * 2010-11-05 2012-12-04 Microsoft Corporation Automated partitioning in parallel database systems
US20190236188A1 (en) * 2018-01-31 2019-08-01 Salesforce.Com, Inc. Query optimizer constraints
GB201813561D0 (en) * 2018-08-21 2018-10-03 Shapecast Ltd Machine learning optimisation method
US11392623B2 (en) * 2019-12-11 2022-07-19 Oracle International Corporation Hybrid in-memory BFS-DFS approach for computing graph queries against heterogeneous graphs inside relational database systems
CN114020781B (en) * 2021-11-08 2024-05-31 北京邮电大学 Query task optimization method based on technological consultation large-scale graph data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291807A (en) * 2017-05-16 2017-10-24 中国科学院计算机网络信息中心 A kind of SPARQL enquiring and optimizing methods based on figure traversal
CN108038136A (en) * 2017-11-23 2018-05-15 上海斯睿德信息技术有限公司 The method for building up and graph inquiring method of Company Knowledge collection of illustrative plates based on graph model
CN110941741A (en) * 2018-09-21 2020-03-31 百度在线网络技术(北京)有限公司 Path search processing method, device, server and storage medium for graph data
CN110795456A (en) * 2019-10-28 2020-02-14 北京百度网讯科技有限公司 Map query method and device, computer equipment and storage medium
CN111241350A (en) * 2020-01-07 2020-06-05 平安科技(深圳)有限公司 Graph data query method and device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
特征索引的大规模图子图查询方法研究;高见野;硕士电子期刊;20190115;全文 *
鄂海红 ; 宋美娜 ; 魏秋实 ; 乔晓东 ; 王涛 ; 王灏 ; 王震 ; 唐俊克 ; 马超童 ; 魏文定 ; 丛丽静 ; 郑云帆 ; 梁月梅 ; 康雯珺 ; 赵黛岩 ; 韩鹏昊 ; 张田宇 ; 田川 ; 谭泽华 ; 朱永波 ; 毕秋波 ; 胥香宇.科技咨询大数据-科技咨询信息模型架构 第5部分:人才域.2020,全文. *

Also Published As

Publication number Publication date
WO2023077731A1 (en) 2023-05-11
CN114020781A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN114020781B (en) Query task optimization method based on technological consultation large-scale graph data
CA2562281C (en) Partial query caching
US6963869B2 (en) System and method for search, index, parsing document database including subject document having nested fields associated start and end meta words where each meta word identify location and nesting level
US20100223262A1 (en) Method and system for storing, searching and retrieving information based on semistructured and de-centralized data sets
US8015165B2 (en) Efficient path-based operations while searching across versions in a repository
Peng et al. Adaptive distributed RDF graph fragmentation and allocation based on query workload
CN112148680A (en) File system metadata management method based on distributed graph database
US20080154862A1 (en) Method of hierarchical searching on a conditional graph
Wu et al. Index structures of user profiles for efficient web page filtering services
John et al. Dynamic sorting and average skyline method for query processing in spatial-temporal data
Manry et al. Output weight optimization for the multi-layer perceptron
CN115543993A (en) Data processing method and device, electronic equipment and storage medium
JPH08235033A (en) Joint arithmetic system for object-oriented data base management system
Marir et al. An enhanced grouping algorithm for vertical partitioning problem in DDBs
Feigenblat et al. A grouping approach for succinct dynamic dictionary matching
Wu et al. Mining skyline patterns from big data environments based on a spark framework
Ding et al. An Efficient Relational Database Keyword Search Scheme Based on Combined Candidate Network Evaluation
CN112148830A (en) Semantic data storage and retrieval method and device based on maximum area grid
Wang et al. Regular expression matching on billion-nodes graphs
Yu et al. Adaptive join algorithms in dynamic distributed databases
Fung et al. A mechanism of structural join index hierarchy for efficient complex object retrieval
Zhao et al. DPTree: a distributed pattern tree index for partial-match queries in peer-to-peer networks
Papadakis et al. Navigational queries as property path queries employing the Kleene star operator
Morzy et al. Optimizing pattern queries for web access logs
CN114860729A (en) Relational data connection method and system based on graph structure index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant