CN114647764A

CN114647764A - Graph structure query method and device and storage medium

Info

Publication number: CN114647764A
Application number: CN202210348471.6A
Authority: CN
Inventors: 李友焕; 郑航宇; 秦拯
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-06-21
Anticipated expiration: 2042-04-01
Also published as: CN114647764B

Abstract

The application provides a graph structure query method and related equipment, which can reduce the time consumption of graph structure query. The method comprises the following steps: acquiring an input query set aiming at a graph structure, wherein the input query set comprises at least one input query edge; querying codes of a first vertex and a second vertex corresponding to a target query edge from a graph structure coding database, wherein the graph structure coding database comprises codes corresponding to a plurality of vertices within two vertices of the target query edge, the target query edge is any one query edge in the input query set, and the coding type of each vertex in the plurality of vertices is direct coding or combined coding; determining the coding type of the first vertex and the coding type of the second vertex according to the coding of the first vertex and the coding of the second vertex; and determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex.

Description

Graph structure query method and device and storage medium

[ technical field ] A method for producing a semiconductor device

The present application relates to the field of graph structures, and in particular, to a graph structure query method, device and storage medium.

[ background of the invention ]

The graph structure is used as a flexible data structure, can express complex real-world entity relationships in a concise form, and is widely applied to various fields, such as a financial platform describing information of transfer transactions and the like among users through a point-edge relationship, and a network facilitator abstracting the communication relationship among network nodes by utilizing the graph structure. The graph query is to extract specific associated information based on graph data analysis, and mainly includes point-edge query, path query, subgraph query and the like.

With the expansion of data scale, the number of graph data edges and nodes is huge, so that the efficiency of querying large graph data is low. Especially, when complex query commands such as multi-hop query and subgraph query are to be processed, the query needs to traverse the neighbor nodes from the starting point, and then sequentially traverse the neighbor nodes of the neighbor nodes, so that the query cost of the hierarchical expansion is very expensive. In addition, in the abstract graph data of the real world, the number of neighbors of the node is far less than the total number of the node, and the node and other nodes on the graph have no edge connection with a high probability.

Therefore, most of the intermediate results in the query process are irrelevant results, such as querying common friends of any two people in the social network, the final result is far less than the number of the whole friends, and the complexity of graph query is increased by useless query cost.

[ summary of the invention ]

The application provides a graph structure query method, a graph structure query device and a storage medium, which can reduce the time consumption of graph structure query.

A first aspect of the present application provides a method for querying a graph structure, including:

acquiring an input query set aiming at a graph structure, wherein the input query set comprises at least one input query edge;

querying codes of a first vertex and a second vertex corresponding to a target query edge from a graph structure coding database, wherein the graph structure coding database comprises codes corresponding to a plurality of vertices within two vertices of the target query edge, the target query edge is any one query edge in the input query set, and the coding type of each vertex in the plurality of vertices is direct coding or combined coding;

determining the coding type of the first vertex and the coding type of the second vertex according to the coding of the first vertex and the coding of the second vertex;

and determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex.

A second aspect of the present application provides a graph structure query apparatus, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an input query set aiming at a graph structure, and the input query set comprises at least one input query edge;

the query device is used for querying codes of a first vertex and a second vertex corresponding to a target query edge from a graph structure coding database, the graph structure coding database comprises codes corresponding to a plurality of vertices in two vertices of the target query edge, the target query edge is any one query edge in the input query set, and the coding type of each vertex in the plurality of vertices is direct coding or combined coding;

a first determining unit, configured to determine a coding type of the first vertex and a coding type of the second vertex according to the coding of the first vertex and the coding of the second vertex;

and the second determining unit is used for determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex.

A third aspect of embodiments of the present application provides a computer device, which includes at least one connected processor, a memory and a transceiver, where the memory is configured to store program codes, and the processor is configured to call the program codes in the memory to perform the steps of the graph structure query method according to the first aspect.

A fourth aspect of the embodiments of the present application provides a computer storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the steps of the graph structure query method described in any one of the above aspects.

Compared with the related art, in the embodiment provided by the application, the graph data is pre-coded in a direct coding and combined coding mode, and the coded graph structure code is stored in the graph structure coding database, so that the coding types of two vertexes of a query edge can be determined when the query edge is input, and the query is carried out according to the coding types, so that the time and space efficiency are considered on the premise of ensuring the correctness of a query result, and the time consumption of graph structure query can be reduced.

[ description of the drawings ]

Fig. 1 is a schematic flowchart of a query method of a graph structure according to an embodiment of the present application;

fig. 2 is a schematic virtual structure diagram of a graph structure query device according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an encoding component of direct encoding according to an embodiment of the present application;

fig. 4 is a schematic diagram of an encoding composition of combinatorial coding provided in an embodiment of the present application;

fig. 5 is a schematic virtual structure diagram of a graph structure query device according to an embodiment of the present application;

fig. 6 is a schematic hardware structure diagram of a server according to an embodiment of the present application.

[ detailed description ] embodiments

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

The invention aims to provide a graph structure query method and related equipment, which pre-encode graph data through a high-efficiency low-dimensional graph encoding algorithm and filter out borderless results in query concentration through decoding during query, thereby considering both time and space efficiency on the premise of ensuring the correctness of query results and achieving the purpose of greatly accelerating graph query.

In order to achieve the purpose, the technical scheme adopted by the application comprises three modules, namely an offline coding module, an online inquiry module and an online updating module.

The offline encoding module encodes the graph nodes by pre-loading the graph data set and the configuration information and using a graph encoding algorithm.

And the online query module decodes the codes of the query point set according to the query content, filters part of the borderless query set and finally returns a query result.

The online updating module is used for maintaining the graph coding data, and updating the graph coding data simultaneously with the operations of updating, deleting, inserting and the like of the database.

Referring to fig. 1, fig. 1 is an architecture diagram of a graph structure query system according to an embodiment of the present application, including:

a graph query acceleration device 101, a database device 102, and an input-output device 103;

the input and output device 103 is responsible for data interaction with the graph query acceleration module 101 and the database device 102, the graph data set is loaded to the offline coding module 101A in the graph query acceleration device 101 for coding in the initial stage, the query set is firstly input to the online query module 101B in the query stage, the database device 102 is queried after partial results are returned, and finally the query result queried from the database device 102 is returned to the input and output device 103. In the update phase, the update data is first input to the online update module 101C for encoding and updating, and then the database device 102 is updated based on the encoded update data. The database device 102 includes, but is not limited to, a graph database, a relational database, and the like.

The following describes a method for querying a graph structure from the perspective of a graph structure querying device, which may be a server or a service unit in the server, and is not particularly limited.

Referring to fig. 2, fig. 2 is a schematic flowchart of a query method of a graph structure according to an embodiment of the present application, including:

201. a set of input queries for a graph structure is obtained.

In this embodiment, the graph structure query device may obtain an input query set for the graph structure, where the input query set includes at least one input query edge, for example, the input query set includes multiple query edges (v)₁，v₂) Wherein v is₁To query a vertex of an edge, v₂For another vertex in the query edge, only multiple times of calling execution are needed for multiple query edges in the input query set to obtain a single query edge (v)₁，v₂) And (4) finishing.

202. And querying the codes of the first vertex and the second vertex corresponding to the target query edge from the graph structure coding database.

In this embodiment, the graph structure query device may obtain the codes of the first vertex and the second vertex corresponding to the target query edge from a graph structure coding database, where the graph structure coding database stores the codes corresponding to multiple vertices including two vertices of the target query edge, and the target query edge is any one query edge in the input query set.

In the method, graph data are coded in a k-core mode and added to a graph structure coding database, wherein the k-core is a commonly used algorithm for mining a closely-associated subgraph structure in a graph, given a graph G and a parameter k, the k-core decomposition algorithm aims to perform subgraph division to obtain a maximum subgraph, and degrees of all vertexes in the subgraph are larger than or equal to k, namely all vertexes in the subgraph have at least k edges connected with other vertexes in the subgraph. The method mainly comprises the following two contents:

1. dividing the graph data through a k-core algorithm to obtain a division result, so that redundant information in the encoding process can be reduced;

2. and according to the division result, dividing the vertex into the removed point and the residual k-core subgraph vertex, and coding the vertex by adopting different coding modes according to different types of the vertex, wherein the coding modes mainly comprise direct coding and combined coding.

The coding part firstly needs to input a coding length m, wherein 1 coding length represents 32 bits (for convenience of description, the bits mentioned in the application are equivalent to bits), namely the total coding length of one vertex is 32 × m, and an input parameter m is used for calculating a parameter k in a k-core algorithm. The coding length m can be set manually or by default, and the coding length m is 5 in default, that is, the code of each node occupies 160 bits (20 bytes), so that the map query can be effectively accelerated by using a smaller coding length, and both time and space are taken into account. For convenience of explanation, the following description of the encoding process uses m-5, but the following description is specific to the parameter m of different sizes:

and A1, performing vertex serial number mapping on the data point set to obtain a target data set corresponding to the data point set.

In this step, the graph structure query device may read the data point set, perform vertex serial number mapping on the read data point set, use ID to represent the original serial number of the vertex, use ID to represent the post-vertex mapping serial number, and then the first read point ID is 1, and the second read point ID is 2, until the point set is completely read, and if the size of the data point set is n, then the vertex ID is 1 to n.

And A2, calculating the peak identification maximum digit and the decomposition parameter corresponding to the target data set.

In this step, the graph structure query device may respectively calculate the maximum vertex identifier bit b and the decomposition parameter k of k-core by the following formulas:

b＝[log₂n]，

m is the code length and can be determined by manual setting or by default setting.

Step A3, determining the vertex degree corresponding to each data in the target data set.

Step A4, determining the vertex identification of the first target vertex and the vertex identification of the neighbor vertex of the first target vertex, wherein the first target vertex is the vertex with the minimum vertex degree in the target data set.

In this step, the vertex degrees of a certain vertex are the number of edges connected to the vertex, and the graph structure query device may calculate the degrees of all vertices in the data point set, sort the degrees in an ascending order, and obtain the vertex identifier of the vertex with the smallest degree (i.e., the first target vertex) and the vertex identifier of the neighbor vertex of the first target vertex, where the neighbor vertex of the first target vertex refers to the vertex connected with the first target vertex with an edge.

And A5, directly coding the vertex identification of the first target vertex and the vertex identification of the neighbor vertex of the first target vertex to obtain the code corresponding to the target vertex.

In this step, the direct coding mode is to divide the code into three parts, the first part is a flag bit and occupies 1 bit, if the first bit of the vertex code is 0, the direct coding mode is adopted, the second part occupies 32 × m-1 bits, the bit string formed by ascending sequencing the neighbor vertices of the first target vertex, each neighbor vertex id occupies b bits and is aligned backwards (the backward alignment here means that the length of the second part is fixed, the bit number of a certain neighbor vertex is less than the length, the identifier of the neighbor vertex is filled forwards from the last of the second part), that is, the 2 nd bit to the 2+ b-1 bit represent the id of the smallest neighbor vertex. The third part is the rest bit string, which is filled with 0's.

And A6, removing the first target vertex and the edge corresponding to the first target vertex from the target data set to obtain a first data set.

In this step, the graph structure query device may remove the vertex with the smallest degree and the edge connected to the vertex, and reduce the degree of the vertex connected to the vertex by one, which indicates that the remaining vertices are no longer connected to the vertex, so that the first data set may be obtained.

Step A7, based on the first data set, iteratively executing steps 3 to 6 until the vertex degree of each data in the target data set is greater than the decomposition parameter.

And step A8, carrying out combined coding on the vertexes with the vertex degrees larger than the decomposition parameters in the target data set.

In this step, the vertices in the target dataset with vertex degrees greater than the decomposition parameters are encoded in a combined manner. Because the degrees of the residual vertexes are all larger than k, all the neighbor ids cannot be directly written into the codes, a part of the neighbor ids are directly written into the codes in a combined coding mode, the residual neighbor ids are written into the codes in a hash mode, and particularly how many neighbor ids are directly written into the codes are realized through a scoring method based on a sliding window and a greedy strategy.

First, the coding components corresponding to the combined coding will be explained:

the coding under the combined coding mode is divided into five parts:

the first part is a flag bit, which occupies 1 bit, 0 represents direct coding, and 1 represents combined coding;

the second part occupies 2 bits and represents a neighbor id writing mode in the combined coding, and the three modes are Left-most, Right-most and Middle, wherein Left-most represents that the directly written id contains the minimum neighbor id, Right-most represents that the directly written id contains the maximum neighbor id, and Middle represents that the directly written id does not contain the minimum neighbor id or the maximum neighbor id. If the first bit and the second bit of the part are 0 and 0 respectively, the writing mode is a leftMost mode, if the first bit and the second bit of the part are 0 and 1, the writing mode is a Middle mode, and if the first bit and the second bit of the part are 1 and 1, the writing mode is a Right-most mode.

The third part is the number of neighbors which are directly written in and occupies log k bits;

the fourth part is directly written, is a bit string formed by ascending vertex ids of the neighbor vertex of the vertex, each id occupies b bits, if the size of the third part is 7, the fourth part represents that 7 neighbor ids are directly written, and the length of the fourth part is 7 x b bits;

the fifth part is a hash coding part, and the length of the fifth part is the length of all the coding bits.

The following describes the encoding flow of the combinatorial coding in detail:

step B1, determining an identification sequence corresponding to a second target vertex and an identification sequence of a neighbor node corresponding to the second target vertex, wherein the second target vertex is any one of the vertexes in the target data set, of which the vertex degrees are greater than the decomposition parameters;

in step B1, a vertex and its neighbor node id sequence are input, and the pair

Step B2, determining the coding score of the corresponding code of the sliding window;

step B3, if the coding score is larger than a preset optimal score, determining the window state of the sliding window, wherein the window state comprises the size and the position of the sliding window;

step B4, moving the sliding window according to a first moving rule based on the size and the position of the sliding window, and iteratively executing the steps B2 to B3 until a preset termination condition is reached;

step B5, adjusting the size of the sliding window, and iteratively executing the steps B2 to B4 based on the adjusted sliding window until the size of the sliding window is larger than a preset value;

and step B6, coding based on the coding score corresponding to the target sliding window and the window state corresponding to the target sliding window to obtain the combined code corresponding to the second target vertex, wherein the target sliding window is the sliding window with the highest coding score.

That is, the graph structure query device may input the second target vertex and the neighbor node id sequence thereof, arrange the neighbor node id sequences in an ascending order, then initialize a sliding window with the size of 0 neighbor nodes, point the leftmost end of the window initial position to the first position of the neighbor node sequence, finally calculate the coding score of the code corresponding to the sliding window through a scoring function, and record the window state of the current sliding window if the coding score of the code corresponding to the sliding window is greater than the preset optimal score, where the window state includes the size and the position of the sliding window.

The following describes the encoding score of the corresponding encoding of the sliding window:

and respectively applying a hash function to the neighbor id outside the sliding window, and setting the position of the fifth part of the code corresponding to the function result as 1. The hash function is id% h, h represents the total number of hash codes, the id% h bit of the fifth part of the code is set to be 1 according to the hash function and the id of the neighbor node, and then a score function is called to calculate a score, which is exemplified as follows:

for example, if the length of the sliding window is 4, the rightmost end of the sliding window points to the 5 th position of the neighbor identification sequence, except for the 5 th to 8 th neighbor ids, the rest neighbor ids apply a hash function, and the bit of the corresponding position of the fifth part is coded to be 1. Since the fourth part of the code takes 4 × b bits, the hash bits h are 32 × m-3-log k-4 × b bits.

The specific score function is:

wherein f is the code score corresponding to the sliding window, V_maxIs the maximum neighbor identification in the sliding window in the neighbor nodes corresponding to the second target vertex, V_minThe minimum neighbor mark in the sliding window in the neighbor node corresponding to the second target fixed point is identified, if V_maxExactly equal to the maximum in the neighbor id sequence, then V_maxN if V_minJust equal to the minimum value, then V_minWhen the value is 0, w is the length of the sliding window, h is the length of the coding position corresponding to the hash function result (i.e. the length of the fifth part of the combinatorial coding, which is the bit number of the hash part), the hash function result is the result obtained by applying the hash function to the identifier of the neighbor node corresponding to the second target vertex in the sliding window, and h is the length of the sliding window₀Is the number of 0's in the bit string corresponding to the hash function result.

And then, moving the sliding window to the right by one bit, and repeating the steps until the rightmost end of the sliding window reaches the rightmost end of the neighbor identification sequence.

Then increasing the size of the sliding window by 1, and repeating the stepsStep, until the size of the sliding window is larger than t, wherein,

and finally, the graph structure inquiry device carries out coding according to the highest score of the target sliding window and the window state of the target sliding window obtained in the step, and obtains the combined code corresponding to the second target vertex. The first part of the code is set to be 1, if the target sliding window contains the minimum neighbor id, the writing mode is Left-most, the second part is set to be 00, if the target sliding window contains the maximum neighbor id, the writing mode is Right-most, the second part is set to be 11, otherwise, the writing mode is set to be 01, the third part is the size of the target sliding window, the fourth part is a continuous bit string formed by vertex ids in the target sliding window, each id occupies b bits and is spliced into a bit string with b x w bits, and the fifth part is a bit string obtained after the rest points pass through a hash function.

203. And determining the coding type of the first vertex and the coding type of the second vertex according to the coding of the first vertex and the coding of the second vertex.

In this embodiment, after obtaining the code of the first vertex and the code of the second vertex corresponding to the target query edge by querying from the graph structure coding database, the graph structure querying device may decode the code of the first vertex and the code of the second vertex, and then determine the coding types of the first vertex and the second vertex by looking up the identification bit of the first vertex and the identification bit of the second vertex, where if the identification bit is 0, the coding type of the vertex is direct coding, and if the identification bit is 1, the coding type of the vertex is combined coding.

204. And determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex.

In this embodiment, after determining the coding type of the first vertex and the coding type of the second vertex, the graph structure querying device may determine the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex. It will be appreciated that the first apex and the second apexThe coding types of the points include two coding types, direct coding and combined coding, and for the sake of understanding, the first vertex is referred to as v below₁The second vertex is v₂The description is given for the sake of example:

the first vertex, the first vertex and the second vertex are both directly coded, that is, the two vertices of the target query edge are both directly coded.

If the coding type of the first vertex and the coding type of the second vertex are both direct codes, the graph structure query device determines the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex, and the query result comprises the following steps:

decoding the code of the first vertex to obtain a neighbor identification sequence corresponding to the first vertex;

if the neighbor identification sequence corresponding to the first vertex contains a second vertex, determining that the first vertex and the second vertex are in a neighbor relation;

if the neighbor identification sequence corresponding to the first vertex does not contain the second vertex, determining that the first vertex and the second vertex are in a non-neighbor relation;

decoding the code of the second vertex to obtain a neighbor identification sequence corresponding to the second vertex;

if the neighbor identification sequence corresponding to the second vertex contains the first vertex, determining that the second vertex and the first vertex are in a neighbor relation;

and if the neighbor identification sequence corresponding to the second vertex does not contain the first vertex, determining that the target query edge does not have a query result.

That is, if the coding type of the first vertex and the coding type of the second vertex are both direct codes, the graph structure query device firstly judges v through the neighbor detection algorithm₂With respect to v₁If the returned result is a neighbor, the query returning edge of the target query edge has a result, and if the returned result is a non-neighbor, the graph structure query device can judge v through a neighbor detection algorithm₁With respect to v₂If v is a neighbor relation of₁And v₂Querying the target for edges for neighbor relationshipsThe query returns the edge with the result, otherwise the return edge does not have the result. The neighbor detection algorithm is explained in detail below:

the neighbor detection algorithm mainly uses the connection relation between the code of one vertex and another vertex. The return result of the neighbor detection method is divided into three types, 1, non-neighbor relation; 2. a neighbor relation; 3. the neighbor relation cannot be determined.

The neighbor detection algorithm is described below to query v₂With respect to v₁By way of example, i.e. by v₁Is determined by the coding decision v₂Whether or not v is₁According to v, of neighbors₁The coding types of (1) are divided into two modes:

direct coding:

query retrieval v₁Corresponding codes, pair v₁Decoding the corresponding code to obtain v₁The corresponding neighbor identification sequence is easy to decode because the direct coding adopts a continuous bit string form, namely the 2 nd to 1+ b th bits are the first vertex id, and so on, the v is judged₁Whether the corresponding neighbor identification sequence contains v or not₂If not, then v is determined₂With respect to v₁Is a non-neighbor relation, if it contains, then v is determined₂With respect to v₁Is a neighbor relation. The coding structure of the direct coding is shown in fig. 3.

And (3) combining and coding:

query retrieval v₁Corresponding coding, firstly obtaining the bit string length of the direct neighbor id in the coding, namely coding the decimal number corresponding to the 4 th to 3+ log k bits, if the decimal number is represented by w, intercepting the bit string with w x b bit length from the 4+ log k bit, equally dividing the bit string into w substrings, namely, each string has the length of b, converting each substring into a decimal form, and finally forming v₁A neighbor id sequence known in the code.

Thereafter idV in the neighbor id sequence is determined_maxAnd minimum idV_minAnd query v₁If the corresponding 2 nd bit and 3 rd bit in the code are 00, V is set_minIs set to 0, V_maxIs v₁The maximum id in the corresponding neighbor id sequence; if it is 11, then V is_maxSetting n, wherein n is the maximum id of a vertex in the data point set; thereafter, v is compared₂And V_max、V_minThe size of (2).

If V_min≤v₂≤V_maxThen directly inquire v₁Whether v is contained in the corresponding neighbor id sequence₂If not, the returned result is a non-neighbor relation, and if the returned result is a neighbor relation.

If v is₂≤V_minOr V_max≤v₂Calculating the hash length h, h-m 32-3-log k-w b, and calculating v₂The hash function f (v) ═ v% h is substituted, and if the result of the hash function is i, the query v is queried₁And if the 3+ log k + w + b + i bit is 0, the returned result is a non-neighbor, and if the 3+ log k + w + b + i bit is 1, the returned result is that the neighbor relation cannot be determined. The coding structure of the combined coding is shown in fig. 4.

If the coding type of the first vertex is direct coding and the coding type of the second vertex is combined coding, that is, the coding type of one vertex in the target query edge is direct coding and the coding type of the other vertex is combined coding.

If the coding type of the first vertex is direct coding and the coding type of the second vertex is combined coding, determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex comprises the following steps:

determining a neighbor identification sequence of neighbor identifications in a code corresponding to the second vertex;

determining a maximum neighbor identifier and a minimum neighbor identifier according to a neighbor identifier sequence of the neighbor identifiers;

determining a target value of the maximum neighbor identifier and a target value of the minimum neighbor identifier according to the specific position parameter in the code corresponding to the second vertex;

comparing the vertex identification of the first vertex, the target value of the maximum neighbor identification and the target value of the minimum neighbor identification to obtain a comparison result;

determining a query result corresponding to the target query edge according to the comparison result;

i.e. if v₁，v₂One is direct coding and the other is combinatorial coding, provided that v₁In a direct coding mode, v₂For the combined coding mode, the graph structure query device can firstly judge v through a neighbor detection algorithm₂With respect to v₁If v is determined₂And v₁If the target query edge is in the neighbor relation, the query result corresponding to the target query edge is an edge existing result, and if v is determined₂And v₁If not, determining that the query result corresponding to the target query edge is an edge-absent result.

It should be noted that, the detection of the combined code in the neighbor detection algorithm has been described in detail above, and details are not described here.

And thirdly, if the coding type of the first vertex and the coding type of the second vertex are both combined coding, namely the coding types of the two vertices of the target query edge are both combined coding.

If the coding type of the first vertex and the coding type of the second vertex are both combined codes, determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex comprises:

determining a first query result corresponding to the target query edge based on the code corresponding to the first vertex;

determining a second query result corresponding to the target query edge based on the code corresponding to the second vertex

If the first query result and the second query result contain the first vertex and the second vertex which are in the neighbor relation, determining the query result of the target query edge as an edge existence result;

if the first query result and the second query result contain the non-neighbor relation between the first vertex and the second vertex, determining the query result of the target query edge as the result without the edge;

and if the first query result and the second query result do not contain the neighbor relation between the first vertex and the second vertex and do not contain the non-neighbor relation between the first vertex and the second vertex, determining the query result of the target query edge by querying the bottom-layer vertex information database.

That is, if v₁，v₂All are combined coding modes, and the graph structure query device judges v through a neighbor detection algorithm₂With respect to v₁And v (i.e. the first query result) of₁With respect to v₂If one of the two query results is a neighbor relationship, the query result corresponding to the target query edge is an edge existence result; if the two query results are both in a non-neighbor relation, the query result corresponding to the target query edge is an edge-absent result; if the two vertexes are not in the neighbor relation, the relation between the two vertexes can be obtained by inquiring the data of the top and bottom storage point information.

When the graph data is updated, the operations are divided into insertion and deletion operations according to the operation property, the following description describes that the code maintenance corresponding to single-side insertion and deletion is performed, and in the case of multiple sides, the operation only needs to be performed by decomposing into multiple points and performing multiple times in a single-side mode. For convenience of explanation, the sides of insertion/deletion are (v)₁，v₂)。

In one embodiment, the graph structure query device further performs the following operations:

acquiring a target updating edge, wherein the target updating edge comprises a third vertex and a fourth vertex;

determining the coding type of a third vertex and the coding type of a fourth vertex;

and updating the target updating edge to the graph structure coding database according to the coding type of the third vertex and the coding type of the fourth vertex.

In this embodiment, the graph structure querying device may obtain a target update edge, where the target update edge includes a third vertex and a fourth vertex, where the target update edge may be added or deleted, and is not specifically limited, and then the graph structure querying device may determine the coding type of the third vertex and the coding type of the fourth vertex, and update the target update edge into the graph structure coding database according to the coding types of the two vertices. How to add and delete will be described in detail according to the coding types of the two vertices of the target update edge: firstly, the target updating edge is the edge newly added into the graph structure coding database.

Step C1, inquiring two vertexes v of target update edge₁，v₂And correspondingly coding, checking the coding zone bit, and further determining the coding types of the two vertexes of the target updating edge.

Step C2, if v₁，v₂One vertex in the encoding is directly encoded, and the number of neighbors in the encoding is smaller than the upper bound b of the number of neighbors (the above-mentioned calculation method of b is explained in detail, and is not described here in detail), if both are satisfied, one vertex is randomly selected for updating, and the specific updating is as follows:

let v be the vertex satisfying the above two conditions₁Parsing v from the original code₁The neighbor identification sequence of (1). Then v is₂Insert into v₁Keeping id ascending order in the neighbor identification sequence, updating the code by a direct coding mode (the direct coding mode is described in detail above and is not described here in detail), and then based on the vertex v₁The updated code performs an update operation of the graph structure coded database.

Step C3, if v₁，v₂One vertex is directly coded, the number of the coding neighbors is equal to the upper bound b of the number of the neighbors, and the other coding mode is combined coding.

Suppose a vertex v₁Is directly coded, then only the vertex v is coded₁Performing coding update to determine vertex v₁Of the vertex v, the vertex v₂Inserted to vertex v₁Keeping id ascending order in the neighbor identification sequence, executing update coding of the combined coding mode (the above-mentioned combined coding mode is explained in detail, and is not described here again specifically), and then based on vertex v₁The updated code performs an update operation of the graph structure coded database.

Step C4, if v₁，v₂Wherein both vertices are directly coded, and twoThe number of neighbors of the vertex is equal to the upper bound b of the number of neighbors.

Acquiring the neighbor identification sequences of two vertexes, and converting v₁Is added to v₂The neighbor identification sequence of (1) is coded in a combined coding mode to obtain an added vertex v₁Vertex v after the neighbor identification sequence of₂Is coded score s₁And code e₁Then v is₂Addition to v₁In the neighbor identification sequence, a combined coding mode is applied to obtain an added vertex v₂Vertex v after the neighbor identification sequence of₁Coding score s₂And code e₂. Comparing the two scores, assuming a score s₁>s₂Then only v is updated₁Coding of points, by e₁In place of v₁And (5) original coding.

Step C5, if v₁，v₂In which both vertices are in a combined coding mode, first, querying v from a graph structure coding database₁As an initialization neighbor identification sequence, and then querying v₁Coding of the corresponding neighbor vertex, by vertex v₁If the returned result is a neighbor relation, the neighbor vertex is moved from v to v in order to input and run a neighbor detection algorithm (the neighbor detection algorithm is explained in detail above and is not described herein in detail), and the returned result is a neighbor relation₁Removing the neighbor identification sequence, and then executing a combined coding mode to obtain a coding score s₁And code e₁To v is to v₂Performing the same operation to obtain a coding score s₂And code e₂Comparing the two scores, assuming a score s₁>s₂Then only v is updated₁Coding of points, by e₁In place of v₁And (5) original coding.

And secondly, the target updating edge is an edge deleted from the graph structure coding database.

Step D1, querying two vertexes v corresponding to target query edges₁And v₂Corresponding coding, looking at the coded flag bits to determine two vertices v₁And v₂The type of encoding of (1).

Step D2, if v₁，v₂Are all directly coded, firstFirst to v₁Decoding to obtain v₁Corresponding neighbor id sequence, if v₁V is contained in the corresponding neighbor id sequence₂If so, delete v₁V in the corresponding neighbor id sequence₂After the base, direct encoding is performed again, and v is updated by the encoding obtained after the re-encoding₁Corresponding coding;

if v is₁The corresponding neighbor id sequence does not contain v₂Then to v₂Decoding to obtain v₂Corresponding neighbor id sequence, if v₂The corresponding neighbor id sequence does not contain v₁If so, delete v₂V of the corresponding neighbor id sequence₁Using the sequence pair v obtained after deletion₂Re-encoding is carried out and the resulting code after re-encoding is used to update v₂And (4) corresponding coding.

Step D3, if v₁，v₂One is a direct coding mode, the other is a combined coding mode, and v is assumed₁For direct encoding, v is first encoded₁Decoding to obtain v₁Corresponding neighbor id sequence, if v₁V is contained in the corresponding neighbor id sequence₂If so, delete v₁V in the corresponding neighbor id sequence₂Re-encoding directly, updating v₁And (4) correspondingly coding. If v is₁The corresponding neighbor id sequence does not contain v₂And directly carrying out database updating operation.

Step D4, if v₁，v₂All are combined coding mode, firstly pass v₁Whether v can be determined or not₂Is other than v₁The above description has already explained in detail how to determine the neighbor relationship between two vertices, and details are not repeated here.

If v cannot be determined₂Is not v₁By querying v through the database₁And deleting v in the neighbor identification sequence₂If the length of the neighbor identification sequence after deletion is less than or equal to k, adopting a direct coding mode to carry out v₁Re-encoding, otherwise, adopting combined encoding mode to pair v₁And (6) recoding. Then, based on the vertex v₂And the steps are executed iteratively. And finally, updating the database based on the codes of the two vertexes obtained after the recoding.

In summary, it can be seen that, in the embodiment provided by the present application, the graph data is pre-encoded in a direct encoding and combined encoding manner, and the encoded graph structure code is stored in the graph structure encoding database, so that when the query edge is input, the encoding types of two vertices of the query edge can be determined, and the query is performed according to the encoding types, thereby considering both time and space efficiency on the premise of ensuring the correctness of the query result, and reducing the time consumption of graph structure query.

The present application is described above in terms of a graph structure query method, and is described below in terms of a graph structure query device.

Referring to fig. 5, fig. 5 is a schematic view of a virtual structure of a graph structure query device according to an embodiment of the present application, where the graph structure query device 500 includes:

an obtaining unit 501, configured to obtain an input query set for a graph structure, where the input query set includes at least one input query edge;

a querying device 502, configured to query, from a graph structure encoding database, codes of a first vertex and a second vertex corresponding to a target query edge, where the graph structure encoding database includes codes corresponding to multiple vertices within two vertices of the target query edge, the target query edge is any one query edge in the input query set, and a coding type of each vertex in the multiple vertices is direct coding or combined coding;

a first determining unit 503, configured to determine a coding type of the first vertex and a coding type of the second vertex according to the coding of the first vertex and the coding of the second vertex;

a second determining unit 504, configured to determine a query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex.

In a possible design, if the coding type of the first vertex and the coding type of the second vertex are both direct coding, the second determining unit 504 is specifically configured to:

if the neighbor identification sequence corresponding to the first vertex contains the second vertex, determining that the first vertex and the second vertex are in a neighbor relation;

In one possible design, if the coding type of the first vertex is direct coding and the coding type of the second vertex is combinatorial coding, the second determining unit 504 is further specifically configured to:

determining a maximum neighbor identifier and a minimum neighbor identifier according to the neighbor identifier sequence of the neighbor identifiers;

determining a target value of the maximum neighbor identifier and a target value of the minimum neighbor identifier according to a specific position parameter in a code corresponding to the second vertex;

and determining a query result corresponding to the target query edge according to the comparison result.

In a possible design, if the coding type of the first vertex and the coding type of the second vertex are both combination coding, the second determining unit 504 is further specifically configured to:

If the first query result and the second query result contain the neighbor relation between the first vertex and the second vertex, determining the query result of the target query edge as an edge existence result;

if the first query result and the second query result contain the non-neighbor relation between the first vertex and the second vertex, determining that the query result of the target query edge is an edge-absent result;

and if the first query result and the second query result do not contain the neighbor relation between the first vertex and the second vertex and do not contain the non-neighbor relation between the first vertex and the second vertex, determining the query result of the target query edge by querying a bottom layer vertex information database.

In one possible design, the first determining unit 503 is further configured to:

step 1, performing vertex serial number mapping on a data point set to obtain a target data set corresponding to the data point set;

step 2, calculating the maximum digit number of the vertex identification and the decomposition parameter corresponding to the target data set;

step 3, determining a vertex degree corresponding to each data in the target data set;

step 4, determining a vertex identification of a first target vertex and vertex identifications of neighbor vertices of the first target vertex, wherein the first target vertex is the vertex with the minimum vertex degree in the target data set;

step 5, directly coding the vertex identification of the first target vertex and the vertex identification of the neighbor vertex of the first target vertex to obtain a code corresponding to the target vertex;

step 6, removing the first target vertex and the edge corresponding to the first target vertex from the target data set to obtain a first data set;

step 7, based on the first data set, iteratively executing the steps 3 to 6 until the vertex degree of each data in the target data set is greater than the decomposition parameter;

and 8, carrying out combined coding on the vertexes with the vertex degrees larger than the decomposition parameters in the target data set.

In one possible design, the first determining unit 503 performs combinatorial coding on the vertices in the target data set with vertex degrees greater than the decomposition parameter by:

step 1, determining an identification sequence corresponding to a second target vertex and an identification sequence of a neighbor node corresponding to the second target vertex, wherein the second target vertex is any one of vertexes in the target data set, of which the vertex degrees are greater than the decomposition parameter;

step 2, determining the coding score of the code corresponding to the sliding window, wherein the sliding window is the sliding window with the size of zero neighbor nodes;

step 3, if the coding score is larger than a preset optimal score, determining the window state of the sliding window, wherein the window state comprises the size and the position of the sliding window;

step 4, moving the sliding window according to a first moving rule based on the size and the position of the sliding window, and iteratively executing the step 2 to the step 3 until a preset termination condition is reached;

step 5, adjusting the size of the sliding window, and iteratively executing the step 2 to the step 4 based on the adjusted sliding window until the size of the sliding window is larger than a preset value;

and 6, coding based on the coding score corresponding to the target sliding window and the window state corresponding to the target sliding window to obtain the combined code corresponding to the second target vertex, wherein the target sliding window is the sliding window with the highest coding score.

In one possible design, the encoding, by the first determining unit 503, based on the encoding score corresponding to the target sliding window and the window state corresponding to the target sliding window, to obtain the combined encoding corresponding to the second target vertex includes:

setting a first position code corresponding to the second target vertex to be 1;

determining a second position code according to the neighbor identification corresponding to the second target vertex contained in the target sliding window;

setting the size of the target sliding window to be a third position code;

setting a continuous bit string corresponding to the identifier of the neighbor node contained in the target sliding window as a fourth position code;

performing hash function processing on the remaining nodes in the neighbor nodes corresponding to the second target vertex to obtain a fifth position code;

the combined code corresponding to the second target vertex includes the first position code, the second position code, the third position code, the fourth position code and the fifth position code.

In one possible design, the apparatus further includes:

an updating unit 505, the updating unit 505 being configured to:

determining a coding type of the third vertex and a coding type of the fourth vertex;

Fig. 6 is a schematic structural diagram of a server according to the present application, and as shown in fig. 6, a server 600 according to this embodiment includes at least one processor 601, at least one network interface 604 or other user interface 603, a memory 605, and at least one communication bus 602. The server 600 optionally contains a user interface 603 including a display, keyboard or pointing device. The memory 605 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile) such as at least one disk memory. The memory 605 stores execution instructions, and when the server 600 runs, the processor 601 communicates with the memory 605, and the processor 601 calls the instructions stored in the memory 605 to execute the query method of the graph structure. The operating system 606, which contains various programs for implementing various basic services and for handling hardware-dependent tasks.

The server provided in the embodiment of the present application may execute the technical solution of the embodiment of the query method with the graph structure, and the implementation principle and the technical effect are similar, which are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a computer, implements the method flows related to the graph structure query device in any of the above method embodiments. Correspondingly, the computer can be the graph structure query device.

The present application further provides a computer program or a computer program product including the computer program, which when executed on a computer, will make the computer implement the method flow related to the graph structure query device in any of the above method embodiments. Correspondingly, the computer can be the graph structure query device.

In the above-described embodiment corresponding to fig. 1, all or part of the implementation may be realized by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for querying a graph structure, comprising:

querying codes of a first vertex and a second vertex corresponding to a target query edge from a graph structure coding database, wherein the graph structure coding database stores codes corresponding to a plurality of vertices including two vertices of the target query edge, the target query edge is any one query edge in the input query set, and the coding type of each vertex in the plurality of vertices is direct coding or combined coding;

2. The method of claim 1, wherein if the coding type of the first vertex and the coding type of the second vertex are both direct codes, the determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex comprises:

3. The method of claim 1, wherein if the coding type of the first vertex is direct coding and the coding type of the second vertex is combinatorial coding, the determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex comprises:

and determining the query result corresponding to the target query edge according to the comparison result.

4. The method of claim 1, wherein if the coding type of the first vertex and the coding type of the second vertex are both combined codes, the determining the query result of the target query edge according to the coding type of the first vertex and the coding type of the second vertex comprises:

5. The method according to any one of claims 1 to 4, further comprising:

6. The method of claim 5, wherein the combinatorial encoding of vertices in the target dataset having a degree of vertices greater than the decomposition parameter comprises:

step 2, determining the coding score of the code corresponding to the sliding window;

7. The method of claim 6, wherein the encoding based on the encoding score corresponding to the target sliding window and the window state corresponding to the target sliding window to obtain the combined encoding corresponding to the second target vertex comprises:

setting the size of the target sliding window to be a third position code;

8. The method of any one of claims 1 to 4, 6 and 7, further comprising:

9. A graph structure query device, comprising:

the query device is used for querying codes of a first vertex and a second vertex corresponding to a target query edge from a graph structure coding database, wherein the graph structure coding database stores codes corresponding to a plurality of vertices including two vertices of the target query edge, the target query edge is any one query edge in the input query set, and the coding type of each vertex in the plurality of vertices is direct coding or combined coding;

10. A computer storage medium, comprising:

instructions which, when run on a computer, cause the computer to perform the steps of the graph structured query method of any one of claims 1 to 8.