CN112819154A

CN112819154A - Pre-training model generation method and device applied to graph learning field

Info

Publication number: CN112819154A
Application number: CN202110072779.8A
Authority: CN
Inventors: 杨洋; 邵平; 王春平; 徐晟尧; 胥奇; 陈磊
Original assignee: Shanghai Shanghu Information Technology Co ltd; Zhejiang University ZJU
Current assignee: Shanghai Shanghu Information Technology Co ltd; Zhejiang University ZJU
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2021-05-18
Anticipated expiration: 2041-01-20
Also published as: CN112819154B

Abstract

The invention discloses a method and a device for generating a pre-training model applied to the field of graph learning, wherein the method comprises the following steps: determining a first node and a first sample subgraph of the first node from a first graph structure sample, wherein the first graph structure sample is any one of first training samples, the attribute of each node of the first graph structure sample is different from the attribute of each node of a second graph structure sample in the first training sample, then determining a graph base element label vector of the first node and an initial representation vector of the first sample subgraph, then obtaining a first characteristic representation vector and a graph base element prediction vector of the first node, and finally training an initial pre-training model according to the graph base element prediction vector and the graph base element label vector until a pre-training model is obtained. And furthermore, the pre-training model is obtained through a plurality of graph structure samples, and the attributes of the graph structure samples are different, so that the pre-training model can be used as the pre-training model of the graph structure of each attribute.

Description

Pre-training model generation method and device applied to graph learning field

Technical Field

The invention relates to the field of image learning, in particular to a method and a device for generating a pre-training model applied to the field of image learning.

Background

The graph convolutional neural network learns the expression vectors of the nodes in the picture by gathering the characteristics of the surrounding neighbor nodes, and can process downstream tasks such as node classification and picture classification in the picture. The picture can be used as a graph structure for representing data, the associated information of the graph structure between the study objects has unique advantages, for example, in the field of biochemistry, molecules can be regarded as the graph structure, and different atoms are nodes and are connected through chemical bonds; in the academic citation network, nodes represent scholars, and cooperation among the scholars is association information among the nodes, namely edges in a graph structure; in the field of electronic commerce, users and commodities can form a graph structure to perform personalized recommendation.

However, the semantics represented by the nodes in the graph structure are different for different graph structures, e.g., in a graph structure of an academic citation network, each node represents a learner, and in a graph structure of a social network, the nodes also include the interests of the user. In the prior art, a graph convolution neural network model is generally trained directly through a data set without a pre-training process, even if the graph convolution neural network model is pre-trained, the pre-training model can be obtained only through a graph structure with a single attribute or a series of attributes in a certain unique field, but the obtained pre-training model can be used for the graph structure with the corresponding single attribute or the series of attributes in the certain unique field, and cannot be used for the graph structures with other attributes, that is, the pre-training model cannot be obtained through the graph structures with different attributes, and the obtained pre-training model cannot be used for training the structures with different attributes to obtain a final model.

Therefore, a model pre-training method is needed, in which a pre-training model can be obtained through a graph structure of each attribute, and then a model of each attribute is obtained from the obtained pre-training model, so as to accelerate the model training efficiency.

Disclosure of Invention

The embodiment of the invention provides a method and a device for generating a pre-training model applied to the field of graph learning, which are used for obtaining the pre-training model through a graph structure sample of each attribute, obtaining a graph neural network model of each attribute through the obtained pre-training model and accelerating the training efficiency of the graph neural network model of each attribute.

In a first aspect, an embodiment of the present invention provides a method for generating a pre-training model applied to the field of graph learning, including:

determining a first node from a first graph structure sample and determining a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; the attributes of each node of the first graph structure sample are different from the attributes of each node of the second graph structure sample in the first training sample;

determining a graph element label vector of the first node and an initial representation vector of the first sample sub-graph according to the first graph structure sample;

inputting the initial expression vector of the first sample subgraph into an initial pre-training model to obtain a first feature expression vector; inputting the first feature expression vector into a first initial neural network model to obtain a graph element prediction vector of the first node;

and training the initial pre-training model according to the graph element prediction vector and the graph element label vector until the pre-training model is obtained.

In the technical scheme, nodes are determined from a plurality of graph structure samples, then sample sub-graphs of all the nodes are obtained, further graph element label vectors of the nodes and initial representation vectors of the sample sub-graphs are obtained, wherein the characteristic representation vectors of the sample sub-graphs represent structural information of the nodes in the graph structure samples, then the initial representation vectors of the sample sub-graphs are input into an initial pre-training model to obtain characteristic representation vectors, then the characteristic representation vectors are input into a first initial neural network model to obtain graph element prediction vectors of all the nodes, and finally the initial pre-training model is trained according to the graph element prediction vectors and the graph element label vectors until the pre-training model is obtained. Because the graph elements represent the structural information of the nodes in the graph structure samples, it should be noted that the attributes of the nodes in the graph structure samples are different, and the pre-training model is obtained through the feature representation vector representing the structural information and the graph element vector, so that the pre-training model is obtained through the graph structure samples of each attribute, that is, the pre-training model can be used for training the graph structure samples of each attribute to obtain the graph neural network model of the corresponding attribute.

Optionally, determining a first node from the first graph structure sample and determining a first sample subgraph of the first node includes:

randomly selecting a node in the first graph structure sample as the first node;

and obtaining a first sample sub-graph of the first node by restarting a random walk strategy algorithm according to the connection relation among the nodes in the first graph structure sample.

The sample subgraph of the node is obtained by restarting the random walk strategy algorithm, the generalization capability of the obtained pre-training model can be increased, a plurality of sample subgraphs of the node can be obtained by using the random walk strategy algorithm for a plurality of times, the sample subgraphs of the node are increased, and the pre-training sample is increased.

Optionally, determining a graph element tag vector of the first node according to the first graph structure sample includes:

and counting the number of the graph elements of each preset type of the first node according to the connection relation among the nodes in the first graph structure sample, and obtaining the graph element label vector of the first node according to the number of the graph elements of each preset type.

Optionally, determining an initial representation vector of the first sample sub-graph according to the first graph structure sample includes:

obtaining an initial representation vector of the first sample subgraph according to the following formula (1);

I-(K^-1/2MK^-1/2)＝UΛU^T………………………………………………(1)；

wherein I is an identity matrix; k is a degree matrix of the first node in the first graph structure sample; m is a adjacency matrix of the first node in the first graph structure sample; u is the initial representation vector of the first sample subgraph.

Optionally, training the initial pre-training model according to the primitive prediction vector and the primitive label vector until the pre-training model is obtained, includes:

determining a vector difference value according to the graph element prediction vector and the graph element label vector by the following formula (2);

wherein V is a preset value; c. C_uA graph primitive label vector for the first node; f (G'_u) Predicting a vector for a graph element of a first node; g is a first graph structure sample; l is the vector difference value;

updating the initial pre-training model and the first initial neural network model according to the vector difference value; and obtaining the pre-training model until the vector difference value meets a set condition.

Optionally, after obtaining the pre-training model, the method further includes:

constructing an initial model, wherein the initial model comprises the pre-training model and a second initial neural network model;

aiming at any third graph structure sample in a second training sample, inputting the third graph structure sample into the pre-training model to obtain a second feature expression vector; inputting the second feature expression vector into the second initial neural network model to obtain a predicted value of the third graph structure sample; nodes of each graph structure sample in the second training sample have the same attribute;

and training the initial model according to the predicted value and the label value of the third graph structure sample until a training end condition is met.

In the above technical solution, the pre-training model is obtained according to the graph structure sample of each attribute, that is, the pre-training model is equivalent to a comprehensive pre-training model, and the graph neural network model of the corresponding attribute is obtained by training according to the graph structure sample of each attribute through the pre-training model, so that the training efficiency of the graph neural network model of each attribute is increased, and the model training time is reduced.

Optionally, training the initial model according to the predicted value and the label value of the third graph structure sample, including:

updating the second initial neural network model according to the predicted value and the label value of the third graph structure sample; or updating the pre-training model and the second initial neural network model according to the predicted value and the label value of the third graph structure sample.

In the technical scheme, the training efficiency is increased and the training time is reduced by only updating the second initial neural network model to train the initial model. The accuracy of the trained initial model is increased by training the initial model by updating the pre-trained model and the second initial neural network model.

In a second aspect, an embodiment of the present invention provides an apparatus for generating a pre-training model applied in the graph learning field, including:

the selection module is used for determining a first node from a first graph structure sample and determining a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; the attributes of each node of the first graph structure sample are different from the attributes of each node of the second graph structure sample in the first training sample;

a processing module, configured to determine, according to the first graph structure sample, a graph element tag vector of the first node and an initial representation vector of the first sample sub-graph;

Optionally, the selection module is specifically configured to:

Optionally, the processing module is specifically configured to:

Optionally, obtaining an initial representation vector of the first sample sub-graph according to the following formula (1);

Optionally, the processing module is specifically configured to:

whereinV is a preset value; c. C_uA graph primitive label vector for the first node; f (G'_u) Predicting a vector for a graph element of a first node; g is a first graph structure sample; l is the vector difference value;

Optionally, the processing module is further configured to:

after the pre-training model is obtained, constructing an initial model, wherein the initial model comprises the pre-training model and a second initial neural network model;

Optionally, the processing module is specifically configured to:

In a third aspect, an embodiment of the present invention further provides a computing device, including:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing the generation method of the pre-training model applied to the image learning field according to the obtained program.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to enable a computer to execute the above method for generating a pre-training model applied in the graph learning field.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a diagram structure provided by an embodiment of the present invention;

FIG. 2 is a system architecture diagram according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for generating a pre-training model applied to the field of graph learning according to an embodiment of the present invention;

FIG. 4 is a diagram of a graphics primitive provided in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of a model calculation according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an initial pre-training model calculation according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a device for generating a pre-training model applied to the graph learning field according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the prior art, graph structures are various, and taking a in fig. 1 as an example, a represents a graph structure, where the graph structure includes nodes (e.g., a and B) and edges between the nodes; the nodes represent the objects to be analyzed, and the edges represent the association relationship with certain attributes among the objects. It can be seen that nodes in different graph structures are different, the number of edges between nodes is different, neighbors of the same node are different, the nodes and the edges have no fixed sequence, the attributes of the nodes may be the same or different, the edges may or may not have weights, and the graph may be dynamic or static. The graph structure is a bridge for bridging real-world data sets and artificial intelligence technology, and has extremely high research significance and practical value. Many problems in real-life scenarios can translate into classical problems in the field of image learning. Such as finding a fraudster in a telecommunications network, can be understood as an abnormal node detection problem in a graph structure.

Graph neural network models are typically trained by a large number of graph structures as training data. However, in real application scenarios, it is often difficult to obtain sufficient training data. Pre-training techniques are often used to solve the problem of insufficient training data volume. Pre-training techniques have enjoyed success in the fields of computer vision and natural language processing, but pre-training in the field of image learning remains a challenging problem. The reason for this is that objects have a uniform meaning in the natural language processing domain and the computer vision domain, for example, for the natural language processing domain, the same words and commonly used phrases in different texts all imply the same semantics. However, in the field of graph learning, because the attributes of graph structures are different, for example, fig. 1 exemplarily shows a schematic diagram of a graph structure, as shown in fig. 1, a in fig. 1 is a graph structure of social attributes, and a node represents a user, where the attributes between a user and B user are the same hobbies. The b diagram in fig. 1 is a diagram structure of academic attributes, and nodes represent users, wherein the attributes between the a user and the C user are the same academic research direction. Pre-training for graph structures of different attributes is therefore not possible.

For graph structures with different attributes, a pre-trained model cannot be obtained according to a plurality of graph structures in the prior art. Therefore, a method for generating a pre-training model is needed to reduce the training time of the graph neural network model and improve the training efficiency of the graph neural network model.

Fig. 2 illustrates a system architecture to which an embodiment of the present invention is applicable, which includes a server 200, and the server 200 may include a processor 210, a communication interface 220, and a memory 230.

The communication interface 220 is used to obtain a graph structure sample of each attribute.

The processor 210 is a control center of the server 200, connects various parts of the entire server 200 using various interfaces and routes, performs various functions of the server 200 and processes data by operating or executing software programs and/or modules stored in the memory 230 and calling data stored in the memory 230. Alternatively, processor 210 may include one or more processing units.

The memory 230 may be used to store software programs and modules, and the processor 210 executes various functional applications and data processing by operating the software programs and modules stored in the memory 230. The memory 230 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Further, memory 230 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

It should be noted that the structure shown in fig. 2 is only an example, and the embodiment of the present invention is not limited thereto.

Based on the above description, fig. 3 exemplarily shows a flow of a method for generating a pre-training model applied to the graph learning field, which can be performed by a device for generating a pre-training model applied to the graph learning field according to an embodiment of the present invention.

As shown in fig. 3, the process specifically includes:

step 310, a first node is determined from the first graph structure sample and a first sample subgraph of the first node is determined.

In this embodiment of the present invention, the first graph structure sample is any one of the first training samples, and the attribute of each node of the first graph structure sample is different from the attribute of each node of the second graph structure sample in the first training sample. In other words, the first training sample includes a plurality of graph structures with different attributes, and each graph structure is constructed according to a single attribute. For example, the first training sample includes graph structure 1, graph structure 2, and graph structure 3, where graph structure 1 is a graph structure in which the attribute of each node is social attribute, that is, the nodes of graph structure 1 having edges have the same interest, graph structure 2 is a graph structure in which the attribute of each node is academic attribute, that is, the nodes of graph structure 2 having edges have the same research direction, and graph structure 3 is a graph structure in which the attribute of each node is taste attribute, that is, the edges of graph structure 3 having edges have the same eating taste. The first graph structure sample is any one of graph structure 1, graph structure 2, and graph structure 3. The second graph structure sample is any one of the graph structures 1, 2 and 3 except the first graph structure sample.

In the embodiment of the present invention, the manner of determining the first node may be: the nodes are sequenced according to the number of edges of each node in the first graph structure, and then the first node is determined from the nodes in a mode that the number is from small to large or the number is from large to small, or the first node is randomly selected from the first graph structure, or the nodes are selected according to the positions of the nodes in the graph structure, and the like.

The first sample sub-graph refers to a sub-graph formed by searching nodes which have direct or indirect incidence relation with the first node in the first graph structure sample. Fig. 5 is a schematic diagram exemplarily illustrating a model calculation, as shown in fig. 5, fig. 5(1) is a diagram structure sample, the black node in fig. 5(2) (node a in fig. 5 (3)) is a first determined node, and fig. 5(3) is a first sample sub-graph of the first node. Of course, the first sample sub-graph of the first node may also be a sub-graph formed by four ABCD nodes. In this embodiment of the present invention, the method for determining the first sample subgraph of the first node may be performed according to the number of edges of other nodes connected to the first node, for example, the node x is connected to the node y and the node z, where the number of edges of the node y is 5 and the number of edges of the node z is 3, and since the number of edges of the node y is greater than the number threshold (e.g., the number threshold is 4), the sample subgraph of the node x includes the node x and the node y.

Further, a node in the first graph structure sample is randomly selected as a first node, and a first sample subgraph of the first node is obtained by restarting a random walk strategy algorithm according to the connection relation between the nodes in the first graph structure sample. And obtaining a first sample subgraph of the first node by restarting the random walk strategy algorithm so as to increase the generalization capability of the trained pre-training model.

Step 320, determining a graph element label vector of the first node and an initial representation vector of the first sample sub-graph according to the first graph structure sample.

Graph elements (Motif) are the basic modules that make up a graph, describing the subgraph structure of a particular connection between different nodes. The graphics primitive has wide application in the fields of bioinformatics, neuroscience, biology, social networks and the like. As a frequently appearing sub-graph structure in the graph, the graph elements contain abundant information. For graph structures of different attributes, a graph base element with a denser structure implies a more intimate relationship between nodes, such as Bob and his two close friends in a social network and Bob and his two collaborators in an academic network. In contrast, a graph primitive with a more relaxed structure implies the opposite meaning.

The graph primitives include a plurality of types, the undirected graph primitives of 2-4 orders are selected in the embodiment of the invention, and as shown in fig. 4, the preset types of the graph primitives include 15 types, wherein a black node is a determined first node.

In the embodiment of the invention, the graph element label vector of the first node is obtained according to the connection relation between the nodes in the graph structure sample. The graph primitive label vector represents a proportion of each graph primitive in the first graph structure sample of the first node, and may be specifically determined according to a weight of an edge connecting the first node and another node, or according to the number of the graph primitives.

Further, according to the connection relation among the nodes in the first graph structure sample, the number of the graph elements of each preset type of the first node is counted, and the graph element label vector of the first node is obtained according to the number of the graph elements of each preset type. The graph base element label vector (motion count vector) of a node represents the number of times c that each type of graph base element containing the node appears around the node respectively_uWherein

Indicating the number of times the ith graph element appears around node u. In the embodiment of the present invention, for each graph structure sample, an orca algorithm may be used to calculate a graph element label vector of each node in the graph structure sample.

As shown in FIG. 4, the number of primitives of each preset type existing in the graph structure sample of the first node can be determined by the preset primitive types, for example, the primitive including only two nodes in FIG. 4 is m₀Type, shown in the a diagram in FIG. 1, m for node A₀The number of type graphics primitives is 4. Similarly, the number of each preset type of graphics primitive of the node a can be determined.

In the embodiment of the present invention, the initial representation vector of the first sample sub-graph may be obtained according to the degree matrix of the first node in the first graph structure sample, for example, the initial representation vector of the first sample sub-graph is an inverse matrix or a transposed matrix of the degree matrix of the first node in the first graph structure sample, and the initial representation vector of the first sample sub-graph may also be obtained according to the adjacency matrix of the first node in the first graph structure sample, for example, the initial representation vector of the first sample sub-graph is an inverse matrix or a transposed matrix of the adjacency matrix of the first node in the first graph structure sample, or the initial representation vector of the first sample sub-graph is a multiplication of the degree matrix of the first node in the first graph structure sample and the adjacency matrix of the first node in the first graph structure sample.

Further, determining an initial representation vector of the first sample sub-graph according to the first graph structure sample, including:

obtaining an initial expression vector of the first sample subgraph according to the following formula (1);

Step 330, inputting the initial representation vector of the first sample sub-graph into an initial pre-training model to obtain a first feature representation vector; and inputting the first feature expression vector into a first initial neural network model to obtain a graph element prediction vector of the first node.

In the embodiment of the invention, the feature expression vector of the first node can be obtained by inputting the initial expression vector of the sample subgraph of the first node into the initial pre-training model, wherein the feature expression vector represents the structural information of the first node in the graph structure sample, and the graph element prediction vector of the first node is obtained through the first initial neural network model. As shown in fig. 5, the pattern structure determines a node a by randomly selecting a node, obtains a sample sub-graph of the node a by restarting a random walk strategy algorithm, then determines an initial representation of the sample sub-graph of the node a according to the above formula (1), for example, the initial representation of the sample sub-graph of the node a is [3, 2, 2, 4, 3, 5, 1], inputs the initial representation vector of the sample sub-graph of the node a into an initial preset training model, obtains a feature representation vector of the node a, for example, the feature representation vector of the node a is a 32-dimensional vector, which is [2, 4, … …, 3, 2], inputs the feature representation vector of the node a into a first initial neural network model to obtain a graph-based element prediction vector of the node a, so as to reduce the dimension of the feature representation vector of the node a, and make the dimension of the obtained graph-based element prediction vector of the node a consistent with the dimension of the graph-based element label vector of the node a, and the graph element prediction vector of the node A represents the number of predicted graph elements of each preset type of the node A.

Fig. 6 exemplarily shows a schematic diagram of calculation of an initial pre-training model, as shown in fig. 6, a first-layer training value of each node in a sample sub-graph is obtained by using an initial representation vector of the sample sub-graph as an input value of an initial pre-training model, for example, the first-layer training value of a node B is obtained according to a node a and a node C, and similarly, the first-layer training value of the node a is obtained according to the first-layer training values of the node B, the node C, the node E, and the node D, and a second-layer training value of nodes a to G is obtained by continuing training through the first-layer training value of the node a until a final feature representation vector of the node a is obtained.

It should be noted that the feature representation vector of the first node is input to the first initial neural network model, so that the obtained graph element prediction vector of the first node is consistent with the dimension of the graph element tag vector of the first node counted in step 320, for example, as shown in fig. 5, the dimension of the graph element tag vector of the first node counted is 15 dimensions, the initial representation vector of the sample sub-graph of the node a is input to the initial preset training model, so that a 64-dimensional feature representation vector is obtained, and the obtained 64-dimensional feature representation vector is input to the first initial neural network model, so that a 15-dimensional graph element prediction vector of the first node is obtained.

Step 340, training the initial pre-training model according to the graph element prediction vector and the graph element label vector until the pre-training model is obtained.

In the embodiment of the invention, a vector can be obtained by calculating according to the graph primitive prediction vector and the graph primitive label vector, a value is determined through the obtained vector, and the value is used for training the initial pre-training model. For example, the ratio of each dimension vector in the map primitive prediction vector and the map primitive label vector is determined, a 15-dimension ratio vector is further obtained, the variance is determined according to 15 values of the ratio vector, and the variance is used for training the initial pre-training model.

Determining a vector difference value of the first node according to the graph base element prediction vector and the graph base element label vector of the first node, and training the initial pre-training model according to the vector difference value of the first node, specifically, determining the vector difference value according to the graph base element prediction vector and the graph base element label vector by the following formula (2);

wherein V is a preset value; c. C_uA graph primitive label vector for the first node; f (G'_u) Predicting a vector for a graph element of a first node; g is a first graph structure sample; l is the vector difference value. It should be noted that V may also be the number of nodes in the first graph structure sample. For example, if the number of nodes in the first graph structure sample is 1000, V is 1000, c_uIs a graph primitive label vector of the u-th node in the first graph structure sample, which is a 15-dimensional vector, f (G'_u) Predict vector for graph element of u-th node, also 15-dimensional vector, by | c_u-f(G'_u) And | obtaining a 15-dimensional difference vector, summing the 15 difference vectors according to the summation operation, and obtaining a vector difference value of the u-th node according to V.

And updating the initial pre-training model and the first initial neural network model according to the vector difference value until the vector difference value meets the set condition to obtain the pre-training model.

In the embodiment of the invention, the vector difference value of each node is used as an error item or an initial pre-training model loss value in a gradient descent method, so that the initial pre-training model and the first initial neural network model are propagated reversely until a loss function of the initial pre-training model meets a set condition, and the pre-training model is obtained. Or the average value of the vector difference values of a plurality of nodes can be used as an error term or an initial pre-training model loss value in the gradient descent method. For example, in combination with the example in step 330, it can be seen from the above equation 2 that a difference between a graph primitive prediction vector of a 15-dimensional first node and a graph primitive label vector of the 15-dimensional first node is determined, and then a vector difference value of the first node is obtained according to a summation operation, for example, the vector difference value of the first node is 0.9, and similarly, a vector difference value of the second node is determined to be 1.0, and vector difference values of 64 nodes are determined in total, and an average value of the vector difference values of the 64 nodes is obtained through an average value algorithm, and the average value is used as an error term or an initial pre-training model loss value in a gradient descent method.

The invention utilizes the graph base elements as the labels of the training samples of the pre-training model, and the initial pre-training model trained by utilizing the graph base elements in the graph structure has the following two benefits.

(1) High-order structure information: different primitives represent different roles that the nodes assume at the architectural level. If the graph primitive label vectors of two nodes are relatively close, the two nodes will be considered to be structurally relatively similar.

(2) Structural level: the graph elements disclose the structural information of each node in the graph structure sample, and the invention captures the structural information by using an initial pre-training model, so that the trained pre-training model can distinguish the types and the number of the graph elements and can distinguish different semantics of each node in different graph elements.

In the embodiment of the present invention, before determining the first sample sub-graph of the first node, preprocessing is further performed on a graph structure in the first training sample. Specifically, the graph structure in the first training sample is subjected to a preset algorithm, such as an orca algorithm, to remove the weight of edges between nodes in the graph structure, so that the subsequent calculation amount is reduced, and the calculation is simplified.

In the embodiment of the invention, after the pre-training model is obtained, the final model can be obtained through the third graph structure sample and the pre-training model.

Further, an initial model is constructed and comprises a pre-training model and a second initial neural network model, the third graph structure sample is input into the pre-training model aiming at any third graph structure sample in the second training sample, a second feature expression vector is obtained, the second feature expression vector is input into the second initial neural network model, a predicted value of the third graph structure sample is obtained, nodes of all the graph structure samples in the second training sample have the same attribute, and the initial model is trained according to the predicted value and the label value of the third graph structure sample until the training end condition is met.

In the embodiment of the invention, model training is carried out through the pre-training model and the second training sample with the same attribute, so that the graph neural network model is obtained, and the efficiency of model training is increased. For example, the attributes of the graph structure samples in the second training sample are all graph structure samples of communication attributes, for example, nodes are users, the attributes between the nodes are historical communication records, and a graph neural network model is obtained through the pre-training model and the graph structure samples of the communication attributes, and is used for finding out abnormal users in the graph structure of the communication attributes, where the abnormal users may be marketing users, fraudulent users, and the like.

It should be noted that, in the training process, the parameters of the pre-trained model and/or the second initial neural network model may be selectively updated for practical applications to obtain the graph neural network model.

And further, updating the second initial neural network model according to the predicted value and the label value of the third graph structure sample, or updating the pre-training model and the second initial neural network model according to the predicted value and the label value of the third graph structure sample.

In the embodiment of the invention, the training of the initial model comprises two modes, wherein the first mode comprises the following steps: the training efficiency is increased and the training time is reduced by only updating the second initial neural network model to train the initial model.

The second mode is as follows: the accuracy of the trained initial model is increased by training the initial model by updating the pre-trained model and the second initial neural network model.

The trained initial model obtained in the embodiment of the invention can be used for various tasks, such as classifying nodes in a graph structure (classifying nodes in a communication graph structure into abnormal users and normal users), classifying the graph structure (classifying a graph structure of 120 movie posters into a graph structure of action movies or a graph structure of love movies), and referring to pooling operation of a graph neural network to obtain an initial representation vector of the graph structure, such as read operation (ready).

Embodiments of the present invention illustratively provide data proofs, as described below.

For the node classification in the Graph structure, the embodiment of the present invention exemplarily provides two data sets U1 and U2, and the reference algorithms ProNE, NetMF, VERSE, GraphWave, GIN, and GCC for the node classification in the prior art, and the reference algorithms Graph2vec, DGCNN, GIN, GCC, and InfoGraph for the Graph structure classification.

In the data set U1, the nodes represent registered users, the edges represent call records of users, wherein users who have not paid for payment in time are regarded as positive samples, and users who pay for payment in time are regarded as negative samples, and the edges include 1104 nodes (123 positive samples and 981 negative samples) and 1719 edges.

In the data set U2, each node represents a client and an edge represents a call log between two clients. Where the nodes for large customers are positive samples and the nodes for common customers are negative samples, the data set U2 includes 9953 nodes (1140 large customers, 8813 common customers) and 373923 edges.

For the graph structure classification, the embodiment of the invention exemplarily provides five data sets R1, R2, R3, R4 and R5.

Data set R1: the total of 1000 pictures containing two kinds of labels (action-like movies and love-like movies) are one label for each picture.

Data set R2: containing three types of labels (comedy, love movies, and science fiction movies) for a total of 1500 pictures, one for each.

Data set R3: two kinds of tags (question-answer community and discussion community) were included, totaling 2000 graphs. The nodes in each graph represent users, and the edges between two nodes represent that one user has communicated with another user in the comment.

Data set R4: five kinds of tags (global news community, video community, animal community, etc.) were included, totaling 5000 pictures.

Data set R5: a total of 5000 plots (high energy physics, condensed physics and celestial physics) were included for the three tags. Each figure represents a respective self-centric network of a particular researcher, and the labels indicate the research areas in which the researcher is located.

The ProNE algorithm: the representative vectors of the nodes are learned by a scalable and efficient model that employs sparse matrix decomposition and propagates in spectral space. In the data proof in the present invention, the ProNE algorithm uses default parameters, i.e., setting θ to 0.5, μ to 0.2, and recursion step number 10.

NetMF algorithm: in the graph embedding learning model based on word2vec, the large frameworks of matrix decomposition are unified, including common models such as Deepwalk, node2vec, LINE and PTE. In the data proof in the present invention, the NetMF algorithm employs default parameters. For example, 256 feature pairs, using an approximate normalized graph laplacian decomposition (approximate normalized graph laplacian) to obtain a 128-dimensional representation vector.

VERSE algorithm: three kinds of similarity are considered when learning the representation vector: community structures, roles and structural equality. Then, a representative vector of the nodes is obtained through neural network learning. In the data proof of the present invention, the VERSE algorithm uses default parameters, i.e., setting α to 0.85 and learning rate to 0.0025.

GraphWave algorithm: the expression vector of the node is obtained through a wavelet diffusion mode in thermodynamics in an unsupervised mode. In the data demonstration of the present invention, the GraphWave algorithm employs the precise mechanism, i.e., the number of samples is 50 and the thermal coefficient is 1000.

GIN algorithm: the neural network of the graph is designed by means of additive pooling operations, possessing the same excellent ability to resolve different graph structures as the Weisfeiler-Leman test.

The GCC algorithm: and pre-training the model through a contrast learning task in unsupervised learning so as to capture the graph structure information.

Graph2vec algorithm: the graph is regarded as a document, the subgraphs of the graph are regarded as words, and the representation of the graph structure is learned.

DGCNN algorithm: through a partial graph convolution model and matched with a novel SortPoling layer.

InfoGraph algorithm: the representation vector is learned by maximizing the mutual information.

The following table 1 is data of each algorithm for node classification, wherein the MPT algorithm is the technical scheme of the present invention.

TABLE 1

Table 2 below shows data of each algorithm for the graph structure classification, where the MPT algorithm is a technical solution in the first mode of the present invention.

TABLE 2

Accuray	R1	R2	R3	R4	R5
						Graph2vec	0.6103	0.3467	0.7850	0.3793	0.7180
DGCNN	0.7100	0.5133	/	/	0.7040
						InfoGrpah	0.6945	0.4677	0.8095	0.4900	0.6878
GIN	0.7280	0.4705	0.7841	0.5014	0.7265
						GCC	0.6726	0.4785	0.8483	0.4454	0.7562
MPT(1)	0.6717	0.4712	0.8524	0.5110	0.7456

Table 3 below shows data of each algorithm for the graph structure classification, where the MPT algorithm is a technical solution in the second mode of the present invention.

TABLE 3

Accuray	R1	R2	R3	R4	R5
						GCC	0.7080	0.4850	0.8640	0.4740	0.7900
MPT(2)	0.7177	0.4951	0.8660	0.4962	0.8048

In table 1, table 2 and table 3, Accuray is the accuracy of the algorithm, precision is the accuracy of the algorithm, call is the recall of the algorithm (the recall is a measure of the coverage data of the algorithm), and F1 is the comprehensive evaluation index.

Wherein,

TP represents True Positive, namely the True label is Positive and the model is also judged to be Positive (Positive), FP represents False Positive, namely the True label is Negative and the model is judged to be Positive (Positive), FN represents False Negative, namely the True label is Positive and the model is judged to be Negative (Negative), TN represents True Negative, namely the True label is Negative and the model is judged to be Negative (Negative). F1 is calculated by calculating the total TP, FN and FP.

Table 1 shows the data results, and the technical solution of the present invention achieves better effects on each index on two data sets. For the F1 index, the present invention exceeded the mean values of the baseline algorithm by 10.23% and 14.31% on the U1 and U2 data sets, respectively. It proves that the downstream tasks on the basis of the two data sets, in particular the finding of abnormal fraudulent users in the data sets. For example, a fraudulent user in a graph structure will borrow money on all known people as much as possible, and the people contacted by the fraudulent user will often be unknown, so that more graph elements in a non-closed structure will appear around the fraudulent user, whereas, on the contrary, for a normal user, two people contacted by his telephone will most likely be in the same circle and also have call records, for example, a company employee calls a colleague in two companies who will most likely also have a telephone contact between them, so that more graph elements in a relatively closed structure will appear around the normal user.

For the graph classification task, as shown in table 2, the technical solution in the first mode of the present invention exceeds other basic algorithms in accuracy indexes by 5.72% and 12.55% respectively based on the R3 and R4 data sets.

As shown in FIG. 3, the accuracy of the second embodiment of the present invention is better than that of the GCC algorithm.

In addition, training the initial model using the pre-trained model can reduce the time required for training.

Based on the same technical concept, fig. 7 exemplarily shows a schematic structural diagram of an apparatus for generating a pre-training model applied to the graph learning field, which may execute a flow of a method for generating a pre-training model applied to the graph learning field according to an embodiment of the present invention.

As shown in fig. 7, the apparatus specifically includes:

a selecting module 710 for determining a first node from a first graph structure sample and determining a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; the attributes of each node of the first graph structure sample are different from the attributes of each node of the second graph structure sample in the first training sample;

a processing module 720, configured to determine, according to the first graph structure sample, a graph element tag vector of the first node and an initial representation vector of the first sample sub-graph;

Optionally, the selecting module 710 is specifically configured to:

Optionally, the processing module 720 is specifically configured to:

Optionally, the processing module 720 is further configured to:

Optionally, the processing module 720 is specifically configured to:

Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:

a memory for storing program instructions;

Based on the same technical concept, the embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to enable a computer to execute the above generation method of a pre-training model applied in the graph learning field.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CK-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A generation method of a pre-training model applied to the field of graph learning is characterized by comprising the following steps:

2. The method of claim 1, wherein determining a first node from a first graph structure sample and determining a first sample subgraph of the first node comprises:

3. The method of claim 1, wherein determining a graph element label vector for the first node from the first graph structure sample comprises:

4. The method of claim 1, wherein determining an initial representation vector for the first sample sub-graph from the first graph structure sample comprises:

5. The method of claim 1, wherein training the initial pre-training model based on the primitive prediction vectors and the primitive label vectors until the pre-training model is obtained comprises:

wherein V is a preset value; c. C_uA graph primitive label vector for the first node; f (G' u) is a graph primitive prediction vector of the first node; g is a first graph structure sample; l is the vector difference value;

6. The method of any of claims 1 to 5, wherein after obtaining the pre-trained model, further comprising:

7. The method of claim 6, wherein training the initial model based on the predicted values and label values of the third graph structure samples comprises:

updating the second initial neural network model according to the predicted value and the label value of the third graph structure sample; or

And updating the pre-training model and the second initial neural network model according to the predicted value and the label value of the third graph structure sample.

8. An apparatus for generating a pre-training model applied to the field of graph learning, comprising:

9. A computing device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory to perform the method of any of claims 1 to 7 in accordance with the obtained program.

10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 7.