WO2023000574A1 - 一种模型训练方法、装置、设备及可读存储介质 - Google Patents

一种模型训练方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2023000574A1
WO2023000574A1 PCT/CN2021/134051 CN2021134051W WO2023000574A1 WO 2023000574 A1 WO2023000574 A1 WO 2023000574A1 CN 2021134051 W CN2021134051 W CN 2021134051W WO 2023000574 A1 WO2023000574 A1 WO 2023000574A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
graph
loss value
convolutional neural
chebyshev
Prior art date
Application number
PCT/CN2021/134051
Other languages
English (en)
French (fr)
Inventor
胡克坤
董刚
赵雅倩
李仁刚
Original Assignee
浪潮(北京)电子信息产业有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮(北京)电子信息产业有限公司 filed Critical 浪潮(北京)电子信息产业有限公司
Publication of WO2023000574A1 publication Critical patent/WO2023000574A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • the present application relates to the field of computer technology, in particular to a model training method, device, equipment and readable storage medium.
  • Graph neural network simply put, is a deep learning architecture for graph-structured data, which combines end-to-end learning with inductive reasoning, and is expected to solve problems such as causal reasoning and interpretability that traditional deep learning architectures cannot handle. series of bottlenecks.
  • graph convolutional neural networks can be divided into two types based on spatial methods and spectral methods.
  • the former uses the information propagation mechanism displayed on the graph, which lacks interpretability;
  • the latter uses the Laplacian matrix of the graph as a tool, has a good theoretical basis, and is the mainstream direction of graph convolutional neural network research.
  • the current graph convolutional neural networks based on spectral methods do not perform well when applying graph vertex classification tasks, that is, the existing graph convolutional neural network-based vertex classification models perform poorly.
  • the purpose of the present application is to provide a model training method, device, device and readable storage medium to improve the performance of the vertex classification model.
  • the specific plan is as follows:
  • the present application provides a model training method, including:
  • the random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including:
  • a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;
  • the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
  • the calculating the first loss value between the first training result and the label matrix includes:
  • the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
  • said calculating a second loss value between said second training result and said first training result includes:
  • the determining the target loss value based on the first loss value and the second loss value includes:
  • the target loss value does not meet the preset convergence condition, updating the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value Network parameters, and iteratively training the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
  • the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:
  • the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
  • both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Feature transformation and graph convolution operations;
  • the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
  • the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
  • Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
  • Hl is the input data of the lth convolutional layer of the graph convolutional neural network
  • Hl+1 is the graph
  • the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network
  • is a nonlinear activation function
  • K ⁇ n is the order of the polynomial
  • n is the The number of vertices
  • ⁇ k is the coefficient of the polynomial
  • T k (x) 2xT k-1 (x)-T k-2 (x)
  • model training device including:
  • the obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
  • a sampling module configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix
  • the first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
  • the second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
  • a first calculation module configured to calculate a first loss value between the first training result and the label matrix
  • a second calculation module configured to calculate a second loss value between the second training result and the first training result
  • a determining module configured to determine a target loss value based on the first loss value and the second loss value
  • a combination module configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .
  • the present application provides a model training device, including:
  • a processor is configured to execute the computer program to implement the model training method disclosed above.
  • the present application provides a readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned disclosed model training method is implemented.
  • the present application provides a model training method, including: obtaining the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set; performing random walk and sampling based on the adjacency matrix to obtain the positive point-by-point Mutual information matrix; Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result; Input the vertex feature matrix and the positive point-by-point mutual information matrix A second Chebyshev graph convolutional neural network to output a second training result; calculate a first loss value between the first training result and the label matrix; calculate the second training result and the first the second loss value between the training results; determine the target loss value based on the first loss value and the second loss value; if the target loss value meets the preset convergence condition, the first Chebyshev The graph convolutional neural network and the second Chebyshev graph convolutional neural network are combined into a dual vertex classification model.
  • this application designs two Chebyshev graph convolutional neural networks, the first Chebyshev graph convolutional neural network is based on vertex feature matrix, adjacency matrix, and label matrix for supervised training, while the second Chebyshev graph convolutional neural network
  • the product neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss value determined based on the loss value of the two
  • the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to train a vertex classification model with better performance.
  • This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
  • model training device, equipment and readable storage medium provided by the present application also have the above-mentioned technical effects.
  • Fig. 1 is a schematic structural diagram of a graph convolutional neural network disclosed in the present application
  • Fig. 2 is a flow chart of a model training method disclosed in the present application
  • Fig. 3 is a schematic diagram of the data trend of a dual Chebyshev graph convolutional neural network disclosed in the present application;
  • FIG. 4 is a schematic diagram of a dual Chebyshev graph convolutional neural network disclosed in the present application.
  • FIG. 5 is a flow chart of a model construction and training method disclosed in the present application.
  • FIG. 6 is a schematic diagram of a model training device disclosed in the present application.
  • FIG. 7 is a schematic diagram of a model training device disclosed in the present application.
  • V represents the set of vertices
  • E represents A collection of connected edges
  • V L is a subset of V
  • the vertices in V L have assigned labels.
  • the graph vertex classification problem solves: how to infer the label of each vertex in the set V ⁇ V L of the remaining vertices.
  • a graph neural network usually consists of an input layer, one or more graph convolutional layers, and an output layer.
  • graph neural networks can be divided into graph convolutional neural networks, graph recurrent neural networks, graph autoencoders, graph generative networks, and spatiotemporal graph neural networks.
  • the graph convolutional neural network has attracted the attention of many researchers due to the great success of the traditional convolutional neural network in the fields of image processing and natural language understanding.
  • Figure 1 shows the structure of a typical graph convolutional neural network, which consists of an input layer (Input layer), two graph convolution layers (Gconv layer), and an output layer (Output layer) composition.
  • the input layer reads the n*d-dimensional vertex attribute matrix X;
  • the graph convolution layer performs feature extraction on X, and passes it to the next graph convolution layer after nonlinear activation functions such as ReLu transformation;
  • the output layer is the task Layer, to complete specific tasks such as vertex classification, clustering, etc.; the figure shows a vertex classification task layer, which outputs the category label Y of each vertex.
  • the present application provides a model training solution that can combine supervised and unsupervised learning to effectively improve the accuracy of classification, effectively reduce the computational complexity of the network, and improve classification efficiency.
  • model training method including:
  • each vertex v of G has d features, and the features of all vertices constitute the n*d-dimensional vertex feature matrix X.
  • the adjacency matrix of G is denoted as A, and the element A ij represents the weight of the connection edge between vertices i and j.
  • an n*C-dimensional label matrix Y is constructed.
  • n
  • indicates the number of all vertices in the graph
  • C indicates the number of label categories of all vertices
  • the elements of each column corresponding to the row are set to 0.
  • the Pubmed dataset contains 19,717 scientific publications in 3 categories with 44,338 citation links between publications. Publications and the links between them form a citation network, and each publication in the network uses a term frequency-inverse text frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) vector to describe the feature vector, which has 500 from a dictionary of terms.
  • the feature vectors of all documents form the feature matrix X.
  • the goal is to classify each document, randomly sample 20 instances of each category as labeled data, use 1000 instances as test data, and use the rest as unlabeled data; construct a vertex label matrix Y. According to the citation relationship between papers, construct its adjacency matrix A.
  • graph datasets can also be constructed based on proteins, graph images, etc. to classify proteins, graph images, etc.
  • the adjacency matrix A based on the random walk and random sampling techniques, the positive point-wise mutual information matrix of the globally consistent information of the coding graph can be constructed.
  • the adjacency matrix has two functions in random walk engineering. First, it represents the topological structure of the graph. According to it, it can be known which vertices are connected and can walk from one vertex to adjacent vertices; second. , is used to determine the probability of random walk, see formula (1) for details, a vertex may have multiple neighbors, in a random walk step, the walker can randomly pick one among all its neighbors.
  • random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including: based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain Context path for each vertex; randomly sample all context paths to determine the co-occurrence times of any two vertices, and construct a vertex co-occurrence matrix; based on the vertex co-occurrence matrix, calculate the vertex and context co-occurrence probability and corresponding The marginal probability of , and determine each element in the positive pointwise mutual information matrix.
  • the "co-occurrence probability of a vertex and a context” refers to: the probability pr(v i , ct j ) of a certain vertex v i appearing in a certain context ct j .
  • the probability pr(v i , ct j ) of vertex v i is included in ct j .
  • the marginal probability of vertex v i is equal to the sum of elements in row i in this matrix divided by the sum of all elements in this matrix.
  • the marginal probability of context ct j is equal to the sum of elements in column j divided by the sum of all elements in this matrix.
  • the positive point-wise mutual information matrix can be represented by P, which can encode the global consistency information of the graph, and can be determined by referring to the following content:
  • the row vector pi, : is the embedded representation of the vertex v i
  • the column vector p :, j is the embedded representation of the context ct j
  • pi j represents the probability that the vertex v i appears in the context ct j
  • the pointwise The mutual information matrix P can be obtained by random walk on the graph dataset. Specifically, consider the context ct j of vertex v j as a path ⁇ j with v j as the root node and length u, then p ij can be obtained by calculating the frequency of vertex v i appearing on the path ⁇ j .
  • each vertex in the graph data set is randomly walked with a length of u steps, and the path ⁇ representing the context of the vertex can be obtained.
  • Random sampling is performed on ⁇ to calculate the number of co-occurrences of any two vertices, and the vertex is obtained - context co-occurrence times matrix O (ie vertex co-occurrence times matrix).
  • the element o ij represents the number of times that vertex v i appears on the context ct j , that is, the path ⁇ j with vertex v j as the root node, which can be used for subsequent calculation of p ij .
  • the value of each element in the positive point-wise mutual information matrix P can be determined, thereby determining the positive point-wise mutual information matrix P.
  • the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are identical, and both include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used for Perform feature transformation and graph convolution operations on the input data;
  • the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
  • the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
  • Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
  • Hl is the input data of the lth convolutional layer of the graph convolutional neural network
  • Hl+1 is the graph
  • the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
  • is the nonlinear activation function
  • K ⁇ n is the order of the polynomial
  • n is the number of vertices in the graph dataset Number
  • ⁇ k is the coefficient of polynomial
  • T k (x) 2xT k-1 (x)-T k-2 (x)
  • ⁇ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
  • calculating the first loss value between the first training result and the label matrix includes: based on the cross-entropy principle, using the difference degree of the probability distribution between the first training result and the label matrix as the first loss value (i.e. supervised loss).
  • calculating the second loss value between the second training result and the first training result includes: calculating the difference between elements with the same coordinates in the second training result and the first training result, and The sum of squares of all differences is used as the second loss value (i.e. unsupervised loss).
  • the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition.
  • updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes: updating the first Chebyshev graph convolutional neural network according to the purpose loss value After network parameters, share the updated network parameters to the second Chebyshev graph convolutional neural network; or update the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, and update the updated The network parameters are shared to the first Chebyshev graph convolutional neural network; or after the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph Convolutional neural network.
  • the first Chebyshev graph convolutional neural network performs supervised training based on the vertex feature matrix, adjacency matrix, and label matrix
  • the second Chebyshev graph convolutional neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss determined based on the loss values of the two
  • the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model, and a vertex classification model with better performance is trained.
  • This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improves the performance of the vertex classification model.
  • the dual vertex classification model can also be called a dual Chebyshev graph convolutional neural network (DCGCN, Dual Chebyshev Graph Convolutional Neural Network).
  • DCGCN Dual Chebyshev graph convolutional neural network
  • the dual Chebyshev graph convolutional neural network includes two identical Chebyshev graph convolutional neural networks ChebyNet with shared parameters, and each ChebyNet consists of an input layer, L graph convolutional layers and an output layer.
  • ChebyNet A takes the adjacency matrix A and vertex feature matrix X of encoding graph local consistency information as input data, and outputs the vertex category label prediction matrix Z A ;
  • ChebyNet P uses the positive point-wise mutual information matrix P and vertex feature encoding graph global consistency information The feature matrix X is used as input data, and the vertex category label prediction matrix Z P is output.
  • ChebyNet A performs supervised learning based on some labeled graph vertices, and the prediction accuracy is high; under the guidance of the former (using its prediction result Z A ), ChebyNet P uses unlabeled graph vertices for unsupervised learning to improve Prediction accuracy for better vertex classification models.
  • Z A and Z P are consistent or the difference is negligible, so Z A or Z P can be used as the output of the dual Chebyshev graph convolutional neural network.
  • Figure 4 illustrates the structure of a dual Chebyshev graph convolutional neural network.
  • the convolutional layer in Figure 4 is the graph convolutional layer described below.
  • the input layer is mainly responsible for reading the graph data to be classified, including the vertex feature matrix X, the adjacency matrix A representing the topology of the graph, and the positive point-by-point mutual information matrix P that encodes the global consistency information of the graph.
  • the graph convolution operation formula is:
  • Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
  • Hl is the input data of the lth convolutional layer of the graph convolutional neural network
  • Hl+1 is the graph
  • the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
  • is the nonlinear activation function
  • K ⁇ n is the order of the polynomial
  • n is the number of vertices in the graph dataset Number
  • ⁇ k is the coefficient of polynomial
  • T k (x) 2xT k-1 (x)-T k-2 (x)
  • H1 is the vertex feature matrix X.
  • ⁇ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
  • U is given by the Laplacian matrix of the graph G A matrix composed of eigenvectors obtained by eigendecomposition; U -1 is the inverse matrix of U; ⁇ is a diagonal matrix of eigenvalues, and the elements on the diagonal are ⁇ 1 , ⁇ 2 ,..., ⁇ n .
  • ⁇ k represents the order of the polynomial, which can limit the information to propagate at most K steps at each vertex. Therefore, only K+1 parameters are required, which greatly reduces the complexity of the model training process. due to the formula The calculation of the convolution kernel matrix involves the eigendecomposition of the graph Laplacian matrix, which is computationally expensive. Therefore, on this basis, the present embodiment uses the Chebyshev polynomials to design an approximate calculation scheme, and Approximately:
  • the loss function of the dual Chebyshev graph convolutional neural network consists of two parts: the supervised learning loss ls S with labeled vertices and the unsupervised learning loss ls U for unlabeled vertices.
  • ChebyNet A takes the adjacency matrix A and the vertex feature matrix X as input for supervised learning, and compares the vertex label prediction result Z A with the known vertex label matrix Y to calculate the supervised learning loss.
  • ChebyNet P takes the positive point-wise mutual information matrix and vertex feature matrix X as input for unsupervised learning, and compares its prediction result Z P with ChebyNet A 's prediction result Z A to calculate the unsupervised learning loss.
  • the loss function of the dual Chebyshev graph convolutional neural network can be expressed as: Among them, ⁇ is a constant used to adjust the proportion of unsupervised learning loss in the entire loss function.
  • the supervised learning loss function calculates the degree of difference between the actual label probability distribution and the predicted label probability distribution of the vertex based on the principle of cross entropy; the unsupervised learning loss function calculates the sum of squares of the difference between the same coordinate elements of Z P and Z A.
  • the initialization strategy of network parameters can choose normal distribution random initialization, Xavier initialization or He Initialization initialization, etc.
  • Network parameters include feature transformation matrix ⁇ l and convolution kernel F l .
  • the network parameters can be corrected and updated according to stochastic gradient descent (StochasticGradientDescent, SGD), momentum gradient descent (MomentumGradientDescent, MGD), NesterovMomentum, AdaGrad, RMSprop and Adam (AdaptiveMomentEstimation) or batch gradient descent (BatchGradientDescent, BGD), etc., to Optimize the loss function value.
  • stochastic gradient descent StochasticGradientDescent, SGD
  • momentum gradient descent MomentumGradientDescent, MGD
  • NesterovMomentum AdaGrad
  • AdaGrad AdaGrad
  • RMSprop and Adam AdaptiveMomentEstimation
  • BGD batch gradient descent
  • the training process of the dual Chebyshev graph convolutional neural network can be carried out with reference to Figure 5, specifically including: for the graph data set G, construct the vertex feature matrix X , the positive point-by-point mutual information matrix P of the global consistency information of the encoded graph, the adjacency matrix A of the local consistency information of the encoded graph, and the vertex label matrix Y; the vertex feature matrix X and the adjacency matrix A are input into ChebyNet A , and the positive point-by-point mutual information The information matrix P and vertex feature matrix X are input into ChebyNet P , and the network parameters are updated according to the above loss function to train ChebyNet A and ChebyNet P.
  • the training ends and a dual Chebyshev graph convolutional neural network is obtained.
  • the class j it should belong to can be obtained according to the vertex label matrix Y.
  • the output feature matrix of each layer is calculated; according to the definition of the output layer, the probability Z j (1 ⁇ j ⁇ C), and calculate the loss function value according to the loss function defined above; for an unlabeled vertex v i ⁇ V U , take the category with the highest probability as the latest category of the vertex to update the vertex label matrix Y.
  • the dual Chebyshev graph convolutional neural network is composed of two Chebyshev graph convolutional neural networks with the same structure and shared parameters.
  • the two perform supervised learning and unsupervised learning respectively, which can improve the network
  • the convergence rate and prediction accuracy at the same time, the graph convolution layer is defined based on the graph Fourier transform, and the graph convolution operation is divided into two stages of feature transformation and graph convolution, which can reduce the amount of network parameters;
  • the graph convolution kernel is defined as a polynomial convolution kernel, which ensures the locality of the graph convolution calculation; in order to reduce the computational complexity, the Chebyshev polynomial is used to approximate the graph convolution.
  • this embodiment provides a training method for a dual Chebyshev graph convolutional neural network, which can solve the problem of vertex classification.
  • graph modeling is performed on the collected data set to obtain its adjacency matrix and vertex feature matrix; based on the adjacency matrix, for each vertex, a random walk of a specific length is carried out on the graph, and the resulting walk is
  • Sequence sampling obtains a positive point-by-point mutual information matrix, which represents the context information of vertices; defines the convolution operation according to the spectral graph theory, constructs the graph convolution layer for feature extraction and the output layer for vertex classification tasks, builds and trains Chebyshev graph convolutional neural network; at the end of training, classification predictions for unlabeled vertices in the graph are available.
  • this method can learn more graph topology information, including the local consistency and global consistency of each vertex, due to the design strategy of the dual graph convolutional neural network.
  • the characteristic information greatly improves the learning ability of the model; and, at the same time, using the graph topology and attribute characteristics of vertices, combined with supervised and unsupervised learning, effectively improves the accuracy of classification; with the help of Chebyshev polynomials to approximate the calculation of graph convolution, Avoiding the expensive matrix eigendecomposition operation effectively reduces the computational complexity of the network and improves the classification efficiency of the network.
  • a model training device provided in the embodiment of the present application is introduced below, and a model training device described below and a model training method described above may refer to each other.
  • model training device including:
  • Obtaining module 601 used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
  • the sampling module 602 is used to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;
  • the first training module 603 is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
  • the second training module 604 is used to input the vertex feature matrix and the positive point-by-point mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
  • the first calculation module 605 is used to calculate the first loss value between the first training result and the label matrix
  • a second calculation module 606, configured to calculate a second loss value between the second training result and the first training result
  • a determining module 607 configured to determine a target loss value based on the first loss value and the second loss value
  • the combination module 608 is configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets the preset convergence condition.
  • sampling module is specifically used for:
  • a random walk of preset length is performed on each vertex in the graph dataset to obtain the context path of each vertex;
  • the vertex and context co-occurrence probability and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
  • the first calculation module is specifically used for:
  • the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
  • the second calculation module is specifically used for:
  • the determination module is specifically used for:
  • the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
  • the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, including:
  • the updated network parameters are shared to the second Chebyshev graph convolutional neural network;
  • the updated network parameters are shared to the first Chebyshev graph convolutional neural network;
  • the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
  • both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Perform feature transformation and graph convolution operations;
  • the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
  • the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
  • Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
  • Hl is the input data of the lth convolutional layer of the graph convolutional neural network
  • Hl+1 is the graph
  • the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
  • is the nonlinear activation function
  • K ⁇ n is the order of the polynomial
  • n is the number of vertices in the graph dataset Number
  • ⁇ k is the coefficient of polynomial
  • T k (x) 2xT k-1 (x)-T k-2 (x)
  • this embodiment provides a model training device, which can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
  • model training device provided in the embodiment of the present application, and the model training device described below and the model training method and device described above may refer to each other.
  • model training device including:
  • Memory 701 used to store computer programs
  • the processor 702 is configured to execute the computer program, so as to implement the method disclosed in any of the foregoing embodiments.
  • a readable storage medium provided by an embodiment of the present application is introduced below.
  • the readable storage medium described below and the model training method, device, and equipment described above may refer to each other.
  • a readable storage medium is used to store a computer program, wherein the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor.
  • the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor.
  • the specific steps of the method reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种模型训练方法、装置、设备及可读存储介质。上述方法设计了两个切比雪夫图卷积神经网络,一个基于顶点特征矩阵、邻接矩阵、标签矩阵进行有监督训练,另一个基于顶点特征矩阵、正逐点互信息矩阵和前一个网络在训练过程中的输出,进行无监督训练;当基于二者的损失值所确定的目的损失值符合预设收敛条件时,将两个切比雪夫图卷积神经网络组合为对偶顶点分类模型,从而训练得到了性能更佳的顶点分类模型。该方法能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。

Description

一种模型训练方法、装置、设备及可读存储介质
本申请要求在2021年7月21日提交中国专利局、申请号为202110825194.9、发明名称为“一种模型训练方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种模型训练方法、装置、设备及可读存储介质。
背景技术
随着云计算、物联网、移动通信和智能终端等信息技术的快速发展,以社交网络、社区和博客为代表的新型应用得到广泛使用。这些应用不断产生大量数据,方便用图来建模分析。其中,图的顶点表示个人或团体,连接边表示他们之间的联系;顶点上通常附有标签信息,用以表示所建模对象的年龄、性别、位置、兴趣爱好和宗教信仰,以及其他许多可能的特征。这些特征从各个方面反映了个人的行为偏好,理想情况下,每个社交网络用户都附有所有与自己特征相关的标签。但现实情况却并非如此。这是因为,用户出于保护个人隐私的目的,越来越多的社交网络用户在分享个人信息时,显得更加谨慎,导致社交网络媒体仅能搜集用户的部分信息。因此,如何根据已知用户的标签信息,推测剩余用户的标签,显得尤为重要和迫切。该问题即顶点分类问题。
针对传统机器学习方法难以处理图数据的不足,学术界和工业界逐渐兴起一股图神经网络的研究热潮。图神经网络,简单地说,是一种用于图结构数据的深度学习架构,它将端到端学习与归纳推理相结合,有望解决传统深度学习架构无法处理的因果推理、可解释性等一系列瓶颈问题。
根据实现原理的不同,图卷积神经网络可分为基于空间方法的和基于谱方法的两种类型。其中,前者利用图上显示的信息传播机制,缺乏可解释性;后者以图的拉普拉斯矩阵为工具,具有良好的理论基础,是图卷积神经网络研究的主流方向。但是,目前基于谱方法的图卷积神经网络在应用图顶点分类任务时,表现并不理想,即现有的基于图卷积神经网络的顶点分类模型性能不佳。
因此,如何提高顶点分类模型的性能,是本领域技术人员需要解决的问题。
发明内容
有鉴于此,本申请的目的在于提供一种模型训练方法、装置、设备及可读存储介质,以提高顶点分类模型的性能。其具体方案如下:
第一方面,本申请提供了一种模型训练方法,包括:
获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;
基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;
将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;
将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;
计算所述第一训练结果和所述标签矩阵之间的第一损失值;
计算所述第二训练结果和所述第一训练结果之间的第二损失值;
基于所述第一损失值和所述第二损失值确定目的损失值;
若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
优选地,所述基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵,包括:
基于所述邻接矩阵,对所述图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;
对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;
基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定所述正逐点互信息矩阵中的每个元素。
优选地,所述计算所述第一训练结果和所述标签矩阵之间的第一损失值,包括:
基于交叉熵原理,将所述第一训练结果和所述标签矩阵之间的概率分布差异程度作为所述第一损失值。
优选地,所述计算所述第二训练结果和所述第一训练结果之间的第二损失值,包括:
计算所述第二训练结果和所述第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为所述第二损失值。
优选地,所述基于所述第一损失值和所述第二损失值确定目的损失值,包括:
将所述第一损失值和所述第二损失值输入损失函数,以输出所述目的损失值;
其中,所述损失函数为:ls=ls S+αls U,ls为所述目的损失值,ls S为所述第一损失值,ls U为所述第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
优选地,若所述目的损失值不符合预设收敛条件,则根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至所述目的损失值符合预设收敛条件;
其中,所述根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,包括:
根据所述目的损失值更新所述第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第二切比雪夫图卷积神经网络;
根据所述目的损失值更新所述第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第一切比雪夫图卷积神经网络;
根据所述目的损失值计算得到新网络参数后,将所述新网络参数共享至所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络。
优选地,所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;
其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
Figure PCTCN2021134051-appb-000001
第l(1≤l≤L)层图卷积层的图卷积操作公式为:
Figure PCTCN2021134051-appb-000002
其中,Q l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H l为图卷积神经网络的第l图卷积层的输入数据,H l+1为图卷积神经网络的第l图卷积层的输出数据;
Figure PCTCN2021134051-appb-000003
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为所述图数据集中的顶点个数;θ k是多项式的系数;T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式;
Figure PCTCN2021134051-appb-000004
为所述图数据集的拉普拉斯矩阵,
Figure PCTCN2021134051-appb-000005
为经过线性变换后的拉普拉斯矩阵。
第二方面,本申请提供了一种模型训练装置,包括:
获取模块,用于获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;
采样模块,用于基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;
第一训练模块,用于将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;
第二训练模块,用于将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;
第一计算模块,用于计算所述第一训练结果和所述标签矩阵之间的第一损失值;
第二计算模块,用于计算所述第二训练结果和所述第一训练结果之间的第二损失值;
确定模块,用于基于所述第一损失值和所述第二损失值确定目的损失值;
组合模块,用于若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
第三方面,本申请提供了一种模型训练设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序,以实现前述公开的模型训练方法。
第四方面,本申请提供了一种可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述公开的模型训练方法。
通过以上方案可知,本申请提供了一种模型训练方法,包括:获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;计算所述第一训练结果和所述标签矩阵之间的第一损失值;计算所述第二训练结果和所述第一训练结果之间的第二损失值;基于所述第一损失值和所述第二损失值确定目的损失值;若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
可见,本申请设计了两个切比雪夫图卷积神经网络,第一切比雪夫图卷积神经网络基于顶点特征矩阵、邻接矩阵、标签矩阵进行有监督训练,同时第二切比雪夫图卷积神经网络基于顶点特征矩阵、正逐点互信息矩阵和第一切比雪夫图卷积神经网络在训练过程中的输出,进行无监督训练;当基于二者的损失值所确定的目的损失值符合预设收敛条件时,将两个切比雪夫图卷积神经网络组合为对偶顶点分类模 型,从而训练得到了性能更佳的顶点分类模型。该方案能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。
相应地,本申请提供的一种模型训练装置、设备及可读存储介质,也同样具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请公开的一种图卷积神经网络的结构示意图;
图2为本申请公开的一种模型训练方法流程图;
图3为本申请公开的一种对偶切比雪夫图卷积神经网络的数据走向示意图;
图4为本申请公开的一种对偶切比雪夫图卷积神经网络示意图;
图5为本申请公开的一种模型构建及训练方法流程图;
图6为本申请公开的一种模型训练装置示意图;
图7为本申请公开的一种模型训练设备示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为方便理解本申请,先对图神经网络和图数据集进行介绍。
需要说明的是,用图对数据及数据之间的关系进行建模分析,具有重要的学术和经济价值。例如,(1)研究传染性疾病和思想观点等在社交网络中随着时间传播扩散的规律;(2)研究社交网络中的群体如何围绕特定利益或隶属关系形成社团,以及社团连接的强度;(3)社交网络根据“人以群分”的规律,发现具有相似兴趣的人,向他们建议或推荐新的链接或联系;(4)问答***将问题引导给最有相关经验的人;广告***向最有兴趣并愿意接受特定主题广告的个人显示广告。
因此需要根据已知用户的标签信息,推测剩余用户的标签,该问题即顶点分类问题,它可形式化地描述为:给定一个图G=(V,E),V表示顶点集合,E表示连接边的集合,V L是V的一个子集,V L中的顶点有指定的标签。图顶点分类问题解决的是:如何推断剩余顶点构成的集合V\V L中,每个顶点的标签。与传统分类问题不同,它不能直接应用传统机器学习中的分类方法,如支持向量机、k近邻、决策树和朴素贝叶斯,来解决。这是因为,传统分类方法通常假设对象是独立的,分类结果不精确。但在图顶点分类中,不同对象即顶点之间并非相互独立,相反,它们有着复杂的依赖关系,必须充分利用这些关系,来提高分类的质量。
图神经网络通常由输入层、一个或多个图卷积层,以及输出层组成。根据结构特点,图神经网络可分为图卷积神经网络、图递归神经网络、图自编码器、图生成网络和时空图神经网络。其中,图卷积神经网络由于传统的卷积神经网络在图像处理、自然语言理解等领域取得巨大成功而吸引众多学者的注意。
参见图1所示,图1展示了一个典型的图卷积神经网络的结构,它由一个输入层(Input layer)、两个图卷积层(Gconv layer),和一个输出层(Output layer)组成。其中,输入层读取n*d维的顶点属性矩阵X;图卷积层对X进行特征提取,经由非线性激活函数如ReLu变换后传递给下一个图卷积层;最后,输出层即任务层,完成特定的任务如顶点分类、聚类等;图中展示的是一个顶点分类任务层,输出每个顶点的类别标签Y。
但由于基于谱方法的图卷积神经网络在应用图顶点分类任务时,表现并不理想,其主要原因是:(1)拉普拉斯矩阵进行特征分解的计算开销较大,为O(n 3);(2)通过添加正则项定义的目标损失函数(ls=ls s+αls reg,ls S和ls reg分别表示有监督学习损失函数和基于图拓扑结构定义的正则项)依赖于“相邻顶点具有类似标签”的局部一致性假设,该假设会限制图神经网络模型的能力,因为图中的连接边并没有对节点间相似性进行编码,但其实它们可以包含附加信息的。
为此,本申请提供了一种模型训练方案,能够结合有监督和无监督学习,有效提高分类的准确度,并有效降低网络的计算复杂性,提高分类效率。
参见图2所示,本申请实施例公开了一种模型训练方法,包括:
S201、获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵。
假设待分类的图数据集为G=(V,E),V表示顶点集合,它分为少量具有类别标签的顶点集合V L和大部分无类别标签的顶点集合V U两部分,并满足V L∪V U=V,
Figure PCTCN2021134051-appb-000006
E表示连接边集合。除标签外,G的每个顶点v都拥有d个特征,所有顶点的特征构成了n*d维的顶点特征矩阵X。G的邻接矩阵记为A,元素A ij表示顶点i和j之间的连接边的权重。
根据已有标签的顶点集合V L,构建n*C维的标签矩阵Y。其中,n=|V|表示图中所有顶点个数,C表示所有顶点的标签类别数,矩阵元素Y ij表示顶点i的类别标签是否为j(j=1,2,…,C)。当顶点i已有类别标签时,置其第j列元素为1,其余列元素为0,即有:Y ij=1(k=j时)或0(k≠j时)。当顶点i为无类别标签时,将该行对应的每一列元素都置为0。
例如:基于Pubmed数据集构建图数据集。Pubmed数据集包含3个类别的19717种科学出版物,出版物之间含有44,338个引用链接。出版物及它们之间的链接形成引文网络,网络中的每个出版物都用词频-逆文本频率指数(Term Frequency-Inverse Document Frequency,TF-IDF)矢量描述特征向量,该矢量从具有500个术语的字典中得出。所有文档的特征向量组成特征矩阵X。目标是将每个文档归类,每个类别随机抽取20个实例作为标记数据,将1000个实例作为测试数据,其余用作未标记的数据;构建顶点标签矩阵Y。根据论文间的引用关系,构建其邻接矩阵A。根据A计算任意两个顶点间的转移概率;对每个顶点v j开展长度为u的随机游走得到路径π j;对π j随机采样计算顶点v i出现在路径π j上的频率P ij,进而得到正逐点互信息矩阵P。
当然,还可以基于蛋白质、图形图像等构建图数据集,以对蛋白质、图形图像等进行分类。
S202、基于邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵。
根据邻接矩阵A,基于随机游走和随机采样技术可以构造编码图全局一致信息的正逐点互信息矩阵。具体的,邻接矩阵在随机游走工程中有两种作用,第一,表征图拓扑结构,根据它可以知道哪些顶点之间有连接关系,可以从一个顶点游走到相邻的顶点;第二,用于确定随机游走的概率,详见公式(1),一个顶点可能有多个邻居,在一个随机游走步中,游走者可在它的所有邻居中随机挑一个。
在一种具体实施方式中,基于邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵,包括:基于邻接矩阵,对图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定正逐点互信息矩阵中的每个元素。
其中,“顶点与上下文共现概率”是指:某个顶点v i出现在某个上下文ct j中的概率pr(v i,ct j)。或者说,ct j中包含顶点v i的概率pr(v i,ct j)。在得到所有的顶点与上下文共现概率后,它们组成了一个矩阵,即顶点共现次数矩阵。顶点v i的边缘概率等于该矩阵中第i行元素的加和除以该矩阵中所有元素的加和。上下文ct j的边缘概率等于第j列元素的加和除以该矩阵中所有元素的加和。
正逐点互信息矩阵可以用P表示,其能够编码图全局一致性信息,具体可参照如下内容进行确定:
假设行向量pi, :是顶点v i的嵌入式表示,列向量p :,j是上下文ct j的嵌入式表示,而pi j表示顶点v i出现在上下文ct j中的概率,那么正逐点互信息矩阵P可通过对图数据集的随机游走获得。具体地说,将顶点v j的上下文ct j视为以v j为根节点、长度为u的路径π j,则p ij可通过计算顶点v i出现在路径π j上的频率得到。不失一般性,设某随机游走者时刻τ所在的图顶点编号为x(τ),且x(τ)=v i,则τ+1时刻游走到其邻居顶点v j的概率t ij用公式(1)表示为:t ij=pr(x(τ+1)=v j|x(τ)=v i)=A ij/∑ jA ij
按照公式(1)对图数据集中每个顶点开展长度为u步的随机游走,即可得到表征该顶点上下文的路径π,对π实施随机采样计算任意两个顶点的共现次数,得到顶点-上下文共现次数矩阵O(即顶点共现次数矩阵)。在该矩阵O中,元素o ij表示顶点v i出现在上下文ct j即以顶点v j为根节点的路径π j上的次数,它可用于随后计算p ij。基于顶点共现次数矩阵O计算顶点与上下文共现概率和相应的边缘概率。记顶点v i和上下文ct j的共现概率以及相应的边缘概率分别为pr(v i,ct j)、pr(v i)和pr(ctj),则有公式(2):
Figure PCTCN2021134051-appb-000007
结合公式(2),则正逐点互信息矩阵P中元素P ij的值可通过以下公式计算得到:p ij=max(log(pr(v i,ct j)/(pr(v i)pr(ct j)),0)。
据此即可确定正逐点互信息矩阵P中每个元素的值,从而确定正逐点互信息矩阵P。
S203、将顶点特征矩阵和邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果。
S204、将顶点特征矩阵和正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果。
在一种具体实施方式中,第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络完全相同,均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;
其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
Figure PCTCN2021134051-appb-000008
第l(1≤l≤L)层图卷积层的图卷积操作公式为:
Figure PCTCN2021134051-appb-000009
其中,Q l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H l为图卷积神经网络的第l图卷积层的输入数据,H l+1为图卷积神经网络的第l图卷积层的输出数据;
Figure PCTCN2021134051-appb-000010
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为图数据集中的顶点个数;θ k是多项式的系数;T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式;
Figure PCTCN2021134051-appb-000011
为图数据集的拉普拉斯矩阵,
Figure PCTCN2021134051-appb-000012
为经过线性变换后的拉普拉斯矩阵。
其中,
Figure PCTCN2021134051-appb-000013
λ max
Figure PCTCN2021134051-appb-000014
中最大的特征值,I n为n*n维的恒等矩阵。
S205、计算第一训练结果和标签矩阵之间的第一损失值。
在一种具体实施方式中,计算第一训练结果和标签矩阵之间的第一损失值,包括:基于交叉熵原理,将第一训练结果和标签矩阵之间的概率分布差异程度作为第一损失值(即有监督损失)。
S206、计算第二训练结果和第一训练结果之间的第二损失值。
在一种具体实施方式中,计算第二训练结果和第一训练结果之间的第二损失值,包括:计算第二训练结果和第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为第二损失值(即无监督损失)。
S207、基于第一损失值和第二损失值确定目的损失值。
在一种具体实施方式中,基于第一损失值和第二损失值确定目的损失值,包括:将第一损失值和第二损失值输入损失函数,以输出目的损失值;其中,损失函数为:ls=ls S+αls U,ls为目的损失值,ls S为第一损失值,ls U为第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
S208、若目的损失值符合预设收敛条件,则将第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
在一种具体实施方式中,若目的损失值不符合预设收敛条件,则根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至目的损失值符合预设收敛条件。
其中,根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,包括:根据目的损失值更新第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第二切比雪夫图卷积神经网络;或根据目的损失值更新第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第一切比雪夫图卷积神经网络;或根据目的损失值计算得到新网络参数后,将新网络参数共享至第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络。
可见,本实施例设计了两个切比雪夫图卷积神经网络,第一切比雪夫图卷积神经网络基于顶点特征矩阵、邻接矩阵、标签矩阵进行有监督训练,同时第二切比雪夫图卷积神经网络基于顶点特征矩阵、正逐点互信息矩阵和第一切比雪夫图卷积神经网络在训练过程中的输出,进行无监督训练;当基于二者的损失值所确定的目的损失值符合预设收敛条件时,将两个切比雪夫图卷积神经网络组合为对偶顶点分类模型,从而训练得到了性能更佳的顶点分类模型。该方案能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。
基于上述实施例,需要说明的是,对偶顶点分类模型也可称为对偶切比雪夫图卷积神经网络(DCGCN,Dual Chebyshev Graph Convolutional Neural Network)。为训练得到对偶切比雪夫图卷积神经网络,需要首先确定网络结构、损失函数、初始化策略、网络参数更新方式等。
1、网络结构。
对偶切比雪夫图卷积神经网络包括两个完全相同的、共享参数的切比雪夫图卷积神经网络ChebyNet,每个ChebyNet都由输入层、L个图卷积层和输出层组成。
请参见图3,记两个ChebyNet分别为ChebyNet A和ChebyNet P。ChebyNet A以编码图局部一致性信息的邻接矩阵A和顶点特征矩阵X作为输入数据,输出顶点类别标签预测矩阵Z A;ChebyNet P以编码 图全局一致性信息的正逐点互信息矩阵P和顶点特征矩阵X为作为输入数据,输出顶点类别标签预测矩阵Z P
其中,ChebyNet A根据部分有标签的图顶点进行有监督学习,预测准确度较高;ChebyNet P在前者的指导下(利用其预测结果Z A)利用无标签的图顶点进行无监督学习,以提高预测准确度,获得更好的顶点分类模型。当ChebyNet A和ChebyNet P训练结束后,Z A和Z P一致或差别可忽略不计,因此可以Z A或Z P作为对偶切比雪夫图卷积神经网络的输出。
图4示意了对偶切比雪夫图卷积神经网络的结构。图4中的卷积层即下文所述的图卷积层。
其中,输入层主要负责读取待分类图数据,包括顶点特征矩阵X、表示图拓扑结构的邻接矩阵A、编码图全局一致性信息的正逐点互信息矩阵P。
第l(1≤l≤L)图卷积层定义:为减少网络参数,将第l隐藏层图卷积操作分解为特征变换和图卷积先后两个阶段。
其中,特征变换公式为:
Figure PCTCN2021134051-appb-000015
图卷积操作公式为:
Figure PCTCN2021134051-appb-000016
其中,Q l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H l为图卷积神经网络的第l图卷积层的输入数据,H l+1为图卷积神经网络的第l图卷积层的输出数据;
Figure PCTCN2021134051-appb-000017
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为图数据集中的顶点个数;θ k是多项式的系数;T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式;
Figure PCTCN2021134051-appb-000018
为图数据集的拉普拉斯矩阵,
Figure PCTCN2021134051-appb-000019
为经过线性变换后的拉普拉斯矩阵。其中,H 1为顶点特征矩阵X。
其中,
Figure PCTCN2021134051-appb-000020
λ max
Figure PCTCN2021134051-appb-000021
中最大的特征值,I n为n*n维的恒等矩阵。
需要说明的是,
Figure PCTCN2021134051-appb-000022
Figure PCTCN2021134051-appb-000023
(记为公式)简化得到,简化过程可参照如下内容:
其中,U是由对图G的拉普拉斯矩阵
Figure PCTCN2021134051-appb-000024
进行特征分解得到的特征向量所组成的矩阵;U -1是U的逆矩阵;Λ是特征值的对角阵,对角线上的各元素分别为λ 12,…,λ n
Figure PCTCN2021134051-appb-000025
是第l层图卷积层的图卷积核矩阵,并定义为:
Figure PCTCN2021134051-appb-000026
需要说明的是,θ k表示多项式的阶数,能够限制信息在每个顶点最多传播K步。因此仅需K+1个参数,大大降低了模型训练过程的复杂度。由于公式
Figure PCTCN2021134051-appb-000027
计算卷积核矩阵时涉及到图拉普拉斯矩阵的特征分解,计算开销大。因此本实施例在此基础上,借助切比雪夫多项式设计近似计算方案,并将
Figure PCTCN2021134051-appb-000028
近似为:
Figure PCTCN2021134051-appb-000029
其中,T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式,可循环递归求解;
Figure PCTCN2021134051-appb-000030
是一个对角阵,能将特征值对角阵映射到[-1,1]。
Figure PCTCN2021134051-appb-000031
代入
Figure PCTCN2021134051-appb-000032
即可得到
Figure PCTCN2021134051-appb-000033
其中,
Figure PCTCN2021134051-appb-000034
输出层定义为
Figure PCTCN2021134051-appb-000035
Z是一个n*C维的矩阵,其每个列向量Z j表示所有顶点属于类别j的概率,即它的第k(1≤k≤n)个元素表示顶点k属于类别j(j=1,2,…,C)的概率。
2、损失函数。
对偶切比雪夫图卷积神经网络的损失函数由带标签顶点有监督学习损失ls S和无标签顶点无监督学习损失ls U两部分组成。
其中,ChebyNet A以邻接矩阵A和顶点特征矩阵X为输入,进行有监督学习,并将顶点标签预测结果Z A和已知的顶点标签矩阵Y进行比较,计算有监督学习损失。ChebyNet P以正逐点互信息矩阵和顶点特征矩阵X作为输入,进行无监督学习,并将其预测结果Z P和ChebyNet A的预测结果Z A进行比较,计算无监督学习损失。据此,对偶切比雪夫图卷积神经网络的损失函数可以表示为:
Figure PCTCN2021134051-appb-000036
Figure PCTCN2021134051-appb-000037
其中,α是一个常数,用以调节无监督学习损失在整个损失函数中所占的比例。
其中,有监督学习损失函数基于交叉熵原理,计算顶点实际标签概率分布和预测标签概率分布的差异程度;无监督学习损失函数计算Z P和Z A相同坐标元素之间差值的平方和。
3、初始化策略。
网络参数的初始化策略可以选择正态分布随机初始化、Xavier初始化或He Initialization初始化等。网络参数包含特征变换矩阵Θ l和卷积核F l
4、网络参数更新方式。
可以按照随机梯度下降(StochasticGradientDescent,SGD)、动量梯度下降(MomentumGradientDescent,MGD)、NesterovMomentum、AdaGrad、RMSprop和Adam(AdaptiveMomentEstimation)或批量梯度下降(BatchGradientDescent,BGD)等,对网络参数进行修正和更新,以优化损失函数值。
确定网络结构、损失函数、初始化策略、网络参数更新方式等内容后,对偶切比雪夫图卷积神经网络的训练过程可参照图5进行,具体包括:对于图数据集G,构造顶点特征矩阵X、编码图全局一致性信息的正逐点互信息矩阵P、编码图局部一致性信息的邻接矩阵A、顶点标签矩阵Y;将顶点特征矩阵X和邻接矩阵A输入ChebyNet A,将正逐点互信息矩阵P和顶点特征矩阵X输入ChebyNet P,并按照上述损失函数更新网络参数,以训练ChebyNet A和ChebyNet P。若损失函数值达到一个指定的较小值或迭代次数达到指定的最大值时,训练结束,得到对偶切比雪夫图卷积神经网络。此时,对于无类别标签的顶点i∈V U,可根据顶点标签矩阵Y得到其应归属的类别j。
在训练过程中,根据图卷积层的定义,结合该层输入的特征矩阵,计算每一个层的输出特征矩阵;按照输出层的定义,预测所有顶点属于每一类别j的概率Z j(1≤j≤C),并根据前述定义的损失函数计算损失函数值;对于无标签顶点v i∈V U,取概率最大的那一类别作为该顶点的最新类别,来更新顶点标签矩阵Y。
在该方案中,对偶切比雪夫图卷积神经网络由两个同结构的、共享参数的切比雪夫图卷积神经网络组成,此二者分别进行有监督学习和无监督学习,可以提高网络的收敛速率和预测准确度;同时,基于图傅里叶变换定义图卷积层,将图卷积操作分为特征变换和图卷积两个阶段,可以减少网络参数量;基于谱图理论,定义图卷积核为多项式卷积核,保证了图卷积计算的局部性;为降低计算复杂度,利用切比雪夫多项式近似计算图卷积。
可见,本实施例提供了一种对偶切比雪夫图卷积神经网络的训练方法,能够解决顶点分类问题。首先,对搜集到的数据集进行图建模,得到其邻接矩阵和顶点特征矩阵;以邻接矩阵为基础,对于每个顶点,在图上开展特定长度的随机游走,通过对产生的游走序列采样得到正逐点互信息矩阵,该矩阵表征顶点的上下文信息;根据谱图理论定义卷积操作,构造用于特征提取的图卷积层和用于顶点分类任务的输出层,搭建并训练切比雪夫图卷积神经网络;训练结束时,即可得到图中未标记顶点的分类预测结果。
与仅含有单个图卷积神经网络的分类***相比,该方法因采用对偶图卷积神经网络的设计策略,可学习到更多图拓扑结构信息,包括每个顶点的局部一致性和全局一致性信息,大大提升了模型的学习能力;并且,同时利用图拓扑结构和顶点的属性特征,结合监督和无监督学习,有效提高了分类的准确度;借助切比雪夫多项式近似计算图卷积,避免运算代价高昂的矩阵特征分解操作,有效降低了网络的计算复杂性,提高了网络的分类效率。
下面对本申请实施例提供的一种模型训练装置进行介绍,下文描述的一种模型训练装置与上文描述的一种模型训练方法可以相互参照。
参见图6所示,本申请实施例公开了一种模型训练装置,包括:
获取模块601,用于获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;
采样模块602,用于基于邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;
第一训练模块603,用于将顶点特征矩阵和邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;
第二训练模块604,用于将顶点特征矩阵和正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;
第一计算模块605,用于计算第一训练结果和标签矩阵之间的第一损失值;
第二计算模块606,用于计算第二训练结果和第一训练结果之间的第二损失值;
确定模块607,用于基于第一损失值和第二损失值确定目的损失值;
组合模块608,用于若目的损失值符合预设收敛条件,则将第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
在一种具体实施方式中,采样模块具体用于:
基于邻接矩阵,对图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;
对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;
基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定正逐点互信息矩阵中的每个元素。
在一种具体实施方式中,第一计算模块具体用于:
基于交叉熵原理,将第一训练结果和标签矩阵之间的概率分布差异程度作为第一损失值。
在一种具体实施方式中,第二计算模块具体用于:
计算第二训练结果和第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为第二损失值。
在一种具体实施方式中,确定模块具体用于:
将第一损失值和第二损失值输入损失函数,以输出目的损失值;
其中,损失函数为:ls=ls S+αls U,ls为目的损失值,ls S为第一损失值,ls U为第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
在一种具体实施方式中,若目的损失值不符合预设收敛条件,则根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至目的损失值符合预设收敛条件;
其中,根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,包括:
根据目的损失值更新第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第二切比雪夫图卷积神经网络;
根据目的损失值更新第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第一切比雪夫图卷积神经网络;
根据目的损失值计算得到新网络参数后,将新网络参数共享至第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络。
在一种具体实施方式中,第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;
其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
Figure PCTCN2021134051-appb-000038
第l(1≤l≤L)层图卷积层的图卷积操作公式为:
Figure PCTCN2021134051-appb-000039
其中,Q l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H l为图卷积神经网络的第l图卷积层的输入数据,H l+1为图卷积神经网络的第l图卷积层的输出数据;
Figure PCTCN2021134051-appb-000040
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为图数据集中的顶点个数;θ k是多项式的系数;T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式;
Figure PCTCN2021134051-appb-000041
为图数据集的拉普拉斯矩阵,
Figure PCTCN2021134051-appb-000042
为经过线性变换后的拉普拉斯矩阵。
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种模型训练装置,该装置能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。
下面对本申请实施例提供的一种模型训练设备进行介绍,下文描述的一种模型训练设备与上文描述的一种模型训练方法及装置可以相互参照。
参见图7所示,本申请实施例公开了一种模型训练设备,包括:
存储器701,用于保存计算机程序;
处理器702,用于执行所述计算机程序,以实现上述任意实施例公开的方法。
下面对本申请实施例提供的一种可读存储介质进行介绍,下文描述的一种可读存储介质与上文描述的一种模型训练方法、装置及设备可以相互参照。
一种可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述实施例公开的模型训练方法。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述 的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的可读存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种模型训练方法,其特征在于,包括:
    获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;
    基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;
    将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;
    将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;
    计算所述第一训练结果和所述标签矩阵之间的第一损失值;
    计算所述第二训练结果和所述第一训练结果之间的第二损失值;
    基于所述第一损失值和所述第二损失值确定目的损失值;
    若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
  2. 根据权利要求1所述的模型训练方法,其特征在于,所述基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵,包括:
    基于所述邻接矩阵,对所述图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;
    对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;
    基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定所述正逐点互信息矩阵中的每个元素。
  3. 根据权利要求1所述的模型训练方法,其特征在于,所述计算所述第一训练结果和所述标签矩阵之间的第一损失值,包括:
    基于交叉熵原理,将所述第一训练结果和所述标签矩阵之间的概率分布差异程度作为所述第一损失值。
  4. 根据权利要求1所述的模型训练方法,其特征在于,所述计算所述第二训练结果和所述第一训练结果之间的第二损失值,包括:
    计算所述第二训练结果和所述第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为所述第二损失值。
  5. 根据权利要求1所述的模型训练方法,其特征在于,所述基于所述第一损失值和所述第二损失值确定目的损失值,包括:
    将所述第一损失值和所述第二损失值输入损失函数,以输出所述目的损失值;
    其中,所述损失函数为:ls=ls S+αls U,ls为所述目的损失值,ls S为所述第一损失值,ls U为所述第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
  6. 根据权利要求1至5任一项所述的模型训练方法,其特征在于,
    若所述目的损失值不符合预设收敛条件,则根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至所述目的损失值符合预设收敛条件;
    其中,所述根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,包括:
    根据所述目的损失值更新所述第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第二切比雪夫图卷积神经网络;
    根据所述目的损失值更新所述第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第一切比雪夫图卷积神经网络;
    根据所述目的损失值计算得到新网络参数后,将所述新网络参数共享至所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络。
  7. 根据权利要求1至5任一项所述的模型训练方法,其特征在于,所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;
    其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
    Figure PCTCN2021134051-appb-100001
    第l(1≤l≤L)层图卷积层的图卷积操作公式为:
    Figure PCTCN2021134051-appb-100002
    其中,Q l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H l为图卷积神经网络的第l图卷积层的输入数据,H l+1为图卷积神经网络的第l图卷积层的输出数据;
    Figure PCTCN2021134051-appb-100003
    是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为所述图数据集中的顶点个数;θ k是多项式的系数;T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式;
    Figure PCTCN2021134051-appb-100004
    为所述图数据集的拉普拉斯矩阵,
    Figure PCTCN2021134051-appb-100005
    为经过线性变换后的拉普拉斯矩阵。
  8. 一种模型训练装置,其特征在于,包括:
    获取模块,用于获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;
    采样模块,用于基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;
    第一训练模块,用于将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;
    第二训练模块,用于将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;
    第一计算模块,用于计算所述第一训练结果和所述标签矩阵之间的第一损失值;
    第二计算模块,用于计算所述第二训练结果和所述第一训练结果之间的第二损失值;
    确定模块,用于基于所述第一损失值和所述第二损失值确定目的损失值;
    组合模块,用于若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。
  9. 一种模型训练设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序,以实现如权利要求1至7任一项所述的模型训练方法。
  10. 一种可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的模型训练方法。
PCT/CN2021/134051 2021-07-21 2021-11-29 一种模型训练方法、装置、设备及可读存储介质 WO2023000574A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110825194.9 2021-07-21
CN202110825194.9A CN113705772A (zh) 2021-07-21 2021-07-21 一种模型训练方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2023000574A1 true WO2023000574A1 (zh) 2023-01-26

Family

ID=78650163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134051 WO2023000574A1 (zh) 2021-07-21 2021-11-29 一种模型训练方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113705772A (zh)
WO (1) WO2023000574A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364372A (zh) * 2020-10-27 2021-02-12 重庆大学 一种有监督矩阵补全的隐私保护方法
CN116109195A (zh) * 2023-02-23 2023-05-12 深圳市迪博企业风险管理技术有限公司 一种基于图卷积神经网络的绩效评估方法及***
CN116129206A (zh) * 2023-04-14 2023-05-16 吉林大学 图像解耦表征学习的处理方法、装置及电子设备
CN116405100A (zh) * 2023-05-29 2023-07-07 武汉能钠智能装备技术股份有限公司 一种基于先验知识的失真信号还原方法
CN117391150A (zh) * 2023-12-07 2024-01-12 之江实验室 一种基于分层池化图哈希的图数据检索模型训练方法
CN117540828A (zh) * 2024-01-10 2024-02-09 中国电子科技集团公司第十五研究所 作训科目推荐模型训练方法、装置、电子设备和存储介质
CN117971356A (zh) * 2024-03-29 2024-05-03 苏州元脑智能科技有限公司 基于半监督学习的异构加速方法、装置、设备及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705772A (zh) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 一种模型训练方法、装置、设备及可读存储介质
CN114360007B (zh) * 2021-12-22 2023-02-07 浙江大华技术股份有限公司 人脸识别模型训练、人脸识别方法、装置、设备及介质
CN114528994A (zh) * 2022-03-17 2022-05-24 腾讯科技(深圳)有限公司 一种识别模型的确定方法和相关装置
CN114707641A (zh) * 2022-03-23 2022-07-05 平安科技(深圳)有限公司 双视角图神经网络模型的训练方法、装置、设备及介质
CN114490950B (zh) * 2022-04-07 2022-07-12 联通(广东)产业互联网有限公司 编码器模型的训练方法及存储介质、相似度预测方法及***
CN114943324B (zh) * 2022-05-26 2023-10-13 中国科学院深圳先进技术研究院 神经网络训练方法、人体运动识别方法及设备、存储介质
CN115858725B (zh) * 2022-11-22 2023-07-04 广西壮族自治区通信产业服务有限公司技术服务分公司 一种基于无监督式图神经网络的文本噪声筛选方法及***
CN116071635A (zh) * 2023-03-06 2023-05-05 之江实验室 基于结构性知识传播的图像识别方法与装置
CN116089652B (zh) * 2023-04-07 2023-07-18 中国科学院自动化研究所 视觉检索模型的无监督训练方法、装置和电子设备
CN116402554B (zh) * 2023-06-07 2023-08-11 江西时刻互动科技股份有限公司 一种广告点击率预测方法、***、计算机及可读存储介质
CN116431816B (zh) * 2023-06-13 2023-09-19 浪潮电子信息产业股份有限公司 一种文献分类方法、装置、设备和计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention
CN112464057A (zh) * 2020-11-18 2021-03-09 苏州浪潮智能科技有限公司 一种网络数据分类方法、装置、设备及可读存储介质
CN112925909A (zh) * 2021-02-24 2021-06-08 中国科学院地理科学与资源研究所 一种考虑局部不变性约束的图卷积文献分类方法及***
CN113705772A (zh) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 一种模型训练方法、装置、设备及可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention
CN112464057A (zh) * 2020-11-18 2021-03-09 苏州浪潮智能科技有限公司 一种网络数据分类方法、装置、设备及可读存储介质
CN112925909A (zh) * 2021-02-24 2021-06-08 中国科学院地理科学与资源研究所 一种考虑局部不变性约束的图卷积文献分类方法及***
CN113705772A (zh) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 一种模型训练方法、装置、设备及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHUANG CHENYI [email protected]; MA QIANG [email protected]: "Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification", THE WEB CONFERENCE 2018, INTERNATIONAL WORLD WIDE WEB CONFERENCES STEERING COMMITTEE, REPUBLIC AND CANTON OF GENEVASWITZERLAND, 23 April 2018 (2018-04-23) - 27 April 2018 (2018-04-27), Republic and Canton of GenevaSwitzerland , pages 499 - 508, XP058652837, ISBN: 978-1-4503-5640-4, DOI: 10.1145/3178876.3186116 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364372A (zh) * 2020-10-27 2021-02-12 重庆大学 一种有监督矩阵补全的隐私保护方法
CN116109195A (zh) * 2023-02-23 2023-05-12 深圳市迪博企业风险管理技术有限公司 一种基于图卷积神经网络的绩效评估方法及***
CN116129206A (zh) * 2023-04-14 2023-05-16 吉林大学 图像解耦表征学习的处理方法、装置及电子设备
CN116405100A (zh) * 2023-05-29 2023-07-07 武汉能钠智能装备技术股份有限公司 一种基于先验知识的失真信号还原方法
CN116405100B (zh) * 2023-05-29 2023-08-22 武汉能钠智能装备技术股份有限公司 一种基于先验知识的失真信号还原方法
CN117391150A (zh) * 2023-12-07 2024-01-12 之江实验室 一种基于分层池化图哈希的图数据检索模型训练方法
CN117391150B (zh) * 2023-12-07 2024-03-12 之江实验室 一种基于分层池化图哈希的图数据检索模型训练方法
CN117540828A (zh) * 2024-01-10 2024-02-09 中国电子科技集团公司第十五研究所 作训科目推荐模型训练方法、装置、电子设备和存储介质
CN117540828B (zh) * 2024-01-10 2024-06-04 中国电子科技集团公司第十五研究所 作训科目推荐模型训练方法、装置、电子设备和存储介质
CN117971356A (zh) * 2024-03-29 2024-05-03 苏州元脑智能科技有限公司 基于半监督学习的异构加速方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113705772A (zh) 2021-11-26

Similar Documents

Publication Publication Date Title
WO2023000574A1 (zh) 一种模型训练方法、装置、设备及可读存储介质
Bhagat et al. Node classification in social networks
CN110347932B (zh) 一种基于深度学习的跨网络用户对齐方法
Li et al. Restricted Boltzmann machine-based approaches for link prediction in dynamic networks
CN114048331A (zh) 一种基于改进型kgat模型的知识图谱推荐方法及***
CN112015868B (zh) 基于知识图谱补全的问答方法
Li et al. Image sentiment prediction based on textual descriptions with adjective noun pairs
WO2022252458A1 (zh) 一种分类模型训练方法、装置、设备及介质
Ma et al. Joint multi-label learning and feature extraction for temporal link prediction
Komkhao et al. Incremental collaborative filtering based on Mahalanobis distance and fuzzy membership for recommender systems
CN112131261B (zh) 基于社区网络的社区查询方法、装置和计算机设备
CN114943017B (zh) 一种基于相似性零样本哈希的跨模态检索方法
CN112925857A (zh) 基于谓语类型预测关联的数字信息驱动的***和方法
Wang et al. Efficient multi-modal hypergraph learning for social image classification with complex label correlations
Wang et al. Graph active learning for GCN-based zero-shot classification
Berton et al. Rgcli: Robust graph that considers labeled instances for semi-supervised learning
Wang et al. Link prediction in heterogeneous collaboration networks
Sun et al. Graph force learning
Zhou et al. Unsupervised multiple network alignment with multinominal gan and variational inference
Drakopoulos et al. Self organizing maps for cultural content delivery
CN113515519A (zh) 图结构估计模型的训练方法、装置、设备及存储介质
CN117349494A (zh) 空间图卷积神经网络的图分类方法、***、介质及设备
CN116861923A (zh) 多视图无监督图对比学习模型构建方法、***、计算机、存储介质及应用
Mishra et al. Unsupervised functional link artificial neural networks for cluster Analysis
Park et al. Multi-attributed graph matching with multi-layer random walks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21950812

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE