CN116610874A

CN116610874A - Cross-domain recommendation method based on knowledge graph and graph neural network

Info

Publication number: CN116610874A
Application number: CN202310660348.2A
Authority: CN
Inventors: 赵中楠; 周舟; 刘文靖
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-08-18

Abstract

The invention discloses a cross-domain recommendation method based on a knowledge graph and a graph neural network, which is characterized by comprising the following steps of: the method comprises the following steps: s1: normalizing the initial parameters; s2: distributing intra-domain embedding for users and items; s3: calculating the inter-domain initial embedding of the user and the item; s4: calculating information embedding propagated on the graph based on the cross-domain graph; s5: computing information embedding aggregated on the graph based on the cross-domain graph; s6: calculating the final inter-domain embedding of the user and the item; s7: and calculating a matching value between the user and the project and recommending. The invention can integrate multi-domain information to carry out cross-domain graph modeling, carries out feature learning from the semantic information and structure information layers of the graph, carries out matching value calculation by learning intra-domain embedding and inter-domain embedding of users and projects, and finally carries out project recommendation on the users by using Bayesian pairwise sequencing. The method and the device can be applied to various cross-domain recommendation scenes, such as e-commerce recommendation and the like.

Description

Cross-domain recommendation method based on knowledge graph and graph neural network

Technical field

The invention provides a cross-domain recommendation method based on a knowledge graph and a graph neural network, which aims at the problems of sparse data and cold start widely existing in the traditional recommendation method, and performs cross-domain recommendation in a combined guidance mode by utilizing composite graph information of the knowledge graph and the graph neural network, so that the method has a good recommendation effect.

Background

The recommendation system is used as a method for screening information, can provide items possibly needed for users by using user data, effectively solves the problem of information overload, and is widely used in a plurality of fields such as electronic commerce, advertisements, movies and the like. The knowledge graph is a general formal description framework of semantic knowledge, and a recommendation system based on the knowledge graph utilizes entities in the real world and utilizes a triplet structure of the knowledge graph to construct links for recommendation items, so that interpretability is provided for recommendation. The recommendation system based on deep learning can learn deep representation of more abstract and dense users and projects by using a deep learning model, and simultaneously build a prediction model by using a deep neural network structure, can learn cross features better, and enhances the expression capability and generalization performance of the recommendation model. The cross-domain recommendation method can utilize the information rich in source domain to help improve the recommendation accuracy of users in destination domain, can generally relieve the problems of data sparseness and cold start, and can also better improve the problem of information cocoons.

Disclosure of Invention

The invention aims to: for a recommendation system, the existing auxiliary information has the characteristics of large scale, heterogeneous data, sparse data, uneven distribution and the like, and the application of the existing recommendation method to the auxiliary information is generally limited to a text learning stage, and the application of knowledge in the auxiliary information is insufficient, so that the recommendation result is inaccurate. Aiming at the problems, a cross-domain recommendation model based on the fusion of the composite graph information of the knowledge graph and the graph neural network and the multiple domains is provided, and the cross-domain recommendation on the multiple domains is realized by combining the semantic information and the structural information of the graph.

To achieve the above object, as shown in fig. 1, which is a basic flowchart of the present invention, the method includes the steps of:

s1: normalizing the initial parameters;

s2: distributing intra-domain embedding for users and items;

s3: calculating the inter-domain initial embedding of the user and the item;

s4: calculating information embedding propagated on the graph based on the cross-domain graph;

s5: computing information embedding aggregated on the graph based on the cross-domain graph;

s6: calculating the final inter-domain embedding of the user and the item;

s7: and calculating a matching value between the user and the project and recommending.

The initial parameters in the step S1 include an entity embedding dimension D, a regularization parameter λ, a number m of samples of batch gradient descent, a training iteration number T, a depth of attention layer D of the graph neural network, a dimension D' of each layer of the graph neural network, and a parameter W of the graph neural network.

In step S2, for each domain, a domain-by-domain insert is assigned to the user and the item in the domain according to the user and the item ID, and for domain x, the domain-by-domain insert of the user and the item in the domain is denoted as E _u(x) and E_i(x) Where x= { a, B,..x }.

In the step S3, domain knowledge graphs are constructed by using domain project information, a graph is constructed by using the user, the project and the inter-project knowledge graphs together, and a triplet score is used for entities in the graph to obtain inter-domain initial embedding of the entities in the triplet (h, r, t), wherein the calculation formula is as follows

wherein ,W_r ∈R ^k×d Is a transformation matrix of the relation r,represents L ₂ Regularization to prevent over-fitting problems;

the calculation formula of the inter-domain knowledge graph loss function of the definition domain x is as follows

L _KG(x) ＝∑ _{(h,r,t,t')∈T} -lnσ(g(h,r,t')-g(h,r,t)) (2)

wherein ,(h, r, t') is a pseudo triplet constructed by randomly replacing tail entities, G is a constructed cross-domain map, σ is non-The function is activated linearly.

In the information embedding propagated on the cross-domain graph in the step S4, the information embedding calculation formula of the neighbor nodes is as follows

Where pi (h, r, t) is the attention coefficient embedded in the propagation layer, N _h = { (h, R, t) | (h, R, t) ∈g } represents all triplet sets with node h as head entity, head entity h and tail entity t satisfy { h, t|h, t E '}, E' is all entity sets in cross-domain graph G, relationship R satisfies { r|r E R }, R is all relationship sets in cross-domain graph G;

the initial attention coefficient calculation formula is defined as follows

π(h,r,t)＝(W _r e _t ) ^T tanh(W _r e _h +e _r ) (4)

Wherein tanh is a nonlinear activation function, W _r Is a transformation matrix on the relation r in the embedded layer;

the normalized attention coefficient calculation formula is defined as follows

Where (h, r ', t') is the set of all triples.

In the step S5 of embedding the aggregated information in the cross-domain map, the calculation formula of the entity aggregation embedding is as follows

Wherein the superscript (n) indicates that the embedding is an nth order embedding, the superscript (n-1) indicates that the embedding is an nth-1 order embedding, and f (·) represents an aggregator;

the aggregator calculation formula is defined as follows

Wherein LeakyReLU is an activation function, W _r ∈R ^d'×d Is a trainable weight matrix, d' is the vector dimension after linear transformation.

The final inter-domain embedded calculation formula of the user and the item in the step S6 is as follows

Wherein || represents a stitching operation;

the inter-domain collaborative filtering loss function calculation formula defining the domain x is as follows

Wherein Ω= { (u, I, j) | (u, I) ∈i+, (u, j) ∈i- } represents the training set, i+ represents the positive sample of the user u's interactions with item I, I-represents the negative sample of no interactions, σ is a nonlinear activation function,is the scoring value between user u and item i in field x.

The calculation formula of the matching value between the user and the item in the step S7 is as follows

Wherein x= { a, B, C,. -%, X };

the final loss function of domain x is defined as follows

wherein ,is the set of parameters of the model, E is the embedded vector of all entities and relationships, W _r Is a transformation matrix in a specific relation r, lambda is L ₂ Regularization parameters.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a diagram of intra-domain embedding acquisition in accordance with an embodiment of the present invention

Fig. 3 is a process of obtaining inter-domain embedding in an embodiment of the present invention.

Detailed description of the preferred embodiments

In order to make the technical solution in the embodiments of the present invention clear and completely described, the present invention is further described in detail below.

Taking a double data set as an example, the double data set contains user project interaction data and project data of three domains of films, music and books, and a sharing user exists in each domain, so that the data set is selected for model training and testing.

Step one: the model initial parameters are normalized. The value of the physical embedding dimension D is 64, the value of the regularization parameter lambda is 0.00001, the value of the number m of samples with batch gradient descent is 1024, the value of the training iteration round number T is 100, the value of the attention layer depth D of the graph neural network is 3, the value of the dimension D' of each layer of the graph neural network is [64,32,16], and the parameter W of the graph neural network is an initialization random value.

Step two: intra-domain embedding is allocated for users and items.

For each domain in the Douban dataset, a domain insert is assigned to the user and item in that domain based on the user and item ID. For the movie domain, the intra-domain embedding of users and items on that domain is denoted as E _u(A) and E_i(A) For the music domain, the intra-domain embeddings of users and items on that domain are denoted as E _u(B) and E_i(B) For the book domain, the intra-domain embeddings of users and items on that domain are denoted as E _u(C) and E_i(C) 。

Step three: the initial embedding of users and items between domains in three domains of movies, music and books is calculated.

For a movie domain, entity extraction and relationship construction are carried out on movie names, directors, brief introduction, drama, publishing countries, languages and actors; for a music domain, extracting entities and constructing relations of song names, singers, publishing companies and labels; and for the book domain, entity extraction and relation construction are carried out on the book name, the author, the brief introduction, the translator and the publishing company, and finally, the project knowledge maps of three projects are constructed.

Combining the project knowledge graph and the user project interaction information, constructing a graph structure depending on the sharing user, calculating scores for triples on the graph, and training by using loss functions. As shown in fig. 3, user u ₁ The resulting inter-domain initial embedding isItem i ₅ The resulting inter-domain initial embedding is +.>

Step four: information embedding propagated on the graph is calculated based on the cross-domain graph, as shown in part in fig. 3.

For user node u ₁ The neighbor nodes of the node include i ₂ 、i ₃ Calculate u ₁ Neighbor embedding of (c) is as follows

Two initial attention coefficients are calculated, the calculation process is as follows

π(u ₁ ,r ₁ ,i ₂ )＝(W _r i ₂ ) ^T tanh(W _r u ₁ +r ₁ ) (2)

π(u ₁ ,r ₁ ,i ₃ )＝(W _r i ₃ ) ^T tanh(W _r u ₁ +r ₁ ) (3)

Then, the normalization calculation is carried out, and the calculation process is as follows

For item node i ₅ The neighbor nodes of the node include u ₃ 、u ₄ 、e ₃ Calculate i ₅ Neighbor embedding of (c) is as follows

Three initial attention coefficients were calculated, the calculation procedure being as follows

v(i ₅ ,r ₂ ,e ₃ )＝(W _r e ₃ ) ^T tanh(W _r i ₅ +r ₂ ) (7)

π(i ₅ ,r ₃ ,u ₃ )＝(W _r u ₃ ) ^T ranh(W _r i ₅ +r ₃ ) (8)

π(i ₅ ,r ₃ ,u ₄ )＝(W _r u ₄ ) ^T tanh(W _r i ₅ +r ₃ ) (9)

Step five: the information embedding aggregated on the graph is calculated based on the cross-domain graph, as partially shown in fig. 3.

For user node u ₁ Information embedding after aggregation of the neighbor nodes is calculated, and the calculation process is as follows

For item node i ₅ Information embedding after aggregation of the neighbor nodes is calculated, and the calculation process is as follows

Step six: the final inter-domain embedding of the user and the item is calculated as partially shown in fig. 3.

For user node u ₁ And item node i ₅ The final inter-domain embedding calculation process is as follows

Step seven: and calculating a matching value between the user and the project and recommending. For user node u ₁ And item node i ₅ The matching value calculation process of (2) is as follows

And finally, sorting according to the matching value between the user and the item to obtain a final recommendation result.

The present invention is not limited to the specific embodiments described above, but is to be construed as being limited to the preferred embodiments of the present invention, and any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims

1. The cross-domain recommendation method based on the knowledge graph and the graph neural network is characterized by comprising the following steps of: the method comprises the following steps:

s1: normalizing the initial parameters;

s2: distributing intra-domain embedding for users and items;

s3: calculating the inter-domain initial embedding of the user and the item;

s6: calculating the final inter-domain embedding of the user and the item;

2. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: the initial parameters in the step S1 include an entity embedding dimension D, a regularization parameter λ, a number m of samples of batch gradient descent, a training iteration number T, a depth of attention layer D of the graph neural network, a dimension D' of each layer of the graph neural network, and a parameter W of the graph neural network.

3. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: each domain in the step S2 allocates a domain-embedded to the user and the item in the domain according to the user and the item ID, and for the domain x, the domain-embedded of the user and the item in the domain is denoted as E _u(x) and E_i(x) Where x= { a, B,..x }.

4. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: in the step S3, the inter-domain initial embedding process for calculating the user and the project is performed by firstly constructing a domain knowledge graph by using project information of the domain, constructing a graph by using the user, the project and the inter-project knowledge graph together, and obtaining the inter-domain initial embedding of the entity by using the triplet score for the entity in the graph, wherein the calculation formula is as follows

wherein ,W_r ∈R ^k×d Is a transformation matrix of the relation r,represent L2 regularization to prevent overfitting issues;

L _KG(x) ＝∑ _{(h，r，t，t′)∈T} -lnσ(g(h，r，t′)-g(h，r，t)) (2)

wherein ,(h, r, t') is a pseudo triplet of randomly substituted tail entity constructs, G is a constructed cross-domain map, σ is a nonlinear activation function.

5. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: in the information embedding propagated on the cross-domain graph in the step S4, the information embedding calculation formula of the neighbor node is as follows

the initial attention coefficient calculation formula is defined as follows

π(h，r，t)＝(W _r e _t ) ^T tanh(W _r e _h +e _r ) (4)

the normalized attention coefficient calculation formula is defined as follows

Where (h, r ', t') is the set of all triples.

6. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: in the step S5 of embedding the aggregated information in the cross-domain map, the calculation formula of the entity aggregation embedding is as follows

the aggregator calculation formula is defined as follows

7. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: the final inter-domain embedded calculation formula of the user and the item in the step S6 is as follows

Wherein || represents a stitching operation;

Wherein Ω= { (u, I, j) | (u, I) ∈i ⁺ ,(u,j)∈I ^- The training set, I ⁺ A positive sample representing the interaction between user u and item I, I ^- Representing a negative sample of no interaction, sigma is a nonlinear activation function,is the scoring value between user u and item i in field x.

8. The knowledge-graph and graph neural network-based cross-domain recommendation method as claimed in claim 1, wherein: the calculation formula of the matching value between the user and the item in the step S7 is as follows

Wherein x= { a, B, C,. -%, X };

the final loss function of domain x is defined as follows

wherein ,is a set of parameters of the model, E is all entities and relationshipsEmbedding vectors, W _r Is a transformation matrix in a specific relation r, lambda is L ₂ Regularization parameters.