CN114925270A

CN114925270A - Session recommendation method and model

Info

Publication number: CN114925270A
Application number: CN202210497302.9A
Authority: CN
Inventors: 曾碧卿; 池俊龙
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-19
Anticipated expiration: 2042-05-09
Also published as: CN114925270B

Abstract

The invention relates to a conversation recommendation method, which comprises the following steps: s1: constructing a conversation graph according to the conversation sequence; s2: aggregating L-order neighbor node features according to the session graph to obtain an article embedding vector; wherein L is a positive integer; s3: capturing sequence information of the object embedded vector to obtain a conversation sequence feature vector; s4: calculating current interest and global interest according to the conversation sequence feature vector, and performing interest fusion calculation according to the current interest and the global interest to obtain a final conversation expression vector; s5: calculating the click probability of the candidate articles according to the final session expression vector, and outputting K candidate articles with the highest click probability as final recommended articles; wherein K is a positive integer. The session recommendation method has the advantages of simplicity, high efficiency, high accuracy, stable performance and high running speed.

Description

Session recommendation method and model

Technical Field

The invention relates to the technical field of recommendation systems, in particular to a session recommendation method and a session recommendation model.

Background

The explosive growth of the internet information quantity brings the problem of information overload, and stops the forward pace of the big data era. Currently, the most effective method for solving the information overload problem is personalized recommendation, such as movie recommendation, music recommendation, and commodity recommendation.

Most of the traditional recommendation methods recommend based on the user identification and long-term historical interaction records. However, in many scenarios, the above information of the user is not available. For example, some privacy-conscious users prefer to view items anonymously by accessing a website; in addition, some e-commerce platforms also allow users to access websites anonymously, requiring users to log in only when they purchase items. When a user is in an anonymous access state, a website is difficult to accurately recommend the user through a traditional recommendation technology based on long-term interaction history of the user, only global popular articles and other articles which meet the taste of the public can be recommended to the user, accurate personalized recommendation for personal interests of the user cannot be achieved, and user experience and website retention rate are reduced.

Therefore, a Session-Based Recommendation method (SBR) is proposed, which well solves the Recommendation problem under the anonymous condition of the user, and simultaneously improves the experience of the user and increases the retention rate of the platform. A Session (Session) refers to a series of interactive behaviors generated on a website from the time a user enters the website to the time the user exits the website. Because the user's clicking process on the website is time-ordered, the items clicked by the user in this time constitute a conversation sequence. The session-based recommendation is to mine potential interest preferences of the user according to a session sequence generated by the user in the current session process, and predict an article which the user may click next time, so as to realize accurate personalized recommendation.

In a related study of the conversational-based recommendation algorithm, some previous methods model conversational sequences using Markov Chains (MC), Recurrent Neural Networks (RNN) to capture the user's interest preferences. However, the method based on the markov chain is a traditional method, is simple and quick, but has poor effect and can only be used for some simple scenes. Although the cyclic neural network-based method can capture the dependency relationship of the session sequence, better recommendation is made. However, due to the serial structure, parallelization is difficult, the operation efficiency is low, and the application to a conversation recommendation scene with high real-time requirement is difficult.

Recently, with the rise and popularity of Graph Neural Networks, some scholars have suggested introducing Graph Neural Networks (GNNs) into session recommendation tasks, constructing a sequence of sessions as a Graph structure, and then learning item embedding of sessions using Graph Neural Networks to model user preference representations. Wu et al (Wu S, Tang Y, Zhu Y, et al, Session-Based Recommendation with the AAAI Conference on Artificial Intelligence insight. 2019, 33 (01): 346-353.) proposed a Session-Based Recommendation with the AAAI Recommendation, SR-GNN, model that was the first to apply GNN to conversational Recommendation tasks. However, existing Graph Neural network-based session recommendation methods, such as SR-GNN, FGNN (Qiau R, Li J, Huang Z, et al. Retening the item order in session-based recommendation with Graph Neural network [ C ]// Proceedings of the 28th ACM International Conference Information and Knowledge management.2019: 579. 588.), TAGNN (Targetable Grating Neural network; Yu F, Zhu Y, Liu Q, et al. TAGNN: Targetable Neural network for session-based recommendation [ C ]// Proceedings 43. mapping of the location of the item in session, and the like, are all based on the long-term dependency of the session recommendation Information and the location of the item in session, and the like, and there are no long-term dependency of the session recommendation Information sequences 2020: the former is expressed as that the click sequence of all the items in the conversation can not be obtained from the conversation graph modeling; the latter is manifested as a long-term memory failure, and the sequence information of the session initiation is almost forgotten. This may result in poor model expression capability of the neural network for session recommendation, inaccurate session representation, and reduced performance and effectiveness of session recommendation.

The location information and long-term dependencies of the session sequence are crucial for accurate prediction of user preferences, and researchers have therefore also worked on this aspect. For example, Xu et al (Xu C, Zhao P, Liu Y, et al, Graph-contained Self-Attention Network for Session-based Recommendation [ C ]// IJCAI.2019, 19: 3940-; ye et al (Ye R, Zhang Q, Luo H. Cross-Session Aware Temporal connectivity [ C ]//2020 International Conference Data reduction works (ICDMW). IEEE, 2020: 220 @ 226.) propose a Cross-Session-Aware Temporal convolution Network (CA-TCN) model, which utilizes a Temporal Convolution Network (TCN) to overcome the disadvantages of loss of sequence order position information and long term dependency in the graph neural Network method.

However, the self-attention mechanism of the GC-SAN model cannot sense sequence position information, additional position coding needs to be introduced, the effect of the model is limited by the quality of the position coding, the performance is not stable enough, and the GC-SAN model is quadratic in time complexity (O (N) ² ) The computational overhead is large. On the other hand, a single convolution layer of the CA-TCN model can only capture the local dependency of the sequence, and the long-term dependency of the sequence can be captured only by stacking a plurality of layers, and the longer the length of the sequence is, the more the number of TCN layers needs to be stacked, which causes the problems of complex calculation process and low operation speed.

Disclosure of Invention

Based on this, the invention aims to provide a conversation recommendation method which has the advantages of simplicity, high efficiency, high accuracy, stable performance and high running speed.

The invention is realized by the following technical scheme:

a session recommendation method comprising the steps of:

s1: constructing a conversation graph according to the conversation sequence;

s2: aggregating L-order neighbor node features according to the session graph to obtain an article embedding vector; wherein L is a positive integer;

s3: capturing sequence information of the object embedded vector to obtain a conversation sequence feature vector;

s4: calculating current interest and global interest according to the conversation sequence feature vector, and performing interest fusion calculation according to the current interest and the global interest to obtain a final conversation expression vector;

s5: calculating the click probability of all candidate articles according to the final session expression vector, and outputting K candidate articles with the highest click probability as final recommended articles; wherein K is a positive integer.

According to the conversation recommendation method, the high-order conversion relation among the articles can be learned by aggregating the L-order neighbor node characteristics in the conversation graph, and a more accurate article embedding vector is obtained; meanwhile, the position information and the long-term dependence relationship of the conversation sequence can be captured quickly, and more accurate conversation sequence characteristic vectors are obtained, so that the conversation recommendation performance and effect are improved.

Further, step S3 is specifically:

capturing sequence information of the object embedding vector at one time by adopting a gated multilayer perceptron algorithm, thereby calculating to obtain the conversation sequence feature vector; wherein the gated multi-layered perceptron performs the steps of:

s31: adopting a space projection matrix to capture sequence position information of the object embedding vector at one time, and fusing the sequence position information and the object embedding information to obtain a session sequence intermediate vector;

s32: residual error connection is carried out on the article embedding vector and the session sequence intermediate vector to obtain the session sequence feature vector;

s33: judging whether the calculation times of the conversation sequence feature vector reach preset cycle times, if not, taking the conversation sequence feature vector as the article embedding vector, returning to the step S31, and if so, directly outputting the conversation sequence feature vector; wherein the preset cycle number is set as N; wherein N is a positive integer.

Further, the calculation process of the session sequence intermediate vector is as follows:

z _s ＝Dropout[SGU(z _h )W ₄ +b ₄ ] (12)

in the formula, z _s Representing the intermediate vector, z, of said session sequence _h Representing a session embedding vector; h is _f Representing the embedded vector of the item(s),

an item embedding vector representing layer normalization; LayerNorm () stands for LayerNorm regularization technique, Dropout () stands for Dropout regularization technique, GeLU () stands for GeLU activation function, SGU (z) _h ) Representing a spatial gating unit algorithm to carry out sequence information learning on the session embedded vector; w ₃ Representing a third parameter matrix, W ₄ Representing a fourth parameter matrix, b ₃ Representing a third parameter vector, b ₄ Representing a fourth parameter vector;

the SGU (z) _h ) The calculation process of (2) is as follows:

[z ₁ ；z ₂ ]＝z _h (13)

z _r ＝z ₁ ⊙z _p (16)

in the formula, z _r Representing a sequence of sessions representation vector; z is a radical of formula _p A sequence position vector containing sequence position information of the session sequence; z is a radical of ₁ 、z ₂ Respectively representing a first half-session embedded vector and a second half-session embedded vector which are obtained by dividing the session embedded vector into halves along an embedding dimension, wherein the first half-session embedded vector and the second half-session embedded vector both contain article embedding information of a session sequence;

a second half-session embedding vector representing layer normalization; w _s Representing a spatial projection matrix, b _s Representing a spatial parameter vector; an indication of a hadamard product;

the calculation formula of the session sequence feature vector is as follows:

in the formula,

representing the session sequence feature vector.

Further, step S2 is specifically:

aggregating L-order neighbor node features in the session graph by adopting a graph isomorphic network so as to obtain the article embedding vector; wherein the graph isomorphic network performs the steps of:

s21: calculating an order-l embedding vector according to an order-l-1 embedding vector of the session graph; when L is equal to 1, the embedding vector of the (L-1) order is the initial embedding vector;

s22: judging whether the calculation times of the L-order embedded vector reaches a preset order, if not, taking the L-order embedded vector as the (L-1) -order embedded vector, returning to the step S21, if so, taking the L-order embedded vector as an L-order embedded vector, and executing a step S23; wherein the preset order is set as L;

s23: an item embedding vector is calculated from the L-order embedding vector.

Further, the computation process of the l-order embedding vector is as follows:

h ^l ＝MLP{[A _in +A _out +I×(1+∈)]×h ^l-1 } (6)

in the formula,

represents the embedding vector of order l, h ^l Represents the embedded intermediate vector of order l, h ^l-1 Represents the (l-1) order embedded intermediate vector; a. the _out Representing the out-of-order matrix in the adjacency matrix, A _in Representing an in-degree matrix in an adjacent matrix, and I representing an identity matrix; e represents the initial mapping error; ReLU () represents a ReLU activation function; dropout () represents a Dropout regularization technique; MLP () represents a multi-layer perceptron algorithm;

the calculation process of the article embedding vector comprises the following steps:

g _h ＝σ(W _g [h ⁰ ；h ^L ]) (8)

h _f ＝g _h ⊙h ⁰ +(1-g _h )⊙h ^L (9)

in the formula, h _f Represents the item embedding vector, h ^L Representing the L-order embedding vector; g is a radical of formula _h Representing a gated embedding parameter determined jointly by the initial embedding vector and the L-order embedding vector; [;]representing the splicing of the initial embedding vector and a point-to-point L-order embedding vector; w _g Representing a gating parameter matrix; σ () represents a sigmoid activation function; an even line indicates a hadamard product.

Further, step S4 is specifically:

according to the session sequence feature vector, obtaining the current interest through average pooling calculation, and obtaining the global interest through aggregation pooling calculation; then, the current interest and the global interest are subjected to self-adaptive fusion, and a final session representation vector is obtained through calculation.

Further, the calculation process of the final session representation vector is as follows:

g _z ＝σ(W ₅ z _local +W ₆ z _global ) (20)

z _f ＝g _z ⊙z _local +(1-g _z )⊙z _global (21)

in the formula, z _f A representation vector representing the final session is generated,

representing the session sequence feature vector; z is a radical of _local Represents the current interest, z _global Representing the global interest; g _z Representing a gated fusion parameter, determined by the current interest and the global interest; mean () represents average pooling, sum () represents aggregate pooling; w ₅ Denotes a fifth parameter matrix, W ₆ Representing a sixth parameter matrix.

Further, the method also comprises a parameter optimization step:

s6: preprocessing the conversation sequences in the conversation data set, then randomly dividing the preprocessed conversation sequences into a training set and a testing set according to the ratio of 9:1, and randomly dividing the conversation sequences in the training set into a plurality of batches;

s7: initializing the network parameters of the steps S2-S4, and executing the steps S1-S5 to the conversation sequence of the training set according to batches to optimize the network parameters; in the optimization process, the network parameters are optimized by adopting a self-adaptive momentum algorithm, and a cross entropy loss function is adopted for loss calculation;

the expression of the cross entropy loss function is:

in the formula,

representing the click probability;

an ith value representing the click probability, namely a probability value that the ith item is the next click item in the conversation sequence; y is _i The unique hot coded vector of the ith article is 1 or 0; y is _i 1 indicates that the ith item is a normal example, that is, the item clicked next is the ith item; y is _i 0 means that the ith item is a negative example, i.e. the item clicked next is not the ith item;

s8: after each iteration, executing steps S1-S5 on the session sequence of the test set to perform performance test, and repeating a plurality of iterations until the performance is not improved any more.

The invention also provides a session recommendation model which comprises a composition module, an article embedding information extraction module, a sequence information extraction module, an interest fusion module and a prediction module which are sequentially stacked;

the composition module constructs a conversation graph according to a conversation sequence; wherein the session graph is composed of an initial embedded vector and an adjacency matrix;

the composition module constructs a conversation graph according to a conversation sequence;

the article embedding information extraction module aggregates L-order neighbor node characteristics in the session graph to obtain an article embedding vector;

the sequence information extraction module captures sequence information of the object embedded vector to obtain a conversation sequence feature vector;

the interest fusion module calculates the current interest and the global interest according to the feature vector of the session sequence of the arrival point, and performs interest fusion on the current interest and the global interest to obtain a final session representation vector;

and the prediction module calculates the click probability of all candidate items according to the final session expression vector and outputs the K candidate items with the highest click probability as the final recommended items.

Further, the session graph is composed of an initial embedded vector and an adjacency matrix;

the article embedding information extraction module comprises an L-layer graph isomorphic network and a highway network which are sequentially stacked; the l-level graph isomorphic network calculates l-order embedded vectors according to the (l-1) -order embedded vectors of the session graph; l belongs to [1, L ], and L is a positive integer; when l is 1, the (l-1) order embedding vector is the initial embedding vector; when L is L, the L-order embedded vector is the L-order embedded vector; the expressway network calculates an article embedding vector according to the initial embedding vector and the L-order embedding vector;

the sequence information extraction module comprises N layers of gate-controlled multilayer perceptrons which are sequentially stacked; each gated multilayer perceptron comprises a first LayerNorm layer, a first linear mapping layer, a GeLU activation function layer, a spatial gating unit, a second linear mapping layer and a residual connecting layer which are sequentially stacked; the first LayerNorm layer, the first linear mapping layer and the GeLU activation function layer sequentially process the article embedding vector, and a session embedding vector is obtained through calculation; the space gate control unit calculates to obtain a session sequence intermediate vector according to the session embedding vector; the second linear mapping layer and the residual connecting layer sequentially process the session sequence intermediate vectors and calculate to obtain the session sequence feature vectors;

the spatial gating unit comprises a vector half-and-half segmentation layer, a second LayerNorm layer, a time sequence dimension linear mapping layer and a gating fusion layer; the vector halving layer halves the session embedded vector into a first half-session embedded vector and a second half-session embedded vector; the second LayerNorm layer performs layer normalization on the second half-session embedded vector; the linear mapping layer of the arrival time sequence dimension obtains a sequence position vector through calculation by using sequence position information of a second half-session embedded vector normalized by a space projection matrix one-time capture layer; the point-to-point gating fusion layer fuses the point-to-point first half session embedding vector and the point-to-point sequence position vector to obtain a point-to-point session sequence intermediate vector;

the interest fusion module comprises an average pooling layer, an aggregation pooling layer and a gated fusion network; the average pooling layer calculates to obtain the current interest according to the session sequence feature vector, the aggregation pooling layer calculates to obtain the global interest according to the session sequence feature vector, and the gated fusion network fuses the current interest and the global interest to obtain the final session representation vector.

Compared with the prior art, the conversation recommendation method and the conversation recommendation model provided by the invention can be used for capturing the sequence information in the conversation sequence at one time on the basis of the aggregation high-order article transfer relationship, and have the following advantages:

(1) the performance is stable, and the running speed is fast: extra position coding is not required to be introduced, the limitation that the performance of the model depends on the quality of the position coding is overcome, and the stability of the performance is improved; meanwhile, the gated multi-layer perceptron has linear time complexity (O (N)), and the computation overhead is smaller than that of a GC-SAN model, and the operation speed is higher.

(2) Simple and efficient, the accuracy is high: the time sequence dimension of the conversation sequence is acted by a space projection matrix, the sequence position information and the long-term dependence relation of the conversation sequence can be captured at one time, a plurality of convolution layers are not required to be stacked, the method is simple and efficient, and more accurate conversation representation can be obtained, so that the conversation recommendation performance and effect are improved.

For a better understanding and practice, the present invention is described in detail below with reference to the accompanying drawings.

Drawings

Fig. 1 is a flowchart illustrating steps of a session recommendation method according to an embodiment of the present invention;

fig. 2 is a diagram of a process for constructing a session map according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a session graph and an adjacency matrix, an out-degree matrix and an in-degree matrix thereof according to an embodiment of the present invention;

FIG. 4 is a network architecture diagram of a session recommendation model according to an embodiment of the present invention;

FIG. 5 is a diagram of a network architecture for an article embedded information extraction module according to an embodiment of the present invention;

fig. 6 is a network architecture diagram of a gated multi-tier perceptron according to an embodiment of the present invention.

Detailed Description

The inventor finds that the reason for poor session recommendation effect is that the session recommendation methods only focus on modeling local item transfer relations on a session graph and cannot capture sequence information such as session sequence position information, long-term dependency relations and the like from the session graph. If the sequence information in the conversation graph can be obtained through further modeling, the performance and the effect of conversation recommendation can be obviously improved. Therefore, on the basis of aggregating the high-order item transfer relationship of the conversation graph, the conversation sequence position information and the long-term dependency relationship in the conversation graph are further captured, and the accuracy and the stability of conversation recommendation are improved. The invention introduces a Graph Isomorphic Network (GIN) with L-layer stacking to aggregate an L-order article transfer relation in a conversation Graph, and then introduces a gated multi-layer Perceptron (gMLP) to capture conversation sequence position information and long-term dependency relation at one time. The following is illustrated by a specific example:

please refer to fig. 1, which is a flowchart illustrating a session recommendation method according to an embodiment of the present invention. The conversation recommendation method comprises the following steps:

s1: constructing a conversation graph according to the conversation sequence;

s2: aggregating L-order neighbor node characteristics according to the session graph to obtain an article embedding vector; wherein L is a positive integer;

s4: calculating current interest and global interest according to the session sequence feature vector, and calculating to obtain a final session representation vector according to the current interest and the global interest;

In step S1, the process of constructing the session map is: constructing a conversation graph according to the conversation sequence; and embedding all articles in the conversation graph to obtain an initial embedding vector, and meanwhile, carrying out normalized weighting on all edges in the conversation graph to obtain an adjacency matrix.

In a session-based recommendation scenario, V ═ { V ═ V ₁ ，v ₂ ，...，v _M Denotes all item sets that appear in all conversation sequences, where v _I E.g. V, represents the I-th item in the set of items, and I is e [1, M](ii) a M is a positive integer, which indicates that the total number of articles is M. A session sequence can then be represented as a list sorted by timestamp: s ═ v _s，1 ，v _s，2 ，...，v _s，n In which v _s，j E.g. V, represents the item clicked on by the user j in the session sequence, and j e [1, n ∈](ii) a n is a positive integer indicating that the conversation sequence has clicked on n items; the goal of the conversational recommendation is to predict the next item that the user will click on, i.e., (n +1) th item v of the conversational sequence _s，n+1 And v is _s，n+1 ∈V。

It should be noted that all items in the item set are a set of all items occurring in all conversation sequences, and there are typically hundreds of thousands or millions of items; the length of a conversation sequence is generally not more than 300, namely, the number of times of clicking the item in the conversation sequence is not more than 300; since there may be some article repetitions in the sequence, fewer articles will actually appear in a session sequence. Assuming that there are a total of m items in a conversation sequence, then there are: m is more than or equal to n and less than M.

Please refer to fig. 2, which is a diagram illustrating a process of constructing a session map according to the present embodiment. A conversation sequence can be modeled to obtain a conversation graph: g _s ＝(V _s ，E _s ). In the conversation chart, V _s Representing a set of points (or a set of session objects), each node in the set of points representing a node generationAn item v in which a sequence of table sessions occurs _s，j ；E _s Representing a set of edges (or a set of item transfer relationships), each edge (v) in the set of edges _s，j-1 ，v _s，j ) On behalf of the user clicking on item v first in the conversation sequence _s，j-1 Click on the item v _s，j At this time, j is equal to [2, n ]]。

Nodes in the session graph are scalar quantities, and are converted into node vectors through article embedding, so that the point sets are converted into initial embedded vectors, and feature extraction can be carried out. Then, the initial embedded vector is obtained by: each node (article) in the point set is embedded into a uniform embedding space to obtain a node vector, and the node vectors of all the nodes jointly form an initial embedding vector of the session graph:

h ⁰ ＝{x ₁ ，x ₂ ，...，x _m } (1)

in the formula, h ⁰ An initial embedded vector representing the session graph, consisting of node vectors for all items in the initial session graph;

a node vector representing an ith node in the session graph;

representing a real number set, d representing an embedding dimension (simplified to d); m is the number of nodes, corresponding to m items in the session sequence.

Similarly, carrying out article embedding on the candidate article, thereby converting the label of the candidate article into an embedded vector; wherein the candidate items are all items in the item set, and the embedding vector of the I-th item in the item set is represented as x _I And is and

the acquisition mode of the adjacency matrix is as follows: for the node i (i.e. the ith node), calculating the total number of incoming edges connected with the node i, taking the reciprocal of the total number of the incoming edges as the weight of the incoming edges connected with the node i, and then the incoming matrix is formed by the weights of the incoming edges of all the nodes, and in the same way, obtaining the outgoing matrix, and forming the adjacency matrix of the conversation graph by the incoming matrix and the outgoing matrix.

Then, the session graph may be represented as:

G′ _s ＝(h ⁰ ，A) (2)

in formula (II), G' _s Representing a session graph;

representing an adjacency matrix.

In the prior art, a session Graph is generally input to a GNN, such as a Gated Graph Neural Network (GGNN) or a Graph Convolution Network (GCN), to capture rich node connection information in the Graph and automatically extract features of the session Graph. However, the problem of over-smoothing is easily caused by stacking multiple layers of GNNs, so most GNN-based session recommendation methods can only adopt a single layer of GNNs to aggregate first-order neighbor node features, and cannot learn high-order neighbor node features (second-order and above are called high-order neighbors), so that the expression capability of a model is limited, and the finally extracted object embedding vector is not accurate enough.

In step S2, the embodiment adopts a Graph Isomorphic Network (GIN) to aggregate L-order neighbor node features in the session graph, thereby obtaining an item embedding vector.

Wherein the GIN executes the steps of:

s21: calculating an order l embedded vector according to the order (l-1) embedded vector of the session graph; when L is equal to 1, the (L-1) order embedding vector is the initial embedding vector;

s22: judging whether the calculation frequency of the L-order embedded vector reaches a preset order, if not, taking the L-order embedded vector as an (L-1) -order embedded vector, returning to the step S21, if so, taking the L-order embedded vector as an L-order embedded vector, and executing a step S23; wherein the preset order is set as L;

s23: an item embedding vector is calculated from the L-order embedding vector.

In step S21, the original calculation procedure of the l-order embedding vector is:

h ^l ＝MLP{[A _out +I×(1+∈)]×h ^l-1 } (3)

in the formula,

represents an embedded vector of order l, h ^l Represents an embedded intermediate vector of order l, h ^l-1 Represents the (l-1) order embedded intermediate vector;

representing a degree-out matrix in the adjacency matrix;

representing an identity matrix; e represents the initial mapping error; m represents the number of nodes in the session graph; ReLU () represents a ReLU activation function, so that GIN can approximate any nonlinear function, and the feature extraction capability of GIN is effectively enhanced; dropout () represents a Dropout regularization technique to prevent overfitting and speed up training; MLP () represents a Multilayer Perceptron (MLP) algorithm, also called an Artificial Neural Network (ANN), i.e. the calculation formula is:

MLP(H)＝ReLU[BatchNorm(HW ₁ +b ₁ )]W ₂ +b ₂ (5)

wherein MLP (H) represents that a vector H is processed by adopting MLP algorithm, and H refers to [ A ] in formula (3) _out +I×(1+∈)]×h ^l-1 Moiety, i.e. H ═ a _out +I×(1+∈)]×h ^l-1 ；

A first matrix of parameters is represented that is,

representing a second parameter matrix;

a vector of a first parameter is represented,

representing a second parameter vector; BatchNorm () represents a batch normalization technique to prevent overfitting.

Please refer to fig. 3, which is a diagram illustrating a session graph and an adjacency matrix, an outbound matrix and an inbound matrix thereof according to this embodiment. As can be seen from the formula (3), the original calculation process of the l-order embedded vector only uses the out-degree matrix in the adjacent matrix, and the out-degree matrix is constructed according to the out-degree information of each node, that is, the original calculation process of the l-order embedded vector does not consider the positive and negative relation of the edges. In this embodiment, the positive and negative relationships of the edge are considered as two relationships, and the output matrix and the input matrix in the adjacent matrix are used to calculate the l-order embedded vector, which is more beneficial to learning the complex transfer relationship between the articles, so as to obtain a more accurate article embedded vector through calculation.

Then, according to the connection relationship of each node in the session graph, the adjacency matrix is divided into an in-degree matrix and an out-degree matrix, and the formula (3) is corrected, so that the calculation process of the l-order embedded vector is as follows:

h ^l ＝MLP{[A _in +A _out +I×(1+∈)]×h ^l-1 } (6)

in the formula,

representing an in-degree matrix in an adjacency matrix;

in step S23, the calculation process of the item embedding vector is:

g _h ＝σ(W _g [h ⁰ ；h ^L ]) (8)

h _f ＝g _h ⊙h ⁰ +(1-g _h )⊙h ^L (9)

in the formula, h _f To representItem embedding vector, h ^L Representing an L-order embedding vector; g _h Representing a gating parameter, which is jointly determined by an initial embedding vector and an L-order embedding vector; [;]representing splicing the initial embedded vector and the L-order embedded vector;

representing a gating parameter matrix for removing the spliced vectors

Is converted into

σ () represents a sigmoid activation function; an even line indicates a hadamard product.

In step S3, the present embodiment uses a gated multi-layer perceptron (gMLP) algorithm to capture the sequence information of the item embedding vector at a time, so as to calculate the session sequence feature vector.

Wherein the gMLP performs the steps of:

s31: capturing all sequence position information of the article embedding vector at one time by adopting a space projection matrix, and fusing the sequence position information and the article embedding information to obtain a conversation sequence intermediate vector;

s32: performing Residual Connection (Residual Connection) on the article embedding vector and the session sequence intermediate vector to obtain a session sequence feature vector;

s33: judging whether the calculation times of the conversation sequence feature vector reach the preset cycle times, if not, taking the conversation sequence feature vector as an article embedding vector, returning to the step S31, and if so, directly outputting the conversation sequence feature vector; wherein the preset cycle number is set as N; wherein N is a positive integer.

In step S31, the calculation process of the session sequence intermediate vector is:

z _s ＝Dropout[SGU(z _h )W ₄ +b ₄ ] (12)

in the formula, z _s Representing the intermediate vector, z, of the conversation sequence _h A representation of the embedded vector of the session,

an item embedding vector representing layer normalization; LayerNorm () denotes LayerNorm regularization technique, GeLU () denotes GeLU activation function, SGU (z) _h ) Representing a Spatial Gating Unit (SGU) algorithm to perform sequence information learning on the session embedded vector;

a third parameter matrix is represented which is,

representing a fourth parameter matrix;

which represents a third vector of parameters that is,

representing a fourth parameter vector. W ₃ And b ₃ Pair of representations

Performing a linear mapping, W ₄ And b ₄ For SGU (z) _h ) Performing linear mapping; meanwhile, two regularization technologies, i.e. LayerNorm and Dropout, are adopted to prevent the model from being over-fitted, and a GeLU activation function is used to improve the nonlinear expression capability of the model.

Wherein, SGU (z) _h ) The calculation process of (2) is as follows:

[z ₁ ；z ₂ ]＝z _h (13)

z _r ＝z ₁ ⊙z _p (16)

in the formula, z _r Representing a session sequence representation vector; z is a radical of _p A sequence position vector containing sequence position information of the session sequence;

respectively representing a first half-session embedded vector and a second half-session embedded vector which are obtained by dividing the session embedded vector into halves along the embedding dimension, wherein the first half-session embedded vector and the second half-session embedded vector both contain article embedded information of a session sequence;

a second half-session embedding vector representing layer normalization;

representing a spatial projection matrix, wherein n represents the length of the conversation sequence, and assuming that the length of the conversation sequence is 50, the shape (shape) of the spatial projection matrix is 50 × 50; b is a mixture of _s Representing a spatial parameter vector; equation (16) represents the fusion of the first half-session embedding vector z by means of gating (Hadamard product) ₁ And a sequence position vector z _p Thereby obtaining a conversation sequence representation vector z _r 。

In step S32, the calculation formula of the feature vector of the session sequence is:

in the formula,

representing a Session sequence featureAnd (5) vector quantity.

To better learn the user's representation of preferences, the user's current interests (short-term interests) are further differentiated from the global interests (long-term interests). The existing method generally takes the last item vector of the feature vector of the session sequence as the current interest of the user, then performs aggregation pooling (Sum Pooling) on the whole session sequence as the global interest of the user, and then fuses the current interest and the global interest of the user by adopting a splicing + linear transformation mode. The method cannot adaptively select the information which is most beneficial to prediction from the current interest and the global interest, and the importance of the current interest and the global interest on the prediction is not distinguished, so that the accuracy of session recommendation is reduced.

In step S4, according to the session sequence feature vector, the present embodiment obtains the current interest through average pooling (mean pooling) calculation, and obtains the global interest through aggregate pooling (sum pooling) calculation; then, the current interest and the global interest are subjected to self-adaptive fusion, and a final session representation vector is obtained through calculation.

The calculation process of the final session representation vector is as follows:

g _z ＝σ(W ₅ z _local +W ₆ z _global ) (20)

z _f ＝g _z ⊙z _local +(1-g _z )⊙z _global (21)

in the formula, z _local Representing the current interest of the user, z _global Representing a global interest of a user; g is a radical of formula _z Expressing gating fusion parameters, which are determined by current interest and global interest; z is a radical of _f Representing a final session representation vector; mean () represents average pooling, sum () represents aggregate pooling;

a fifth parameter matrix is represented which is,

representing a sixth parameter matrix.

Compared with a method for obtaining a final session representation vector through splicing and linear transfer, the embodiment adopts the gate-controlled fusion network, and can adaptively select effective information from the current interest and the global interest for fusion. For a conversational recommendation, the gated fusion parameters select more feature information from the current interest (i.e., 1-g) if the current interest is more important _z ＞g _z ) (ii) a If the global interest is more important, the gated fusion parameters select more feature information from the global interest (i.e., 1-g) _z ＜g _z ). Therefore, a more accurate final conversation representation vector can be obtained, and the precision of conversation recommendation is effectively improved.

In step S5, the click probability is obtained by: matrix multiplication is carried out on the embedding vector of the candidate item and the final conversation expression vector to obtain the probability score of each candidate item, then the probability score of each candidate item is normalized by adopting a softmax function to obtain the probability value of each candidate item, the probability values of all the candidate items jointly form the click probability of all the candidate items, and the calculation process is as follows:

wherein,

representing the click probability of all candidate items;

the I value represents the click probability of all candidate items, namely the probability value of the I item being the next click item in the conversation sequence; t denotes the transpose of the matrix.

For the conversation sequence s, the goal of the conversation recommendation method is to calculate the click probability of all candidate items and calculate the click probability

The K candidate articles with the highest probability value serve as final recommended articles to be recommended to the user; wherein K is a positive integer.

Based on the steps S1-S5, an initial session recommendation model may be formed, and the initial session recommendation model may need to be trained to optimize the network parameters of the model before session recommendation can be performed using the trained session recommendation model. Therefore, the session recommendation method further comprises a parameter optimization step:

s7: initializing the network parameters of the steps S2-S4, and executing the steps S1-S5 to the session sequence of the training set according to batches to optimize the network parameters; in the optimization process, a self-Adaptive momentum (Adam) algorithm is adopted to optimize the network parameters, and a cross entropy loss function is adopted to perform loss calculation;

wherein the network parameters include: first parameter matrix W ₁ And a first parameter vector b ₁ A second parameter matrix W ₂ And a second parameter vector b ₂ Gate control parameter matrix W _g A third parameter matrix W ₃ And a third parameter vector b ₃ A fourth parameter matrix W ₄ And a fourth parameter vector b ₄ A spatial projection matrix W _s And a spatial parameter vector b _s The fifth parameter matrix W ₅ A sixth parameter matrix W ₆ ；

The expression of the cross entropy loss function is:

in the formula, y _i A one-hot encoding vector of the ith item (group route item) is 1 or 0; y is _i 1 indicates that the ith item is a normal example, that is, the item clicked next is the ith item; y is _i 0 means that the ith item is a negative example, i.e., the item of the next click is not the ith item.

S8: after each iteration (epoch), performing steps S1-S5 on the session sequence of the test set to perform performance test, and repeating a plurality of iterations until the performance is not improved any more;

wherein, each complete traversal of all session sequences in the training set is called as one iteration; after each iteration, performing performance test, and recording the current performance test result and the corresponding network parameters; after multiple iterations, if there is no further improvement in performance, it indicates that steps S1-S5 have been trained to converge (i.e., the best performance state).

The performance test adopts two common indexes of session recommendation: hit Rate (Hit Rate, HR) and Mean Reciprocal Rank (MRR). HR @ K is the proportion of tagged items (items that the user actually clicked on) ranked among the top K recommended items, and is often used to evaluate unsorted recommendations. MRR is the average of the reciprocal ranks of the desired items, which is an evaluation of the ranked recommendations.

The present embodiment also provides a conversation recommendation model constructed according to steps S1 to S8.

Please refer to fig. 4, which is a network architecture diagram of a session recommendation model according to the present embodiment. The session recommendation model comprises a composition module, an article embedding information extraction module, a sequence information extraction module, an interest fusion module and a prediction module which are sequentially stacked.

The composition module constructs a conversation graph according to the conversation sequence; wherein the session graph is composed of an initial embedded vector and an adjacency matrix.

An article embedding information extraction module aggregates L-order neighbor node characteristics in the conversation graph to obtain an article embedding vector;

please refer to fig. 5, which is a network architecture diagram of the article embedded information extraction module according to the present embodiment. The article embedded information extraction module comprises an L-layer graph isomorphic network and a highway network which are sequentially stacked; the l-level graph isomorphic network calculates l-order embedded vectors according to the (l-1) -order embedded vectors of the session graph; l belongs to [1, L ], and L is a positive integer; when l is 1, the (l-1) order embedding vector is the initial embedding vector; when L is equal to L, the L-order embedded vector is the L-order embedded vector; and the expressway network calculates the article embedding vector according to the initial embedding vector and the L-order embedding vector.

the sequence information extraction module comprises N layers of gating multilayer perceptrons which are stacked in sequence. Please refer to fig. 6, which is a diagram of a network architecture of a gated multi-layer sensor according to the present embodiment. Each gated multilayer perceptron comprises a first LayerNorm (layer normalization) layer, a first linear mapping layer, a GeLU activation function layer, a spatial gating unit, a second linear mapping layer and a residual connecting layer which are sequentially stacked; the first LayerNorm layer, the first linear mapping layer and the GeLU activation function layer sequentially process the object embedding vector, and a session embedding vector is obtained through calculation; the space gate control unit calculates to obtain a session sequence intermediate vector according to the session embedding vector; and the second linear mapping layer and the residual connecting layer sequentially process the intermediate vectors of the session sequence and calculate to obtain the feature vectors of the session sequence.

Specifically, the spatial gating unit comprises a vector half-split layer, a second LayerNorm layer, a linear mapping layer of a time sequence dimension and a gating fusion layer. The vector halving layer halving the session embedded vector into a first half-session embedded vector and a second half-session embedded vector; the second LayerNorm layer performs layer normalization on the second half-session embedded vector; the linear mapping layer of the time sequence dimension obtains the sequence position vector through calculation by capturing the sequence position information of the normalized second half-session embedded vector of the layer once through a space projection matrix; and the gated fusion layer fuses the first half-session embedding vector and the sequence position vector to obtain a session sequence intermediate vector.

The interest fusion module calculates the current interest and the global interest according to the session sequence feature vector, and performs interest fusion on the current interest and the global interest to obtain a final session representation vector;

the interest fusion module comprises an average pooling layer, an aggregation pooling layer and a gated fusion network. The average pooling layer calculates to obtain the current interest according to the session sequence feature vector, the aggregation pooling layer calculates to obtain the global interest according to the session sequence feature vector, and the gating fusion network fuses the current interest and the global interest to obtain the final session representation vector.

And the prediction module calculates click probability according to the final session expression vector and outputs the K articles with the highest click probability as candidate articles.

The embodiment also provides a conversation recommendation system, which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the steps of the conversation recommendation method when executing the computer program.

The present embodiment also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the computer program implements the steps of the session recommendation method.

The following provides a specific application scenario to illustrate the technical effects of the present invention:

in the present embodiment, the above-described session recommendation model is implemented in the pytorech framework through steps S1 to S8. The size of an embedded dimension of the session recommendation model is set to be 100, the number of layers of a Graph Isomorphic Network (GIN) is set to be 3, a gated multi-layer perceptron (gMLP) is set to be 2, and dropout is set to be 0.5. The model construction process is carried out on a server with a double-channel CPU (Intel (R) Xeon (R) Silver 4114CPU @2.20GHz x 2), and NVIDIA GTX 1080Ti GPU of 11GB video memory is used for model training and testing.

The conversation data set adopts two public reference data sets Diginethe tica dataset and the retailpocket dataset. Preprocessing a conversation sequence in a conversation data set: firstly, filtering all conversations with the length of 1 and objects with the occurrence times less than 5 times; then, setting the maximum session sequence length of the Digimetia data set to be 15, setting the maximum session sequence length of the Retailticket data set to be 35, filling the session sequences smaller than the maximum length with 0, and truncating the session sequences larger than the maximum length; finally, a conversation sequence and a corresponding label are generated through sequence segmentation preprocessing, namely, a sample and a label are generated. For example, for one session s ═ v _s，1 ，v _s，2 ，...，v _s，n ]By the sequence segmentation preprocessing, the following can be obtained: ([ v) _s，1 ]，v _s，2 )，([v _s，1 ，v _s，2 ]，v _s，3 )，……，([v _s，1 ，v _s，2 ，...，v _s，n-1 ]，v _s，n ). Wherein, [ v ] _s，1 ，v _s，2 ，...，v _s，n-1 ]Is the sequence generated, v _s，n Indicating the next click item (i.e., the tag of the sequence). And randomly dividing the preprocessed conversation sequence into a training set and a testing set according to the ratio of 9:1, randomly dividing the conversation sequence in the training set into a plurality of batches, and setting the batch size as 100.

Initializing network parameters of a session recommendation model, setting the number of training iterations to be 30, inputting session sequences in a training set into the session recommendation model according to batches for training, and optimizing the network parameters of the model through an Adam algorithm; wherein the initial learning rate of the Adam algorithm is 0.001, and the weight attenuation rate is 1 e-5.

In the present embodiment, the recommendation result of Top-K (K ═ 20) is considered, that is, HR @20 and MRR @20 are used as evaluation indexes, and the performance comparison is performed with the 9 session recommendation models in the prior art. When the model convergence reaches the convergence state, the experimental effect of the session recommendation model of the invention on the digenetica data set and the retailpocket data set is shown in table 1. As can be seen from Table 1, the session recommendation model of the present invention achieves the best recommendation effect on two data sets and two evaluation indexes. On a Digimetica data set, compared with an optimal model GC-SAN in the prior art, the HR @20 of the session recommendation model is 52.77%, the MRR @20 is 18.25%, the HR @20 is respectively increased by 3.79% and 2.5%, and the performance is obviously improved; on a RetailRocket data set, HR @20 of the conversation recommendation model is 53.92, MRR @20 is 29.35, and compared with an optimal model STAN in the prior art, the HR @20 and the MRR @20 are respectively improved by 0.8% and 1.95%, and the overall improvement is good. This fully demonstrates the effectiveness of the important structures in the session recommendation model of the present invention-graph isomorphic networks and gated multi-tier perceptrons: the multilayer stacked graph isomorphic network can effectively learn the high-order conversion relation among the articles, so that more accurate article embedding vectors are obtained; the multi-layer stacked gate-controlled multi-layer perceptron rapidly captures the position information of the conversation sequence through the space gate-control unit and learns the long-term dependence relationship of the conversation sequence by combining the multi-layer perceptron, thereby enhancing the expression capability of the model and improving the conversation recommendation performance.

TABLE 1 comparison of the Effect of the Session recommendation model of the present invention with 9 Session recommendation models in the prior art

Compared with the prior art, the conversation recommendation method, the model, the system and the storage medium provided by the invention can be used for capturing the sequence information in the conversation sequence at one time on the basis of the aggregation high-order article transfer relationship, and have the following advantages:

(1) the performance is stable, and the running speed is fast: extra position coding is not required to be introduced, the limitation that the performance of the model depends on the position coding is overcome, and the stability of the performance is improved; meanwhile, the gated multilayer perceptron has linear time complexity (O (N)), and the calculation cost is smaller than that of SAN (storage area network), and the operation speed is higher.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. A method for conversational recommendation, comprising the steps of:

s1: constructing a conversation graph according to the conversation sequence;

s4: calculating current interest and global interest according to the session sequence feature vector, and performing interest fusion calculation according to the current interest and the global interest to obtain a final session representation vector;

2. The session recommendation method according to claim 1, wherein step S3 specifically comprises:

s31: adopting a space projection matrix to capture the sequence position information of the article embedding vector at one time, and fusing the sequence position information and the article embedding information to obtain a conversation sequence intermediate vector;

s32: residual error connection is carried out on the article embedding vector and the intermediate vector of the conversation sequence to obtain the characteristic vector of the conversation sequence;

s33: judging whether the calculation times of the conversation sequence feature vector reach preset cycle times or not, if not, taking the conversation sequence feature vector as the article embedding vector, returning to the step S31, and if so, directly outputting the conversation sequence feature vector; wherein the preset cycle number is set as N; wherein N is a positive integer.

3. The conversation recommendation method according to claim 2, wherein:

the calculation process of the intermediate vector of the session sequence comprises the following steps:

z _s ＝Dropout[SGU(z _h )W ₄ +b ₄ ] (12)

an item embedding vector representing layer normalization; LayerNorm () denotes LayerNorm regularization technique, Dropout () denotes Dropout regularization technique, GeLU () denotes GeLU activation function, SGU (z) _h ) Representing a spatial gating unit algorithm to perform sequence information learning on the session embedding vector; w ₃ Representing a third parameter matrix, W ₄ Represents a fourth parameter matrix, b ₃ Representing a third parameter vector, b ₄ Represents a fourth parameter vector;

the SGU (z) _h ) The calculation process of (2) is as follows:

[z ₁ ；z ₂ ]＝z _h (13)

z _r ＝z ₁ ⊙z _p (16)

in the formula, z _r Representing a session sequence representation vector; z is a radical of formula _p A sequence position vector containing sequence position information of the session sequence; z is a radical of ₁ 、z ₂ Respectively representing a first half-session embedded vector and a second half-session embedded vector which are obtained by dividing the session embedded vector into halves along an embedding dimension, wherein the first half-session embedded vector and the second half-session embedded vector both contain article embedding information of a session sequence;

the calculation formula of the session sequence feature vector is as follows:

in the formula,

representing the session sequence feature vector.

4. The session recommendation method according to any one of claims 1 to 3, wherein step S2 specifically comprises:

s21: calculating an order I embedding vector according to the order (l-1) embedding vector of the session graph; when L is equal to 1, the (L-1) order embedding vector is the initial embedding vector;

s23: an item embedding vector is calculated from the L-order embedding vector.

5. The conversation recommendation method according to claim 4, wherein:

the computation process of the l-order embedding vector is as follows:

h ^l ＝MLP{[A _in +A _out +I×(1+∈)]×h ^l-1 } (6)

in the formula,

represents the embedding vector of order l, h ^l Represents an embedded intermediate vector of order l, h ^l-1 Representing the (l-1) order embedded intermediate vector; a. the _out Representing the out-of-order matrix in the adjacency matrix, A _in Representing an in-degree matrix in the adjacency matrix, and I representing an identity matrix; e represents an initial mapping error; ReLU () denotes a ReLU activation function; dropout () represents a Dropout regularization technique; MLP () represents a multi-layer perceptron algorithm;

g _h ＝σ(W _g [h ⁰ ；h ^L ]) (8)

h _f ＝g _h ⊙h ⁰ +(1-g _h )⊙h ^L (9)

in the formula, h _f Represents the item embedding vector, h ^L Representing the L-order embedding vector; g is a radical of formula _h Representing a gated embedding parameter determined jointly by the initial embedding vector and the L-order embedding vector; [;]representing splicing the initial embedding vector and a point-to-point L-order embedding vector; w _g Representing a gating parameter matrix; σ () represents a sigmoid activation function; an even line indicates a hadamard product.

6. The conversation recommendation method according to any one of claims 5, wherein step S4 specifically comprises:

7. The conversation recommendation method according to claim 6, wherein:

g _z ＝σ(W ₅ z _local +W ₆ z _global ) (20)

z _f ＝g _z ⊙z _local +(1-g _z )⊙z _global (21)

in the formula, z _f Representing the final session representation vector and,

representing the session sequence feature vector; z is a radical of formula _local Represents said current interest, z _global Representing the global interest; g _z Representing a gated fusion parameter determined by the current interest and the global interest; mean () represents average pooling, sum () represents aggregate pooling; w is a group of ₅ Denotes a fifth parameter matrix, W ₆ A sixth parameter matrix is represented.

8. The conversation recommendation method according to claim 7, further comprising a parameter optimization step of:

s6: preprocessing a conversation sequence in a conversation data set, then randomly dividing the preprocessed conversation sequence into a training set and a testing set according to a ratio of 9:1, and randomly dividing the conversation sequence in the training set into a plurality of batches;

the expression of the cross entropy loss function is:

in the formula,

representing the click probability;

an ith value representing the click probability, namely a probability value that the ith item is the next click item in the conversation sequence; y is _i The unique hot coded vector of the ith article is 1 or 0; y is _i 1 denotes the ithThe item is a sample, that is, the item clicked next is the ith item; y is _i 0 means that the ith item is a negative example, i.e. the item clicked next is not the ith item;

9. A conversational recommendation model, characterized by:

the system comprises a composition module, an article embedding information extraction module, a sequence information extraction module, an interest fusion module and a prediction module which are sequentially stacked;

the sequence information extraction module captures the sequence information of the article embedded vector to obtain a conversation sequence feature vector;

10. The conversational recommendation model of claim 9, wherein:

the session graph is composed of an initial embedded vector and an adjacency matrix;

the article embedding information extraction module comprises an L-layer graph isomorphic network and a highway network which are sequentially stacked; the l-level graph isomorphic network calculates l-order embedded vectors according to the (l-1) -order embedded vectors of the session graph; l belongs to [1, L ], and L is a positive integer; when l is 1, the (l-1) order embedding vector is the initial embedding vector; when L is equal to L, the L-order embedded vector is the L-order embedded vector; the expressway network calculates an article embedding vector according to the initial embedding vector and the L-order embedding vector;