CN111506814A - Sequence recommendation method based on variational self-attention network - Google Patents

Sequence recommendation method based on variational self-attention network Download PDF

Info

Publication number
CN111506814A
CN111506814A CN202010273754.XA CN202010273754A CN111506814A CN 111506814 A CN111506814 A CN 111506814A CN 202010273754 A CN202010273754 A CN 202010273754A CN 111506814 A CN111506814 A CN 111506814A
Authority
CN
China
Prior art keywords
self
attention
attention network
sequence
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010273754.XA
Other languages
Chinese (zh)
Other versions
CN111506814B (en
Inventor
赵朋朋
赵静
周晓方
崔志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202010273754.XA priority Critical patent/CN111506814B/en
Publication of CN111506814A publication Critical patent/CN111506814A/en
Application granted granted Critical
Publication of CN111506814B publication Critical patent/CN111506814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method introduces a variational self-encoder into a self-attention network to capture the potential preference of a user, on one hand, the obtained self-attention vector is expressed as density through variational inference, and the variance of the self-attention vector can well represent the uncertainty of the preference of the user, on the other hand, the self-attention network is adopted to learn the inference process and the generation process of the variational self-encoder, so that the self-attention network can well capture long-term and short-term dependence, can better capture the uncertainty and the dynamics of the preference of the user, and improves the accuracy of a recommendation result. In addition, the application also provides a sequence recommendation device and equipment based on the variation self-attention network, and the technical effect of the sequence recommendation device and equipment corresponds to that of the method.

Description

Sequence recommendation method based on variational self-attention network
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for sequence recommendation based on a variational self-attention network.
Background
In the era of information explosion, recommendation systems play an increasingly important role. The key to recommendation systems is being able to accurately describe the interests and preferences of the user, however these interests and preferences are always changing naturally and fraught with uncertainty. Sequence recommendations, which attempt to capture the dynamic preferences of users, have become an attractive topic in academia and industry today.
FPMC is a classical method that combines Markov chains and matrix factorization models linearly to capture the user preferences, but is not sufficient to model high-level interactions because the weights of the different components in this method are linearly fixed.inspired by deep learning, many people have studied in depth on the Recurrent Neural Networks (RNNs) and succeeded in sequence recommendation.
In recent years, self-attention networks (SANs) have found good application in many natural language processing (N L P) tasks, such as machine translation, emotion analysis and problem solving, etc. SAN also shows good performance and efficiency in sequence recommendations compared to traditional RNNs and Convolutional Neural Networks (CNNs). for example, Kang et al propose a self-attention sequence recommendation (SASRec) model to capture long-term and local dependencies of items in a sequence, which were previously typically modeled by RNNs and CNNs.
FIG. 1 is a schematic diagram of a deterministic recommendation method, in FIG. 1, u is a user representation, i1,i2,i3,i4Are all item representations, and the dashed ellipses represent potential preferences of user u, where i1,i2,i3Different from each other in category i1,i4The categories are the same. As shown in FIG. 1, assume that user u is associated with a sequence of items i1And i2Interacting that when a deterministic method is used to learn the user's preferences, u may be located in i in the latent feature space (2D mapping)1And i2In the middle of (a). If a recommendation is made based on the distance between u and the candidate item, it is possible to recommend item i to user u3Instead of the real item i4(class and i)1Same) because u and i are3The distance between them is small. Therefore, the fixed point representation cannot capture uncertainty and is prone to incorrect recommendation results.
In summary, the current sequence recommendation scheme represents the preferences of potential users as fixed points in a potential feature space, and a fixed point vector lacks the capability of capturing uncertainty and dynamics of user preferences commonly existing in a recommendation system, so that great limitation exists in capturing the potential preferences of the users, and the recommendation result is inaccurate.
Disclosure of Invention
The application aims to provide a variational self-attention network-based sequence recommendation method, device and equipment, which are used for solving the problem that the recommendation result is inaccurate because the current sequence recommendation scheme cannot capture the uncertainty and the dynamics of user preference. The specific scheme is as follows:
in a first aspect, the present application provides a variational self-attention network-based sequence recommendation method, including:
generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
deducing the input embedded matrix input from an attention network to obtain a self-attention vector, and determining variation parameters according to the self-attention vector;
determining latent variables according to the variation parameters by using a re-parameterization method;
generating a representation of the historical interaction sequence from the latent variables as a representation of user preferences using a generated self-attention network;
and determining the candidate item with the maximum interaction probability according to the user preference representation by utilizing a prediction layer to serve as a recommendation result.
Preferably, the generating an input embedding matrix according to the historical interaction sequence includes:
generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix is as follows:
Figure BDA0002444055470000031
where i ∈ (1, n), n denotes the sequence length, AiItem embedding information, P, representing the ith itemiIndicating the position embedding information of the ith item.
Preferably, the inferring the input embedded matrix input from an attention network to obtain a self-attention vector comprises:
determining an inferred projection result according to the input embedding matrix and the inferred projection matrix by utilizing a projection layer inferred from the attention network;
generating a self-attention vector from the inferred projected results using a first predetermined number of self-attention blocks in an inferred self-attention network
Figure BDA0002444055470000032
Wherein h is1For the first preset number, n represents the sequence length.
Preferably, the determining a variation parameter according to the self-attention vector includes:
determining a mean and a variance as the self-attention vectorApproximate posterior distribution qλ(z|Su) Wherein the mean value is
Figure BDA0002444055470000033
The variance is
Figure BDA0002444055470000034
l1(. represents a linear transformation,. l2(. cndot.) represents another linear transformation, λ represents an approximation parameter of the variational self-attention network, SuRepresenting the historical interaction sequence, and z representing the latent variable.
Preferably, the determining the latent variable according to the variation parameter by using the reparameterization method includes:
determining latent variables according to the variation parameters by using a re-parameterization method, wherein the latent variables are as follows: z is uλλWhere standard gaussian variables are represented.
Preferably, the generating a representation of the historical interaction sequence from the latent variable using a self-attention network comprises:
determining a projection result according to the latent variable and the generated projection matrix by using a projection layer generated in the self-attention network;
distributing p based on the condition using a second preset number of self-attention blocks generated in the self-attention networkθ(Su| z), generating a representation of the historical interaction sequence from the generated projection results
Figure BDA0002444055470000041
Wherein h is2For the second predetermined number, θ represents a real parameter of the variation from the attention network.
Preferably, the determining, by the prediction layer, the candidate item with the highest interaction probability according to the user preference representation as the recommendation result includes:
according to the target formula, a prediction layer is used for predicting the candidate item with the maximum interaction probability in the candidate item set based on the user number expression so as to obtainFor the recommended result, the target formula is:
Figure BDA0002444055470000042
wherein
Figure BDA0002444055470000043
Representing the predicted candidate item with the highest interaction probability of the user u at the time t,
Figure BDA0002444055470000044
representing user preference representations
Figure BDA0002444055470000045
In the t-th row of (a),
Figure BDA0002444055470000046
Figure BDA0002444055470000047
n represents the number of candidates in the set of candidate items and d represents the vector dimension.
Preferably, the method further comprises the following steps:
optimizing real parameters and approximate parameters of the variational self-attention network according to a loss function, wherein the loss function is as follows:
Figure BDA0002444055470000048
wherein, y(u,t)Representing the actual interactive item of user u at time t,
Figure BDA0002444055470000049
representing the predicted candidate item with the maximum interaction probability of the user u at the time t, SuRepresenting the historical interaction sequence, theta and lambda representing the real and approximate parameters of the variational self-attention network, respectively, sigmaλjIs expressed as sigmaλJ row of (1), muλjRepresents uλRow j of (2).
In a second aspect, the present application provides a sequence recommendation apparatus based on a variational self-attention network, including:
embedding a module: the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
an inference module: the input embedding matrix input is used for deducing from an attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
a parameterization module: the method is used for determining latent variables according to the variation parameters by using a reparameterization method;
a generation module: generating a representation of the historical interaction sequence from the latent variables as a user preference representation using a generated self-attention network;
a prediction module: and the candidate item with the maximum interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.
In a third aspect, the present application provides a sequence recommendation device based on a variational self-attention network, including:
a memory: for storing a computer program;
a processor: for executing said computer program for carrying out the steps of the variational self-attention network based sequence recommendation method as described above.
The application provides a sequence recommendation method based on a variational self-attention network, which comprises the following steps: according to the historical interaction sequence, an input embedding matrix comprising item embedding information and position embedding information; deducing the input of the input embedded matrix from an attention network to obtain a self-attention vector, and determining variation parameters according to the self-attention vector; determining latent variables according to the variation parameters by using a re-parameterization method; generating a representation of the historical interaction sequence from the latent variables as a representation of user preferences using the generated self-attention network; and determining the candidate item with the maximum interaction probability according to the user preference representation by utilizing the prediction layer to serve as a recommendation result.
In conclusion, the method introduces the variational self-encoder into the self-attention network to capture the potential preference of the user, on one hand, the obtained self-attention vector is expressed as density through variational inference, and the variance of the obtained self-attention vector can well represent the uncertainty of the preference of the user, on the other hand, the self-attention network is adopted to learn the reasoning process and the generating process of the variational self-encoder, so that the self-attention network can well capture long-term and short-term dependence, the uncertainty and the dynamics of the preference of the user can be better captured, and the accuracy of the recommendation result is improved.
In addition, the application also provides a sequence recommendation device and equipment based on the variational self-attention network, and the technical effect of the sequence recommendation device and equipment corresponds to the technical effect of the method, and the technical effect is not repeated.
Drawings
For a clearer explanation of the embodiments or technical solutions of the prior art of the present application, the drawings needed for the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram provided by the present application for explaining the uncertainty of user preferences that a deterministic recommendation method cannot handle very well;
fig. 2 is a flowchart illustrating a first implementation of a sequence recommendation method based on a variational self-attention network according to an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating an implementation of a second embodiment of a sequence recommendation method based on a variational self-attention network provided in the present application;
fig. 4 is a schematic diagram of a variational self-attention network according to a second embodiment of a sequence recommendation method based on a variational self-attention network provided in the present application;
FIG. 5 is a functional block diagram of an embodiment of a sequence recommendation apparatus based on a variational self-attention network provided in the present application;
fig. 6 is a schematic structural diagram of an embodiment of a sequence recommendation device based on a variational self-attention network according to the present application.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Currently, sequence recommendation has become an extremely attractive topic in recommendation systems. Current sequence recommendation methods, including state-of-the-art self-attention-based methods, typically employ deterministic neural networks to represent the preferences of potential users as fixed points in the potential feature space. However, the fixed point vector lacks the ability to capture the uncertainty and dynamics of user preferences that are prevalent in recommendation systems, resulting in inaccurate recommendation results.
Aiming at the problem, the application provides a sequence recommendation method, device and equipment based on a variational self-attention network, a variational self-encoder is introduced into the self-attention network to capture the potential preference of a user, the uncertainty and the dynamic property of the preference of the user can be better captured, and the accuracy of a recommendation result is improved.
First, the problem of the present application will be described. In this embodiment, the user is recorded as U ═ U1,u2,...,uMAnd denote the item X ═ X1,x2,...,xNFor each user U ∈ U, the interaction records of user U are sorted in time order, and the sequential interactions of available users U are each other
Figure BDA0002444055470000071
Wherein
Figure BDA0002444055470000072
|NuAnd | represents the number of items that user u has accessed. The object of the present application is to provide a solution of S as defined aboveuBy modeling SuTo predict the next item that the user may like.
In response to the above problems, inspired by Variational Auto Encoder (VAE), the present application utilizes a Variational self-attention network (VSAN) to implement sequence recommendation in hopes of maximizing the probability of the next item depending on the user's historical sequence of interactions
Figure BDA0002444055470000073
Wherein S ist uRepresenting items that user u interacts with at time t. This goal is extended to the entire training set, with the conditional probabilities of all interactive items in all sequences as follows:
Figure BDA0002444055470000074
Figure BDA0002444055470000075
then, the emphasis of the model is changed to how to pair the joint probabilities p (S)u) And modeling. Following VAE, the present application first assumes a continuous latent variable z, which is sampled from the standard normal distribution, i.e., z N (0; I). Then, distributing p by the conditionθ(Su| z) to model a historical interaction sequence SuThe conditional distribution is parameterized by θ. Thus, the joint probability pθ(Su) Can be specified by the marginal distribution as follows:
pθ(Su)=∫pθ(Su|z)pθ(z)。
in order to optimize the parameter θ, the best approach is to maximize the above equation. However, the true posterior distribution pθ(z|Su) Are often complex and difficult to solve. Thus, the present application introduces a relatively simple posterior distribution qλ(z|Su) To approximate the true posterior distribution described above. Thus by means of variational extrapolation, where λ represents another set of parameters. For convenience of description, θ will be referred to as a true parameter and λ as an approximate parameter, and p will be referred to as an approximate parameter, respectivelyθ(z|Su) Called true posterior distribution, and qλ(z|Su) Referred to as an approximate posterior distribution.
By derivation and recombination, the relationship between log-likelihood and the introduced posterior distribution is:
Figure BDA0002444055470000081
where K L represents the Kullback-L eibler divergence so far, the objective of the present application has been shifted to two terms that maximize the right hand side of the above equation, which are referred to as evidence lower bound objectives (E L BO).
Finally, the present application models VAEs by means of two neural networks, namely an inferred self-attention network and a generated self-attention network, where the former passes through qλ(z|Su) Based on SuTo infer the potential vector z, which is passed through pθ(Su| z) generates a corresponding user representation based on the potential vector z, the learning process being controlled by E L BO described above.
Referring to fig. 2, a first embodiment of a sequence recommendation method based on a variational self-attention network provided in the present application is described below, where the first embodiment includes:
s201, generating an input embedding matrix according to a historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
the historical interaction sequence refers to the item access records of the user in the past period, and the items in the sequence are arranged according to the access sequence. In the embedding layer, the input embedding matrix includes item embedding information, and the input embedding matrix of the present embodiment also includes location embedding information in consideration of ignoring location information of a historical interaction sequence from the attention network.
S202, deducing the input embedded matrix input from an attention network to obtain a self-attention vector, and determining variation parameters according to the self-attention vector;
inputting the input embedding matrix into the inferred self-attention network to obtain an approximate posterior distribution qλ(z|Su) Corresponding variational parameter u ofλAnd σλ. Specifically, a self-attention vector is obtained through a projection layer and a plurality of self-attention blocks; then, the posterior distribution q is estimated from the self-attention vectorλ(z|Su) The mean and variance are the variation parameters. Unlike the conventional self-attention network, the single-head is used in the embodiment to infer the self-attention network, and the number of self-attention blocks stacked in the embodiment to infer the self-attention network is the first preset number.
S203, determining latent variables according to the variation parameters by using a re-parameterization method;
in particular, according to qλ(z|Su) The latent variable z is sampled. However, depending on uλAnd σλThe samples are functions that are indeterminate and are not differentiable. Thus, the present embodiment utilizes a reparameterization technique, followed by a reparameterization of the latent variable z to uλAnd σλAs a function of (c).
S204, generating a representation of the historical interaction sequence according to the latent variable by using a generated self-attention network to serve as a user preference representation;
it is worth mentioning that the above generation process focuses on the next item of the historical interaction sequence, and as a preferred embodiment, may also focus on a certain number of items that follow. Unlike the conventional self-attention network, the single-head is used in the embodiment to infer the self-attention network, and the number of self-attention blocks stacked in the embodiment to infer the self-attention network is the second preset number.
S205, determining a candidate item with the maximum interaction probability according to the user preference representation by utilizing a prediction layer to serve as a recommendation result.
The embodiment provides a sequence recommendation method based on variation self-attention network, firstly inputting the input embedded content into the inference self-attention network, and expressing the obtained self-attention vector as density by applying Gaussian distribution, and using the density to process the uncertainty of user preference; next, obtaining corresponding latent variables according to variation parameters output from the inferred attention network; then, in order to capture the long-term and local dependency relationship of the user, another self-attention network is adopted to model a generation process, and a final user preference representation is generated based on the latent variables; finally, the generated user preference representation is utilized to predict items that the user is likely to interact next.
The following begins to describe in detail an embodiment of a sequence recommendation method based on a variational self-attention network provided by the present application, where the embodiment two is implemented based on the foregoing embodiment one, and is expanded to a certain extent on the basis of the embodiment one.
Referring to fig. 3, the second embodiment specifically includes:
s301, calculating item embedding and position embedding of each item according to the historical interaction sequence on an embedding layer to obtain an input embedding matrix;
in this embodiment, the input includes item embedding and location embedding. First, the history sequence of the user is recorded
Figure BDA0002444055470000101
Conversion to fixed length sequences
Figure BDA0002444055470000102
Where n represents the maximum sequence length that the variational self-attention network can model. Generating a sequence of n interaction records, constructing a continuous embedded matrix of items
Figure BDA0002444055470000103
Then obtaining an input embedding matrix
Figure BDA0002444055470000104
Wherein d represents the embedding dimension and
Figure BDA0002444055470000105
in addition, a location matrix to be learnable
Figure BDA0002444055470000106
Added to the input matrix AEmbedded as final input.
In summary, the input embedding matrix of the embodiment includes item embedding information and position embedding information of each item in the history interaction sequence. Specifically, the input embedding matrix is:
Figure BDA0002444055470000107
where i ∈ (1, n), n denotes the sequence length, AiItem embedding information, P, representing the ith itemiIndicating the position embedding information of the ith item.
S302, the input is embedded into the matrix input and inferred from the attention network to obtain a self-attention vector, variation parameters of approximate posterior distribution are determined according to the self-attention vector, and the self-attention network is inferred to comprise a first preset number of self-attention blocks;
after the final input embedding matrix is obtained, it is input into the inferred self-attention network to output a posterior distribution qλ(z|Su) The corresponding variation parameter. The left side of fig. 4 demonstrates the specific structure inferred from the attention network. The self-attention network is defined as follows:
Figure BDA0002444055470000108
in the above formula, the first and second carbon atoms are,
Figure BDA0002444055470000109
wherein
Figure BDA00024440554700001010
Representing a projection matrix (for a differential description, the projection matrix here is referred to as an inferred projection matrix, and the projection matrix generated from the attention network is referred to as a generated projection matrix hereinafter.) to propagate low-level features to higher levels, this embodiment applies the remaining connections in the network, then, for fast and stable training of the neural network, layer normalization is employed, and furthermore, a two-layer fully-connected network with Re L U activation function is used to account for different lateralsInteraction between dimensions and the ability of the network to be non-linear. Finally, the whole inference process is as follows:
E=LayerNorm(D+I),
F=ReLU(EW1+b1)W2+b2
Gi=LayerNorm(F+E),
wherein W1,W2,b1,b2Are all network parameters. For convenience and simplicity, the entire self-attention network described above is defined as:
Gi=SAN(I)。
through the above process, GiEssentially integrating the embedding of all previous items. To capture more complex item transformations, a first preset number h may be stacked1Individual self-attention block:
Figure BDA0002444055470000111
wherein
Figure BDA0002444055470000112
The posterior distribution q is then estimated from the final self-attention vectorλ(z|Su) The mean and variance are as follows:
Figure BDA0002444055470000113
wherein l1(. represents a linear transformation,. l2(. cndot.) represents another linear transformation. Thus, in this manner, a deterministic self-attention vector
Figure BDA0002444055470000114
The variance of a gaussian distribution, rather than a traditional fixed point, can capture the uncertainty of user preference well.
S303, determining potential variables according to the variation parameters by using a re-parameterization method;
specifically, the latent variables are: z is uλλWhere standard gaussian variables are represented, act to introduce noise.
S304, generating a representation of the historical interaction sequence according to the latent variable and the real condition distribution by utilizing a generated self-attention network to serve as a user preference representation;
to define the generation process, the present embodiment also uses a self-attention network generated based on pθ(Su| z) to generate the corresponding Su. According to the latent variable z, p isθ(Su| z) is expressed as:
Figure BDA0002444055470000121
wherein the content of the first and second substances,
Figure BDA0002444055470000122
is a representation of the user preferences generated from the final output of the attention layer. The structure generated from the attention network is shown on the right side of fig. 4, with the projection parameters as follows:
Figure BDA0002444055470000123
wherein
Figure BDA0002444055470000124
And
Figure BDA0002444055470000125
representing a projection matrix. Since the generation and inference from the attention network differ only in the input part, it is not described in detail here.
In summary, the process of S304 is as follows: determining a projection result according to the latent variable and the generated projection matrix by using a projection layer generated in the self-attention network; distributing p based on the condition using a second preset number of self-attention blocks generated in the self-attention networkθ(Su| z), generating a representation of the historical interaction sequence from the generated projection results
Figure BDA0002444055470000126
Wherein h is2For the second predetermined number, θ represents a real parameter of the variation from the attention network.
It is worth mentioning that the above generation process only focuses on the next item in the user history sequence. Preferably, one can focus on the next k items. The most direct method is to
Figure BDA0002444055470000127
Considered as a chronologically ordered collection:
Figure BDA0002444055470000128
in the present embodiment, for the sake of a differential representation, the output inferred from the attention network is described as
Figure BDA0002444055470000129
The output generated from the attention network is recorded as
Figure BDA00024440554700001210
Wherein
Figure BDA00024440554700001211
The subscript i of (1) has no practical meaning, but takes the initial letter of inference,
Figure BDA00024440554700001212
the upper superscript of (d) represents the number of self-attention blocks inferred in the self-attention network;
Figure BDA00024440554700001213
the lower subscript g has no practical meaning, but takes the initial letter of generating,
Figure BDA00024440554700001214
the upper corner of (a) indicates the number of self-attention blocks generated in the self-attention network.
S305, determining a candidate item with the maximum interaction probability according to the user preference representation by utilizing a prediction layer to serve as a recommendation result;
according to a target formula, a candidate item with the highest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number, and the target formula is as follows:
Figure BDA00024440554700001215
wherein
Figure BDA00024440554700001216
Representing the predicted candidate item with the highest interaction probability of the user u at the time t,
Figure BDA0002444055470000131
representing user preference representations
Figure BDA0002444055470000132
In the t-th row of (a),
Figure BDA0002444055470000133
n represents the number of candidates in the set of candidate items and d represents the vector dimension.
S306, in the evaluation stage, the real parameters and the approximate parameters of the variation self-attention network are optimized according to the loss function.
In the evaluation phase, the mean value of the variation distribution (i.e., u)λ) As SuThe potential representation z. of (a) follows the previously described lower bound on evidence E L BO, the loss function that is a variation from the attention network in this example is:
Figure BDA0002444055470000134
wherein, y(u,t)Representing the actual interactive item of user u at time t,
Figure BDA0002444055470000135
representing the predicted candidate item with the maximum interaction probability of the user u at the time t, SuRepresenting the historical interaction sequence, theta and lambda representing the real and approximate parameters of the variational self-attention network, respectively, sigmaλjIs expressed as sigmaλJ row of (1), muλjRepresents uλRow j of (2). By minimizing the loss function, the real parameter θ and the approximation parameter λ can be jointly optimized.
In the following, a sequence recommendation device based on a variation self-attention network provided in an embodiment of the present application is introduced, and the sequence recommendation device based on a variation self-attention network described below and the sequence recommendation method based on a variation self-attention network described above may be referred to correspondingly.
As shown in fig. 5, the sequence recommendation apparatus based on a variational self-attention network according to this embodiment includes:
the embedded module 501: the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
the inference module 502: the input embedding matrix input is used for deducing from an attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
the parameterization module 503: the method is used for determining latent variables according to the variation parameters by using a reparameterization method;
the generation module 504: generating a representation of the historical interaction sequence from the latent variables as a user preference representation using a generated self-attention network;
the prediction module 505: and the candidate item with the maximum interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.
The variation self-attention network based sequence recommendation apparatus of the present embodiment is configured to implement the aforementioned variation self-attention network based sequence recommendation method, and therefore specific implementations of the apparatus can be seen in the foregoing embodiment parts of the variation self-attention network based sequence recommendation method, for example, the embedding module 501, the inference module 502, the parameterization module 503, the generation module 504, and the prediction module 505 are respectively configured to implement steps S201, S202, S203, S204, and S205 in the aforementioned variation self-attention network based sequence recommendation method. Therefore, specific embodiments thereof may be referred to in the description of the corresponding respective partial embodiments, and will not be described herein.
In addition, since the sequence recommendation apparatus based on the variation self-attention network of this embodiment is used to implement the aforementioned sequence recommendation method based on the variation self-attention network, the role thereof corresponds to the role of the above method, and details thereof are not described here.
In addition, the present application also provides a sequence recommendation device based on a variational self-attention network, as shown in fig. 6, including:
the memory 100: for storing a computer program;
the processor 200: for executing said computer program for carrying out the steps of the variational self-attentional network based sequence recommendation method as described above.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above detailed descriptions of the solutions provided in the present application, and the specific examples applied herein are set forth to explain the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A sequence recommendation method based on a variational self-attention network is characterized by comprising the following steps:
generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
deducing the input embedded matrix input from an attention network to obtain a self-attention vector, and determining variation parameters according to the self-attention vector;
determining latent variables according to the variation parameters by using a re-parameterization method;
generating a representation of the historical interaction sequence from the latent variables as a representation of user preferences using a generated self-attention network;
and determining the candidate item with the maximum interaction probability according to the user preference representation by utilizing a prediction layer to serve as a recommendation result.
2. The method of claim 1, wherein generating an input embedding matrix from the historical sequence of interactions comprises:
generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix is as follows:
Figure FDA0002444055460000011
where i ∈ (1, n), n denotes the sequence length, AiItem embedding information, P, representing the ith itemiIndicating the position embedding information of the ith item.
3. The method of claim 1, wherein the inferring the input embedding matrix input from an attention network, resulting from an attention vector, comprises:
determining an inferred projection result according to the input embedding matrix and the inferred projection matrix by utilizing a projection layer inferred from the attention network;
generating a self-attention vector from the inferred projected results using a first predetermined number of self-attention blocks in an inferred self-attention network
Figure FDA0002444055460000012
Wherein h is1For the first preset number, n represents the sequence length.
4. The method of claim 3, wherein said determining a variational parameter from said self-attention vector comprises:
determining a mean and a variance as an approximate posterior distribution q from the self-attention vectorλ(z|Su) Wherein the mean value is
Figure FDA0002444055460000021
The variance is
Figure FDA0002444055460000022
l1(. represents a linear transformation,. l2(. cndot.) represents another linear transformation, λ represents an approximation parameter of the variational self-attention network, SuRepresenting the historical interaction sequence, and z representing the latent variable.
5. The method of claim 4, wherein determining latent variables from the variation parameters using a reparameterization method comprises:
determining latent variables according to the variation parameters by using a re-parameterization method, wherein the latent variables are as follows: z is uλλWhere standard gaussian variables are represented.
6. The method of claim 5, wherein generating the representation of the historical interaction sequence from the latent variables using a self-attention network comprises:
determining a projection result according to the latent variable and the generated projection matrix by using a projection layer generated in the self-attention network;
distributing p based on the condition using a second preset number of self-attention blocks generated in the self-attention networkθ(Su| z), generating a representation of the historical interaction sequence from the generated projection results
Figure FDA0002444055460000023
Wherein h is2For the second predetermined number, θ represents a real parameter of the variation from the attention network.
7. The method of claim 6, wherein determining, as a recommendation, the candidate item with the highest probability of interaction from the user preference representation using a prediction layer comprises:
according to a target formula, a candidate item with the highest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number, and the target formula is as follows:
Figure FDA0002444055460000024
wherein
Figure FDA0002444055460000025
Representing the predicted candidate item with the highest interaction probability of the user u at the time t,
Figure FDA0002444055460000026
representing user preference representations
Figure FDA0002444055460000027
In the t-th row of (a),
Figure FDA0002444055460000028
Figure FDA0002444055460000031
n represents the number of candidates in the set of candidate items and d represents the vector dimension.
8. The method of any one of claims 1-7, further comprising:
optimizing real parameters and approximate parameters of the variational self-attention network according to a loss function, wherein the loss function is as follows:
Figure FDA0002444055460000032
wherein, y(u,t)Representing the actual interactive item of user u at time t,
Figure FDA0002444055460000033
representing the predicted candidate item with the maximum interaction probability of the user u at the time t, SuRepresenting the historical interaction sequence, theta and lambda representing the real and approximate parameters of the variational self-attention network, respectively, sigmaλjIs expressed as sigmaλJ row of (1), muλjRepresents uλRow j of (2).
9. A sequence recommendation apparatus based on a variational self-attention network, comprising:
embedding a module: the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
an inference module: the input embedding matrix input is used for deducing from an attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
a parameterization module: the method is used for determining latent variables according to the variation parameters by using a reparameterization method;
a generation module: generating a representation of the historical interaction sequence from the latent variables as a user preference representation using a generated self-attention network;
a prediction module: and the candidate item with the maximum interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.
10. A sequence recommendation device based on a variational self-attention network, comprising:
a memory: for storing a computer program;
a processor: for executing said computer program for carrying out the steps of the variational self-attention network based sequence recommendation method according to any one of claims 1 to 8.
CN202010273754.XA 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network Active CN111506814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010273754.XA CN111506814B (en) 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010273754.XA CN111506814B (en) 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network

Publications (2)

Publication Number Publication Date
CN111506814A true CN111506814A (en) 2020-08-07
CN111506814B CN111506814B (en) 2023-11-28

Family

ID=71864057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010273754.XA Active CN111506814B (en) 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network

Country Status (1)

Country Link
CN (1) CN111506814B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446489A (en) * 2020-11-25 2021-03-05 天津大学 Dynamic network embedded link prediction method based on variational self-encoder
CN113160898A (en) * 2021-05-18 2021-07-23 北京信息科技大学 Prediction method and system for Gibbs free energy of iron-based alloy
CN113688315A (en) * 2021-08-19 2021-11-23 电子科技大学 Sequence recommendation method based on no-information-loss graph coding
CN114154071A (en) * 2021-12-09 2022-03-08 电子科技大学 Emotion time sequence recommendation method based on attention mechanism
CN114912984A (en) * 2022-05-31 2022-08-16 重庆师范大学 Self-attention-based time scoring context-aware recommendation method and system
CN117236198A (en) * 2023-11-14 2023-12-15 中国石油大学(华东) Machine learning solving method of flame propagation model of blasting under sparse barrier
CN117251295A (en) * 2023-11-15 2023-12-19 成方金融科技有限公司 Training method, device, equipment and medium of resource prediction model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359140A (en) * 2018-11-30 2019-02-19 苏州大学 A kind of sequence of recommendation method and device based on adaptive attention
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110245299A (en) * 2019-06-19 2019-09-17 中国人民解放军国防科技大学 Sequence recommendation method and system based on dynamic interaction attention mechanism
US20190354839A1 (en) * 2018-05-18 2019-11-21 Google Llc Systems and Methods for Slate Optimization with Recurrent Neural Networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190354839A1 (en) * 2018-05-18 2019-11-21 Google Llc Systems and Methods for Slate Optimization with Recurrent Neural Networks
CN109359140A (en) * 2018-11-30 2019-02-19 苏州大学 A kind of sequence of recommendation method and device based on adaptive attention
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110245299A (en) * 2019-06-19 2019-09-17 中国人民解放军国防科技大学 Sequence recommendation method and system based on dynamic interaction attention mechanism

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446489B (en) * 2020-11-25 2023-05-05 天津大学 Dynamic network embedded link prediction method based on variation self-encoder
CN112446489A (en) * 2020-11-25 2021-03-05 天津大学 Dynamic network embedded link prediction method based on variational self-encoder
CN113160898A (en) * 2021-05-18 2021-07-23 北京信息科技大学 Prediction method and system for Gibbs free energy of iron-based alloy
CN113160898B (en) * 2021-05-18 2023-09-08 北京信息科技大学 Iron-based alloy Gibbs free energy prediction method and system
CN113688315A (en) * 2021-08-19 2021-11-23 电子科技大学 Sequence recommendation method based on no-information-loss graph coding
CN113688315B (en) * 2021-08-19 2023-04-18 电子科技大学 Sequence recommendation method based on no-information-loss graph coding
CN114154071A (en) * 2021-12-09 2022-03-08 电子科技大学 Emotion time sequence recommendation method based on attention mechanism
CN114154071B (en) * 2021-12-09 2023-05-09 电子科技大学 Emotion time sequence recommendation method based on attention mechanism
CN114912984A (en) * 2022-05-31 2022-08-16 重庆师范大学 Self-attention-based time scoring context-aware recommendation method and system
CN117236198A (en) * 2023-11-14 2023-12-15 中国石油大学(华东) Machine learning solving method of flame propagation model of blasting under sparse barrier
CN117236198B (en) * 2023-11-14 2024-02-27 中国石油大学(华东) Machine learning solving method of flame propagation model of blasting under sparse barrier
CN117251295A (en) * 2023-11-15 2023-12-19 成方金融科技有限公司 Training method, device, equipment and medium of resource prediction model
CN117251295B (en) * 2023-11-15 2024-02-02 成方金融科技有限公司 Training method, device, equipment and medium of resource prediction model

Also Published As

Publication number Publication date
CN111506814B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
CN111506814A (en) Sequence recommendation method based on variational self-attention network
Weisz et al. Sample efficient deep reinforcement learning for dialogue systems with large action spaces
CN109544306B (en) Cross-domain recommendation method and device based on user behavior sequence characteristics
CN110659744B (en) Training event prediction model, and method and device for evaluating operation event
De'Ath Boosted trees for ecological modeling and prediction
CN111582694A (en) Learning evaluation method and device
CN113254792B (en) Method for training recommendation probability prediction model, recommendation probability prediction method and device
US11663486B2 (en) Intelligent learning system with noisy label data
US20180285769A1 (en) Artificial immune system for fuzzy cognitive map learning
CN110377707B (en) Cognitive diagnosis method based on depth item reaction theory
CN114169869B (en) Attention mechanism-based post recommendation method and device
US11699108B2 (en) Techniques for deriving and/or leveraging application-centric model metric
CN116956116A (en) Text processing method and device, storage medium and electronic equipment
CN113609388A (en) Sequence recommendation method based on counterfactual user behavior sequence generation
Reehuis et al. Novelty and interestingness measures for design-space exploration
Jiang et al. On the solution of stochastic optimization problems in imperfect information regimes
CN114911969A (en) Recommendation strategy optimization method and system based on user behavior model
CN114461906A (en) Sequence recommendation method and device focusing on user core interests
CN111957053A (en) Game player matching method and device, storage medium and electronic equipment
CN111897943A (en) Session record searching method and device, electronic equipment and storage medium
CN116467466A (en) Knowledge graph-based code recommendation method, device, equipment and medium
CN112241447B (en) Learning situation data processing method and device, computer equipment and storage medium
WO2022167079A1 (en) An apparatus and method for training a parametric policy
Valença et al. Selecting variables with search algorithms and neural networks to improve the process of time series forecasting
Belacel et al. Scalable collaborative filtering based on splitting-merging clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant