CN116108283B

CN116108283B - Uncertainty perception contrast learning method for sequence recommendation

Info

Publication number: CN116108283B
Application number: CN202310388969.XA
Authority: CN
Inventors: 赵朋朋; 龙超
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-10-13
Anticipated expiration: 2043-04-13
Also published as: CN116108283A

Abstract

The application discloses an uncertainty perception contrast learning method for sequence recommendation, which comprises the steps of constructing a model embedding layer; determining a base sequence encoder; calculating accidental uncertainty perception contrast learning loss, and calculating cognitive uncertainty perception integration learning loss to obtain final prediction output of the model; calculating the loss of the main recommended task; model back-propagation gradients update model parameters. In order to alleviate the cognitive uncertainty of the model, the method integrates a plurality of independent sub-networks behind the encoder to capture the output of the model at different representation levels. In addition, the method also applies a diversity regularization term to prevent convergence of outputs of the multiple sub-networks.

Description

Uncertainty perception contrast learning method for sequence recommendation

Technical Field

The application relates to the technical field of sequence recommendation models, in particular to an uncertainty perception contrast learning method for sequence recommendation.

Background

Recommendation systems are widely used in large e-commerce (e.g., heaven, amazon) and streaming media (e.g., netflix) platforms to provide fast and efficient personalized services. Among the various recommendation methods, sequence recommendation (Sequential Recommendation, SR) is receiving increasing attention because of its ability to dynamically model the real-time interests of users. The method is mainly applied to the electronic market, and the goods which are most likely to be clicked by the user next are predicted according to the clicking sequence generated by clicking the goods by the user.

The goal of SR is to recommend the following items of interest to the user by modeling the corresponding item-to-item conversion patterns based on the user's historical behavior sequence. Recent advances in sequence recommendations (e.g., laser, GRU4Rec, and SASRec) have proposed a series of models to simulate the dynamic conversion process of user interest with significant success. The existing sequence recommendation model assumes that the dynamic interests of the user are deterministic, that is, the user embedding modeled in the potential space is a deterministic vector. However, in the real world, the interests of the user are complex and diverse, and certain ebedding is insufficient to represent rich user intentions. Therefore, it is critical and practical to consider how uncertainty variables affect a recommendation system.

Under the electronic market, the shopping intention of the user is not determined, the dynamic interest of the user only depends on the past interaction sequence, and occasionally noise which is introduced into the interaction sequence by the user. For example, in exploring electronic products, customers may be attracted to suggested popular but unrelated products, or may inadvertently order certain merchandise for various reasons, all of which may have an irreversible impact on the sequence modeling process. This inherent noise of the data itself is known as occasional uncertainty.

On the other hand, the sequence recommendation model represents only the interest of the user changing continuously as a single embedded vector, which is insufficient for the complicated and changeable user intention. Not only can the deterministic modeling process lead to a certain deviation in the final predicted result, but also the model under-fitting problem caused by the inherent sparsity of the recommended dataset can lead to the single modeling method to catch the fly. How to alleviate this perceived uncertainty in sequential recommendations due to limited data and inadequate training of the recommendation model remains a pending topic.

Therefore, we propose a new recommendation model named UCL4SR (Uncertitry-aware Contrastive Learning for Sequential Recommendation). Specifically, UCL4SR uses contrast learning to maintain uniformity of commodity representation distribution. To better exploit contrast learning to mitigate occasional uncertainty in the data, we introduced a learnable masking matrix to remove uncorrelated correlations between items. To alleviate the cognitive uncertainty of our model, we integrate multiple independent sub-networks behind the encoder to capture the model's output at different representation levels. Further, a diversity regularization term is applied to prevent convergence of the outputs of the multiple sub-networks. A large number of experiments performed on a public data set show that UCL4SR has better performance than the most advanced sequence recommendation method all the time and has strong noise immunity.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present application has been made in view of the above-described problems occurring in the prior art.

Therefore, the application provides an uncertainty perception contrast learning method for sequence recommendation, which can solve the problem of reducing the cognitive uncertainty of a sequence recommendation model, capture the output of the model at different expression levels and prevent the convergence of the output of a plurality of sub-networks.

In order to solve the technical problems, the application provides a method for uncertainty perception contrast learning of sequence recommendation, which comprises the following steps:

constructing a model embedding layer; determining a base sequence encoder; calculating accidental uncertainty perception contrast learning loss, and calculating cognitive uncertainty perception integration learning loss to obtain final prediction output of the model; calculating the loss of the main recommended task; model back-propagation gradients update model parameters.

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the construction comprises the steps of representing a set of users and commodities as U and L, wherein the number of the users and the commodities is |U| and |L|, respectively, and constructing an interaction sequence of the user U； wherein ,/>Representing the t-th item interacted with by user u,representing the length of the interaction sequence; designating a maximum sequence length N to unify the length of the input sequence, cutting when the length of the sequence is larger than N, and supplementing a 'pad' mark in front when the length of the sequence is smaller than N; the embedding layer includes converting all commodity IDs in the E-commerce platform into embedded vectors using a lookup table by maintaining oneEmbedding moment->The Gao Weidu thermal encoding is projected onto a low-dimensional dense representation, the embedding matrix being: />The method comprises the steps of carrying out a first treatment on the surface of the Where D is the embedding size, |I| represents the number of all items, |I|xD represents that this is a matrix of |I| rows and columns; the interaction sequence for a given user u is expressed asIndexing by commodity ID in matrix to obtain corresponding embedded matrix->The method comprises the steps of carrying out a first treatment on the surface of the Input of matrix: interactive commodity ID sequence of each user +.>The method comprises the steps of carrying out a first treatment on the surface of the Output of the matrix: embedded representation corresponding to interactive merchandise sequence of user +.>。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the basic sequence encoder comprises selecting SASRec as backbone model, modeling user interaction sequence by stacking multiple transducer encoders, and giving a first%) Sequence representation of the layer->Output of the transducer encoder at layer I>As will be described below,

；

wherein Represents a feed-forward neural network, h represents +.>Number of->，Representing projection matrix, D is embedding size, using sequence embedded representation +.>And position coding->Is taken as the input of the model +.>The regularization strategy is omitted from the formula, the attention mechanism is defined as,

；

wherein Q, K, V represent query, key, value respectively,representing a scaling factor; basic sequence encoder input: the +.f corresponding to the interactive commodity sequence of the user>Layer embedded representation->The method comprises the steps of carrying out a first treatment on the surface of the Basic sequence encoder output: user interaction sequence obtained by coding through basic sequence coderIs>Layer representation->。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the attention mechanism includes adding a trainable binary mask to each self-attention layer to trim noisy or task independent attention, introducing a binary matrix for the first self-attention layer, wherein />Indicating whether an attention connection exists between item x and item y, the first self-attention layer becomes:

；

wherein Is the original full attention, +.>Representing sparse attention, ++>An attention score of exactly zero can be generated for the irrelevant dependencies, thereby improving interpretability, the differentiable mask samples the binary single heat vector from the input distribution by means of a guard-softmax,and apply re-parameterization techniques to make them differentiable as follows:

；

input: the first corresponding interactive commodity sequence of the userLayer embedded representation->Input to a sequence encoder after a learner masking; and (3) outputting: the +.sup.th obtained after a learning masking>Layer results->。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the accidental uncertainty perception contrast learning comprises the steps of giving an interactive sequence of a user u to obtain two corresponding augmentation sequences,/>Enhancement is by randomly selecting aug' aug from appropriate data enhancement ^'' The obtained commodity interaction sequence of the user is randomly masked, deleted or reordered to obtain the sequence of two enhanced versions of the original interaction sequence +.> and />。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the saidThe occasional uncertainty aware contrast learning also includes the step of combining two enhancement sequences and />Feeding to an encoder network to output a user representation +.> and />Representing that the user is represented by M independent subnetworks +.>Performing conversion; wherein g _m Representing independent sub-networks, and specifically using a multi-layer perceptron; obtaining M different embedded vectors through calculationModify the traditional self-supervision loss and will +.> and />Replaced by average +.>M=1..m, the M sub-network generated embeddings are averaged to improve the predictive performance, and the final contrast loss is expressed as follows:

；

where τ is the temperature controlling the contrast learning intensity,is an integral representation of a sequence randomly extracted from the batch, applying the dot product as a similarity function sim, N representing the batch size; input: input sequenceTwo representations after data enhancement are shown +.> and />Respectively inputting the data into an accidental uncertainty perception contrast learning module; and (3) outputting: accidental uncertainty vs learning loss->。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the cognitive uncertainty perception integrated learning loss comprises the steps of introducing an integrated learning idea to comprehensively consider a plurality of cognitive aspects of a model in order to alleviate the cognitive uncertainty problem generated by the model in the process of fitting data; designing a new loss function for encouraging diversity during sub-network training of the primary recommendation task, regularizing the diversity termDefined as embedded vector->Standard deviation difference between, wherein->An embedded vector representing the mth sub-network output, the standard deviation is variance->Square root of (2);

；

wherein Is a small scalar 1, < +_, which prevents numerical instability>Represents the average of the output representations of M sub-networks, M represents the total number of sub-networks, +.>Representing the embedded vector of the mth sub-network output.

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the cognitive uncertainty perception integrated learning penalty further includes a diversity regularization function expressed as follows:

；

the purpose of the diversity loss is to approximate by forcing the standard deviation of the elements to approachTo encourage divergence between subnetworks, thereby preventing embedding from folding into the same vector, where α is an artificially set hyper-parameter, given manually in the code; input: input sequence->Output after passing through multiple sub-networks +.>Inputting the data into a cognitive uncertainty perception integrated learning module; and (3) outputting: cognitive uncertainty loss->。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the prediction includes, encoding a transducerThe embedding of the last layer serves as the final prediction of user preference, and the similarity with all item emplacement is calculated, and the prediction probability of the next position of all items is obtained as follows:

；

wherein Representing the probability of all items with which the user may interact; input: sequence->The final sequence representation obtained after the preceding encoding +.>Inputting into a prediction layer; and (3) outputting: the prediction layer outputs the probability of the next click commodity among all commodities +.>。

As a preferable scheme of the uncertainty perception contrast learning method of the sequence recommendation, the application comprises the following steps: the loss of the main recommendation task comprises the steps that the sequence recommendation aims at interacting sequences according to the history of the user in the E-commerce platformPredicting the next item to be clicked by user u, will sequence +.>Split into a set of subsequences and targets as follows:

；

wherein Representation->Length of->Represent S _u The length of the sequence, superscript indicates the range of interception of this sequence,/->And->Is->Is a target tag of (2); the loss of the main recommended task is calculated using the cross entropy loss as follows:

；

wherein The representation is based on->Is +.>The method comprises the steps of carrying out a first treatment on the surface of the The penalty of the recommended task is added to the two uncertainties for joint optimization to multitask the model, the UCL4SR final objective function is as follows,

；

wherein and />Is a super parameter, and the optimization problem in the formula is solved by a gradient descent algorithm; input: final representation after encoding of the user interaction sequence +.>Input to candidate setA recommendation module; and (3) outputting: the current user recommends merchandise.

The application has the beneficial effects that: the method is based on an uncertainty-aware contrast learning sequence recommendation model (UCL 4 SR), and specifically, the UCL4SR utilizes contrast learning to maintain the uniformity of commodity representation distribution. To better exploit contrast learning to mitigate occasional uncertainty in the data, a learnable masking matrix is introduced to remove uncorrelated correlations between items. To alleviate the cognitive uncertainty of the model, multiple independent sub-networks are integrated after the encoder to capture the model's output at different representation levels. Further, a diversity regularization term is applied to prevent convergence of the outputs of the multiple sub-networks. Through a large number of experiments performed on a public data set, UCL4SR always has better performance than the most advanced sequence recommendation method, and has strong noise immunity.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

fig. 1 is a schematic flow chart of an uncertainty aware contrast learning method for sequence recommendation according to an embodiment of the present application.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1, a first embodiment of the present application provides an uncertainty aware contrast learning method for sequence recommendation, including:

s1: and constructing a model embedding layer.

Further, the construction includes, representing a set of users and commodities as U and L, the number of users and commodities being |U| and |L|, respectively, constructing an interaction sequence of user U； wherein ,/>T-th item indicating user u interacted with, < >>Representing the length of the interaction sequence;

further, a maximum sequence length N is designated to unify the length of the input sequence, when the length of the sequence is larger than N, clipping is carried out, and when the length of the sequence is smaller than N, the 'pad' mark is added in front.

It should be noted that the embedding layer includes converting all commodity IDs in the E-commerce platform into embedded vectors using a lookup table by maintaining an embedding matrixThe Gao Weidu thermal encoding is projected onto a low-dimensional dense representation, the embedding matrix being: />The method comprises the steps of carrying out a first treatment on the surface of the Where D is the embedding size, |i| represents the number of all items, |i|xd represents that this is a matrix of |i| rows and columns.

Further, the interaction sequence for a given user u is expressed asIndexing by commodity ID in matrix to obtain corresponding embedded matrix->。

It should be noted that the input of the matrix: interactive commodity ID sequence for each userThe method comprises the steps of carrying out a first treatment on the surface of the Output of the matrix: embedded representation corresponding to interactive merchandise sequence of user +.>。

S2: a base sequence encoder is determined.

Further, the basic sequence encoder comprises selecting SASRec as a backbone model, modeling the user interaction sequence by stacking a plurality of transducer encoders, giving a%) Sequence representation of the layer->The output of the transducer encoder at layer i is as follows,

；

wherein Q, K, V represent query, key, value respectively,representing the scaling factor.

It should be noted that the base sequence encoder input: the first corresponding interactive commodity sequence of the userLayer embedded representation->The method comprises the steps of carrying out a first treatment on the surface of the Basic sequence encoder output: the first +.>Layer representation。

Still further, the attention mechanism includes appending a trainable binary mask to each self-attention layer to trim noisy or task independent attention, introducing a binary matrix for the first self-attention layer, wherein />Representing a note between item x and item yThe first self-attention layer becomes:

；

wherein Is the original full attention, +.>Representing sparse attention, ++>An attention score of exactly zero can be generated for the irrelevant dependencies, thereby improving interpretability, and the differentiable mask samples the binary single thermal vector from the input distribution by means of a guard-softmax and applies a reparameterization technique to make it differentiable, as follows:

；

it should be noted that, input: the first corresponding interactive commodity sequence of the userLayer embedded representation->Input to a sequence encoder after a learner masking; and (3) outputting: the +.sup.th obtained after a learning masking>Layer results->。

S3: and calculating accidental uncertainty perception contrast learning loss, and calculating cognitive uncertainty perception integration learning loss to obtain final prediction output of the model.

It should be noted that the accidental uncertainty perception contrast learning includes, given the interaction sequence of user u, obtaining two corresponding augmentation sequences,/>Enhancement is by randomly selecting aug' aug from appropriate data enhancement ^'' The obtained commodity interaction sequence of the user is randomly masked, deleted or reordered to obtain the sequence of two enhanced versions of the original interaction sequence> and />。

Further, the occasional uncertainty aware contrast learning further includes combining two enhancement sequences and />Feeding to an encoder network to output a user representation +.> and />Representing that the user is represented by M independent subnetworks +.>Performing conversion; wherein g _m Representing independent subnetworks, in particular using multi-layer feelAnd (5) knowing the machine.

It should be noted that M different embedded vectors are obtained by calculationModify the traditional self-supervision loss and will +.> and />Replaced by average +.>M=1..m, the M sub-network generated embeddings are averaged to improve the predictive performance, and the final contrast loss is expressed as follows:

；

where τ is the temperature controlling the contrast learning intensity,is an overall representation of a sequence randomly extracted from the batch, the dot product is applied as a similarity function sim, N representing the batch size.

Further, input: two representations of an input sequence after data enhancement and />Respectively inputting the data into an accidental uncertainty perception contrast learning module; and (3) outputting: accidental uncertainty vs learning loss->。

It should be noted that the cognitive uncertainty perception integrated learning loss includes, in order to alleviate the problem of cognitive uncertainty generated by the model in the process of fitting data, introducing an integrated learning idea to comprehensively consider a plurality of cognitive aspects of the model.

Further, a new penalty function is designed to encourage diversity during sub-network training of the primary recommendation task, regularize the diversity termDefined as embedded vector->Standard deviation difference between, wherein->Output representation representing the mth subnetwork, standard deviation is variance +.>Square root of (2);

；

wherein Is a small scalar to prevent numerical instability, +.>Represents the average of the output representations of M sub-networks, M represents the total number of sub-networks, +.>Representing the embedded vector of the mth sub-network output.

Still further, the predicting includes, transforming the encoderThe embedding of the last layer serves as the final prediction of user preference, and the similarity with all item emplacement is calculated, and the prediction probability of the next position of all items is obtained as follows:

；

wherein Representing the probability of all items with which the user may interact.

It should be noted that, input: sequence(s)The final sequence representation obtained after the preceding encoding +.>Inputting into a prediction layer; and (3) outputting: the prediction layer outputs the probability of the next click commodity among all commodities +.>。

S4: the loss of the primary recommended task is calculated.

Further, the calculating the loss of the main recommendation task comprises the steps that the sequence recommendation aims at interacting the sequence according to the history of the user in the E-commerce platformPredicting the next item to be clicked by user u, will sequenceSplit into a set of subsequences and targets as follows:, wherein />Representation->Length of->Represent S _u The length of the sequence, superscript indicates the range of interception of this sequence,/->And->Is->Is a target tag of (1).

Further, the loss of the main recommended task is calculated using the cross entropy loss as follows:

；

wherein The representation is based on->Is +.>。

It should be noted that adding the loss of recommended tasks to the loss of both uncertainties performs joint optimization to multitask the model, the final objective function of UCL4SR is as follows,

；

wherein and />Is a super parameter, and the optimization problem in the formula is solved by a gradient descent algorithm.

Further, input: final representation after encoding of user interaction sequencesInputting the candidate set recommendation module; and (3) outputting: the current user recommends merchandise.

S5: model back-propagation gradients update model parameters.

Example 2

In order to verify the beneficial effects of the application, the application provides an uncertainty perception contrast learning method for sequence recommendation, which is scientifically demonstrated through experiments.

To evaluate model performance, experiments were performed on four data sets for recommendation, which are widely used. The data sets were Beauty, amazon-Toys and ML-1M, respectively. Beaurity and Toys datasets are real e-commerce data from the Amazon website, including the real click behavior of each user on the Amazon website.

Wherein ML-1M is a classical movie data set in which favorite movie data of each user is recorded in detail, and the data set detailed data is shown in table 1.

TABLE 1 data set detail data

We employ two of the most common performance assessment metrics, recall and normalized discount-accumulation-gain (NDCG). Recall is the accuracy of the group-trunk items that appear in the first N recommendations. NDCG is a ranking penalty for measuring the location of real items in the top N recommendations. For each user, the items are sorted in a descending order according to the prediction scores, a recommendation list is generated, the first N items are intercepted, and the first N results of the evaluation index are calculated. Since Recall@1 is equivalent to NDCG@1, we report the results of Recall@ 5,10,20 and NDCG@ 5,10,20. In addition, compared with a common sequence recommendation model, specific recommendation performances are shown in tables 2 and 3, wherein the sixth column from left to right in table 3 is the strongest reference model CBiT, the seventh column is the performance of the UCL4SR model of the application, and the eighth column is the ratio of UCL4SR to CBiT improvement.

TABLE 2 comparison of recommendation Performance of partially common sequence recommendation models

TABLE 3 comparison of recommendation Performance of partially common sequence recommendation models

According to tables 2 and 3, our model can not only surpass the traditional sequence recommendation models such as SASRec, GRU4Rec and Caser, but also surpass the effect of the latest comparison learning sequence recommendation model. Compared with a model of modeling sequence recommendation uncertainty like STOSA, the model has a certain improvement, which shows that in an e-commerce scene, the fact that the cognition uncertainty and accidental uncertainty are considered in the sequence recommendation model is necessary, so that commodities which are not consistent with shopping intention of a user in commodities clicked by the user can be automatically eliminated, and the influence of noise commodities on final prediction is weakened.

The application relates to an uncertainty perception contrast learning method for sequence recommendation, which is mainly applied to the market prospect of electronic commerce and is used for predicting commodities which are most likely to be clicked by a user next according to a clicking sequence generated by clicking the commodities by the user. Firstly, constructing a model embedding layer, determining a basic sequence encoder, then calculating accidental uncertainty perception contrast learning loss, calculating cognitive uncertainty perception integration learning loss, obtaining final prediction output of a model, then calculating loss of a main recommended task, and finally updating model parameters according to model back propagation gradient.

It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The uncertainty perception contrast learning method for sequence recommendation is characterized by comprising the following steps of: comprising the steps of (a) a step of,

constructing a model embedding layer;

determining a base sequence encoder;

calculating accidental uncertainty perception contrast learning loss, and calculating cognitive uncertainty perception integration learning loss to obtain final prediction output of the model;

calculating the loss of the main recommended task;

model parameters are updated by model back propagation gradients;

the construction comprises the steps of representing a set of users and commodities as U and L, wherein the number of the users and the commodities is |U| and |L|, respectively, and constructing an interaction sequence of the user U

wherein ,the t commodity which is interacted by the user u is represented, t epsilon {1,2, …, |S ^u |}，|S ^u The i represents the length of the interaction sequence;

designating a maximum sequence length N to unify the length of the input sequence, cutting when the length of the sequence is larger than N, and supplementing a 'pad' mark in front when the length of the sequence is smaller than N;

the embedding layer comprises converting all commodity IDs in the E-commerce platform into embedding vectors by using a lookup table, and maintaining an embedding matrix E R ^|I|×D The Gao Weidu thermal encoding is projected onto a low-dimensional dense representation, the embedding matrix being: e epsilon R ^|I|×D ；

Where D is the embedding size, |I| represents the number of all items, |I|xD represents that this is a matrix of |I| rows and columns;

the interaction sequence for a given user u is expressed asIndexing by commodity ID in matrix to obtain corresponding embedded matrix e ^u ∈R ^N×D ；

Input of matrix: interactive commodity ID sequence for each user

Output of the matrix: embedded representation e corresponding to interactive commodity sequence of user ^u ∈R ^N×D ；

The base sequence encoder comprises a code generator configured to generate a base sequence,

SASRec is chosen as the backbone model, modeling the user interaction sequence by stacking multiple transducers, given the sequence representation H of layer (l-1) ^l-1 Output H of the transducer encoder at layer I ^l As will be described below,

H ^l ＝FFN(Concat(head ₁ ,…,head _h )W ^h )

where FFN denotes a feedforward neural network, h denotes the number of heads,W ^h ∈R ^D×D representing projection matrix +.>Representing the representation of the query matrix at the i-th layer, < >>Representing the representation of the key matrix at the i-th layer, a +.>Representing the representation of the value matrix at the ith layer, splicing the matrices according to the embedding dimension by using the Concat representation, and embedding the representation E by using the sequence ⁰ And position coding P ⁰ Is taken as the sum of the model input H ⁰ The regularization strategy is omitted from the formula, the attention mechanism is defined as,

wherein Q, K, V represent query, key, value respectively,representing a scaling factor;

basic sequence encoder input: layer 1 embedded representation H corresponding to interactive commodity sequence of user ^l-1 ；

Basic sequence encoder output: first layer representation H of user interaction sequence encoded by base sequence encoder ^l ；

The mechanism of attention includes that,

each self-attention layer is added with a trainable binary mask to trim noisy or task independent attention, and a binary matrix Z is introduced for the first self-attention layer ^(l) ∈{0,1} ^n×n, wherein Representing an article xAnd the attention connection between the item y is present, x, y represents the x-th row and y-th column of the binary matrix, x, y e {1,2, …, n }, the first self-attention layer becomes:

M ^(l) ＝A ^(l) Z ^(l)

Attention(Q ^(l) ,K ^(l) ,V ^(l) )＝M ^(l) V ^(l)

wherein A^(l) Is the original full attention, M ^(l) Represent sparse attention, Z ^(l) An attention score of exactly zero can be generated for the irrelevant dependencies, thereby improving interpretability, and the differentiable mask samples the binary single thermal vector from the input distribution by means of a guard-softmax and applies a reparameterization technique to make it differentiable, as follows:

Z ^(l) ＝GumbelSoftmax(H ^l )

wherein Z^(l) Representing a differentiable mask obtained after a guard-softmax sampling and a re-parameterization;

input: layer 1 embedded representation H corresponding to interactive commodity sequence of user ^l-1 Input to a sequence encoder after a learner masking;

and (3) outputting: layer one results Z after a learning masking ^(l) ；

The accidental uncertainty perception contrast learning comprises the steps of giving an interactive sequence of a user u to obtain two corresponding augmentation sequences S ⁱ ＝aug'(S ^u ),S ^j ＝aug”(S ^u ) Enhancement is obtained by randomly selecting aug ', aug "from suitable data enhancement, randomly masking, deleting, or reordering the user' S merchandise interaction sequence to obtain two enhanced versions of sequence S of the original interaction sequence ⁱ and S^j ；

The occasional uncertainty aware contrast learning further comprises the step of combining two enhancement sequences S ⁱ and S^j Fed to the knittingEncoder network to output user representation H _i and H_j Representing a user being composed of M independent subnetworksPerforming conversion;

wherein g_m Representing independent sub-networks, and specifically using a multi-layer perceptron;

obtaining M different embedded vectors through calculationModify the traditional self-supervision loss and let H _i and H_j Replaced by average +.>m=1..m, the M sub-network generated embeddings are averaged to improve the prediction performance, the final contrast loss is expressed as follows:

where τ is the temperature controlling the contrast learning intensity,is an integral representation of a sequence k randomly extracted from the batch, applying a dot product as a similarity function sim, N representing the total number of random sequences k, exp representing an exponential function based on a natural constant e;

input: two representations H of the input sequence after data enhancement _i and H_j Respectively inputting the data into an accidental uncertainty perception contrast learning module;

and (3) outputting: occasional uncertainty vs. learning loss L _ssl ；

The prediction may comprise a prediction of the degree of freedom,

transform encoderEmbedding of the last layer as user biasAnd (3) predicting the final product, and calculating the similarity with all the item transmitting, wherein the predicting probability of the next position of all the items is as follows:

wherein Representing the probability of all items with which the user may interact;

input: sequence S ^u The final sequence representation obtained after the preceding encodingInputting into a prediction layer;

and (3) outputting: the prediction layer outputs the probability of the next click commodity in all commodities

The calculating the loss of the primary recommended task includes,

the goal of the sequence recommendation is to interact with the sequence S according to the user' S history in the E-commerce platform _u Predicting the next item to be clicked by user u, will sequenceSplit into a set of subsequences and targets as follows: wherein |S_u I represents S _u Length of S _u I represents S _u The length of the sequence, superscript indicates the range of interception of this sequence,/->But->Is->Is a target tag of (2);

the loss of the main recommended task is calculated using the cross entropy loss as follows:

wherein The representation is based on->Is +.>

The loss of recommended tasks is added to the loss of both uncertainties for joint optimization, to multitask learn the model, the final objective function is as follows,

wherein ,λ₁ and λ₂ Is a super parameter, L _div Is the loss of cognitive uncertainty, and the optimization problem in the formula is solved by a gradient descent algorithm;

input: final representation after encoding of user interaction sequencesInputting the candidate set recommendation module;

and (3) outputting: the current user recommends merchandise.

2. The uncertainty perceived contrast learning method of a sequence recommendation of claim 1, wherein: the cognitive uncertainty perception integrated learning loss comprises the steps of introducing an integrated learning idea to comprehensively consider a plurality of cognitive aspects of a model in order to alleviate the cognitive uncertainty problem generated by the model in the process of fitting data;

designing a new loss function for encouraging diversity during sub-network training of the primary recommendation task, regularizing the diversity term L _div Defined as an embedded vectorStandard deviation difference between, wherein H ^m The embedded vector representing the mth sub-network output, the standard deviation is the variance sigma ² Square root of (2);

wherein ε is>0 is a small scalar that prevents numerical instability,represents the average of the output representations of M sub-networks, M representing the total number of sub-networks, H ^m Representing the embedded vector of the mth sub-network output.

3. The uncertainty perceived contrast learning method of a sequence recommendation of claim 2, wherein: the cognitive uncertainty perception integrated learning penalty further includes a diversity regularization function expressed as follows:

L _div ＝max(0,α-σ ² )

the purpose of the diversity penalty is to encourage divergence between subnetworks by forcing the element standard deviation to approach a >0, where a is an artificially set hyper-parameter, given manually in the code, to prevent embedding from folding into the same vector;

input: input sequence S ^u Output after passing through multiple sub-networksInputting the cognitive uncertainty perception integrated learning module;

and (3) outputting: cognitive uncertainty loss L _div 。