CN108959655B

CN108959655B - Self-adaptive online recommendation method for dynamic environment

Info

Publication number: CN108959655B
Application number: CN201810889330.9A
Authority: CN
Inventors: 张利军; 卢世银; 周志华
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2020-04-03
Anticipated expiration: 2038-08-07
Also published as: CN108959655A

Abstract

The invention discloses a dynamic environment-oriented adaptive online recommendation method, which is characterized in that a recommendation task is modeled into an online multi-classification problem, and then the recommendation is carried out by using an adaptive online classification method. First, a historical dataset of an application scenario is obtained. Then, a classifier and a loss function are selected, and the optimal parameters of the classifier on the historical data set are calculated to be used as initial values. Then, recommended items are decided according to the prediction of the classifier in each round, and the classifier parameters are updated through an adaptive method. The adaptive method includes a meta method and a plurality of expert methods. Compared with the prior art, the method can adaptively perform online recommendation, and is suitable for dynamic environments with unpredictable change speed and amplitude.

Description

Self-adaptive online recommendation method for dynamic environment

Technical Field

The invention relates to an online recommendation method in the field of data mining and machine learning, in particular to a method for carrying out self-adaptive online recommendation in a dynamic environment, which can be applied to scenes such as news recommendation, advertisement recommendation, commodity recommendation and the like.

Background

The online recommendation method can learn the interest preference from the interactive data with the user while performing recommendation, and adjust the recommendation strategy in real time to adapt to the interest preference of the user. In each recommendation round, the recommendation method firstly observes the characteristics of the user and all candidate items, then determines a recommendation item according to a recommendation strategy, and finally updates the recommendation strategy according to the item actually selected by the user. With the rapid increase of the amount of observable data and the great increase of the computing power of hardware, online recommendation methods have been largely applied in the fields of economy, education, games, multimedia, and the like. For example, in internet advertisement delivery, the online recommendation method can determine delivered advertisements according to the characteristics of users and all candidate advertisements when each user arrives, and update the model after the user feeds back (clicks on one advertisement) to improve the subsequent delivery effect. In a news recommending system, an online recommending method can predict news categories which are interesting to a user according to the characteristics of the user and all candidate news when each user arrives so as to recommend the news categories, and update a model after the user feeds back the news categories (reads the news of a certain category) so as to improve the subsequent recommending effect. In the stock investment, the online recommendation method can predict the next market fluctuation situation according to the market characteristics at the beginning of each investment cycle so as to recommend the high-quality bid, and update the model according to the actual fluctuation situation at the end of the investment cycle so as to improve the investment income in the next cycle.

The traditional online recommendation method mainly aims to reduce the operation overhead and achieve the performance of the static offline recommendation method. Although many online recommendation methods have been theoretically demonstrated to perform equally well on average as the best offline recommendation methods when the recommendation rounds are sufficiently numerous, static offline recommendation methods tend to perform poorly for a dynamically changing environment, and the theoretical guarantees of these online recommendation methods are of no practical significance. Recently, some online recommendation methods with theoretical guarantee, which can be applied to dynamic environments, have been proposed, but these methods all require that the change speed and amplitude of the environment can be determined in advance, and these requirements limit their application range. In many real-world scenarios, the changing circumstances of the environment faced by the recommendation method are difficult to control and estimate in advance. In the investment of stocks, when a significant event occurs, the price of the stocks is often changed very severely; in internet advertising and news recommendation systems, user streaming is fraught with randomness and contingency. In order to be applicable to highly variable, non-predeterminable dynamic environments, an adaptive online recommendation method is needed.

Disclosure of Invention

The purpose of the invention is as follows: the current online recommendation method is only suitable for a dynamic environment with a priori knowledge and slow change, and the change of the environment under many scenes in reality is fast and cannot be predicted in advance. Aiming at the problem, the invention provides a dynamic environment-oriented self-adaptive online recommendation method.

The technical scheme is as follows: a self-adaptive online recommendation method facing to a dynamic environment is used for application scenes such as news recommendation, advertisement recommendation and commodity recommendation. Specifically, first, a history data set of an application scene is acquired. Then, a classifier and a loss function are selected, and the optimal parameters of the classifier on the historical data set are calculated to be used as initial values. Then, recommended items are decided according to the prediction of the classifier in each round, and the classifier parameters are updated through an adaptive method. The adaptive method includes a meta method and a plurality of expert methods. Each expert method is configured with different learning rates aiming at a possible dynamic environment, and the decision is updated in a gradient descending mode in each round; and the meta method receives the decisions of all the expert methods in each round, then gives different weights to each expert method according to the recent recommendation expression of each expert method in a dynamic environment, and finally combines the decisions of the expert methods to determine a final recommended item based on the weights.

A self-adaptive online recommendation method facing to a dynamic environment comprises a meta method and an expert method.

The meta-method comprises the following specific steps:

step 100, obtaining a recommendation scene history data set H { (x)_i,y_i) I ═ 1,2, …, m }, where x is_iRepresenting a vector y formed by splicing the user features and all candidate item features_iAn item representing the user's actual selection;

step 101, selecting a classifier c (x, w) and a loss function l (p, y), wherein x represents a vector formed by splicing user features and all candidate item features, y represents an item actually selected by a user, w represents a parameter of the classifier, and p represents a recommended item output by the classifier;

step 102, calculating optimal parameters in a classifier parameter feasible region W on the basis of the selected classifier and the loss function on the historical data set

Step 103, setting step size parameters α;

104, setting the number N of expert methods;

step 105, setting the learning rate η of each expert method;

step 106, initializing the weight of each expert method

Step 107, at each recommendation round T1, 2, …, T, performing the following steps:

step 108, obtaining a vector x formed by splicing the user characteristics and all candidate project characteristics_t；

Step 109, receiving the output of each expert method

Step 110, calculating classifier parameters

η, the learning rate is expressed by,

representing the weight of the expert method, and t representing the number of recommended turns;

step 111, according to the recommendation item c (x) output by the classifier_t,w_t) Recommending;

step 112, obtaining the item y actually selected by the user in the round_t；

Step 113, calculate function f_t(w)＝l(c(x_t,w),y_t) At w_tGradient of (2)

Step 114, will

Sending to each expert method;

step 115, construct the substitution loss function s_t(·)；

Step 116, update the weight of each expert method

The specific steps of each expert method are as follows:

step 200, initialization

Step 201, at each recommendation round T1, 2, …, T, the following steps are performed:

step 202, will

Sending to the meta method;

step 203, receiving

Step 204, updating output

The classifiers selected in step 101 include a conventional linear classifier c (x, w) ═ w^Tx, softmax classifier, neural network classifier, etc.; alternative loss functions are all convex differentiable loss functions, including the square loss l (p, y) ═ p-y²The Hinge loss l (p, y) max (0,1-yp) and the cross entropy loss l (p, y) sigma_iy_ilog(p_i) And the like.

The setting mode of the step length parameter α in the step 103 is

Where D is the diameter of the classifier parameter feasible region W; g is an arbitrary value such that the following holds:

the setting mode of the number N of the professional methods in the step 104 is

The learning rate η of each expert method in step 105 is set in such a manner that the learning rate of the i-th (1, 2, …, N) expert is

The substitution loss function s constructed in said step 115_tSpecific definition of (a) is

w_tRefer to the parameter values of the t-th classifier.

The projection operator Π in step 204_W[·]Is specifically defined as_W[u]＝argmin_v∈W‖u-v‖,u∈W。

Has the advantages that: compared with the prior art, the method can adaptively perform online recommendation, and is suitable for dynamic environments with unpredictable change speed and amplitude.

Drawings

FIG. 1 is a meta-method work flow diagram of the present invention;

FIG. 2 is a flow chart of the expert method of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

Take the recommendation of goods in the e-commerce website as an example.

The workflow of the meta-method is shown in fig. 1. First, a purchase record H { (x) of all users in the last period of time of a website is acquired_i,y_i) I ═ 1,2, …, m }, where x is_iVector y representing the concatenation of the characteristics of the user and all the goods_iIndicating the goods purchased by the user. The user characteristics include gender, age, residence, income, education, etc., and the commodity characteristics include price, sales, click-through rate, shopping cart conversion rate, etc.

Next, the softmax classifier and cross-entropy loss l (p, y) ═ Σ commonly used in this scenario are selected_iy_ilog(p_i). In thatOn purchase of the record data set, optimal classifier parameters are calculated based on the selected classifier and the loss function

This can be done by a gradient descent iso-convex optimization method.

Then, determining the number T of recommended rounds, and setting a step length parameter

Number of expert methods

Where D is any value such that the following holds:

g is an arbitrary value such that the following holds:

w is the feasible field of classifier parameters.

Then, the learning rate of each expert method is set: the learning rate of the i (1, 2, …, N) -th expert method is set to

Initializing weights for each expert method

Finally, an online run of each recommended round is started. In each recommendation turn, the meta-method firstly obtains the feature vectors of the user and all candidate commodities in the turn, and x is obtained by splicing_t. The next method receives the output of each expert method

Calculating parameters of softmax classifier

According to the output c (x) of the softmax classifier_t,w_t) And recommending the commodity. Later meta-method obtains commodity y actually purchased by the user in the turn_tCalculating a function f_t(w)＝l(c(x_t,w),y_t) At w_tGradient of (2)

And sends it to all expert methods. Final element method for constructing substitution loss function s_t(. h) updating the weight of each expert method

The workflow of each expert method is shown in fig. 2. After initialization is completed, in each recommended round, the expert method first sends the output of the current round to the meta method, then receives gradient information from the meta method, and finally updates the output of the next round using gradient descent.

Claims

1. A self-adaptive online recommendation method facing to dynamic environment is characterized in that: including meta methods and expert methods;

the meta-method comprises the following specific steps:

step 102 provides, on the historical data set,calculating optimal parameters in a classifier parameter feasible region W according to the selected classifier and the loss function

Step 103, setting step size parameters α;

104, setting the number N of expert methods;

step 105, setting the learning rate η of each expert method;

step 106, initializing the weight of each expert method

Step 109, receiving the output of each expert method

Step 110, calculating classifier parameters

Wherein

Represents the weight of the expert with learning rate η in the t-th round;

step 112, obtaining the item y actually selected by the user in the round_t；

Step 113, calculating cost function f of the t round_t(w)＝l(c(x_t,w),y_t) At w_tGradient of (2)

Step 114, will

Sending to each expert method;

step 115, construct the substitution loss function s_t(·)；

Step 116, update the weight of each expert method

The specific steps of each expert method are as follows:

step 200, initialization

Step 201, at each recommended round T1, 2, …, T performs the following steps, where T denotes the total number of rounds:

step 202, will

Sending to the meta method;

step 203, receiving

Step 204, updating output

II therein_W[·]Representing a projection operator.

2. The dynamic environment-oriented adaptive online recommendation method of claim 1, wherein: the classifiers selected in step 101 include a conventional linear classifier c (x, w) ═ w^Tx, softmax classifier and neural network classifier; alternative loss functions are all convex differentiable loss functions, including the squared loss l (p, y))＝(p-y)²The Hinge loss l (p, y) max (0,1-yp) and the cross entropy loss l (p, y) sigma_iy_ilog(p_i)。

3. The adaptive online recommendation method for dynamic environment facing claim 1, wherein the step size parameter α in step 103 is set according to

Wherein T is the total number of rounds; d is the diameter of the classifier parameter feasible region W; g is an arbitrary value such that the following holds:

4. the dynamic environment-oriented adaptive online recommendation method of claim 1, wherein: the setting mode of the number N of the professional methods in the step 104 is

5. The adaptive online recommendation method for dynamic environment according to claim 1, wherein the learning rate η of each expert method in the step 105 is set as 1,2, …, and the learning rate of N experts is set as

6. the dynamic environment-oriented adaptive online recommendation method of claim 1, wherein: said step (c) is115, and a substitution loss function s_tSpecific definition of (a) is

7. The dynamic environment-oriented adaptive online recommendation method of claim 1, wherein: the projection operator Π in step 204_W[·]Is specifically defined as_W[u]＝argmin_v∈W‖u-v‖,u∈W。