CN110263257A

CN110263257A - Multi-source heterogeneous data mixing recommended models based on deep learning

Info

Publication number: CN110263257A
Application number: CN201910547320.1A
Authority: CN
Inventors: 冀振燕; 宋晓军; 赵颖斯; 皮怀雨; 李俊东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2019-09-20
Anticipated expiration: 2039-06-24
Also published as: CN110263257B

Abstract

Deep learning has been widely used in image and audio identification, text classification and has indicated that fields, the recommender systems based on deep learning such as study also become the research hotspot of scholars in recent years.Deep learning model all achieves fabulous effect in the expression study of the specific datas such as image, text, avoids complicated Feature Engineering, and the abstract characteristics of the Nonlinear Multi time of available isomeric data indicate, overcome the heterogeneity of a variety of data.Currently, the deep learning recommended models of fusion scoring, comment and social networks not yet propose.This patent is based on deep learning algorithm, give the recommended flowsheet with stronger expansion, it analyzes different data and is suitble to the related algorithm used and principle, it is proposed the final loss function in conjunction with comment, scoring and social information according to the loss function of different data, improves the accuracy of recommendation results.

Description

Multi-source heterogeneous data mixing recommended models based on deep learning

Technical field

Deep learning has been widely used in image and audio identification, text classification and has indicated the fields such as study in recent years, Recommender system based on deep learning also becomes the research hotspot of scholars.Deep learning model is in certain numbers such as image, texts According to expression study in all achieve fabulous effect, avoid complicated Feature Engineering, available isomeric data it is non-thread Property multi-level abstract characteristics indicate, overcome the heterogeneity of a variety of data.Currently, merging scoring, comment and social networks Deep learning recommended models not yet propose.This patent is based on deep learning algorithm, gives pushing away with stronger expansion Recommend model.

Background technique

Current deep learning model cannot still be recommended in conjunction with scoring, comment and social network information.Because of multi-source The character representation of isomeric data still has difficulty, social information and other users and article interactive information and can not directly merge.If Can be using the expression of deep learning method study different isomerization data and unified into a deep learning model by them, it will solve Research is indicated in the shortcomings that selecting algorithms of different that needs on algorithm fusion, and using deep learning learning characteristic before certainly, will Significantly improve the accuracy of recommendation results.The advantages of in order to make full use of these three data, the spy that this patent fusion is scored, commented on During levying and social information being added to training, the multi-source heterogeneous data recommendation model based on deep learning is proposed.

For comment data, traditional topic model cannot accurately indicate the feature of text, and this patent passes through PV-DBOW mould Type learns to comment on the character representation of document, and PV-DBOW assumes the independence between the word in document, and use the document Predict the word each observed.PV-DBOW represents each document by a dense vector, this vector is trained to come pre- Survey the word in document.For score data, traditional matrix disassembling method faces Sparse and the lower difficulty of accuracy rate, This patent is scored using neural metwork training, can preferably embody the feature of user and article.For social network data, originally Patent to based on BPR to the social relationship information for increasing user in learning method so that the sampling of this method more adduction Reason, improves the accuracy of recommendation results.

Summary of the invention

The recommended models for being capable of handling multi-source heterogeneous data are proposed based on deep learning, model has accuracy height, can expand The advantages that malleability is strong.Model has selected the text fragment based on deep learning to indicate learning method, devises based on scoring study The neural network of user and article characteristics, and by social networks constraint based on pair study.Depth is based on due to existing The method of the text representation study of habit have been relatively mature, it is possible to directly use existing network, by its training result and Other Fusion Features are trained together, so that obtaining more accurate fusion feature indicates.Score data and text data are not It is same, it can directly learn to obtain the feature of user and article, so not needing the vector of study scoring indicates, without Practise substance feature.

Multi-source heterogeneous data recommendation model based on deep learning includes three kinds of data: comment, scoring and social networks, often The characteristics of kind data have itself, embodies the feature of user or article from different perspectives.Learn a variety of numbers by depth model According to vector indicate, the fusion feature of user or article is then obtained by concatenated method.Comment on feature instantiation user couple The attitude of article can also be used to the attribute for indicating article, the mark sheet that model passes through PV-DBOW algorithm study comment paragraph Show, then weighted superposition obtains user or the vector of article and indicates.Scoring is characterized in the overall evaluation of the user to article, embodies use Family can learn the satisfactory level of article using BPR to the nonlinear characteristic of user and article.Social networks embodies Friend relation between user affects the interaction between user and article indirectly, using social network relationships can reinforce to The constraint of family buying behavior, to further increase recommendation results accuracy.

The above method contains following steps:

(1) it Text character extraction: is indicated using the feature vector of PV-DBOW model learning text fragment.What model used It is distributed bag of words (DistributedBag-of-Words), which predicts paragraph using a paragraph vector In word obtained through stochastical sampling.

(2) score feature extraction: the neural network connected entirely using two layers learns scoring of the user to article.With text Unlike feature learning model, this method can directly obtain user and the feature vector of article and indicate, rather than directly mention Take the feature of scoring.

(3) consumer articles Fusion Features: the comment text feature acquired according to (1) can be commented what every user issued User characteristics are obtained by feature vector weighted sum, the comment feature vector weighted sum that article receives is obtained into article characteristics. The text of user and scoring Fusion Features are finally obtained into the fusion feature of user using fusion function, by the text of article and commented Fusion Features are divided to obtain the fusion feature of article.

(4) based on the optimization of BPR: sampling to obtain the triple with user preference based on social networks, according to Bayes Theoretical optimization obtains optimum model parameter.

(5) recommend: the feature vector of user and article is input in model by the model parameter acquired according to step (4) Recommend article for user.

In step (1) Text character extraction, including following four steps:

1. Text Pretreatment

Each paragraph is indicated using an one-hot vector, represents the column in paragraph matrix.It will be in comment text Word carries out duplicate removal, is added in dictionary, and each word is indicated using unique one-hot vector.After the completion of building, square is commented on Each column can uniquely correspond to a comment in battle array.

2. word samples

Model predicts the word in paragraph using a paragraph vector, wherein the word in paragraph by paragraph with Machine samples to obtain.Each word is counted as independent presence in paragraph, and the sequence of word does not influence the study knot of paragraph vector Fruit.

3. optimizing

Use step 1. in paragraph vector as input, the word for using step 2. to sample as export, constantly Iteration, training paragraph vector model.The model is constructed based on neural network, using softmax classifier, using under stochastic gradient Drop method acquires model parameter.

4. the character representation of comment text

After the completion of training, each column of paragraph matrix are the feature vector of every section of comment.It is 1. defined using step every The one-hot vector sum matrix multiple of section comment, can be obtained the character representation when previous paragraphs.

Step (2) scoring feature extraction includes following three step:

1. neural network constructs

The neural network that scoring Feature Selection Model is connected entirely based on two layers, uses ELU activation primitive.Neural network it is defeated Enter the result being multiplied for the feature of user with the characteristic element of article.Neural network output is scoring of the user to article.

2. consumer articles characteristic optimization

According to objective function, the scoring feature vector of user and article is continued to optimize using stochastic gradient descent method, is reduced Loss.When prediction scoring and practical scoring are closer to, the scoring feature of user and article is can be obtained in deconditioning.

Step (3) is merged the feature of step (1) and step (2) to obtain new fusion feature.

Scoring feature represents overall assessment of the user to article, simple clear, and contains the difference of user in commenting on Viewpoint, in further detail.Step (3) will be commented on and scoring feature merges, and more being enriched comprehensive user characteristics indicates. The method of fusion using will comment on feature vector and scoring feature vector connect, the character representation merged to Amount.

The optimization of step (4) based on BPR mainly comprises the steps of:

1. triple generates

Because the preference of user often has similitude with their good friend, the reflection in reality is exactly that user more holds The article for easily selecting its good friend to buy or have a preference for.The similitude of this preference between user is applied to the sampling of BPR model , by more reasonably constraining to sampling process, the triple for being more in line with user behavior is obtained, to improve subsequent mould Type training and the accuracy recommended.

2. model optimization

It is proposed that unified objective function carries out model optimization.The fusion function of multi-source heterogeneous data has been proposed above, Need to construct objective function according to fusion feature now, enable in learning process fusion feature more accurate representation user or The feature of article.Stochastic gradient descent method can be used to be solved, existing deep learning frame is all integrated with stochastic gradient Descent algorithm can obtain the feature vector of final user and article by called side Faku County.

Step (5) is that user recommends interested article.

Each user does not buy with other or the feature vector of article that does not browse is multiplied and the user can be obtained to the object The preference-score of product, score is higher to be represented user and is more possible to buy or browse the article.The score descending of all items is arranged Arrange the Top-N recommendation list for taking top n that can acquire user.

Detailed description of the invention

Fig. 1 is the mixed recommendation model flow figure based on multi-source heterogeneous data.

Specific embodiment

According to the method introduction in specification, implements the recommended models based on multi-source heterogeneous data and needs following steps:

(1) Text character extraction

1. Text Pretreatment

Use d_uvUser u is indicated to the comment text of article v, the word that comment text includes is indicated using w, pass through The feature vector of user and article that user learns the comment of article use u₁And v₁It indicates, the feature vector of paragraph makes Use d_uvIt indicates, term vector is indicated using w, the word of all comments is stored in dictionary V.The dimension of these feature vectors Number is all K.

2. word samples

For each paragraph, text filed, some words of stochastical sampling sampling from the region are randomly selected, as The result of training classifier.Text filed size and the number for choosing word in this region are manually set.

3. optimizing

Every section of comment can be all mapped in a random higher-dimension semantic space, then be carried out to the word for including in paragraph Prediction is optimized by study, and obtaining more accurate paragraph feature vector indicates.According to bag of words it is assumed that each word w exists Document d_uvThe probability of middle appearance is calculated using softmax:

Wherein w ' indicates that the whole words for belonging to dictionary V, exp are indicated using e as the exponential function at bottom.It can be with by this formula Acquire the probability that any word occurs in document.During actually there is word probability in maximization, the solution of gradient solution Expense is larger.In order to reduce the expense of calculating, often using the method for negative sampling in calculating process, in the word not occurred According to a predefined noise profile come sampling section word, approximate calculation is carried out as negative sample, rather than uses dictionary In all word.Based on the strategy of negative sampling, then the objective function of PV-DBOW is defined as:

The combination of all word with document is all added by above formula, whereinIt is word w in document d_uvMiddle appearance Number, if not occurring functional value be 0.What is represented is sigmoid function, and t is the number of negative sample,What is indicated is in noise profile P_VIn,Expectation.

4. the character representation of comment text

According to above-mentioned objective function, the character representation d of available document_uvAnd it is proposed in this paper based on conventional machines Similar in the recommended models of learning method, the feature vector of user and article can be indicated according to the feature vector of comment.No It crosses the character representation of user herein and article no longer to be calculated by the average of comment feature vector, but passes through subsequent Models Sets Learn at optimization.

By the feature vector weighting summation of all comments of user and normalization obtains the user characteristics factor:

Wherein D_uIndicate all comment numbers of user u, p'_ukIndicate total probability of the user on topic k, W_uvIndicate user u For the weight of i-th of comment of sending, p_ukIt is its normalized expression.The characterization factor of user u are as follows:

p_u=(p_u1,...,p_uK)

User characteristics factor dimension is K.The article characteristics factor can be used similar formula and calculate:

Wherein D_vIndicate all comment numbers that article receives, q'_vkIndicate total probability of the article on topic k, q_vkIt is that it is returned One expression changed, W_uvIndicate article v for the weight of u-th of the comment received.The characterization factor of article are as follows:

q_v=(q_v1,...,q_vK)

K is that the number of dimensions of article and user are consistent.

Wherein W_uvTo comment on d_uvIt, could be to the important journey of different comments by weight for the weight of user u and article i Degree makes differentiation, to construct reasonable user and article characteristics.

(2) scoring feature extraction

1. neural network constructs

The neural network connected entirely using two layers trains to obtain final user to article score.Use can be directly obtained The feature vector of family and article indicates.Define r_uiScoring of the user u to article i is indicated, then for the r that arbitrarily scores_uiHave User r_uWith corresponding article r_iIt is corresponding to it.Then available two layers of neural network prediction formula:

r_ui=φ (U₂·φ(U₁(r_u⊙r_i)+c₁)+c₂)

Wherein ⊙ indicates that element multiplication, φ (x) are ELU activation primitive, U₁、U₂、c₁And c₂To need the weight that learns and partially Poor parameter.

2. consumer articles characteristic optimization

Objective function is to predict scoring and square subtracted each other that really score, and Optimal Parameters make objective function minimum Obtaining optimal user and article indicates.

(3) consumer articles Fusion Features

The feature vector of user and article are constructed by the interactive information between user and article.It is proposed a fusion feature Function f (), it is assumed that the character representation that learn is x from scoring and text data₁,x₂, then then may be used by fusion function To obtain fused feature:

X=f (x₁,x₂)

Wherein x is fused feature.It is merged using simple series system, user can be enhanced and article is special The scalability of sign, this has great importance for the model based on multi-source heterogeneous data.So obtained by function f () Feature is

(4) based on the optimization of BPR

1. triple generates

The user is bought or browsed by each user u according to the purchase of user or browsing record and social networks The article crossed is defined as i, user is defined as j from not in contact with the article crossed, then the article that the user good friend was bought defines For p.All article set are defined as D in system, then user u purchase and the article set browsed are defined as D_u, user The article that good friend bought is defined as D_p.The article that user preference can most be represented is article D that user has bought first_i, It is secondary, according to the similitude of good friend's preference, user be likely to buy its good friend bought but the article D that does not buy of the user_p\ D_u.Finally, the article that user most unlikely buys be D (D_u∪D_p).User and article are constructed according to social network information Triple can be represented as training set, training set T:

T:=(u, i, j) | i ∈ (D_u∪D_p),j∈D\(D_u∪D_p)}

Wherein (u, i, j) is consumer articles triple, represents user u and is greater than article j for the preference of article i.Its Middle article i belongs to the article that user bought or the direct good friend of user bought, and it is to buy and it is straight that article j, which belongs to user, Connect the article that friend did not also buy.Thus the direct friend relation based on user constructs consumer articles triple, is used for The training of subsequent BPR model.

2. model optimization

It is greater than article j according to preference of the definition known users u of front to article i.It is indicated using function g () In conjunction with the loss function that user and article characteristics indicate, g () is defined as sigmoid function to calculate user couple herein herein Difference preference's degree of different articles, then there is g (u, i, j)=σ (u^Ti-u^Tj).So entirely merging pushing away for multi-source heterogeneous data The objective function for recommending model is defined as:

Wherein W is the weight parameter of every kind of model, indicates the weight ginseng of every comment of user in learning model in comment Number is different from, and needs to obtain by study.And in the model that scoring calculates, that acquire is directly the spy of user and article Sign, that is, weight parameter are set as 1, do not need to be updated by optimization object function.What wherein Θ was represented is mould The other parameters for needing to learn in type, Θ={ Θ₁,Θ₂}={ { w, d_uv},{U₁,U₂,c₁,c₂,r_u,r_i}}.λ is every kind of model Punishment parameter, their value is all on [0,1] section.It is added to negative sign before the objective function of Rating Model, because commenting The objective function of sub-model needs to minimize, and the objective function of overall model is to maximize.

(5) recommend

Personalized recommendation list can be multiplied to obtain by user with the feature vector of article:

S=u^Tv

Claims

1. proposing the recommended models for being capable of handling multi-source heterogeneous data based on deep learning, model has accuracy high, expansible The advantages that property is strong.The above method contains following steps:

(1) it Text character extraction: is indicated using the feature vector of PV-DBOW model learning text fragment；Model using point The bag of words (Distributed Bag-of-Words) of cloth, the model predicted using a paragraph vector in paragraph with The word that machine samples；

(2) score feature extraction: the neural network connected entirely using two layers learns scoring of the user to article；With text feature Unlike learning model, this method can directly obtain user and the feature vector of article and indicate, rather than directly extracts and comment The feature divided；

(3) consumer articles Fusion Features: the comment text feature acquired according to (1), the comment that every user can be issued are special Sign vector weighted sum obtains user characteristics, the comment feature vector weighted sum that article receives is obtained article characteristics, finally The text of user and scoring Fusion Features are obtained into the fusion feature of user using fusion function, the text of article and scoring is special Sign fusion obtains the fusion feature of article；

(4) based on the optimization of BPR: sampling to obtain the triple with user preference based on social networks, according to bayesian theory Optimization obtains optimum model parameter；

(5) recommend: the feature vector of user and article are input in model to use by the model parameter acquired according to step (4) Recommend article in family.

2. (1) Text character extraction step described in claim 1, Text Pretreatment therein uses d_uvTo indicate user To the comment text of article v, the word that comment text includes is indicated u using w, is learnt by user to the comment of article The feature vector of user and article uses u₁And v₁It indicates, the feature vector of paragraph uses d_uvIt indicates, term vector uses w to come It indicates, the word of all comments is stored in dictionary V；The number of dimensions of these feature vectors is all K.

3. (1) Text character extraction step described in claim 1, word sampling therein is for each paragraph, at random Choose text filed, some words of stochastical sampling sampling, the result as training classifier from the region；It is text filed Size and in this region choose word number manually set.

4. (1) Text character extraction step described in claim 1, every section of comment can be all mapped in optimization therein In the higher-dimension semantic space random to one, then the word for including in paragraph is predicted, is optimized by study, is obtained more Accurate paragraph feature vector indicates；According to bag of words it is assumed that each word w in document d_uvThe probability of middle appearance uses Softmax is calculated:

Wherein w ' expression belongs to whole words of dictionary V, and exp is indicated using e as the exponential function at bottom；It can be in the hope of by this formula The probability that any word occurs in document；During actually there is word probability in maximization, the solution expense of gradient solution It is larger；In order to reduce the expense of calculating, often using the method for negative sampling in calculating process, the basis in the word not occurred One predefined noise profile carrys out sampling section word, carries out approximate calculation as negative sample, rather than uses institute in dictionary Some words；Based on the strategy of negative sampling, then the objective function of PV-DBOW is defined as:

The combination of all word with document is all added by above formula, whereinIt is word w in document d_uvTime of middle appearance Number, functional value is 0 if not occurring；What is represented is sigmoid function, and t is the number of negative sample,What is indicated is in noise profile P_VIn,Expectation.

5. (1) Text character extraction step described in claim 1, the character representation of comment text therein specifically just like Lower feature: according to above-mentioned objective function, the character representation d of available document_uvAnd it is proposed in this paper based on conventional machines Similar in the recommended models of learning method, the feature vector of user and article can be indicated according to the feature vector of comment.But The character representation of user and article is no longer calculated by the average of comment feature vector herein, but passes through subsequent model integrated Optimize to learn.

Wherein D_uIndicate all comment numbers of user u, p '_ukIndicate total probability of the user on topic k, W_uvIndicate user u for The weight of i-th of the comment issued, p_ukIt is its normalized expression；The characterization factor of user u are as follows:

p_u=(p_u1..., p_uK)

User characteristics factor dimension is K；The article characteristics factor can be used similar formula and calculate:

Wherein D_vIndicate all comment numbers that article receives, q '_vkIndicate total probability of the article on topic k, q_vkIt is its normalization Expression, W_uvIndicate article v for the weight of u-th of the comment received；The characterization factor of article are as follows:

q_v=(q_v1..., q_vK)

K is that the number of dimensions of article and user are consistent；Wherein W_uvTo comment on d_uvFor the weight of user u and article i, pass through Weight could make differentiation to the significance level of different comments, to construct reasonable user and article characteristics.

The characteristic extraction step 6. (2) described in claim 1 score, neural network building therein use two layers of full connection Neural network train to obtain final user to article score；The feature vector table of user and article can be directly obtained Show；Define r_uiScoring of the user u to article i is indicated, then for the r that arbitrarily scores_uiThere is user r_uWith corresponding article r_i It is corresponding to it；Then available two layers of neural network prediction formula:

r_ui=φ (U₂·φ(U₁(r_u⊙r_i)+c₁)+c₂)

Wherein ⊙ indicates that element multiplication, φ (x) are ELU activation primitive, U₁、U₂、c₁And c₂For weight and the deviation ginseng for needing to learn Number.

The characteristic extraction step 7. (2) described in claim 1 score, consumer articles characteristic optimization objective function therein are Make objective function minimum that optimal user and object can be obtained for prediction scoring and square subtracted each other that really score, Optimal Parameters Product indicate.

8. (3) user characteristics fusion steps described in claim 1 are constructed by the interactive information between user and article The feature vector of user and article；It is proposed the function f () an of fusion feature, it is assumed that from scoring and text data in study to Character representation be x₁, x₂, then passing through fusion function then available fused feature:

X=f (x₁, x₂)

Wherein x is fused feature；It is merged using simple series system, user and article characteristics can be enhanced Scalability, this has great importance for the model based on multi-source heterogeneous data；The feature so obtained by function f () For

9. the Optimization Steps of (4) based on BPR described in claim 1, triple therein generates the purchase according to user Or the article that the user buys or browsed is defined as i, by user for each user u by browsing record and social networks It is defined as j from not in contact with the article crossed, then the article that the user good friend bought is defined as p；All article collection in system Conjunction is defined as D, then user u purchase and the article set browsed are defined as D_u, the article that user good friend bought is defined as D_p；The article that user preference can most be represented is article D that user has bought first_i；Secondly, according to the similitude of good friend's preference, User be likely to buy its good friend bought but the article D that does not buy of the user_p\D_u；Finally, user most unlikely buys Article be D (D_u∪D_p)；The triple of user and article is constructed as training set, the training set according to social network information T can be represented as:

T:=(u, i, j) | i ∈ (D_u∪D_p), j ∈ D (D_u∪D_p)}

Wherein (u, i, j) is consumer articles triple, represents user u and is greater than article j for the preference of article i；Wherein object Product i belongs to the article that user bought or the direct good friend of user bought, and it is to buy and it is directly good that article j, which belongs to user, The article that friend did not also buy；Thus the direct friend relation based on user constructs consumer articles triple, for subsequent The training of BPR model.

10. the Optimization Steps of (4) based on BPR described in claim 1, model optimization therein according to the definition of front Know that user u is greater than article j to the preference of article i；It indicates user and article characteristics is combined to indicate using function g () Loss function, by g () is defined as sigmoid function herein herein to calculate user to difference preference's degree of different articles, So there is g (u, i, j)=σ (u^Ti-u^Tj)；So the objective function for entirely merging the recommended models of multi-source heterogeneous data is defined Are as follows:

Wherein W is the weight parameter of every kind of model, indicates the weight parameter of every comment of user in learning model all in comment It is not identical, it needs to obtain by study；And in the model that scoring calculates, what is acquired is directly the feature of user and article, It is exactly that weight parameter is set as 1, does not need to be updated by optimization object function；What wherein Θ was represented is needed in model The other parameters to be learnt, Θ={ Θ₁, Θ₂}={ { w, d_uv, { U₁, U₂, c₁, c₂, r_u, r_i}}；λ is the punishment of every kind of model Parameter, their value is all on [0,1] section；It is added to negative sign before the objective function of Rating Model, because of Rating Model Objective function need to minimize, and the objective function of overall model be maximize.

11. (5) recommendation step described in claim 1, personalized recommendation list can pass through user and article Feature vector is multiplied to obtain:

S=u^Tv

Each user does not buy with other or the feature vector of article that does not browse is multiplied and the user can be obtained to the article Preference-score, score is higher to be represented user and is more possible to buy or browse the article；The score descending arrangement of all items is taken Top n can acquire the Tbp-N recommendation list of user.