CN101968798A - Community recommendation method based on on-line soft constraint LDA algorithm - Google Patents

Community recommendation method based on on-line soft constraint LDA algorithm Download PDF

Info

Publication number
CN101968798A
CN101968798A CN 201010284218 CN201010284218A CN101968798A CN 101968798 A CN101968798 A CN 101968798A CN 201010284218 CN201010284218 CN 201010284218 CN 201010284218 A CN201010284218 A CN 201010284218A CN 101968798 A CN101968798 A CN 101968798A
Authority
CN
China
Prior art keywords
community
theme
user
model
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010284218
Other languages
Chinese (zh)
Inventor
俞能海
康雨洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN 201010284218 priority Critical patent/CN101968798A/en
Publication of CN101968798A publication Critical patent/CN101968798A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a community recommendation method based on an on-line soft constraint linear discriminant analysis (LDA) algorithm, which belongs to the field of social networks. The invention aims to solve the problems of over-fitting and high computing amount due to the restriction of data observation, which are common in the current community recommendation method. The invention mainly utilizes the number of posting in each community as the soft constraint, computes the potential subject distribution in the community by using the LDA algorithm, processes the newly added users by using an increment method, and updates the model in real time, thereby achieving the effect of on-line operation. The method can be used for automatically computing the subject distribution in the community on the premise of lacking user characteristics and community characteristics, and finally estimates the community which the user is most interested; and thus, the on-line method greatly enhances the computational efficiency. The invention obviously improves the accuracy rate and speed as compared with the popular method at present.

Description

Community's recommend method based on online soft-constraint LDA algorithm
Technical field
The present invention relates to personalized recommendation method, particularly a kind of community's recommend method based on online soft-constraint LDA algorithm.
Background technology
In recent years, community network was large quantities of emerges in large numbers.These webpages provide the instrument of setting up community for the user, make users with a common goal to flock together, and share their information of interest mutually.Could can users produce fascination in the face of the community of various theme of magnanimity, how the interested community of efficient selection oneself along with increasing fast of this social network service? so a very important techniques is recommended to become gradually by community.
Common personalized recommendation method has two kinds at present: a kind of recommend method that is based on content, a kind of is collaborative the recommendation.Content-based recommend method utilizes every user that the comment or the voting behavior of object are trained a preference pattern for this user earlier, and then a new object that utilizes this preference pattern to recommend him to be most interested in to the user.Collaborative recommendation is to be based upon on the basis of following hypothesis: similar user has identical hobby.When using collaborative method of recommending to give user's recommended, only need hobby with reference to the user similar to this user.Therefore this method does not need to obtain the content information of object, can recommend under the situation of shortage to the description of object.
In the algorithm that present existing community is recommended, two kinds of well-known methods based on collaborative recommendation are arranged: the LDA method of ARM method and two-value.There are how many overlapping users to calculate intercommunal mutual relationship between the different communities of ARM method utilization.The LDA method of two-value is calculated intercommunal potential theme by community-user's co-occurrence matrix.These two kinds of methods run into easily because over-fitting phenomenon that restriction caused and the huge problem of calculated amount that data are observed.And these two kinds of methods have all been ignored the user and intercommunal relation is strong and weak, can not handle initiate user.
Summary of the invention
The objective of the invention is to, solve that existing community recommend method is run into easily because over-fitting phenomenon that restriction caused and the huge problem of calculated amount that data are observed.
For achieving the above object, the invention provides a kind of community's recommend method based on online soft-constraint LDA algorithm, comprise that calculating theme distributes, calculates best candidate community, online updating three big steps.
Described calculating theme distribution step is:
Step a for unique user, grasps its information of posting in each little community, adds up its number of times of posting respectively, as the criterion of user and community relations, with i user U iAt j the C of community that he participated in I, jOn the number of times of posting as user U iWith the C of community I, jConcern degree of strength, use R I, jExpression;
Step b is considered as document with the user, and the community that the user participates in is considered as the word in the document, and R I, jBe exactly the word C of community I, jAt customer documentation U iIn occurrence number, set up with the LDA algorithm that user's theme distributes and theme community's distributed model, and with Gibbs method of sampling solving model parameter, the detailed process of finding the solution is:
Be earlier the community's word that occurs in all customer documentations, theme set of Random assignment, as be the word C of community I, jDistribute the theme set
Figure BSA00000273164600021
Utilize iterative formula to upgrade all themes again, restrain up to model parameter:
P ( t i , j , k = t | T - ( i , j , k ) , c ) ∝ n - ( i , j , k ) , t ( c i , j ) + β n - ( i , j , k ) , t ( · ) + N C β n - ( i , j , k ) , t ( u i ) + α n - ( i , j , k ) , · ( u i ) + N T α
Wherein, T -(i, j, k)Current theme t is removed in expression I, j, kAll remaining afterwards themes distribute,
Figure BSA00000273164600023
The expression C of community I, jBe assigned to the total degree of theme t, α and β are the parameters that adopts empirical value;
Described calculating best candidate community step is:
Step c distributes with Model Calculation theme-community's distribution phi of finding the solution out and user-theme
Figure BSA00000273164600024
, utilize following formula to calculate:
φ ^ t ( c ) = n t ( c ) + β n t ( · ) + N C β , θ ^ t ( u ) = n t ( u ) +α n · ( u ) + N T α ;
Steps d for each community's marking, is found out the community that the user is most interested in, and the standards of grading that sort for community are:
ξ = Σ N T φ ^ t ( c ) θ ^ t ( u ) ;
Described online updating step is:
Step e on the model basis of invariable that maintenance has trained, trains separately initiate user model, and method is specially:
The model that maintenance has trained is constant, give the community that occurs in initiate customer documentation word Random assignment theme set separately, the theme that utilizes the iterative formula iteration to upgrade in the new customer documentation again distributes, use the model that has trained in the iterative formula, and only the less number of times of iteration can significantly be raised the efficiency like this;
Step f, the merging of two parts model is as a whole, as new model, be each community's marking more again.
Beneficial effect of the present invention is, utilize the post number of times of user in each community as soft-constraint, use the LDA algorithm, can be under the prerequisite that lacks user characteristics and community's feature, the potential theme that calculates automatically in the community distributes, and finally extrapolates the community that the user is most interested in.Utilize the method for an increment to handle initiate user, the real-time update model reaches the effect of on-line operation, has improved counting yield greatly.
In order to check the validity of our method, we as data set, have collected the information of 409093 10814 communities that user and they participated in MySpace altogether.We choose one at random and form test set from all communities that each user participated in, and all remaining data are as training set.Use the LDA method of two-value and our method simultaneously, carry out the recommendation of community to the user.See then whether the rank in recommendation results of the community in the test set is forward, rank is high more, and ecbatic is good more.Experimental result shows that the present invention has significantly improved accuracy rate and speed that community is recommended.As Fig. 1.
Description of drawings
Fig. 1 compares for the result of S-LDA of the present invention and two-value B-LDA method.
Fig. 2 carries out the system schematic that community is recommended for using the present invention to the user.
Fig. 3 is for calculating the process flow diagram that theme distributes.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is a part of embodiment of the present invention, rather than whole embodiment.Based on embodiments of the invention, the every other embodiment that those of ordinary skills are obtained under the prerequisite of not making creative work belongs to the scope of protection of the invention.
As shown in Figure 1, for using the present invention the user is carried out the system schematic that community is recommended.Described system comprises foreground reptile and backstage arithmetic element.On the foreground, reptile is responsible for obtaining the information that all these systems need handle from network.In the backstage arithmetic element, the method that the present invention comprised is used to the data that the foreground obtains are analyzed and calculated.
Before calculating process of the present invention is elaborated, this example is carried out certain description.The data that this example is related are all come the friend-making website MySpace of the famous community of automatic network.Realize grasping the reptile design of these data and the management method of these data, not within the scope of the invention.
For the user who obtains-community relations matrix, the target of our method is that the potential theme that will calculate in the community distributes, and utilizes this distribution to community's ordering of giving a mark, and recommends the highest community of score to the user at last.
The present invention need preestablish the number of potential theme.When calculating potential theme distribution, the number of topics purpose is chosen the accuracy that can influence final recommendation results, adjusts within the specific limits according to actual conditions in the application.
Below calculating process of the present invention is elaborated.
As shown in Figure 2, the present invention includes the calculating theme and distribute, calculate best candidate community and online updating three big steps.
Be elaborated for calculating the theme distributed process below.
As shown in Figure 3, the flow process of calculating theme distribution is:
Step 101: initialization, be the community's word that occurs in all customer documentations, theme set of Random assignment, as be the word C of community I, jDistribute the theme set
Figure BSA00000273164600031
The size of theme set is R I, j, representative of consumer u iWith the c of community I, jConcern power, count initialized array N1, N2, N3, N4 is used for storage respectively With
Figure BSA00000273164600033
Step 102: initial value i=1 is set, illustrates that current preparation handles first customer documentation;
Step 103: initial value j=1 is set, current preparation process user u is described iThe first property word;
Step 104: initial value k=1 is set, the current preparation processing c of community is described I, jFirst theme in the theme set;
Step 105:, k with N1[j], N2[i, k], N3[k] value subtract 1 respectively;
Step 106: utilize formula Selection has the theme t of maximum probability, as new theme, replaces original theme;
Step 107:, k with N1[j], N2[i, k], N3[k] value add 1 respectively;
Step 108: if k<R Ij, then k=k+1 forwards step 105 to, otherwise continues;
Step 109: if c I, jNot user u iLast community's word, then j=j+1 forwards step 104 to, otherwise continues;
Step 110: if u iBe not last user, then i=i+1 forwards step 103 to, otherwise finishes.
Be described in detail for calculating the best candidate community process below.
Step 201: distribute with Model Calculation theme-community's distribution phi of finding the solution out and user-theme
Figure BSA00000273164600042
, utilize following formula to calculate:
φ ^ t ( c ) = n t ( c ) + β n t ( · ) + N C β , θ ^ t ( u ) = n t ( u ) +α n · ( u ) + N T α ;
Step 202: for each community's marking, find out the community that the user is most interested in, the standards of grading that sort for community are:
ξ = Σ N T φ ^ t ( c ) θ ^ t ( u ) ;
Be described in detail for the online updating process below.
Step 301: for initiate customer documentation, the model that maintenance has trained is constant, distributes the theme set for separately the community's word that occurs in the initiate customer documentation;
Step 302: utilize iterative formula, upgrade the set of newly assigned theme separately, can use the model that has trained in the iterative formula, and the fewer number of times of iteration only;
Step 303: the merging of two parts model is as a whole, as new model, forward step 201 then to.
The above description of this invention is illustrative, and nonrestrictive, and those skilled in the art is understood, and can carry out many modifications, variation or equivalence to it within spirit that claim limits and scope, but they will fall within the scope of protection of the present invention all.

Claims (7)

1. the community's recommend method based on online soft-constraint LDA algorithm is characterized in that, comprises that calculating theme distributes, calculates best candidate community, online updating three big steps:
Described calculating theme distribution step is:
Step a for unique user, grasps its information of posting in each little community, adds up its number of times of posting respectively, with this power that concerns as measurement user and community;
Step b sets up with the LDA algorithm that user's theme distributes and theme community's distributed model, and with Gibbs method of sampling solving model parameter;
Described calculating best candidate community step is:
Step c distributes with Model Calculation theme-community's distribution phi of finding the solution out and user-theme
Figure FSA00000273164500011
Steps d for each community's marking, is found out the community that the user is most interested in;
Described online updating step is:
Step e on the model basis of invariable that maintenance has trained, trains separately initiate user model;
Step f, the merging of two parts model is as a whole, as new model, be each community's marking more again.
2. calculating theme distribution step according to claim 1 is characterized in that, among the described step a, selects the post number of times of user in community as the criterion of user and community relations, with i user U iAt j the C of community that he participated in I, jOn the number of times of posting as user U iWith the C of community I, jConcern degree of strength, use R I, jExpression.
3. calculating theme distribution step according to claim 1 is characterized in that, among the described step b, the user is considered as document, and the community that the user participates in is considered as the word in the document, and R I, jBe exactly the word C of community I, jAt customer documentation U iIn occurrence number.
4. calculating theme distribution step according to claim 1 is characterized in that, among the described step b, the solving model parametric procedure is specially:
Be earlier the community's word that occurs in all customer documentations, theme set of Random assignment, as be the word C of community I, jDistribute the theme set
Figure FSA00000273164500012
Utilize iterative formula to upgrade all themes again, restrain up to model parameter:
P ( t i , j , k = t | T - ( i , j , k ) , c ) ∝ n - ( i , j , k ) , t ( c i , j ) + β n - ( i , j , k ) , t ( · ) + N C β n - ( i , j , k ) , t ( u i ) + α n - ( i , j , k ) , · ( u i ) + N T α
Wherein, T -(i, j, k)Current theme t is removed in expression I, j, kAll remaining afterwards themes distribute,
Figure FSA00000273164500014
The expression C of community I, jBe assigned to the total degree of theme t, α and β are the parameters that adopts empirical value.
5. calculating best candidate according to claim 1 community step is characterized in that, among the described step c, theme-community's distribution phi and user-theme distributes
Figure FSA00000273164500015
Calculate with following formula:
φ ^ t ( c ) = n t ( c ) + β n t ( · ) + N C β , θ ^ t ( u ) = n t ( u ) +α n · ( u ) + N T α .
6. calculating best candidate according to claim 1 community step is characterized in that, in the described steps d, for the standards of grading of community's ordering are:
ξ = Σ N T φ ^ t ( c ) θ ^ t ( u ) .
7. online updating step according to claim 1 is characterized in that, among the described step e, the method that initiate user model is trained separately is specially:
The model that maintenance has trained is constant, give the community that occurs in initiate customer documentation word Random assignment theme set separately, the theme that utilizes the iterative formula iteration to upgrade in the new customer documentation again distributes, use the model that has trained in the iterative formula, and only the less number of times of iteration can significantly be raised the efficiency like this.
CN 201010284218 2010-09-10 2010-09-10 Community recommendation method based on on-line soft constraint LDA algorithm Pending CN101968798A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010284218 CN101968798A (en) 2010-09-10 2010-09-10 Community recommendation method based on on-line soft constraint LDA algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010284218 CN101968798A (en) 2010-09-10 2010-09-10 Community recommendation method based on on-line soft constraint LDA algorithm

Publications (1)

Publication Number Publication Date
CN101968798A true CN101968798A (en) 2011-02-09

Family

ID=43547955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010284218 Pending CN101968798A (en) 2010-09-10 2010-09-10 Community recommendation method based on on-line soft constraint LDA algorithm

Country Status (1)

Country Link
CN (1) CN101968798A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915335A (en) * 2012-09-17 2013-02-06 北京大学 Information associating method based on user operation record and resource content
CN102929894A (en) * 2011-08-12 2013-02-13 中国人民解放军总参谋部第五十七研究所 Online clustering visualization method of text
CN103365978A (en) * 2013-07-01 2013-10-23 浙江大学 Traditional Chinese medicine data mining method based on LDA (Latent Dirichlet Allocation) topic model
CN104572623A (en) * 2015-01-12 2015-04-29 上海交通大学 Efficient data summary and analysis method of online LDA model
CN105608116A (en) * 2015-12-14 2016-05-25 成都陌云科技有限公司 Interaction history data based personalized recommendation method
CN105989077A (en) * 2015-02-09 2016-10-05 北京字节跳动科技有限公司 Recommendation-based interest community user guide method
CN106919997A (en) * 2015-12-28 2017-07-04 航天信息股份有限公司 A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA
CN107391637A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 For possessing the group recommending method of geographical social information
CN107562836A (en) * 2017-06-07 2018-01-09 北京航空航天大学 Method is recommended based on the answerer of topic model and machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080319974A1 (en) * 2007-06-21 2008-12-25 Microsoft Corporation Mining geographic knowledge using a location aware topic model
CN101706812A (en) * 2009-11-24 2010-05-12 清华大学 Method and device for searching documents
CN101710333A (en) * 2009-11-26 2010-05-19 西北工业大学 Network text segmenting method based on genetic algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080319974A1 (en) * 2007-06-21 2008-12-25 Microsoft Corporation Mining geographic knowledge using a location aware topic model
CN101706812A (en) * 2009-11-24 2010-05-12 清华大学 Method and device for searching documents
CN101710333A (en) * 2009-11-26 2010-05-19 西北工业大学 Network text segmenting method based on genetic algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《PNAS》 20040306 Griffiths T L,Steyvers M. Finding scientific topics 5228-5235 第101卷, 第1期 2 *
《自动化学报》 20091231 石晶,范猛,李万龙 基于LDA模型的主题分析 1586-1592 第35卷, 第12期 2 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929894A (en) * 2011-08-12 2013-02-13 中国人民解放军总参谋部第五十七研究所 Online clustering visualization method of text
CN102915335A (en) * 2012-09-17 2013-02-06 北京大学 Information associating method based on user operation record and resource content
CN102915335B (en) * 2012-09-17 2016-04-27 北京大学 Based on the information correlation method of user operation records and resource content
CN103365978A (en) * 2013-07-01 2013-10-23 浙江大学 Traditional Chinese medicine data mining method based on LDA (Latent Dirichlet Allocation) topic model
CN103365978B (en) * 2013-07-01 2017-03-29 浙江大学 TCM data method for digging based on LDA topic models
CN104572623A (en) * 2015-01-12 2015-04-29 上海交通大学 Efficient data summary and analysis method of online LDA model
CN105989077A (en) * 2015-02-09 2016-10-05 北京字节跳动科技有限公司 Recommendation-based interest community user guide method
CN105989077B (en) * 2015-02-09 2019-05-07 北京字节跳动科技有限公司 A kind of interest community user's bootstrap technique based on recommendation
CN105608116A (en) * 2015-12-14 2016-05-25 成都陌云科技有限公司 Interaction history data based personalized recommendation method
CN105608116B (en) * 2015-12-14 2019-03-15 成都陌云科技有限公司 Personalized recommendation method based on interactive history data
CN106919997A (en) * 2015-12-28 2017-07-04 航天信息股份有限公司 A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA
CN107562836A (en) * 2017-06-07 2018-01-09 北京航空航天大学 Method is recommended based on the answerer of topic model and machine learning
CN107391637A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 For possessing the group recommending method of geographical social information

Similar Documents

Publication Publication Date Title
CN101968798A (en) Community recommendation method based on on-line soft constraint LDA algorithm
CN105260390B (en) A kind of item recommendation method based on joint probability matrix decomposition towards group
Olczyk A systematic retrieval of international competitiveness literature: a bibliometric study
CN106980692A (en) A kind of influence power computational methods based on microblogging particular event
CN108154430A (en) A kind of credit scoring construction method based on machine learning and big data technology
CN107577688A (en) Original article influence power analysis system based on media information collection
CN103731738A (en) Video recommendation method and device based on user group behavioral analysis
CN108334575A (en) A kind of recommendation results sequence modification method and device, electronic equipment
CN106570525A (en) Method for evaluating online commodity assessment quality based on Bayesian network
CN103699626A (en) Method and system for analysing individual emotion tendency of microblog user
CN102439597A (en) Parameter deducing method, computing device and system based on potential dirichlet model
CN108830416A (en) Ad click rate prediction framework and algorithm based on user behavior
CN107562836A (en) Method is recommended based on the answerer of topic model and machine learning
CN104636426A (en) Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions
Mediero et al. Probabilistic calibration of a distributed hydrological model for flood forecasting
CN107292785A (en) One kind is set a question method and system
CN112612942B (en) Social big data-based fund recommendation system and method
CN105302880A (en) Content correlation recommendation method and apparatus
Papacharalampous et al. A review of machine learning concepts and methods for addressing challenges in probabilistic hydrological post-processing and forecasting
Zhai et al. Rice irrigation schedule optimization based on the AquaCrop model: study of the Longtouqiao irrigation district
CN102664744A (en) Group-sending recommendation method in network message communication
CN116911962A (en) Article selecting device and method based on data model
CN101739418A (en) Method for sequencing multi-index comprehensive weight audio-video album
Moerkerken et al. Which farmers adopt solar energy? A regression analysis to explain adoption decisions over time
CN112269932A (en) Big data-based small and medium enterprise resource integration processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110209