CN101968798A - Community recommendation method based on on-line soft constraint LDA algorithm - Google Patents
Community recommendation method based on on-line soft constraint LDA algorithm Download PDFInfo
- Publication number
- CN101968798A CN101968798A CN 201010284218 CN201010284218A CN101968798A CN 101968798 A CN101968798 A CN 101968798A CN 201010284218 CN201010284218 CN 201010284218 CN 201010284218 A CN201010284218 A CN 201010284218A CN 101968798 A CN101968798 A CN 101968798A
- Authority
- CN
- China
- Prior art keywords
- community
- theme
- user
- model
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to a community recommendation method based on an on-line soft constraint linear discriminant analysis (LDA) algorithm, which belongs to the field of social networks. The invention aims to solve the problems of over-fitting and high computing amount due to the restriction of data observation, which are common in the current community recommendation method. The invention mainly utilizes the number of posting in each community as the soft constraint, computes the potential subject distribution in the community by using the LDA algorithm, processes the newly added users by using an increment method, and updates the model in real time, thereby achieving the effect of on-line operation. The method can be used for automatically computing the subject distribution in the community on the premise of lacking user characteristics and community characteristics, and finally estimates the community which the user is most interested; and thus, the on-line method greatly enhances the computational efficiency. The invention obviously improves the accuracy rate and speed as compared with the popular method at present.
Description
Technical field
The present invention relates to personalized recommendation method, particularly a kind of community's recommend method based on online soft-constraint LDA algorithm.
Background technology
In recent years, community network was large quantities of emerges in large numbers.These webpages provide the instrument of setting up community for the user, make users with a common goal to flock together, and share their information of interest mutually.Could can users produce fascination in the face of the community of various theme of magnanimity, how the interested community of efficient selection oneself along with increasing fast of this social network service? so a very important techniques is recommended to become gradually by community.
Common personalized recommendation method has two kinds at present: a kind of recommend method that is based on content, a kind of is collaborative the recommendation.Content-based recommend method utilizes every user that the comment or the voting behavior of object are trained a preference pattern for this user earlier, and then a new object that utilizes this preference pattern to recommend him to be most interested in to the user.Collaborative recommendation is to be based upon on the basis of following hypothesis: similar user has identical hobby.When using collaborative method of recommending to give user's recommended, only need hobby with reference to the user similar to this user.Therefore this method does not need to obtain the content information of object, can recommend under the situation of shortage to the description of object.
In the algorithm that present existing community is recommended, two kinds of well-known methods based on collaborative recommendation are arranged: the LDA method of ARM method and two-value.There are how many overlapping users to calculate intercommunal mutual relationship between the different communities of ARM method utilization.The LDA method of two-value is calculated intercommunal potential theme by community-user's co-occurrence matrix.These two kinds of methods run into easily because over-fitting phenomenon that restriction caused and the huge problem of calculated amount that data are observed.And these two kinds of methods have all been ignored the user and intercommunal relation is strong and weak, can not handle initiate user.
Summary of the invention
The objective of the invention is to, solve that existing community recommend method is run into easily because over-fitting phenomenon that restriction caused and the huge problem of calculated amount that data are observed.
For achieving the above object, the invention provides a kind of community's recommend method based on online soft-constraint LDA algorithm, comprise that calculating theme distributes, calculates best candidate community, online updating three big steps.
Described calculating theme distribution step is:
Step a for unique user, grasps its information of posting in each little community, adds up its number of times of posting respectively, as the criterion of user and community relations, with i user U
iAt j the C of community that he participated in
I, jOn the number of times of posting as user U
iWith the C of community
I, jConcern degree of strength, use R
I, jExpression;
Step b is considered as document with the user, and the community that the user participates in is considered as the word in the document, and R
I, jBe exactly the word C of community
I, jAt customer documentation U
iIn occurrence number, set up with the LDA algorithm that user's theme distributes and theme community's distributed model, and with Gibbs method of sampling solving model parameter, the detailed process of finding the solution is:
Be earlier the community's word that occurs in all customer documentations, theme set of Random assignment, as be the word C of community
I, jDistribute the theme set
Utilize iterative formula to upgrade all themes again, restrain up to model parameter:
Wherein, T
-(i, j, k)Current theme t is removed in expression
I, j, kAll remaining afterwards themes distribute,
The expression C of community
I, jBe assigned to the total degree of theme t, α and β are the parameters that adopts empirical value;
Described calculating best candidate community step is:
Step c distributes with Model Calculation theme-community's distribution phi of finding the solution out and user-theme
, utilize following formula to calculate:
Steps d for each community's marking, is found out the community that the user is most interested in, and the standards of grading that sort for community are:
Described online updating step is:
Step e on the model basis of invariable that maintenance has trained, trains separately initiate user model, and method is specially:
The model that maintenance has trained is constant, give the community that occurs in initiate customer documentation word Random assignment theme set separately, the theme that utilizes the iterative formula iteration to upgrade in the new customer documentation again distributes, use the model that has trained in the iterative formula, and only the less number of times of iteration can significantly be raised the efficiency like this;
Step f, the merging of two parts model is as a whole, as new model, be each community's marking more again.
Beneficial effect of the present invention is, utilize the post number of times of user in each community as soft-constraint, use the LDA algorithm, can be under the prerequisite that lacks user characteristics and community's feature, the potential theme that calculates automatically in the community distributes, and finally extrapolates the community that the user is most interested in.Utilize the method for an increment to handle initiate user, the real-time update model reaches the effect of on-line operation, has improved counting yield greatly.
In order to check the validity of our method, we as data set, have collected the information of 409093 10814 communities that user and they participated in MySpace altogether.We choose one at random and form test set from all communities that each user participated in, and all remaining data are as training set.Use the LDA method of two-value and our method simultaneously, carry out the recommendation of community to the user.See then whether the rank in recommendation results of the community in the test set is forward, rank is high more, and ecbatic is good more.Experimental result shows that the present invention has significantly improved accuracy rate and speed that community is recommended.As Fig. 1.
Description of drawings
Fig. 1 compares for the result of S-LDA of the present invention and two-value B-LDA method.
Fig. 2 carries out the system schematic that community is recommended for using the present invention to the user.
Fig. 3 is for calculating the process flow diagram that theme distributes.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is a part of embodiment of the present invention, rather than whole embodiment.Based on embodiments of the invention, the every other embodiment that those of ordinary skills are obtained under the prerequisite of not making creative work belongs to the scope of protection of the invention.
As shown in Figure 1, for using the present invention the user is carried out the system schematic that community is recommended.Described system comprises foreground reptile and backstage arithmetic element.On the foreground, reptile is responsible for obtaining the information that all these systems need handle from network.In the backstage arithmetic element, the method that the present invention comprised is used to the data that the foreground obtains are analyzed and calculated.
Before calculating process of the present invention is elaborated, this example is carried out certain description.The data that this example is related are all come the friend-making website MySpace of the famous community of automatic network.Realize grasping the reptile design of these data and the management method of these data, not within the scope of the invention.
For the user who obtains-community relations matrix, the target of our method is that the potential theme that will calculate in the community distributes, and utilizes this distribution to community's ordering of giving a mark, and recommends the highest community of score to the user at last.
The present invention need preestablish the number of potential theme.When calculating potential theme distribution, the number of topics purpose is chosen the accuracy that can influence final recommendation results, adjusts within the specific limits according to actual conditions in the application.
Below calculating process of the present invention is elaborated.
As shown in Figure 2, the present invention includes the calculating theme and distribute, calculate best candidate community and online updating three big steps.
Be elaborated for calculating the theme distributed process below.
As shown in Figure 3, the flow process of calculating theme distribution is:
Step 101: initialization, be the community's word that occurs in all customer documentations, theme set of Random assignment, as be the word C of community
I, jDistribute the theme set
The size of theme set is R
I, j, representative of consumer u
iWith the c of community
I, jConcern power, count initialized array N1, N2, N3, N4 is used for storage respectively
With
Step 102: initial value i=1 is set, illustrates that current preparation handles first customer documentation;
Step 103: initial value j=1 is set, current preparation process user u is described
iThe first property word;
Step 104: initial value k=1 is set, the current preparation processing c of community is described
I, jFirst theme in the theme set;
Step 105:, k with N1[j], N2[i, k], N3[k] value subtract 1 respectively;
Step 106: utilize formula
Selection has the theme t of maximum probability, as new theme, replaces original theme;
Step 107:, k with N1[j], N2[i, k], N3[k] value add 1 respectively;
Step 108: if k<R
Ij, then k=k+1 forwards step 105 to, otherwise continues;
Step 109: if c
I, jNot user u
iLast community's word, then j=j+1 forwards step 104 to, otherwise continues;
Step 110: if u
iBe not last user, then i=i+1 forwards step 103 to, otherwise finishes.
Be described in detail for calculating the best candidate community process below.
Step 201: distribute with Model Calculation theme-community's distribution phi of finding the solution out and user-theme
, utilize following formula to calculate:
Step 202: for each community's marking, find out the community that the user is most interested in, the standards of grading that sort for community are:
Be described in detail for the online updating process below.
Step 301: for initiate customer documentation, the model that maintenance has trained is constant, distributes the theme set for separately the community's word that occurs in the initiate customer documentation;
Step 302: utilize iterative formula, upgrade the set of newly assigned theme separately, can use the model that has trained in the iterative formula, and the fewer number of times of iteration only;
Step 303: the merging of two parts model is as a whole, as new model, forward step 201 then to.
The above description of this invention is illustrative, and nonrestrictive, and those skilled in the art is understood, and can carry out many modifications, variation or equivalence to it within spirit that claim limits and scope, but they will fall within the scope of protection of the present invention all.
Claims (7)
1. the community's recommend method based on online soft-constraint LDA algorithm is characterized in that, comprises that calculating theme distributes, calculates best candidate community, online updating three big steps:
Described calculating theme distribution step is:
Step a for unique user, grasps its information of posting in each little community, adds up its number of times of posting respectively, with this power that concerns as measurement user and community;
Step b sets up with the LDA algorithm that user's theme distributes and theme community's distributed model, and with Gibbs method of sampling solving model parameter;
Described calculating best candidate community step is:
Step c distributes with Model Calculation theme-community's distribution phi of finding the solution out and user-theme
Steps d for each community's marking, is found out the community that the user is most interested in;
Described online updating step is:
Step e on the model basis of invariable that maintenance has trained, trains separately initiate user model;
Step f, the merging of two parts model is as a whole, as new model, be each community's marking more again.
2. calculating theme distribution step according to claim 1 is characterized in that, among the described step a, selects the post number of times of user in community as the criterion of user and community relations, with i user U
iAt j the C of community that he participated in
I, jOn the number of times of posting as user U
iWith the C of community
I, jConcern degree of strength, use R
I, jExpression.
3. calculating theme distribution step according to claim 1 is characterized in that, among the described step b, the user is considered as document, and the community that the user participates in is considered as the word in the document, and R
I, jBe exactly the word C of community
I, jAt customer documentation U
iIn occurrence number.
4. calculating theme distribution step according to claim 1 is characterized in that, among the described step b, the solving model parametric procedure is specially:
Be earlier the community's word that occurs in all customer documentations, theme set of Random assignment, as be the word C of community
I, jDistribute the theme set
Utilize iterative formula to upgrade all themes again, restrain up to model parameter:
6. calculating best candidate according to claim 1 community step is characterized in that, in the described steps d, for the standards of grading of community's ordering are:
7. online updating step according to claim 1 is characterized in that, among the described step e, the method that initiate user model is trained separately is specially:
The model that maintenance has trained is constant, give the community that occurs in initiate customer documentation word Random assignment theme set separately, the theme that utilizes the iterative formula iteration to upgrade in the new customer documentation again distributes, use the model that has trained in the iterative formula, and only the less number of times of iteration can significantly be raised the efficiency like this.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010284218 CN101968798A (en) | 2010-09-10 | 2010-09-10 | Community recommendation method based on on-line soft constraint LDA algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010284218 CN101968798A (en) | 2010-09-10 | 2010-09-10 | Community recommendation method based on on-line soft constraint LDA algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101968798A true CN101968798A (en) | 2011-02-09 |
Family
ID=43547955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010284218 Pending CN101968798A (en) | 2010-09-10 | 2010-09-10 | Community recommendation method based on on-line soft constraint LDA algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101968798A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102915335A (en) * | 2012-09-17 | 2013-02-06 | 北京大学 | Information associating method based on user operation record and resource content |
CN102929894A (en) * | 2011-08-12 | 2013-02-13 | 中国人民解放军总参谋部第五十七研究所 | Online clustering visualization method of text |
CN103365978A (en) * | 2013-07-01 | 2013-10-23 | 浙江大学 | Traditional Chinese medicine data mining method based on LDA (Latent Dirichlet Allocation) topic model |
CN104572623A (en) * | 2015-01-12 | 2015-04-29 | 上海交通大学 | Efficient data summary and analysis method of online LDA model |
CN105608116A (en) * | 2015-12-14 | 2016-05-25 | 成都陌云科技有限公司 | Interaction history data based personalized recommendation method |
CN105989077A (en) * | 2015-02-09 | 2016-10-05 | 北京字节跳动科技有限公司 | Recommendation-based interest community user guide method |
CN106919997A (en) * | 2015-12-28 | 2017-07-04 | 航天信息股份有限公司 | A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA |
CN107391637A (en) * | 2017-07-10 | 2017-11-24 | 江苏省现代企业信息化应用支撑软件工程技术研发中心 | For possessing the group recommending method of geographical social information |
CN107562836A (en) * | 2017-06-07 | 2018-01-09 | 北京航空航天大学 | Method is recommended based on the answerer of topic model and machine learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080319974A1 (en) * | 2007-06-21 | 2008-12-25 | Microsoft Corporation | Mining geographic knowledge using a location aware topic model |
CN101706812A (en) * | 2009-11-24 | 2010-05-12 | 清华大学 | Method and device for searching documents |
CN101710333A (en) * | 2009-11-26 | 2010-05-19 | 西北工业大学 | Network text segmenting method based on genetic algorithm |
-
2010
- 2010-09-10 CN CN 201010284218 patent/CN101968798A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080319974A1 (en) * | 2007-06-21 | 2008-12-25 | Microsoft Corporation | Mining geographic knowledge using a location aware topic model |
CN101706812A (en) * | 2009-11-24 | 2010-05-12 | 清华大学 | Method and device for searching documents |
CN101710333A (en) * | 2009-11-26 | 2010-05-19 | 西北工业大学 | Network text segmenting method based on genetic algorithm |
Non-Patent Citations (2)
Title |
---|
《PNAS》 20040306 Griffiths T L,Steyvers M. Finding scientific topics 5228-5235 第101卷, 第1期 2 * |
《自动化学报》 20091231 石晶,范猛,李万龙 基于LDA模型的主题分析 1586-1592 第35卷, 第12期 2 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929894A (en) * | 2011-08-12 | 2013-02-13 | 中国人民解放军总参谋部第五十七研究所 | Online clustering visualization method of text |
CN102915335A (en) * | 2012-09-17 | 2013-02-06 | 北京大学 | Information associating method based on user operation record and resource content |
CN102915335B (en) * | 2012-09-17 | 2016-04-27 | 北京大学 | Based on the information correlation method of user operation records and resource content |
CN103365978A (en) * | 2013-07-01 | 2013-10-23 | 浙江大学 | Traditional Chinese medicine data mining method based on LDA (Latent Dirichlet Allocation) topic model |
CN103365978B (en) * | 2013-07-01 | 2017-03-29 | 浙江大学 | TCM data method for digging based on LDA topic models |
CN104572623A (en) * | 2015-01-12 | 2015-04-29 | 上海交通大学 | Efficient data summary and analysis method of online LDA model |
CN105989077A (en) * | 2015-02-09 | 2016-10-05 | 北京字节跳动科技有限公司 | Recommendation-based interest community user guide method |
CN105989077B (en) * | 2015-02-09 | 2019-05-07 | 北京字节跳动科技有限公司 | A kind of interest community user's bootstrap technique based on recommendation |
CN105608116A (en) * | 2015-12-14 | 2016-05-25 | 成都陌云科技有限公司 | Interaction history data based personalized recommendation method |
CN105608116B (en) * | 2015-12-14 | 2019-03-15 | 成都陌云科技有限公司 | Personalized recommendation method based on interactive history data |
CN106919997A (en) * | 2015-12-28 | 2017-07-04 | 航天信息股份有限公司 | A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA |
CN107562836A (en) * | 2017-06-07 | 2018-01-09 | 北京航空航天大学 | Method is recommended based on the answerer of topic model and machine learning |
CN107391637A (en) * | 2017-07-10 | 2017-11-24 | 江苏省现代企业信息化应用支撑软件工程技术研发中心 | For possessing the group recommending method of geographical social information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101968798A (en) | Community recommendation method based on on-line soft constraint LDA algorithm | |
CN105260390B (en) | A kind of item recommendation method based on joint probability matrix decomposition towards group | |
Olczyk | A systematic retrieval of international competitiveness literature: a bibliometric study | |
CN106980692A (en) | A kind of influence power computational methods based on microblogging particular event | |
CN108154430A (en) | A kind of credit scoring construction method based on machine learning and big data technology | |
CN107577688A (en) | Original article influence power analysis system based on media information collection | |
CN103731738A (en) | Video recommendation method and device based on user group behavioral analysis | |
CN108334575A (en) | A kind of recommendation results sequence modification method and device, electronic equipment | |
CN106570525A (en) | Method for evaluating online commodity assessment quality based on Bayesian network | |
CN103699626A (en) | Method and system for analysing individual emotion tendency of microblog user | |
CN102439597A (en) | Parameter deducing method, computing device and system based on potential dirichlet model | |
CN108830416A (en) | Ad click rate prediction framework and algorithm based on user behavior | |
CN107562836A (en) | Method is recommended based on the answerer of topic model and machine learning | |
CN104636426A (en) | Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions | |
Mediero et al. | Probabilistic calibration of a distributed hydrological model for flood forecasting | |
CN107292785A (en) | One kind is set a question method and system | |
CN112612942B (en) | Social big data-based fund recommendation system and method | |
CN105302880A (en) | Content correlation recommendation method and apparatus | |
Papacharalampous et al. | A review of machine learning concepts and methods for addressing challenges in probabilistic hydrological post-processing and forecasting | |
Zhai et al. | Rice irrigation schedule optimization based on the AquaCrop model: study of the Longtouqiao irrigation district | |
CN102664744A (en) | Group-sending recommendation method in network message communication | |
CN116911962A (en) | Article selecting device and method based on data model | |
CN101739418A (en) | Method for sequencing multi-index comprehensive weight audio-video album | |
Moerkerken et al. | Which farmers adopt solar energy? A regression analysis to explain adoption decisions over time | |
CN112269932A (en) | Big data-based small and medium enterprise resource integration processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20110209 |