CN102004774A - Personalized user tag modeling and recommendation method based on unified probability model - Google Patents

Personalized user tag modeling and recommendation method based on unified probability model Download PDF

Info

Publication number
CN102004774A
CN102004774A CN 201010546780 CN201010546780A CN102004774A CN 102004774 A CN102004774 A CN 102004774A CN 201010546780 CN201010546780 CN 201010546780 CN 201010546780 A CN201010546780 A CN 201010546780A CN 102004774 A CN102004774 A CN 102004774A
Authority
CN
China
Prior art keywords
topic
label
user
model
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010546780
Other languages
Chinese (zh)
Other versions
CN102004774B (en
Inventor
唐杰
张宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2010105467801A priority Critical patent/CN102004774B/en
Publication of CN102004774A publication Critical patent/CN102004774A/en
Application granted granted Critical
Publication of CN102004774B publication Critical patent/CN102004774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a personalized user tag modeling and recommendation method based on a unified probability model, comprising the following steps: S1, carrying out statistics on tagging behaviors of users on a social tagging site; S2, carrying out formal definition on questions tagged by the users; S3, establishing a topic model based on user tagging, wherein the topic model is a unified probabilistic model and called a UdT model; S4, establishing a frame of a tag recommendation system based on the UdT model, wherein the frame is recommended through learning the interest of the users and according to semantic information included in the interest; and S5, verifying the frame of the tag recommendation system. Experimental results show that by using the method of the invention, user interest can be effectively explored and the accuracy of tag recommendation can be improved.

Description

Modeling of personalized user label and recommend method based on unified probability model
Technical field
The invention belongs to Internet technical field, relate in particular to the study understanding and the recommended technology of personalized user label in the social label website, be specially a kind of modeling of personalized user label and recommend method based on unified probability model.
Background technology
Society's label (Social tagging) is the key property of Web2.0, and it allows the user freely to mark various resources, for example webpage, scientific paper and multimedia resource.Society's label can help user's taxonomic revision and inquiry various information, and simultaneously, it all has very big value for a lot of practical applications, comprises web search, expansion inquiry, personalized search, Internet resources classification and cluster.Appearance and fast development along with social label website, for example social label website (Flickr, Picassa, YouTube, Plaxo), blog (Blogger, WordPress, LiveJournal), encyclopaedia (Wikipedia, PBWiki), microblogging (Twitter, Jaiku), tag system undoubtedly become one of important means of the extensive community data that increases of tissue.
Recently, label recommends to become a big focus of social label research.Label is recommended to recommend maximally related label with user's resources shared exactly.The effect that label is recommended mainly contains two aspects: the one, and for social label website, label recommends to enlarge the tally set of resource, thus the indexed set when increasing retrieve resources; The 2nd, similar with other commending system for the user, the purpose that label is recommended is to strengthen the user experience of user in the mark process, shortens user's think time.Label in the practical application is recommended more complicated and challenging.At first, the resource pouplarity satisfies power law in the actual social label website, and this resource that shows the overwhelming majority only was marked 1 time or 2 times, so certain resource is arranged probably only by one or do not marked by Any user.In this case, collaborative filtering is just no longer suitable, so need further to inquire into the contact between the Internet resources and be labeled in label on other similar resource.Secondly, different users can use the same resource of different label for labelling, and this depends on personal habits.Therefore, need the label commending system of a user individual of design to increase user experience, encourage the user to mark more resources.Personalized labels is recommended and will be recommended in conjunction with user's mark history, and purpose is at each specific user, specific resource is carried out label recommend.
Present personalized labels recommends to mainly contain two kinds of methods: the method that (1) is content-based; (2) based on the method for graph structure.Wherein content-based method is commonly used the interest at family from text message (description of web page contents, scientific paper, label and resource) middle school usually, and then can be for newly user and new resources are recommended.Comparing content-based method based on the method for graph structure has more hypothesis and constraint condition usually, for example supposes that all want all to occur in recommended resource and the user's data in the past.Yet this hypothesis normally can't satisfy in actual applications.This is because the label commending system need still can be made rational recommendation under the situation that system knows nothing Internet resources or user.Two kinds of methods are compared, and the advantage of content-based method is that it is applicable to new user and new resources, but the accuracy rate of this method is not as the method based on graph structure.And only be applicable to old user and old resource based on the method for graph structure, though the accuracy rate height can not be handled the situation of new user and new resources.
In order to make full use of the network structure information of social labeling system, need carry out modeling to the relation between user, resource and the label.There are many researchs that social label network is being carried out modeling at present.For example, social tag system is described as three metanetworks that a node that is made of user, label and resource is formed.This three metanetwork is broken down into a dual network and one one metanetwork and learns wherein potential structure.The researcher who has is modeled to one three metanetwork with social tag system, has increased social dimension (user), and the ontology model under traditional dual network is extended to ternary.The researcher who has has proposed a social label network figure, and wherein label is regarded as connecting the bridge of isomery field different resource, has designed the semi-supervised sorting algorithm based on this network chart.These methods are the social labeling system of research on a network chart all.Another method of studying social labeling system is to simulate social label for labelling process with a generation model.For example, people such as Wu have designed a probability generation model, and in the model, three entities (label, resource, user) in the social tag system are mapped to same concept space, represent this concept space with a multi-C vector, wherein the corresponding knowledge class of each dimension.In addition, the level Bayesian model based on LDA (Latent Dirichlet Allocation) and PLSA (Probabilistic Latent Semantic Analysis) also is used to model society mark.
The rise of Web2.0 has driven the progress of recommending for label.There is certain methods to be based on the historical information of user's mark.For example AutoTag is the label commending system that is in particular blog design by Gilad Mishne.This system has adopted information retrieval method to estimate similarity between the blog first, and for wanting recommended blog to seek similar blog, and the label that will be labeled on these similar blogs sorts, and the sort by frequency of utilization draws the label of recommendation at last.This system also considers user profile, and the information retrieval method of use is comparatively simple.Another label commending system is the FolkRank algorithm, and it utilizes the graph structure information in the social label network.This algorithm is the expansion of famous algorithm PageRank.The researcher who has learns the ordering of label by the method for decomposing based on tensor, thereby recommends.The researcher who also has utilizes the method for tensor dimensionality reduction to carry out label and recommends.The above-mentioned method based on graph structure depends on social comparatively closely label network, and except these methods, some are also very effective based on method of semantic, and the algorithm of people such as Wu design is for example arranged.Yet these methods are not all considered the interest that the user is specific.
People such as Xu utilize collaborative markup information to carry out label and recommend.Their recommend method is intended recommending those to be labeled in label on the target resource by large quantities of users, and notional each face that repeats to allow the label recommended out to cover resource of wishing the label that to recommend by minimizing, the employed method of this algorithm and Del.icio.us website is similar, all can not handle new resource.The researcher who has designs the P-tag algorithm and automatically generates personalized label for webpage.These labels that automatically generate not only relevant with text message on the webpage also with viewer's desktop on file content be correlated with.The researcher who has recommends problem at the label of Flickr website, on the Flickr website, when a user submits a secondary picture and some labels to, system can show a label Candidate Set of having arranged preface automatically to the user, and this label Candidate Set is to generate by the label of user's input before and the common relation that occurs of other labels.But this method depends on the user and imports some label by hand, and other labels are automatically further recommended by system then, can not be applied to fully have only resource but on the problem that marked without any the user.Moreover, because they have only considered the data of common appearance, so the problem of topic drift may occur.Someone has introduced a kind of interactive label commending system of personalization, is in the Flickr website equally, and the meeting special consideration user's of system labeled data is recommended.Because this algorithm also depends on label with existing, so the shortcoming of method above also existing.
More and more researchers begins to pay close attention to the information that depends on the user and wishes can be familiar with the user further and understand their potential interest and preference from their mark behavior.User's markup information was recommended before the researcher who has attempted utilizing.Used label has shown user's preference and interest to a great extent before the user, and is very helpful for recommending.The label that the behavior of the researcher's analysis user browse network that has comes predictive user should use for certain width of cloth picture.The researcher who has uses a method based on the stratification label clustering to carry out personalized label and recommends.Some other researcher has studied the label commending system of real-time high-efficiency.The researcher who also has has designed the automated tag system for text search and digital library's design.
Because problem space is huge, so efficient is the same with accuracy extremely important.In the method for above design, they use the method cut apart figure to improve accuracy rate to reduce algorithm complex simultaneously.In actual applications, the very big and user of data set wishes to obtain real-time recommendation results.Therefore, how to guarantee to carry out expeditiously that personalized user recommends is a major challenge in this field.Simultaneously, the dynamic perfromance of society's mark also is that another studies a question.
Summary of the invention
(1) technical matters that will solve
The technical problem to be solved in the present invention is, how a kind of modeling of personalized user label and recommend method that is applied in the internet is provided, thereby define personalized label for labelling behavior, and the label of certain resource of its mark is predicted by the historical record of user's mark.
(2) technical scheme
For solving the problems of the technologies described above, the invention provides a kind of modeling of personalized user label and recommend method based on unified probability model, modeling of personalized user label and recommend method based on unified probability model may further comprise the steps:
User's mark behavior on S1, the social label of the statistics website;
S2, user's mark problem is carried out formalization definition;
The topic model that S3, foundation mark based on the user, it is a uniform probability model, is called the UdT model; Unified probability model is a kind of all modeled tasks all to be described in a probability model in the model.
S4, set up the framework based on the label commending system of described UdT model, described framework is to recommend by study user's interest and according to the semantic information that comprises in the interest;
The framework of S5, the described label commending system of checking.
Wherein, described step S2 specifically may further comprise the steps:
S21, user's mark behavior form is turned to a tlv triple, described tlv triple comprises user, label and three elements of resource;
Topic in S22, the formalization definition mark problem distributes, and specifically, sets up the T dimension topic distribution vector θ corresponding to user u ∈ U u∈ R T, wherein, vectorial θ ueveryly satisfy
Figure BSA00000348338300051
Each element θ UzExpression user u is to the interested probability of topic z; And the foundation T dimension topic distribution vector θ ∈ R corresponding with the document d ∈ D that relates to different topics T, wherein the every of vectorial θ satisfies
Figure BSA00000348338300052
Each element θ wherein zExpression document d relates to the probability of topic z, and R represents the real number vector;
S23, set up the topic model based on user interest, wherein, user interest is described as the combination of various topics, for the interest of different topics different probability is arranged, and this model is with multivariate normal distribution { p (the t| θ of an employed label t of this user uRepresent that { p (t| θ distributes uIn the label t of probable value maximum represented this topic semantically;
S24, set up the topic model of document, the topic model of the document is made up of two normal distributions: the probability distribution { p (t| θ) } of the probability distribution of word w { p (w| θ) } and label t, θ represents the multivariate normal distribution of the topic of document d.
Wherein, described step S3 is specially:
Estimate two class unknown parameters in the UdT model: the distribution θ of the topic of (1) M document, based on the topic distribution θ of user interest u, the distribute word distribution phi of λ and T topic of the Bernoulli Jacob of M document; (2) for each label t Di, relative throwing coin is s as a result Di, the topic z that distributes Di, described throwing coin result satisfies the Bernoulli Jacob λ that distributes; For each the word w among the document d Di, relative topic z ' DiFor used each the label t of user u Ui, relative topic z Ui
Wherein, the method for estimated parameter is: at first estimate (a): the posteriority about topic z distributes, and utilizes it to estimate topic distribution θ in first generative process u, estimate then (b): about throw coin as a result the posteriority of s and topic z distribute, utilize it to obtain second parameter θ in the generative process then, λ, φ and ψ, wherein ψ is the distribution of word, described first generative process is used for the topic of modelling user interest and distributes; Described second generative process is used for the topic of document of modelling mark and distributes.
Wherein, in step S4, described UdT model combined with language model set up the framework of described label commending system.
Wherein, the method that described UdT model is combined with language model is as follows:
The at first mark normalization that two Model Calculation are gone out, then according to the shared weight of mark with two kinds of mark additions, thereby find the label that only in the candidate collection of a model, occurs; Perhaps
Earlier the label that utilizes the UdT model to recommend is sorted, select with the information retrieval method rearrangement then that the label of some sorts again before the rank.
(3) beneficial effect
The present invention has designed a topic model (UdT) based on the user, comes simultaneously to user's the interest and topic distribution the carrying out modeling of document.Compare existing method, the UdT model can automatically identify the label which is marked and depend on user's special interests, and the label which is marked is the decision that distributes of the topic by resource integral body.Then, use the UdT model of designing to solve label and recommend problem.There are two kinds of different combination strategies to utilize the UdT model to improve the accuracy rate that label is recommended.Experimental result shows that method that the present invention proposes can excavate user's interest effectively and improve the accuracy rate that label is recommended.
Description of drawings
Fig. 1 is a UdT illustraton of model proposed by the invention;
Fig. 2 is the framework of the label commending system that designs in the inventive method;
Fig. 3 is the starting point (the example explanation of social label website) of modelling of the present invention;
Fig. 4 is the ACT illustraton of model;
Fig. 5 is to use the precision chart of Bibsonomy data set; Wherein LM represents to use language model to recommend label; ACT represents to recommend label in conjunction with language model and ACT model; UdT1 represents to use in conjunction with strategy one, recommends label with the UdT model; UdT2 represents to use in conjunction with strategy two, recommends label with the UdT model.
Fig. 6 is the performance synoptic diagram of recommending about the label of topic number;
Fig. 7 is the LDA illustraton of model;
Fig. 8 is based on the general topic model of user v.s.; Wherein UdT represents the model that the present invention designs, and UdT-represents the topic model user interest not taken into account;
Fig. 9 is the case study synoptic diagram;
Figure 10 is the method flow diagram of the embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention will be further described.
The present invention is by the statistical study to real data, research user's mark behavior and mark purpose, the personalization mark problem of social label website is carried out the formalization definition, wherein will mark the behavior form and turn to tlv triple, and each user's interest is described as a topic distributes, and the label that will be labeled on each resource is modeled to a general label or the label based on user's special interests, and the both learns in a probability generative process.Wherein, proposed a unified probability model (User-dependent Tagging Model, be called for short UdT model) and described mark behavior, the group that this model has been estimated that general topic distributes and distributed based on specific user's topic based on the user.Then, designed a label recommend method, and provided two kinds of recommendation strategies: linear and and the language model of mark based on the UdT model.At last, on the Bibsonomy website of True Data, compare assessment with baseline algorithm (basic language model and author-meeting-topic model (Author-Conference-Topic model, ACT model)).As shown in figure 10, method of the present invention may further comprise the steps:
Step 1: add up the mark behavior of user individual on the social label website;
Step 2: user's mark problem (can be described as social mark problem) is carried out the formalization definition;
Described step 2 comprises:
(1) user is marked the behavior form and turn to a tlv triple.There is following element a social label website: user, label and resource.The user represents that with u ∈ U U represents the set that all users form; Label represents that with t ∈ T T represents the set of all labels; Resource represents that with r ∈ R R represents the set of all resources.The user, label and resource have constituted a tlv triple, label recommend the input data (being user's mark) of problem be D={ (u, r, t) }, u ∈ U, r ∈ R, t ∈ T, output data (i.e. the label of Tui Jianing) be T (u, r)=arg max P (t|u, r).Therefore, user's once mark behavior can be regarded as by such tlv triple (r i, t j, u k) form, its expression user u kTo resource r iMark, the label of use is set t jSo,, following training set data is arranged: D={ (r in order to learn a marking model based on the user 1, t 1, u 1) ..., (r N, t N, u N).Wherein, t iExpression user u iTo resource r iThe tag set of mark, N are the mark total degrees.What society's mark problem was considered is that the label that has marked in the social label website is carried out modeling and analysis; And the consideration of label recommendation problem is on the basis of analyzing social mark problem new resource to be carried out label to recommend.
In addition, other symbols that will use and explanation are as shown in table 1 below among the present invention:
Table 1
(2) formalization defines the topic distribution in the social mark problem.In society's mark problem, a user has many different interest and corresponding topic usually.Say that formally each user u ∈ U has a T dimension topic distribution vector θ corresponding with it u∈ R T, wherein the every of vector satisfies Each element θ in the vector UzExpression user u is to the interested probability size of topic z.Similarly, for a document, it also can relate to the information about each different topics, and therefore, each document d ∈ D also has the topic distribution vector θ ∈ R of a T dimension corresponding with it T, wherein the every of vector satisfies
Figure BSA00000348338300101
Similarly, each element θ wherein zRepresented that document d relates to the probability size of topic z.
(3) foundation is based on the topic model of user interest.Based on the topic model of user u interest is multivariate normal distribution { p (t| θ with its employed label uRepresent.Among the present invention, user's mark behavior is seen as the manifestation mode of user's interest, therefore can investigate user's interest from the used label of user.User's interest is described as the combination of various topics, but for the interest of different topics different probability is arranged.The label that this model hypothesis user uses is followed label distribution p (the t| θ corresponding to each topic u).Therefore, the label of probable value maximum has been represented this topic in the distribution semantically.
(4) set up the topic model of document.Different with the topic model of user interest is that the topic model of document is made up of two normal distributions: the probability distribution { p (t| θ) } of the probability distribution of word w { p (w| θ) } and label t.Distribution { p (w| θ) } is followed in the sampling of word in the model hypothesis document, and distribution { p (t| θ) } is followed in the sampling that is labeled in the label on the document.
Step 3: propose label model (User-dependent Tagging Model is called for short the UdT model), as shown in Figure 1 based on the user.This model carries out modeling to document, label and user's topic distribution simultaneously, and can distinguish the mark behavior of user individual and general mark behavior.The basic thought of UdT model is simultaneously the document and the user interest of mark to be carried out modeling with two relevant generative processes.First generative process (seeing the right half part of Fig. 1) is used for the topic of modelling user interest and distributes.Its generative process is: (1) distributes a label distribution phi about this topic z respectively for each topic z zWith word distribution phi about this topic z Z ', φ zAnd φ Z 'All satisfy the Di Li Cray distribution that priori (probability) is respectively β and β '; (2), be α at first for user u distributes a priori for each user u ∈ U uDi Li Cray distribution θ u, distribute as topic about user u; Secondly for each label t that is used by user u Ui, from topic distribution θ uDistribute a topic z UiAnd from word distribution about topic
Figure BSA00000348338300102
Distribute a word w UiSecond generative process (seeing the left-half of Fig. 1) is used for the topic of document of modelling mark and distributes, and its concrete generative process is: for each document d, at first be Di Li Cray distribution θ that priori is (probability) α of document d distribution d, distribute as topic about document d; Secondly judge that according to the value of s label is relevant or relevant with the topic of unitary document with user personalized interest.The value of s satisfies distribution λ=P (s=0|d)~beta (γ u, γ); Then, for each the label t that is labeled on the document d Di, throw coin s as a result with this label Di, s wherein DiSatisfy Bernoulli Jacob about the λ s that distributes Di=bernoulli (λ); If s=0 is then from the topic distribution θ based on the user uDistribute a topic z UiAnd from distributing Distribute a label t DiOtherwise from general topic distribution θ dDistribute a topic z DiAnd from distributing
Figure BSA00000348338300112
Distribute a label t DiOnce more for each word w of document d DiFrom distribution z dDistribute a topic z ' DiAgain according to distribution
Figure BSA00000348338300113
Distribute a word w DiAt last, whether model has used a Bernoulli Jacob to distribute to judge the label that is labeled in document based on user's personal interest.
Described step 3 comprises:
Step 3-1: analyze and treat estimated parameter.In order to find the solution the UdT model, need the unknown parameter in the estimation model, parameter promptly obtains this UdT model after determining.Two class unknown parameters are arranged: (1) M document topic distribution θ, the topic distribution θ of user interest in the UdT model u, the distribute word distribution phi of λ and T topic of the Bernoulli Jacob of M document; (2) for each label t Di, relative throwing coin result is s Di, the topic of distribution is z Di, for each the word w among the document d Di, relative topic is z ' Di, for used each the label t of user u Ui, relative topic is z UiUsually it is impossible directly finding the solution such probability model.When solving model, the present invention is not the direct estimation Model parameter, but at first estimates (a): the posteriority about topic z distributes, and utilizes it to estimate topic distribution θ in first generative process then u, estimate then (b): about throw coin as a result the posteriority of s and topic z distribute, utilize it to obtain second parameter θ in the generative process then, λ, φ and ψ, wherein ψ is that word distributes.
Wherein, for the estimation of (a), the sampling algorithm of use and LDA model class seemingly, the two different place is: the LDA modeling statistics be the probability distribution that topic occurs in each document, and here statistics be the probability distribution of the topic that samples of each user.That is, use following posterior probability:
Figure BSA00000348338300114
N wherein UzBe that topic is sampled the number of times about the polynary normal state topic distribution of user u; n ZtIt is the number of times that label t is generated by topic z; And frequency n -uiIn subscript-ui represent number of times except that present example.Upper and lower target implication in the formula of the present invention is analogized with this rule.
Wherein, for the estimation of (b), its principle adopts and (a) estimates similar methods all to be to use Gibbs Sampling method, but different is, needs during estimation simultaneously that s and topic z sample as a result to throwing coin.Correspondingly, the posterior probability of the posterior probability of the label t that samples from the topic z based on the user and the label t that samples from unitary document topic z is defined as respectively:
P ( z t di , s t di = 0 | t d , t u , z - di , γ , γ u , α u , β ) =
n d 0 - di + γ u n d 0 - di + n d 1 - di + γ u + γ · n d 0 Z t di - di + n uz t di + α u Σ Z ( n d 0 z - di + n u + α u ) · n z t di t di - di + n z t di t di u + β Σ t ( n z t di t - di + n z t di t u + β )
P ( z t di , s t di = 1 | t d , t u , z - di , γ , γ u , α , β ) =
n d 1 - di + γ n d 0 - di + n d 1 - di + γ u + γ · n d 1 Z t di - di + α Σ Z ( n d 1 z - di + α ) · n z t di t di - di + n z t di t di u + β Σ t ( n z t di t - di + n z t di t u + β )
N wherein D0Be document d by number of times based on the topic profile samples of user interest; n D1Be the number of times of document d by the topic profile samples of unitary document content; Frequency n uSubscript u represent that it has all counted number to all users.For example,
Figure BSA00000348338300125
Expression label t is assigned to the number of times of topic z by all users.
In the process that above-mentioned parameter is estimated, algorithm needs the inferior number vector of a M * T of access (document * topic), the inferior number vector of a T * K (topic * label), one of M * 2 (document * coin value) number of times vector sum | the inferior number vector of U| * T (user's number * topic), || the expression modulo operation.These vectors have been arranged, and algorithm can be estimated the topic distribution θ of document easily Dz, user's topic distribution θ UtAnd the label distribution phi of topic ZvBy following computing formula:
θ dz = n dz + α Σ Z ′ ( n dz ′ + α )
θ ut = n ut + α u Σ Z ′ ( n qz ′ + α u )
φ zt = n zt d + n zt q + β Σ t ′ ( n zt ′ d + n zt ′ u + β )
Can get by the algorithm complex analysis to the UdT model, complexity is
Figure BSA00000348338300132
Wherein L is the iterations of Gibbs sampling algorithm,
Figure BSA00000348338300133
Be the mean value that the user uses the label number, and
Figure BSA00000348338300134
Be the mean value of word number in the document.
Step 4: set up based on the label of UdT model and recommend framework.The emphasis of language model is an extracting keywords from title or content, and the starting point of the relevant UdT model of topic is study user's interest and recommends according to potential semantic information.The present invention combines two kinds of methods and has set up the framework of label commending system, as shown in Figure 2.
Step 4 relates to two in conjunction with tactful:
(1) strategy one: UdT1.At first, the mark normalization with two Model Calculation go out promptly, makes ‖ score1 ‖ =‖ score2 ‖ , wherein score1 is the mark that calculates of language model and score2 is the mark that the UdT Model Calculation is come out.If certain label t appears in the candidate collection of language model and does not appear in the training set, then score2[t]=0; On the contrary, do not appear in the Candidate Set of language model if certain label t appears in the training set, then score1[t]=0.The mark of then final label t is:
score[t]=λ c·score1[t]+(1-λ c)·score2[t]
Here λ cBe the weight of mark addition, use addition here rather than the reason that multiplies each other is that the candidate collection of two models differs bigger, therefore the score1 of many labels is arranged or the score2 value is 0.Can help us to find those labels that occurs in a candidate collection mark addition, this also is the purpose in conjunction with strategy.
(2) strategy two: UdT2.Second kind that proposes among the present invention is to use following formula that label is sorted earlier in conjunction with strategy:
P ( t | u ′ , r ′ ) = Σ z P ( t | z ) avg w ∈ r ′ P ( z | u )
Wherein
Figure BSA00000348338300136
Word wherein NumThe number of expression word, resource of r ' expression.The label of selecting rank forward (preceding 500) then uses following formula to sort again:
P ( t | r * ) = N d N d + λ · tf ( t , d ) N d + ( 1 - N d N d + λ ) · tf ( t , D ′ ) N ′ D
N wherein dBe given resource r *Description d in the number of various words, (t d) is given resource r to tf *Description d in the word frequencies (occurrence number of word) of word t, N ' DBe the number of the various words of whole data centralization, tf (t, D ') is the word frequencies (occurrence number of word) of the word t among the whole data set D '.λ is a Di Li Cray smoothing factor, is set as the average length of document usually, promptly here is N ' D/ | D ' |.UdT2 proposed by the invention has two advantages: 1. do the candidate collection that can enlarge language model like this, be mainly reflected in from coupling and extend to coupling based on topic based on keyword, as two labels " data mining " and " knowledge engineering ", if on the keyword angle then can't mate, but then can mate from the topic angle; 2. owing to the sparse property of data set, the UdT model does not have very big applicability for new user or new resources.The result who obtains with simple information retrieval method rearrangement UdT model can improve the accuracy that final label is recommended.
Step 5: test by True Data and to verify and analyze the UdT model and recommend framework.
Step 5 comprises:
Step 5-1: experimental design.This step comprises training set and the test set of using in the selection experiment, sets up database table, provides the performance index of evaluation and test UdT model, and specifies the pedestal method that is used to contrast;
Step 5-2: recommend experimental result to outgoing label.Label, the distinct methods that this step provides on the data set corresponding popular label and recommendation respectively carries out the graphic analyses that performance index value that label recommends and user interest exert an influence to recommendation results etc.;
Step 5-3: experimental result discussion and analysis.This step comprises the impact effect of analyzing the topic number, will compare based on user's topic model and general topic model, the example case is studied, and the artificial explanation etc. of listing the topic title.
In one embodiment of the invention, with social label and the True Data collection published in the shared system (can referring to www.bibsonomy.org) is example, to the UdT model and the generative process of recommending framework, and how to use the UdT model and recommend the label relevant and excavation to be illustrated based on user's customized information with given resource.
(1) society's mark problem-instance is analyzed.Fig. 3 has shown the example of a social mark problem.One of the left side shows 4 different users among the figure, is used all labels of their difference in each frame, obtains two topics of each user with the LDA model of standard: data mining and machine learning from used all labels of user.One of the right shows 5 different resources among the figure, and the arrow in the middle of the figure indicates that the user marks the process of resource, and for example, user 1 has marked resource 1 and 2, and user 3 has marked resource 2,4, and 5.Each frame on the right is the content of resource, the label that has text message and mark to use.The basic topic model of same use has obtained the distribution about data mining and these two topics of machine learning from the text message of resource.Based on the analysis to this example, the present invention wishes to use a new marking model based on the user, learns special interests and the resource implicit topic distribution of user for specific resources simultaneously according to the label of mark, and carries out the personalized user label and recommend.
(2) generate the UdT model.
In step (2), the invention still further relates to the parameter estimation of UdT model, its derivation of equation process is as follows:
Derivation formula for Gibbs Sampling (gibbs sampler):
P ( z ui | z - ui , t u , α u , β ) = n uz ui - ui + α u Σ Z ( n uz - ui + α u ) n z ui t ui - ui + β Σ t ( n z ui t - ui + β ) ,
Following joint distribution is arranged:
p(t u,z uu,β)=p(t u|z u,β)p(z uu)
According to generating label t uMultivariate normal distribution, can obtain:
p ( t u | z u , φ ) = Π i = 1 N u φ z ui t ui = Π z u = 1 T Π i = 1 V u φ z ui t ui n z ui t ui
N wherein uBe the label number that user u uses, Be multivariate normal distribution, and
Figure BSA00000348338300154
Be label t UiBe assigned to topic z UiOn number of times.
φ is carried out integration, can obtain:
p ( t u | z u , β ) = ∫ p ( tu | zu , φ ) p ( φ | β ) dφ = ∫ Π z u = 1 T 1 Δ ( β ) Π i = 1 V u φ z ui t ui n z ui t ui + β - 1 d φ z = Π z u = 1 T Δ ( n zu + β ) Δ ( β )
Wherein n zu = { n z u t u i } i = 1 V u And Δ ( β ) = Π i = 1 k Γ ( β i ) Γ Σ i = 1 k β i
Equally, can derive and obtain following formula:
p ( z u | α u ) = Π u = 1 U Δ ( n u + α u ) Δ ( α u ) , n u = { n uz u ( i ) } i = 1 T
Above-mentioned formula is multiplied each other, can obtain:
p ( z u , t u | α u , β ) = Π z u = 1 T Δ ( n zu + β ) Δ ( β ) Π u = 1 U Δ ( n u + α u ) Δ ( a u )
From joint distribution, can obtain following condition and distribute:
p ( Z ui | z - ui , t u , α u , β )
= p ( z u , t u | α u , β ) p ( z - ui , t u | α u , β ) = p ( t u | z u , β ) p ( t u | z - ui , β ) · p ( z u | α u ) p ( z - ui | α u ) = Δ ( n zu + β ) · Δ ( n u + α u ) Δ ( n z - ui + β ) · Δ ( n - ui + α u )
Γ ( n z u t u ( i ) + β ) Γ ( Σ i = 1 V n ( n z u t u ( i ) + β ) ) · Γ ( n uz u ( i ) + α u ) Γ ( Σ i = 1 T ( n uz u ( i ) + α u ) ) Γ ( n z u t u ( i ) + β - 1 ) Γ ( Σ i = 1 V n ( n z u t u ( i ) + β ) - 1 ) · Γ ( n uz u ( i ) + α u - 1 ) Γ ( Σ i = 1 T ( n uz u ( i ) + α u ) - 1 ) = n uz ui - ui + α u Σ Z ( n uz - ui + α u ) n z ui t ui - ui + β Σ t ( n z ui t - ui + β )
Similarly, λ is carried out integration, obtains:
p ( s | γ u , γ ) = ∫ p ( s | λ ) p ( λ | γ u , γ ) dλ = ∫ ( Π i λ s i ( 1 - λ ) ( 1 - s i ) ) · λ ( γ u - 1 ) ( 1 - λ ) ( γ - 1 ) Beta ( γ u , γ ) dλ
= ∫ λ n d 0 ( 1 - λ ) n d 1 · λ ( γ u - 1 ) ( 1 - λ ) ( γ - 1 ) Beta ( γ u , γ ) dλ = ∫ λ ( n d 0 + r u - 1 ) · ( 1 - λ ) ( n d 1 + r - 1 ) dλ Beta ( γ u , γ )
= Beta ( n d 0 + γ u , n d 1 + γ ) Beta ( γ u , γ )
Next will derive about
Figure BSA000003483383001612
More new formula, following conditional probability is arranged, wherein n D0Be that the topic of document d is sampled the number of times based on user's topic, and n D1It is the number of times that the topic of document d is sampled general topic.n uSubscript u represent to have calculated all users here, for example Expression is for all users, and label t is assigned to the total degree of topic z.
p ( Z t di , s t di = 0 | t d , t u , z - di , γ , γ u , α u , β ) = p ( s t di | = 0 | γ , γ u ) · p ( Z t di | s t di = 0 , t d , t u , z - di , α u , β )
= p ( s t di = 0 | γ , γ u ) · p ( z t d , t d , t u | s t di = 0 , α u , β ) p ( z - t di , t d , t u | s t di = 0 , α u , β )
= p ( s - i , s t di = 0 | γ , γ u ) p ( s - i | γ , γ u ) · p ( z t d | s t di = 0 , α u ) p ( z t - di | s t di = 0 , α u ) · p ( t d , t u | s t di = 0 , z t d , β ) p ( t d , t u | s t di = 0 , z t - di , β )
= B ( n d 0 - d i + γ u + 1 , n d 1 - d i + γ ) B ( n d 0 - d i + γ u , n d 1 - d i + γ ) · Δ ( n d 0 z + n u + α u ) Δ ( n d 0 - zi + n u + α u ) · Δ ( n zu + n zt + β ) Δ ( n zu + n z - ti + β )
= ( n d 0 - d i + γ u ) ! ( n d 1 - d i + γ - 1 ) ! ( n d 0 - d i + γ u + n d 1 - d i + γ ) ! ( n d 0 - d i + γ u - 1 ) ! ( n d 1 - d i + γ - 1 ) ( n d 0 - d i + γ u + n d 1 - d i + γ - 1 ) ! · Γ ( n d 0 z t ( di ) + ( n uz t ( di ) + α u ) Γ Σ i = 1 T ( n d 0 z t ( di ) + ( n uz t ( di ) + α u ) ) Γ ( n d 0 z t ( di ) + ( n uz t ( di ) + α u - 1 ) Γ Σ i = 1 T ( n d 0 z t ( di ) + ( n uz t ( di ) + α u ) - 1 ) · Γ ( n z t t ( ui ) + n z t t ( di ) + β ) Γ ( Σ i = 1 V t ( n z t t ( ui ) + n z t t ( di ) + β ) ) Γ ( n z t t ( ui ) + n z t t ( di ) + β - 1 ) Γ ( Σ i = 1 V t ( n z t t ( ui ) + n z t t ( di ) + β ) - 1 )
= n d 0 - di + γ u n d 0 - di + n d 1 - di + γ u + γ · n d 0 z t di - di + n uzt di + α u Σ Z ( n d 0 z - di + n u + α u ) · n z t di t di - di + n z t di t di u + β Σ t ( n z t di t - di + n z t di t u + β )
Simultaneously,
p ( Z t di , s t di = 1 | t d , t u , z - di , γ , γ u , α u , β )
= p ( s - i , s t di = 1 | γ , γ u ) p ( s - i | γ , γ u ) · p ( z t d | s t di = 1 , α u ) p ( z t - di | s t di = 1 , α u ) · p ( t d , t u | s t di = 1 , z t d , β ) p ( t d , t u | s t di = 1 , z t - di , β )
= B ( n d 0 - d i + γ u , n d 1 - d i + γ + 1 ) B ( n d 0 - d i + γ u , n d 1 - d i + γ ) · Δ ( n d 1 z + n u + α u ) Δ ( n d 1 - zi + n u + α u ) · Δ ( n zu + n zt + β ) Δ ( n zu + n z - ti + β )
= n d 1 - di + γ u n d 1 - di + n d 1 - di + γ u + γ · n d 1 z t di - di + n uzt di + α u Σ Z ( n d 1 z - di + n u + α u ) · n z t di t di - di + n z t di t di u + β Σ t ( n z t di t - di + n z t di t u + β )
(3) recommending to select two kinds in the framework for use based on the UdT model that has generated: UdT1 and UdT2 in conjunction with tactful;
(4) select training set and test set;
Training set data and the test set data selected in concrete the enforcement are provided by the match that the ECML meeting was held in 09 year.Training set is all data on the Bibsonomy website before on June 1st, 2009.In this training set, one has 389,009 mark behaviors, has marked 56,386 different bookmark resources and 41,874 different bibtex resources altogether.One has 37,998 resources has only occurred in training set 1 time.One has 2,271 different users and 37,880 different labels in the training set.The label that marks the bookmark resource on average has 4.234, and the label that marks on the bibtex resource on average has 3.588.
Test set in concrete the enforcement is all data from Bibsonomy website on July 1st, 1 day 1 June in 2009.Training set data and test set data are distinct, and resource and the user (this has increased the difficulty of label commending system undoubtedly) who does not occur in many training sets arranged in test set.In test set, one has 26,072 mark behaviors, wherein about the bookmark resource 8,361 mark behaviors is arranged, and about the bibtex resource 17,711 mark behaviors is arranged.Have only 1,265 different bookmark resource and 2,138 different bibtex resources to appear in the training set in the test set altogether, remaining all is emerging resource (being new resources).The label that marks the bookmark resource on average has 3.9 and the label that marks on the bibtex resource on average has 4.086.
(5) selection algorithm exploitation and enforcement running environment;
The bare metal learning algorithm is realized with Microsoft Visual Studio 2005, and all enforcement all is at a double-core, uses Intel Xeon processor (3.0GHz), in save as 8GB, operating system is to finish on the server of Windows 2003.
(6) determine to weigh the index that label is recommended performance;
Use precision (accuracy) in the enforcement, recall (recall rate), f-measure (f measures) recommends the index of performance as weighing label.P@n wherein, precision, recall and f-measure when r@n and f@n represent respectively to recommend n label.For given user u and given resource r, correct tally set is defined as TAG, and (u, r), the tally set of recommendation is defined as
Figure BSA00000348338300181
Then this moment precision, recall and f-measure are distributed as:
precision ( T ~ ( u , r ) ) = 1 | U | Σ u ∈ U | TAG ( u , r ) | ∩ T ~ ( u , r ) T ~ ( u , r )
recall ( T ~ ( u , r ) ) = 1 | U | Σ u ∈ U | TAG ( u , r ) | ∩ T ~ ( u , r ) TAG ( u , r )
f - measure ( T ~ ( u , r ) ) = 2 × recall × precision recall + precision
(7) pedestal method of selection contrast;
Use following several diverse ways in the enforcement and compared their performances in label is recommended.
1), use language model to recommend label;
2), use is recommended label in conjunction with tactful one in conjunction with language model and ACT model (as shown in Figure 4);
3), use combination strategy one usefulness UdT model to recommend label;
4), use is recommended label in conjunction with tactful dual-purpose UdT model.
(7) provide result of implementation;
Popular label corresponding and the five big labels of being recommended out by the UdT model have been shown in Bibsonomy data centralization the most popular 5 bibtex resources and 5 bookmark resources in the table 2 with resource.
Table 2
In " label of recommendation " hurdle, the label that sections out with runic illustrates that this label that the UdT model is recommended also is popular label.From table 2, can see,, mark normally some general words of superincumbent popular label, use the label commending system of UdT model can recommend the label relevant usually with resource for those popular resources.Except those and the identical example of correct option, also find to have very similarly word of a lot of semantemes, for example, " portable " and " ontology " in second row can represent " portableontology "; " web " in the fifth line is very relevant with " web20 ", and " compare " is similar to " comparison ".Usually UdT model total energy is recommended the label relevant with given resource, and the back can illustrate that the label commending system that uses the UdT model also can excavate the customized information based on the user.The precision of the label recommend method of the whole bag of tricks is as shown in table 3, and the f-measure value is as shown in table 4.Table 3 shows that the UdT model can improve the performance that has the label recommend method now and reach 7.67%.
Table 3
In the table 3, P@1 represents that the accuracy of recommending article one label to return, P@3 represent the accuracy of recommending the 3rd strip label to return, and P@5 represents the accuracy of recommending the 5th strip label to return, P@7 represents that the accuracy of recommending the 7th strip label to return, P@10 represent the accuracy of recommending the tenth strip label to return.
Table 4
Figure BSA00000348338300202
In the table 4, f@1 represents the comprehensive evaluating value of recommending article one label to return, f@3 represents the comprehensive evaluating value of recommending the 3rd strip label to return, f@5 represents the comprehensive evaluating value of recommending the 5th strip label to return, f@7 represents that the comprehensive evaluating value of recommending the 7th strip label to return, f@10 represent the comprehensive evaluating value of recommending the tenth strip label to return.
(8) result of implementation analysis
From two concrete results of implementation of aspect analysis: the 1. impact effect of topic number.As other probability model, the setting that the number of the topic in the UdT model can be manual.In order to investigate the topic number whole label is recommended Effect on Performance, respectively the number of the topic in the UdT model is made as 40,50 in the enforcement, 65,80,100,200,300 and 500, the f-measure value of different topic numbers can see Table 4.Can find that from table 4 along with the growth of the topic number in the UdT model, the f-measure value also increases thereupon, and propagation process slows down gradually.In addition, the growth of topic number will reduce the time efficiency of whole algorithm, is therefore obtaining higher accuracy rate and reducing computation complexity that a balance is arranged between the two.After consideration, the topic number in the enforcement of the overwhelming majority is set to 50.2. based on the general topic model of user v.s..If a user is new markup information without any the past, then the UdT model also can't obtain his potential preference, and this moment, the UdT model can be lowered one's standard or status into a topic model (as shown in Figure 7) that is similar to LDA.The different results of implementation that we will obtain come comparison we based on user's topic model and traditional general topic model.Concrete visible Fig. 8 of result of implementation, wherein formula has been used in the UdT-representative
Figure BSA00000348338300211
Method, and UdT has been to use formula
Figure BSA00000348338300212
Method.Table 5 shows representative label and the word in experiment.Each topic is represented that by generating probability the highest label and word the title of topic is artificial explanation.
Table 5
Figure BSA00000348338300221
In addition, table 5 has been listed some topics that the UdT model obtains and about the label and the word of these topics, as can be found from Table 5: the label in a topic is very relevant with the word in the resource description, and they can represent this topic.
(8) case study
Fig. 9-1~9-4 is another example.From Fig. 9-1 as can be seen, in this example, have 5 users to mark certain resource simultaneously, but used different labels, Fig. 9-4 has comprised totally 10 topic distribution statistics figure of 5 users, wherein, corresponding two statistical graphs of each user, first is " user's overall situation topic distributes ", second is " distribution of user partial topic ", the implication of each figure is: each is with the probability of occurrence size of the topic of number-mark, and this topic of the big explanation of probability is the hot issue of often being used by the user.Can find that by this case study the UdT model is the mark behavior that can excavate user individual really.The topic that the UdT model can obtain this resource distributes, and sees Fig. 9-1, and in this example, the most popular topic of resource is topic #21 (" Data Mining ").UdT model further identification label is the user individual definition or relevant with whole topic distribution.Fig. 9-3 shows all labels that are assigned to this resource, from Fig. 9-3 as can be seen, in this example, the label of forming with the speech of frame of broken lines in the filled box is represented to distribute based on user's topic, it is the user individual definition, and the label that the speech of band frame of broken lines is formed represents that general topic distributes, and is relevant with whole topic.Fig. 9-1 is that 5 topics based on user interest distribute, and Fig. 9-the 2nd, the topic of the label that the user uses distributes.From Fig. 9-4, can also find some very significant patterns: the topic of the label that the user uses distributes and the topic of the interest of user own is distributed with very big correlationship.Also have an interesting phenomenon to be: the user of specialty prefers using more special label and general user likes using more general label.For instance, the resource among the figure is about data mining, and user 778 is that he has used label " research " and " mining " on the topic 10 (" data analysis ") " expert "; And 353 pairs of various topics of user all are interested in, and he has used label " visualization ", and this is a general word, and are little with the whole topic relation of resource.
Experimental result:
Utilize step 1 of the present invention~5, two the personalized user commending systems that comprise have been created in conjunction with strategy based on the UdT model, carry out based on COMPARISON OF CALCULATED RESULTS WITH EXPERIMENTAL DATA experiment, experimental result shows that the topic model (UdT model) based on user's mark proposed by the invention can excavate user's interest effectively and improve the accuracy rate that label is recommended.
Above embodiment only is used to illustrate the present invention; and be not limitation of the present invention; the those of ordinary skill in relevant technologies field; under the situation that does not break away from the spirit and scope of the present invention; can also make various variations and modification; therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.

Claims (6)

1. modeling of personalized user label and recommend method based on a unified probability model is characterized in that, may further comprise the steps:
User's mark behavior on S1, the social label of the statistics website;
S2, user's mark problem is carried out formalization definition;
The topic model that S3, foundation mark based on the user, it is a uniform probability model, is called the UdT model;
S4, set up the framework based on the label commending system of described UdT model, described framework is to recommend by study user's interest and according to the semantic information that comprises in the interest;
The framework of S5, the described label commending system of checking.
2. method according to claim 1 is characterized in that, described step S2 specifically may further comprise the steps:
S21, user's mark behavior form is turned to a tlv triple, described tlv triple comprises user, label and three elements of resource;
Topic in S22, the formalization definition mark problem distributes, and specifically, sets up the T dimension topic distribution vector θ corresponding to user u ∈ U u∈ R T, wherein, vectorial θ ueveryly satisfy
Figure FSA00000348338200011
Each element θ UzExpression user u is to the interested probability of topic z; And the foundation T dimension topic distribution vector θ ∈ R corresponding with the document d ∈ D that relates to different topics T, wherein the every of vectorial θ satisfies
Figure FSA00000348338200012
Each element θ wherein zExpression document d relates to the probability of topic z;
S23, set up the topic model based on user interest, wherein, user interest is described as the combination of various topics, for the interest of different topics different probability is arranged, and this model is with multivariate normal distribution { p (the t| θ of an employed label t of this user uRepresent that { p (t| θ distributes uIn the label t of probable value maximum represented this topic semantically;
S24, set up the topic model of document, the topic model of the document is made up of two normal distributions: the probability distribution { p (t| θ) } of the probability distribution of word w { p (w| θ) } and label t, θ represents the multivariate normal distribution of the topic of document d.
3. method according to claim 2 is characterized in that, described step S3 is specially:
Estimate two class unknown parameters in the UdT model: the distribution θ of the topic of (1) M document, based on the topic distribution θ of user interest u, the distribute word distribution phi of λ and T topic of the Bernoulli Jacob of M document; (2) for each label t Di, relative throwing coin is s as a result Di, the topic z that distributes Di, described throwing coin result satisfies the Bernoulli Jacob λ that distributes; For each the word w among the document d Di, relative topic z ' DiFor used each the label t of user u Ui, relative topic z Ui
4. method according to claim 3 is characterized in that, the method for two class unknown parameters in the described estimation UdT model is: at first estimate (a): the posteriority about topic z distributes, and utilizes it to estimate topic distribution θ in first generative process u, estimate then (b): about throw coin as a result the posteriority of s and topic z distribute, utilize it to obtain second parameter θ in the generative process then, λ, φ and ψ, wherein ψ is the distribution of word, described first generative process is used for the topic of modelling user interest and distributes; Described second generative process is used for the topic of document of modelling mark and distributes.
5. method according to claim 4 is characterized in that, in step S4, the UdT model is combined with language model set up the framework of described label commending system.
6. method according to claim 5 is characterized in that, the described method that the UdT model is combined with language model is as follows:
The at first mark normalization that two Model Calculation are gone out, then according to the shared weight of mark with two kinds of mark additions, thereby find the label that only in the candidate collection of a model, occurs; Perhaps
Earlier the label that utilizes the UdT model to recommend is sorted, select with the information retrieval method rearrangement then that the label of some sorts again before the rank.
CN2010105467801A 2010-11-16 2010-11-16 Personalized user tag modeling and recommendation method based on unified probability model Active CN102004774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105467801A CN102004774B (en) 2010-11-16 2010-11-16 Personalized user tag modeling and recommendation method based on unified probability model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105467801A CN102004774B (en) 2010-11-16 2010-11-16 Personalized user tag modeling and recommendation method based on unified probability model

Publications (2)

Publication Number Publication Date
CN102004774A true CN102004774A (en) 2011-04-06
CN102004774B CN102004774B (en) 2012-11-14

Family

ID=43812136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105467801A Active CN102004774B (en) 2010-11-16 2010-11-16 Personalized user tag modeling and recommendation method based on unified probability model

Country Status (1)

Country Link
CN (1) CN102004774B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262653A (en) * 2011-06-09 2011-11-30 华中科技大学 Label recommendation method and system based on user motivation orientation
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN102915358A (en) * 2012-10-16 2013-02-06 北京奇虎科技有限公司 Method and device for realizing navigation website
CN102915335A (en) * 2012-09-17 2013-02-06 北京大学 Information associating method based on user operation record and resource content
CN102955813A (en) * 2011-08-29 2013-03-06 ***通信集团四川有限公司 Information searching method and information searching system
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
CN103177093A (en) * 2013-03-13 2013-06-26 北京开心人信息技术有限公司 General recommendation method and system based on object tags
CN103761532A (en) * 2014-01-20 2014-04-30 清华大学 Label space dimensionality reducing method and system based on feature-related implicit coding
CN104714959A (en) * 2013-12-12 2015-06-17 腾讯科技(深圳)有限公司 Application query method and application query device
CN105095202A (en) * 2014-04-17 2015-11-25 华为技术有限公司 Method and device for message recommendation
CN105740327A (en) * 2016-01-22 2016-07-06 天津中科智能识别产业技术研究院有限公司 Self-adaptive sampling method based on user preferences
CN106779858A (en) * 2016-12-26 2017-05-31 西安理工大学 A kind of product analysis method based on multidimensional perception information semantic level association
CN107016026A (en) * 2016-11-11 2017-08-04 阿里巴巴集团控股有限公司 A kind of user tag determination, information-pushing method and equipment
CN107038213A (en) * 2017-02-28 2017-08-11 华为技术有限公司 A kind of method and device of video recommendations
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN107391577A (en) * 2017-06-20 2017-11-24 中国科学院计算技术研究所 A kind of works label recommendation method and system based on expression vector
CN107729544A (en) * 2017-11-01 2018-02-23 广州优视网络科技有限公司 A kind of method and apparatus for recommending application
CN107766439A (en) * 2017-09-21 2018-03-06 汉鼎宇佑互联网股份有限公司 A kind of personalized recommendation method of fusion structure feature and implicit feedback
WO2018103718A1 (en) * 2016-12-08 2018-06-14 广州优视网络科技有限公司 Application recommendation method and apparatus, and server
CN108241621A (en) * 2016-12-23 2018-07-03 北京国双科技有限公司 The search method and device of legal knowledge
CN109241366A (en) * 2018-07-18 2019-01-18 华南师范大学 A kind of mixed recommendation system and method based on multitask deep learning
CN109933678A (en) * 2019-03-07 2019-06-25 合肥工业大学 Art work recommended method, device, readable medium and electronic equipment
CN110457711A (en) * 2019-08-20 2019-11-15 电子科技大学 A kind of social media event topic recognition methods based on descriptor
CN110727797A (en) * 2019-09-17 2020-01-24 北京三快在线科技有限公司 Label generation method and device, electronic equipment and computer readable medium
CN111415039A (en) * 2020-03-19 2020-07-14 北京航空航天大学 Flight delay mode analysis method based on non-negative tensor decomposition
CN112905786A (en) * 2019-12-04 2021-06-04 北京沃东天骏信息技术有限公司 Label recommendation method and device
CN113641791A (en) * 2021-08-12 2021-11-12 卓尔智联(武汉)研究院有限公司 Expert recommendation method, electronic device and storage medium
CN115563315A (en) * 2022-12-01 2023-01-03 东南大学 Active complex relation extraction method for continuous few-sample learning
CN116628179A (en) * 2023-05-30 2023-08-22 道有道科技集团股份公司 User operation data visualization and man-machine interaction recommendation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101321190A (en) * 2008-07-04 2008-12-10 清华大学 Recommend method and recommend system of heterogeneous network
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN101882162A (en) * 2010-06-29 2010-11-10 北京搜狗科技发展有限公司 Method and system for transmitting network information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101321190A (en) * 2008-07-04 2008-12-10 清华大学 Recommend method and recommend system of heterogeneous network
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN101882162A (en) * 2010-06-29 2010-11-10 北京搜狗科技发展有限公司 Method and system for transmitting network information

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262653A (en) * 2011-06-09 2011-11-30 华中科技大学 Label recommendation method and system based on user motivation orientation
CN102955813A (en) * 2011-08-29 2013-03-06 ***通信集团四川有限公司 Information searching method and information searching system
CN102955813B (en) * 2011-08-29 2015-11-25 ***通信集团四川有限公司 A kind of information search method and system
US9134957B2 (en) 2011-12-16 2015-09-15 International Business Machines Corporation Recommending tags based on user ratings
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
CN103164463B (en) * 2011-12-16 2017-03-22 国际商业机器公司 Method and device for recommending labels
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN102831234B (en) * 2012-08-31 2015-04-22 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN102915335A (en) * 2012-09-17 2013-02-06 北京大学 Information associating method based on user operation record and resource content
CN102915335B (en) * 2012-09-17 2016-04-27 北京大学 Based on the information correlation method of user operation records and resource content
CN102915358B (en) * 2012-10-16 2015-11-25 北京奇虎科技有限公司 Navigation website implementation method and device
CN102915358A (en) * 2012-10-16 2013-02-06 北京奇虎科技有限公司 Method and device for realizing navigation website
CN105117482A (en) * 2012-10-16 2015-12-02 北京奇虎科技有限公司 Method and device for achieving website navigation
CN105117482B (en) * 2012-10-16 2019-05-31 北京奇虎科技有限公司 A kind of method and apparatus for realizing guidance to website
CN103177093B (en) * 2013-03-13 2016-08-17 北京开心人信息技术有限公司 A kind of general recommendations method and system based on object tag
CN103177093A (en) * 2013-03-13 2013-06-26 北京开心人信息技术有限公司 General recommendation method and system based on object tags
CN104714959A (en) * 2013-12-12 2015-06-17 腾讯科技(深圳)有限公司 Application query method and application query device
CN103761532A (en) * 2014-01-20 2014-04-30 清华大学 Label space dimensionality reducing method and system based on feature-related implicit coding
US10891553B2 (en) 2014-04-17 2021-01-12 Huawei Technologies Co., Ltd. Method and apparatus for recommending message
CN105095202A (en) * 2014-04-17 2015-11-25 华为技术有限公司 Method and device for message recommendation
CN105095202B (en) * 2014-04-17 2018-10-30 华为技术有限公司 Message recommends method and device
CN105740327A (en) * 2016-01-22 2016-07-06 天津中科智能识别产业技术研究院有限公司 Self-adaptive sampling method based on user preferences
CN105740327B (en) * 2016-01-22 2019-04-19 天津中科智能识别产业技术研究院有限公司 A kind of adaptively sampled method based on user preference
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN107016026B (en) * 2016-11-11 2020-07-24 阿里巴巴集团控股有限公司 User tag determination method, information push method, user tag determination device, information push device
CN107016026A (en) * 2016-11-11 2017-08-04 阿里巴巴集团控股有限公司 A kind of user tag determination, information-pushing method and equipment
WO2018103718A1 (en) * 2016-12-08 2018-06-14 广州优视网络科技有限公司 Application recommendation method and apparatus, and server
CN108241621A (en) * 2016-12-23 2018-07-03 北京国双科技有限公司 The search method and device of legal knowledge
CN108241621B (en) * 2016-12-23 2019-12-10 北京国双科技有限公司 legal knowledge retrieval method and device
CN106779858A (en) * 2016-12-26 2017-05-31 西安理工大学 A kind of product analysis method based on multidimensional perception information semantic level association
CN106779858B (en) * 2016-12-26 2019-10-25 西安理工大学 One kind being based on the associated product analysis method of multidimensional perception information semantic level
CN107038213A (en) * 2017-02-28 2017-08-11 华为技术有限公司 A kind of method and device of video recommendations
CN107391577B (en) * 2017-06-20 2020-04-03 中国科学院计算技术研究所 Work label recommendation method and system based on expression vector
CN107391577A (en) * 2017-06-20 2017-11-24 中国科学院计算技术研究所 A kind of works label recommendation method and system based on expression vector
CN107766439A (en) * 2017-09-21 2018-03-06 汉鼎宇佑互联网股份有限公司 A kind of personalized recommendation method of fusion structure feature and implicit feedback
CN107729544B (en) * 2017-11-01 2021-06-22 阿里巴巴(中国)有限公司 Method and device for recommending applications
CN107729544A (en) * 2017-11-01 2018-02-23 广州优视网络科技有限公司 A kind of method and apparatus for recommending application
CN109241366A (en) * 2018-07-18 2019-01-18 华南师范大学 A kind of mixed recommendation system and method based on multitask deep learning
CN109241366B (en) * 2018-07-18 2021-10-26 华南师范大学 Hybrid recommendation system and method based on multitask deep learning
CN109933678B (en) * 2019-03-07 2021-04-06 合肥工业大学 Artwork recommendation method and device, readable medium and electronic equipment
CN109933678A (en) * 2019-03-07 2019-06-25 合肥工业大学 Art work recommended method, device, readable medium and electronic equipment
CN110457711A (en) * 2019-08-20 2019-11-15 电子科技大学 A kind of social media event topic recognition methods based on descriptor
CN110457711B (en) * 2019-08-20 2021-02-02 电子科技大学 Subject word-based social media event subject identification method
CN110727797A (en) * 2019-09-17 2020-01-24 北京三快在线科技有限公司 Label generation method and device, electronic equipment and computer readable medium
CN112905786A (en) * 2019-12-04 2021-06-04 北京沃东天骏信息技术有限公司 Label recommendation method and device
CN111415039B (en) * 2020-03-19 2021-05-28 北京航空航天大学 Flight delay mode analysis method based on non-negative tensor decomposition
CN111415039A (en) * 2020-03-19 2020-07-14 北京航空航天大学 Flight delay mode analysis method based on non-negative tensor decomposition
CN113641791A (en) * 2021-08-12 2021-11-12 卓尔智联(武汉)研究院有限公司 Expert recommendation method, electronic device and storage medium
CN115563315A (en) * 2022-12-01 2023-01-03 东南大学 Active complex relation extraction method for continuous few-sample learning
CN116628179A (en) * 2023-05-30 2023-08-22 道有道科技集团股份公司 User operation data visualization and man-machine interaction recommendation method
CN116628179B (en) * 2023-05-30 2023-12-22 道有道科技集团股份公司 User operation data visualization and man-machine interaction recommendation method

Also Published As

Publication number Publication date
CN102004774B (en) 2012-11-14

Similar Documents

Publication Publication Date Title
CN102004774B (en) Personalized user tag modeling and recommendation method based on unified probability model
Al-Ghuribi et al. Multi-criteria review-based recommender system–the state of the art
Wang et al. A sentiment‐enhanced hybrid recommender system for movie recommendation: a big data analytics framework
Antons et al. Mapping the topic landscape of JPIM, 1984–2013: In search of hidden structures and development trajectories
Zhou et al. Preference-based mining of top-K influential nodes in social networks
Kong et al. Exploring dynamic research interest and academic influence for scientific collaborator recommendation
Wan et al. Aminer: Search and mining of academic social networks
Wu et al. Flame: A probabilistic model combining aspect based opinion mining and collaborative filtering
Zhu et al. Ranking user authority with relevant knowledge categories for expert finding
Li et al. Identifying influential scholars in academic social media platforms
Cleger-Tamayo et al. Top-N news recommendations in digital newspapers
Yu et al. TIIREC: A tensor approach for tag-driven item recommendation with sparse user generated content
Faisal et al. Expert ranking techniques for online rated forums
Tran et al. Hashtag recommendation approach based on content and user characteristics
Gedikli et al. Rating items by rating tags
Lin et al. Finding topic-level experts in scholarly networks
Gao et al. Seco-lda: Mining service co-occurrence topics for composition recommendation
Qian et al. Community-based user domain model collaborative recommendation algorithm
Zhao et al. Academic social network-based recommendation approach for knowledge sharing
Kim et al. Topic-Driven SocialRank: Personalized search result ranking by identifying similar, credible users in a social network
Rao et al. Product recommendation system from users reviews using sentiment analysis
Goel et al. Folksonomy-based user profile enrichment using clustering and community recommended tags in multiple levels
Li et al. Modeling topic and community structure in social tagging: The TTR‐LDA‐Community model
Kolomvatsos et al. An efficient recommendation system based on the optimal stopping theory
Pitsilis et al. Harnessing the power of social bookmarking for improving tag-based recommendations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant