CN109977299B - Recommendation algorithm fusing project popularity and expert coefficient - Google Patents

Recommendation algorithm fusing project popularity and expert coefficient Download PDF

Info

Publication number
CN109977299B
CN109977299B CN201910128705.4A CN201910128705A CN109977299B CN 109977299 B CN109977299 B CN 109977299B CN 201910128705 A CN201910128705 A CN 201910128705A CN 109977299 B CN109977299 B CN 109977299B
Authority
CN
China
Prior art keywords
user
similarity
items
project
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910128705.4A
Other languages
Chinese (zh)
Other versions
CN109977299A (en
Inventor
宋小磊
薛妍
王宾
陈春芳
贺小伟
侯榆青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN201910128705.4A priority Critical patent/CN109977299B/en
Publication of CN109977299A publication Critical patent/CN109977299A/en
Application granted granted Critical
Publication of CN109977299B publication Critical patent/CN109977299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation algorithm fusing project popularity and expert coefficient, wherein a user background is introduced into the algorithm to be used as a characteristic for constructing a similar subgroup; introducing a project heat coefficient and an expert opinion coefficient into each similar subgroup to be used as a revise for the original scoring matrix; for a target user, calculating and recommending neighbor users in a subgroup; and recommending appropriate items to the target user according to the scores of the neighbor users. The algorithm of the invention not only improves the recommendation accuracy of the collaborative filtering algorithm, but also reduces the calculation amount of the collaborative filtering recommendation algorithm, and has important reference application value in the field of personalized recommendation systems.

Description

Recommendation algorithm fusing project popularity and expert coefficient
Technical Field
The invention belongs to the field of personalized recommendation systems, and relates to a collaborative filtering recommendation algorithm fusing project popularity and expert coefficients based on a user background.
Background
With the development of the internet, the information base number increases exponentially, and when a user faces explosive information, the user has no trouble, so that the problems of low information utilization rate, no failure to lose useful information and the like are caused, and therefore, how to screen out the content really wanted by the user from massive information is a main problem facing the development of businesses and science and technology at present. The recommendation system has been proposed since the last century and is a main means for solving such problems, and plays an increasingly important role in alleviating the influence caused by information overload, and algorithms applied to the recommendation system through long-term research are mainly classified into a content-based recommendation algorithm, a collaborative filtering recommendation algorithm, a hybrid recommendation algorithm, a popularity recommendation algorithm, and an advanced non-traditional recommendation algorithm. The collaborative filtering recommendation algorithm is used for correctly modeling the user preference and the project characteristics through rating, clicking, consumption and other history records, and is widely applied to the industry due to the advantages of good adaptability, simple logic and easiness in implementation. Although the collaborative filtering algorithm performs well in application compared to other algorithms, it has some drawbacks, such as cold start, data sparsity, and scalability. The collaborative filtering algorithm is based on that a specific model is searched according to user behaviors under the condition that certain information of a user is known to create recommended content, and then system cold start caused by the problem of data sparsity is inevitable; meanwhile, the recommended article popularity is poor for long-tail products; and if the collaborative filtering algorithm based on the similar type needs to establish contact between every two users when finding the similarity, the operation time of the system is increased.
In summary, in the personalized recommendation system, the collaborative filtering algorithm has some problems: the method has the advantages of being extremely sensitive to data sparsity, cold starting of the system, waiting for improvement of recommendation accuracy, waiting for optimization of recommendation performance and the like.
Disclosure of Invention
The invention aims to provide a recommendation algorithm fusing project popularity and expert coefficient, which is used for improving the accuracy of the recommendation algorithm and reducing errors and calculation cost.
In order to realize the task, the invention adopts the following technical scheme:
a recommendation algorithm fusing project popularity and expert coefficients comprises the following steps:
step 1, constructing a user background similar subgroup by adopting a clustering method according to user background information;
step 2, selecting project scoring data of all users in the user background similarity subgroup to form a project scoring matrix, and correcting the project scoring matrix by using a project heat scoring coefficient to reduce sparsity of the project scoring matrix and obtain a simplified project scoring matrix;
step 3, correcting the characteristics of the simplified project scoring matrix by using the expert recommendation coefficient to obtain an optimized project scoring matrix; calculating user score similarity by using the optimized project score matrix, performing linear fitting on the user score similarity and the background similarity to obtain total similarity, and constructing a user similarity matrix;
and 4, acquiring a candidate set of target user score prediction according to the user similarity matrix, and predicting the score of the target user on the project by using a similarity weighted average mode so as to obtain a project recommendation result.
Further, step 1 specifically includes:
step 1.1, taking the basic information tables of all users as samples, constructing a user data set, quantizing the user data set to obtain a sample set list, wherein each point in the set represents one quantized sample; two distance thresholds are chosen: t1 and T2;
step 1.2, randomly selecting a point P from the list, and calculating the distance between the point P and the centers of all subsets; if the subset does not exist currently, taking the point P as a subset center, deleting the point P from the list, and otherwise, turning to the step 1.3;
step 1.3, if the distance between the point P and the center of a certain subset is within T2, deleting the point P from the list and adding the point P into the subset;
step 1.4, if the distance between the point P and the center of a certain subset is between T1 and T2, adding the point P into the subset, but not deleting the point P from the list;
step 1.5, if the distances between the point P and the centers of all the subsets are beyond T1, taking the point P as the center of one subset, and deleting the point P from the list;
step 1.6, continuously circulating the steps 1.2 to 1.5 before list is empty; when the list is empty, completing the coarse clustering process to obtain different subsets, and averaging all points in each subset to obtain a central point of each subset;
step 1.7, clustering the result of the coarse clustering by adopting a K-means algorithm, and clustering to generate a user background similar subgroup; and the initial value of the clustering of the K-means algorithm is the central point of each subset, and the K value of the K-means algorithm is the number of the subsets generated by rough clustering.
Further, the calculation formula of the item heat score coefficient in step 2 is as follows:
Figure RE-GDA0002052749310000031
where i represents an item in category c, Z (c) represents a collection of items under category c, P u,i Represents the user u's score, Σ, for movie i i∈Z(c) P u,i Represents the total score, Σ P, of user u in category c u Represents the total rating of user u under all categories, d (C) is the total number of all categories of items that user u has reviewed, and d (C) is the total number of items that user has reviewed category C.
Further, the step 3 of correcting the features of the simplified item scoring matrix by using the expert recommendation coefficient to obtain an optimized item scoring matrix includes:
the formula for calculating the expert recommendation coefficient is as follows:
Figure RE-GDA0002052749310000032
wherein N is u,c Indicates the number of evaluations, N, of user u on the item of type c u,C Indicates the number of evaluations, t, of user u on all types of items u,c Representing the time difference between the scoring time of the user u on the item of the type c and the time when the user has scoring records for the first time in the user data set, wherein e is the base number of the natural logarithm;
adding the expert recommendation coefficient into the simplified project scoring matrix to correct the characteristics of the project scoring matrix to obtain an optimized project scoring matrix, wherein the calculation formula is as follows:
R uc (t uc )=r u,c (t uc )×R
in the above formula, r u,c (t u,c ) Recommending coefficients for experts, R is a project scoring matrix simplified in step 2, R uc (t uc ) And scoring a matrix for the optimized items.
Further, the step 3 of calculating the user score similarity by using the optimized project score matrix, and performing linear fitting on the user score similarity and the background similarity to obtain a total similarity, and constructing a user similarity matrix includes:
calculating the scoring similarity of the user by using the optimized project scoring matrix, wherein the calculation formula is as follows:
Figure RE-GDA0002052749310000033
wherein u and u 1 Respectively representing any two users, p u,c Represents R uc (t uc ) The user u scores the category c items,
Figure RE-GDA0002052749310000041
represents R uc (t uc ) The average score of user u over all types of items,
Figure RE-GDA0002052749310000042
represents R uc (t uc ) User u 1 The score on the category c items is,
Figure RE-GDA0002052749310000043
represents R uc (t uc ) User u 1 Average score over all types of items, C represents a set of item categories;
and calculating the user background similarity between every two users in the same user background similar subgroup by using the user basic information table, wherein the calculation formula is as follows:
Figure RE-GDA0002052749310000044
wherein r is u And
Figure RE-GDA0002052749310000045
representing users u and u 1 Obtaining a background attribute feature vector after the user basic information is subjected to radial quantization;
carrying out linear fitting on the user scoring similarity and the user background similarity to obtain the total user similarity:
sim UBICF (u,u 1 )=λsim UB (u,u 1 )+(1-λ)sim IC (u,u 1 )
wherein the fusion parameter lambda belongs to [0,1];
according to the user total similarity calculation formula, calculating the total user similarity between every two users in the user background similarity subgroup, thereby forming a user similarity matrix sim u
Further, the step 4 specifically includes:
step 4.1, for the target user U ', selecting the first N users with the highest similarity with the target user U' from the user similarity matrix to form a candidate set U for target user score prediction neigh
Step 4.2, set of candidates U neigh Deleting the items evaluated by the target user u' from the items evaluated by the users in the group, wherein the rest items form an item recommendation candidate set of the target user; and (3) the similarity weighted average score of the items in the candidate set is recorded as the predicted score of the target user u' to the items, and the calculation formula is as follows:
Figure RE-GDA0002052749310000046
wherein u is 1 Is a candidate set U neigh Arbitrary user in (1), sim UBICF (u,u 1 ) For target user u' and user u 1 The total similarity of the users of (1),
Figure RE-GDA0002052749310000047
representing user u 1 Scoring item i, N (U) neigh ) Representing a candidate set U neigh The number of users having evaluated item i, p' u′,i Representing the predicted scores of the target user u' for the item i;
and 4.3, sorting the items in the item candidate set according to the prediction scores of the target users for the items, selecting the top M items with the highest prediction scores to recommend to the target users, and obtaining item recommendation results.
Compared with the prior art, the invention has the following technical characteristics:
1. the method effectively improves the recommendation accuracy under the high-sparsity sample, and improves the cold start problem of the recommendation system by introducing the concept of similar background subgroups.
2. According to the method, the original scoring matrix is corrected by fusing the item heat and the expert recommendation coefficient, so that the sparsity of the matrix is reduced, the recommendation accuracy is improved, and the improved recommendation algorithm is reduced by about 30% in comparison with the traditional collaborative filtering algorithm in terms of the mean square error value; the RMSE is reduced by about 20 percent relative to a cluster-based collaborative filtering algorithm.
3. The algorithm of the invention reduces the calculation cost of the recommendation algorithm in calculating the adjacent user set from the calculation cost by introducing the concept of similar background subgroups.
Drawings
FIG. 1 is a schematic overall flow diagram of the process of the present invention;
FIG. 2 is a recommended accuracy curve obtained for different fusion parameters λ in a simulation experiment of the present invention;
FIG. 3 is a plot of the recommended accuracy at different values of N in a simulation experiment of the present invention;
fig. 4 is a comparison graph of recommendation accuracy of the inventive algorithm (UBICF) and the conventional user-based collaborative filtering algorithm (CCF) and the clustering-based collaborative filtering algorithm (UCF).
Detailed Description
According to experience, similar users generally have similar selections, similar background subgroups are generated in a user clustering mode, and as the number of feature data which can be referred by a new user is small, deviation occurs in a recommendation result when recommendation is performed by using a collaborative filtering algorithm, the concept of the similar subgroups is provided by using the experience characteristics that the similar users generally have similar selections, and the users are firstly classified into the similar user subgroups at the beginning of recommendation to perform next recommendation in the subgroups. The data sets applied in recommendation are usually very high in sparsity, so that the calculation amount is increased when recommendation is performed by using a collaborative filtering algorithm.
Based on the above thought, the scheme provides an improvement of a collaborative filtering algorithm based on cluster optimization, mainly aims at the problem of difficulty in recommending unknown users, recommendation accuracy and recommendation time, optimizes the problem to generate a cluster based on user background and expert opinion similarity, generates similar user recommendation subgroups through user clustering, generates a recommendation candidate list in each subgroup according to a K neighbor algorithm, and then generates a Top-N formal recommendation list according to weighted prediction scores. Because each recommendation action is generated in the similar subgroup of the users, the recommendation accuracy and the recommendation speed are improved. In addition, as the characteristics of user background, expert opinion and the like are added when the user sub-group is generated, the obtained cluster sub-group has better performance on the later recommendation effect than the user cluster sub-group obtained by only using the traditional clustering. The method comprises the following specific steps:
a recommendation algorithm fusing project popularity and expert coefficient comprises the following steps:
step 1, constructing a user background similar subgroup by adopting a clustering method according to user background information.
The invention adopts a combined clustering algorithm to perform clustering processing on the user basic information table. Firstly, coarse clustering is carried out, comprising two steps, wherein the first step is to rapidly and approximately divide data into a plurality of subsets, then, points in the subsets are clustered again by using an accurate calculation method, and the important idea of clustering optimization is as follows: setting a subset initial central point K and an area radius for a sample data set, and efficiently dividing data into a plurality of overlapped subsets to enable all objects to fall within a range covered by the subsets; recalculating a new central point of the objects in the same region, and dividing the region to which the objects belong again according to the distance between the objects and the new central point; and circularly executing the process of dividing the subset and calculating the central point until the position of the central point of the K is not changed any more. Selecting a target user to calculate the distance between the target user and the center of the subgroup of users with different similar backgrounds, obtaining the similarity relation between the target user and the subgroup of similar users with different backgrounds, and finally dividing the sample into different communities according to the similarity. The method comprises the following specific steps:
step 1.1, taking the basic information tables of all users as samples, constructing a user data set, quantizing the user data set to obtain a sample set list, wherein each point in the set represents one quantized sample; two distance thresholds were chosen: t1 and T2, where T1> T2, the values of T1 and T2 may be determined using a cross-check.
Step 1.2, randomly selecting a point P from the list, and calculating the distance between the point P and the centers of all subsets; if the subset does not exist currently, taking the point P as a subset center, deleting the point P from the list, and otherwise, turning to the step 1.3;
step 1.3, if the distance between the point P and the center of a certain subset is within T2, deleting the point P from the list and adding the point P into the subset;
this step is to consider point P to be close enough to this subset at this time that point P can no longer be the center of the other subsets.
Step 1.4, if the distance between the point P and the center of a certain subset is between T1 and T2, adding the point P into the subset, but not deleting the point P from the list;
this means that point P will participate in the next round of clustering;
step 1.5, if the distances between the point P and the centers of all subsets are beyond T1, taking the point P as the center of one subset, and deleting the point P from the list;
step 1.6, continuously circulating the steps 1.2 to 1.5 before the list is empty; and when the list is empty, completing the coarse clustering process to obtain different subsets, and averaging all points in each subset to obtain the central point of each subset.
And after the coarse clustering is finished, obtaining the central point of each subset and the number of the subsets as basic parameters of the next fine clustering. The fine clustering process adopts a K-means algorithm:
step 1.7, clustering the result of the coarse clustering in the step 1.6 by adopting a K-means algorithm, and generating a user background similar subgroup through clustering; wherein, the initial value of the clustering of the K-means algorithm is the central point of each subset in the step 1.6, and the K value of the K-means algorithm is the number of the subsets generated by rough clustering in the step 1.6.
In this embodiment, taking a public movie data set movieLens-1M as an example, the descriptions of different information tables in the data set are shown in table 1, table 2, and table 3:
table 1 table of basic information of users in MovieLens
Attribute name Description of the invention
UserID The unique user identifier is numbered 1-6040
Gender User name, binary type feature, value "F" or "M"
Age Age of user, discrete type characteristic, value of 1-56
Occupation User occupation, discrete type characteristics, 21
Zip-code Compression code
Table 2 table of basic information of movies in MovieLens
Attribute name Description of the invention
MovieID Unique movie identification with uniform serial numbers of 1-3952
Title Name of movie
Genres The movie type includes 18 different types of movies
Table 3 table of user-movie rating information in MovieLens
Attribute name Description of the invention
UserID User number
MovieID Film numbering
Rating User rating of movies 1,5]
Time Time of scoring the movie by the user, value being timestamp
And 2, selecting the project scoring data of all the users in the user background similarity subgroup to form a project scoring matrix, and correcting the project scoring matrix by using the project heat scoring coefficient so as to reduce the sparsity of the project scoring matrix and obtain a simplified project scoring matrix.
In this embodiment, the items are movies, and the user item scoring data is extracted from a user-movie scoring information table in a public movie dataset movieLens-1M, and as shown in table 3, the item scoring data of all users jointly form an item scoring matrix.
In the present scheme, the item heat mentioned is defined as:
A. the user's proportion of the score for a particular type of item among all the item scores;
B. the proportion of the user effective score in all the projects is calculated;
the calculation formula of the item heat score coefficient is as follows:
Figure RE-GDA0002052749310000081
where i represents an item in category c, Z (c) represents a collection of items under category c, P u,i Represents the user u's score, Σ, for movie i i∈Z(c) P u,i Represents the total score, Σ P, of user u under category c u Representing the total rating of user u under all categories, d (C) is the total number of all categories for which user u has reviewed items, and d (C) is the total number of items for which user has reviewed category C.
Reducing the dimension of the original high-sparseness scoring matrix taking the number of items as the dimension into a matrix with relatively low sparseness taking the category of the items as the dimension according to the item heat scoring coefficient (note that the number of the items is higher than the category of the items in experience); the specific method is that the project heat degree scoring coefficient calculation formula is utilized to process the project scoring matrix, and the obtained result is the modified simplified project scoring matrix based on the project heat degree.
Step 3, correcting the characteristics of the simplified project scoring matrix by using an expert recommendation coefficient to obtain an optimized project scoring matrix; and calculating the user rating similarity by using the optimized project rating matrix, and performing linear fitting on the user rating similarity and the background similarity.
After the project popularity scoring coefficient is introduced to correct the project scoring matrix, the selection tendency of the user viewing is obtained through analyzing the actual scene recommended by the movie, and the selection tendency of the user viewing is influenced by the directivity of expert users in the group, so the scheme introduces the expert recommending coefficient as the characteristic of the corrected scoring matrix, and combines the user background similarity, the project popularity and the expert recommending similarity to calculate the new user similarity.
Step 3.1, defining the expert recommendation coefficient as:
A. the proportion of the number of comments of the user to the item of the specific type to the total comments;
B. time decay factor of user comment.
The formula for calculating the expert recommendation coefficient is:
Figure RE-GDA0002052749310000091
wherein N is u,c Indicates the number of evaluations, N, of user u on the item of type c u,C Represents the number of evaluations, t, of user u on all types of items u,c And e is the base number of the natural logarithm, and represents the time difference between the time when the user u scores the item of the type c and the time when the user has a scoring record for the first time in the user data set.
Then the process of the first step is carried out,
Figure RE-GDA0002052749310000092
indicating that the user u rates the ratings of the different types c of items,
Figure RE-GDA0002052749310000093
a decay factor that is a comment on the user, which indicates whether the user has been active in the near future. Under the influence of attenuation coefficient r u,c (t uc ) Is a time function varying between (0,1), the closer the time, the more the number of comments, the greater the expert coefficient representing the user u, and the more beneficial the recommendation of the next step.
Step 3.2, adding the expert recommendation coefficient into the simplified project scoring matrix to correct the characteristics of the project scoring matrix to obtain an optimized project scoring matrix;
since the expert recommendation coefficient is a time-varying number between (0,1), and can be used as a coefficient of the simplified item scoring matrix, the matrix calculation formula constructed by fusing the item popularity scoring coefficient and the expert recommendation coefficient is as follows:
R uc (t uc )=r u,c (t uc )×R
in the above formula, r u,c (t u,c ) Recommending coefficients for experts, R is a project scoring matrix simplified in step 2, R uc (t uc ) To be optimizedThe item scoring matrix of (3).
Step 3.3, calculating the scoring similarity of the user by using the optimized project scoring matrix, wherein the calculation formula is as follows:
Figure RE-GDA0002052749310000094
wherein u and u 1 Respectively representing any two users, p u,c Represents R uc (t uc ) The user u scores the category c items,
Figure RE-GDA0002052749310000105
represents R uc (t uc ) The average score of user u over all types of items,
Figure RE-GDA0002052749310000101
represents R uc (t uc ) User u 1 The score on the category c item is,
Figure RE-GDA0002052749310000102
represents R uc (t uc ) User u 1 Average score over all types of items, C represents the item category set.
Step 3.4, calculating the user background similarity between every two users in the same user background similar subgroup by using the user basic information table, wherein the calculation formula is as follows:
Figure RE-GDA0002052749310000103
wherein r is u And
Figure RE-GDA0002052749310000104
representing users u and u 1 And (4) obtaining a background attribute feature vector after the user basic information is subjected to warp quantization.
Step 3.5, performing linear fitting on the user score similarity and the user background similarity to obtain the total user similarity:
sim UBICF (u,u 1 )=λsim UB (u,u 1 )+(1-λ)sim IC (u,u 1 )
wherein the fusion parameter lambda belongs to the field of 0,1.
According to the user total similarity calculation formula, calculating the total user similarity between every two users in the user background similarity subgroup, thereby forming a user similarity matrix sim u
And 4, acquiring a candidate set of target user score prediction according to the user similarity matrix, and predicting the score of the target user on the project by using a similarity weighted average mode so as to obtain a project recommendation result.
The original user movie scoring matrix has a plurality of scoring blanks (i.e. the user does not score the movie) due to the serious imbalance of the proportion of the number of users to the number of movies (wherein the number of users is far smaller than the number of movies), and the final goal of the recommendation system is to determine whether the user likes the movies without scores, if so, add the movies to the user recommendation set, and if not, not add the movies.
Obtaining a user similarity matrix sim through step 3 u Then, for each user, there is a neighbor user set sorted according to the similarity, and then the first N users with similarity are selected as the movie rating prediction candidate set of the target user, according to the user similarity matrix sim u And the process of predicting the movie scores of the users by the item scoring matrix of the users is as follows:
step 4.1, for the target user U ', selecting the first N users with the highest similarity with the target user U' from the user similarity matrix to form a candidate set U for target user score prediction neigh (ii) a The size of N can be set according to requirements;
step 4.2, set of candidates U neigh Deleting the items evaluated by the target user u' from the items evaluated by the users in the group, wherein the rest items form an item recommendation candidate set of the target user; the similarity weighted average of the items in the candidate set is marked as the target user u' to the itemsThe calculation formula of (2) is as follows:
Figure RE-GDA0002052749310000111
wherein u is 1 Is a candidate set U neigh Arbitrary user in (1), sim UBICF (u,u 1 ) For target user u' and user u 1 The total similarity of the users of (1),
Figure RE-GDA0002052749310000112
representing user u 1 Scoring item i, N (U) neigh ) Representing a candidate set U neigh The number of users having been evaluated for item i, p' u′,i Representing the predicted scores for item i by target user u'.
4.3, sorting the items in the item candidate set according to the prediction scores of the target users for the items, selecting the top M items with the highest prediction scores to recommend to the target users, and obtaining item recommendation results; the specific value of M can be set as desired, for example, from 1 to 5.
Fig. 2 to 4 show results of different simulation experiments of the method of the present invention, and it can be seen from the test results that different fusion parameters λ have a smaller influence on the recommendation accuracy of the present invention, and different N values have a larger influence on the recommendation accuracy; as can be seen from fig. 4, compared with similar algorithms, the mean square error root RMSE value of the present invention is significantly reduced under different N values, which indicates that the recommendation of the present invention is greatly improved compared with the existing algorithms.

Claims (4)

1. A recommendation algorithm fusing project popularity and expert coefficient is characterized by comprising the following steps:
step 1, constructing a user background similar subgroup by adopting a clustering method according to user background information;
step 2, selecting project scoring data of all users in the user background similarity subgroup to form a project scoring matrix, and correcting the project scoring matrix by using a project heat scoring coefficient to reduce sparsity of the project scoring matrix and obtain a simplified project scoring matrix; the calculation formula of the item heat scoring coefficient is as follows:
Figure FDA0003835478060000011
where i represents an item in category c, Z (c) represents a collection of items under category c, P u,i Represents the user u's score, Σ, for movie i i∈Z(c) P u,i Represents the total score, Σ P, of user u in category c u Representing the total rating of user u under all categories, d (C) is the total number of all categories of items reviewed by user u, d (C) is the total number of items reviewed by user u for category C;
step 3, correcting the characteristics of the simplified project scoring matrix by using an expert recommendation coefficient to obtain an optimized project scoring matrix; calculating user score similarity by using the optimized project score matrix, performing linear fitting on the user score similarity and the background similarity to obtain total similarity, and constructing a user similarity matrix; the method for correcting the characteristics of the simplified project scoring matrix by using the expert recommendation coefficient to obtain the optimized project scoring matrix comprises the following steps:
the formula for calculating the expert recommendation coefficient is as follows:
Figure FDA0003835478060000012
wherein N is u,c Indicates the number of evaluations, N, of user u on items of type c u,C Indicates the number of evaluations, t, of user u on all types of items u,c Representing the time difference between the scoring time of the item of the type c by the user u and the time when the user has scoring records for the first time in the user data set, wherein e is the base number of the natural logarithm;
adding the expert recommendation coefficient into the simplified project scoring matrix to correct the characteristics of the project scoring matrix to obtain an optimized project scoring matrix, wherein the calculation formula is as follows:
R uc (t u,c )=r u,c (t u,c )×R
in the above formula, r u,c (t u,c ) Recommending coefficients for experts, R is a project scoring matrix simplified in step 2, R uc (t u,c ) Scoring a matrix for the optimized project;
and 4, acquiring a candidate set for target user score prediction according to the user similarity matrix, and predicting the score of the target user on the project by using a similarity weighted average mode so as to obtain a project recommendation result.
2. The recommendation algorithm for fusing item popularity and expert coefficient according to claim 1, wherein the step 1 specifically comprises:
step 1.1, taking the basic information tables of all users as samples, constructing a user data set, quantizing the user data set to obtain a sample set list, wherein each point in the set represents one quantized sample; two distance thresholds are chosen: t1 and T2;
step 1.2, randomly selecting a point P from the list, and calculating the distance between the point P and the centers of all subsets; if the subset does not exist currently, taking the point P as a subset center, deleting the point P from the list, and otherwise, turning to the step 1.3;
step 1.3, if the distance between the point P and the center of a certain subset is within T2, deleting the point P from the list and adding the point P into the subset;
step 1.4, if the distance between the point P and the center of a certain subset is between T1 and T2, adding the point P into the subset, but not deleting the point P from the list;
step 1.5, if the distances between the point P and the centers of all the subsets are beyond T1, taking the point P as the center of one subset, and deleting the point P from the list;
step 1.6, continuously circulating the steps 1.2 to 1.5 before list is empty; when the list is empty, completing the coarse clustering process to obtain different subsets, and averaging all points in each subset to obtain a central point of each subset;
step 1.7, clustering the result of the coarse clustering by adopting a K-means algorithm, and clustering to generate a user background similar subgroup; the initial value of the K-means algorithm is the central point of each subset, and the K value of the K-means algorithm is the number of the subsets generated by rough clustering.
3. The recommendation algorithm fusing item popularity and expert coefficient according to claim 1, wherein the step 3 of calculating the user score similarity by using the optimized item score matrix, and performing linear fitting on the user score similarity and the background similarity to obtain the total similarity, and constructing the user similarity matrix comprises:
and calculating the scoring similarity of the user by using the optimized item scoring matrix, wherein the calculation formula is as follows:
Figure FDA0003835478060000021
wherein u and u 1 Respectively representing any two users, p u,c Represents R uc (t u,c ) The user u scores the category c items,
Figure FDA0003835478060000031
represents R uc (t u,c ) The average score of user u over all types of items,
Figure FDA0003835478060000032
represents R uc (t u,c ) User u 1 The score on the category c items is,
Figure FDA0003835478060000033
represents R uc (t u,c ) User u 1 Average score over all types of items, C represents a set of item categories;
and calculating the user background similarity between every two users in the same user background similar subgroup by using the user basic information table, wherein the calculation formula is as follows:
Figure FDA0003835478060000034
wherein r is u And
Figure FDA0003835478060000035
representing users u and u 1 Obtaining a background attribute feature vector after the user basic information is subjected to radial quantization;
carrying out linear fitting on the user scoring similarity and the user background similarity to obtain the total user similarity:
sim UBICF (u,u 1 )=λsim UB (u,u 1 )+(1-λ)sim IC (u,u 1 )
wherein the fusion parameter lambda belongs to [0,1];
according to the user total similarity calculation formula, calculating the user total similarity between every two users in the user background similarity subgroup, thereby forming a user similarity matrix sim u
4. The recommendation algorithm for fusing item popularity and expert coefficient according to claim 1, wherein the step 4 specifically comprises:
step 4.1, for the target user U ', selecting the first N users with the highest similarity with the target user U' from the user similarity matrix to form a candidate set U for target user score prediction neigh
Step 4.2, set of candidates U neigh The items evaluated by the users in the group are deleted from the items already evaluated by the target user u', and the rest items form an item recommendation candidate set of the target user; and (3) the similarity weighted average score of the items in the candidate set is recorded as the predicted score of the target user u' to the items, and the calculation formula is as follows:
Figure FDA0003835478060000036
wherein u is 1 Is a candidate set U neigh Arbitrary user in (1), sim UBICF (u,u 1 ) For target user u' and user u 1 The total similarity of the users of (1),
Figure FDA0003835478060000037
representing user u 1 Scoring item i, N (U) neigh ) Representing a candidate set U neigh The number of users having evaluated item i, p' u′,i Representing the predicted scores of the target user u' for the item i;
and 4.3, sorting the items in the item candidate set according to the prediction scores of the target users for the items, selecting the top M items with the highest prediction scores to recommend to the target users, and obtaining item recommendation results.
CN201910128705.4A 2019-02-21 2019-02-21 Recommendation algorithm fusing project popularity and expert coefficient Active CN109977299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910128705.4A CN109977299B (en) 2019-02-21 2019-02-21 Recommendation algorithm fusing project popularity and expert coefficient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910128705.4A CN109977299B (en) 2019-02-21 2019-02-21 Recommendation algorithm fusing project popularity and expert coefficient

Publications (2)

Publication Number Publication Date
CN109977299A CN109977299A (en) 2019-07-05
CN109977299B true CN109977299B (en) 2022-12-27

Family

ID=67077170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910128705.4A Active CN109977299B (en) 2019-02-21 2019-02-21 Recommendation algorithm fusing project popularity and expert coefficient

Country Status (1)

Country Link
CN (1) CN109977299B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490686A (en) * 2019-07-08 2019-11-22 西北大学 A kind of building of commodity Rating Model, recommended method and system based on Time Perception
CN110910215A (en) * 2019-11-20 2020-03-24 深圳前海微众银行股份有限公司 Product recommendation method, device, equipment and computer-readable storage medium
CN111191707B (en) * 2019-12-25 2023-06-06 浙江工商大学 LFM training sample construction method integrating time attenuation factors
CN111486345B (en) * 2020-03-10 2021-08-24 安徽科杰粮保仓储设备有限公司 Grain depot underground pipe network liquid leakage on-line monitoring and early warning method and device
CN113497831B (en) * 2021-06-30 2022-10-25 西安交通大学 Content placement method and system based on feedback popularity under mobile edge network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479202A (en) * 2010-11-26 2012-05-30 卓望数码技术(深圳)有限公司 Recommendation system based on domain expert
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN105868237A (en) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 Multimedia data recommendation method and server
CN106021329A (en) * 2016-05-06 2016-10-12 西安电子科技大学 A user similarity-based sparse data collaborative filtering recommendation method
CN108205682A (en) * 2016-12-19 2018-06-26 同济大学 It is a kind of for the fusion content of personalized recommendation and the collaborative filtering method of behavior
CN108647724A (en) * 2018-05-11 2018-10-12 国网电子商务有限公司 A kind of user's recommendation method and device based on simulated annealing
CN108804683A (en) * 2018-06-13 2018-11-13 重庆理工大学 Associate(d) matrix decomposes and the film of collaborative filtering recommends method
CN109166017A (en) * 2018-10-12 2019-01-08 平安科技(深圳)有限公司 Method for pushing, device, computer equipment and storage medium based on reunion class
CN109360057A (en) * 2018-10-12 2019-02-19 平安科技(深圳)有限公司 Information-pushing method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635733B2 (en) * 2017-05-05 2020-04-28 Microsoft Technology Licensing, Llc Personalized user-categorized recommendations

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479202A (en) * 2010-11-26 2012-05-30 卓望数码技术(深圳)有限公司 Recommendation system based on domain expert
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN105868237A (en) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 Multimedia data recommendation method and server
CN106021329A (en) * 2016-05-06 2016-10-12 西安电子科技大学 A user similarity-based sparse data collaborative filtering recommendation method
CN108205682A (en) * 2016-12-19 2018-06-26 同济大学 It is a kind of for the fusion content of personalized recommendation and the collaborative filtering method of behavior
CN108647724A (en) * 2018-05-11 2018-10-12 国网电子商务有限公司 A kind of user's recommendation method and device based on simulated annealing
CN108804683A (en) * 2018-06-13 2018-11-13 重庆理工大学 Associate(d) matrix decomposes and the film of collaborative filtering recommends method
CN109166017A (en) * 2018-10-12 2019-01-08 平安科技(深圳)有限公司 Method for pushing, device, computer equipment and storage medium based on reunion class
CN109360057A (en) * 2018-10-12 2019-02-19 平安科技(深圳)有限公司 Information-pushing method, device, computer equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Improved personalized recommendation based on user attributes clustering and score matrix filling;U Liji 等;《Computer Standards & Interfaces》;20171114;第57卷;59-67 *
基于hadoop的改进聚类协同过滤推荐算法研究;黎安能;《中国优秀硕士学位论文全文数据库信息科技辑》;20160615(第06期);I138-1516 *
基于用户评分和项目类偏好的协同过滤推荐算法;王宇飞 等;《软件导刊》;20161229;第15卷(第12期);25-29 *
结合用户背景信息的协同过滤推荐算法;吴一帆 等;《计算机应用》;20081101(第11期);2972-2974 *
高校学生就业推荐算法研究及应用;薛妍;《中国优秀硕士学位论文全文数据库信息科技辑》;20200415(第04期);I138-574 *

Also Published As

Publication number Publication date
CN109977299A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109977299B (en) Recommendation algorithm fusing project popularity and expert coefficient
CN110162706B (en) Personalized recommendation method and system based on interactive data clustering
Li et al. Using multidimensional clustering based collaborative filtering approach improving recommendation diversity
CN105138653B (en) It is a kind of that method and its recommendation apparatus are recommended based on typical degree and the topic of difficulty
CN107633444B (en) Recommendation system noise filtering method based on information entropy and fuzzy C-means clustering
CN106471491A (en) A kind of collaborative filtering recommending method of time-varying
CN108717407B (en) Entity vector determination method and device, and information retrieval method and device
CN105389590B (en) Video clustering recommendation method and device
CN107256238B (en) personalized information recommendation method and information recommendation system under multiple constraint conditions
CN110197404B (en) Personalized long-tail commodity recommendation method and system capable of reducing popularity deviation
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
CN110598061A (en) Multi-element graph fused heterogeneous information network embedding method
CN106599047A (en) Information pushing method and device
Lumauag et al. An enhanced recommendation algorithm based on modified user-based collaborative filtering
CN104751353A (en) Cluster and Slope One prediction based collaborative filtering method
Puntheeranurak et al. An Item-based collaborative filtering method using Item-based hybrid similarity
US20240193402A1 (en) Method and apparatus for determining representation information, device, and storage medium
Zhang et al. An efficient recommender system using locality sensitive hashing
CN110598126B (en) Cross-social network user identity recognition method based on behavior habits
Mohamed et al. Two recommendation system algorithms used SVD and association rule on implicit and explicit data sets
CN110059257B (en) Project recommendation method based on score correction
WO2023024408A1 (en) Method for determining feature vector of user, and related device and medium
Takama et al. Context-aware music recommender system based on implicit feedback
Kumar et al. Comparison of various metrics used in collaborative filtering for recommendation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant