CN110569374A - movie recommendation method based on improved collaborative filtering algorithm - Google Patents

movie recommendation method based on improved collaborative filtering algorithm Download PDF

Info

Publication number
CN110569374A
CN110569374A CN201910771272.4A CN201910771272A CN110569374A CN 110569374 A CN110569374 A CN 110569374A CN 201910771272 A CN201910771272 A CN 201910771272A CN 110569374 A CN110569374 A CN 110569374A
Authority
CN
China
Prior art keywords
movie
similarity
recommendation
user
collaborative filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910771272.4A
Other languages
Chinese (zh)
Other versions
CN110569374B (en
Inventor
安俊秀
刘明月
靳宇倡
陈涛
孙琛恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201910771272.4A priority Critical patent/CN110569374B/en
Publication of CN110569374A publication Critical patent/CN110569374A/en
Application granted granted Critical
Publication of CN110569374B publication Critical patent/CN110569374B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a movie recommendation method based on an improved collaborative filtering algorithm, which belongs to the technical field of multimedia information processing, firstly, the classification and application occasions of the collaborative filtering algorithm are researched, aiming at the movie field, an ItemCF algorithm and a collaborative filtering algorithm based on an ALS model are selected, and for the problem of cold start of a movie, the collaborative filtering algorithm based on articles is improved; aiming at the problem of low recommendation precision of the ALS model, the idea of the hybrid model is provided, and the accuracy of the recommendation algorithm is improved under the condition of relieving the influence caused by data sparsity.

Description

Movie recommendation method based on improved collaborative filtering algorithm
Field of the method
The invention belongs to the technical field of multimedia information processing, and particularly relates to a movie recommendation method based on an improved collaborative filtering algorithm.
background method
with the acceleration of life rhythm, the daily pressure of people is gradually increased while the income is increased, movies are favored by audiences as a recreational mode, and the movie industry is in a state of steaming the day. Various types of movies are layered endlessly, which causes serious long tail phenomenon in the system, and the complicated information also affects the use experience of the user.
In order to solve the problem of information overload, researchers have proposed two solutions of information retrieval and information filtering one after another. The most representative information retrieval technology is a search engine, such as hundredth search and *** search, which are well known to the public. However, the search engine has a serious disadvantage that the problem of the user cannot be solved directly, the user is required to provide clear requirements, the requirement on the search keywords is strict, and the selection of the keywords directly influences the search efficiency and the accuracy of the search result. When the user demands are fuzzy and the user demands cannot be expressed correctly, the user can use improper keywords for searching, the searched objects are not wanted by the user directly, and the use standard of the user is improved invisibly. In order to make up for the deficiency of the search engine, the information filtering technology is widely concerned, and the recommendation system is a typical information filtering mode. The method aims to push valuable information for the user under the condition that the user does not find a target clearly, and the user can obtain interested contents more efficiently.
The core thought of the recommendation system is to link users and information, a preference model of the users is obtained by analyzing historical behaviors of the users, the system can automatically perceive user hobbies according to the preference model, and the requirement of the users for information mining is guided, so that personalized recommendation content can be generated for different users. The satisfaction degree of the user is improved, and the search cost of the user is saved to a great extent. The recommendation system may better mine long-tailed items, helping users find those items that they are interested in but are difficult to find.
The cold start problem arises when there are insufficient users in the system or insufficient user behavior data to predict user interest, generally divided into user cold start and item cold start. The most key part in the collaborative filtering recommendation algorithm is the measurement of similarity, the traditional ItemCF algorithm evaluates the similarity degree between articles by analyzing historical scoring data, when a new movie is shown, no user watches the movie, and scoring records related to the movie cannot be obtained in a log of a movie recommendation system. If only the scoring information is relied on, a movie similar to the movie cannot be found through the behavior history of the user, so that a new movie cannot be recommended to the user who may be interested in the movie, and the problem of cold start of the article is caused.
In the actual situation, the number of times of viewing by the user is limited, and it cannot be guaranteed that each user scores each movie, and most of the cases, only a small number of users have scoring operations on some movies, which results in a sparse user-item scoring matrix. The collaborative filtering recommendation algorithm based on the ALS model adopts the idea of matrix decomposition to fill the scoring matrix, so that the matrix becomes dense, and the influence caused by sparse scoring data is alleviated to a certain extent. But since many scores are modeled results, the accuracy of the recommendation results is low.
with the rapid development of the internet, data is growing explosively. The recommendation system is used as a main means of information filtering, how to maintain the characteristics of high efficiency, rapidness and stability of the recommendation system in a big data environment, and the problem of maintaining the expandability of a recommendation algorithm in large-scale data is to be solved urgently. Such problems are generally dealt with by deeply optimizing the recommended algorithm, but the optimized algorithm is still limited by a single machine environment, and the storage and calculation bottlenecks cannot be broken through.
Disclosure of Invention
Aiming at the problem that cold start is generated when there are not enough users or not enough user behavior data in the system to predict the user interest, the invention aims to provide a movie recommendation method based on an improved collaborative filtering algorithm.
a movie recommendation method based on an improved collaborative filtering algorithm is disclosed, the improved collaborative filtering algorithm comprises the following steps that three representative attribute information of the type, director and actors of a movie is selected and fused in the calculation process of the similarity of the articles on the basis of the collaborative filtering algorithm based on the articles, and the attribute information is the collaborative filtering algorithm fused with the attributes of the movie, and the method comprises the following steps:
(1) Constructing a user-movie scoring matrix according to the scoring information of the movie by the user;
(2) Evaluating the similarity degree between the movies according to the obtained scoring matrix to obtain scoring similarity;
(3) According to the film attribute information, a film-type matrix is constructed to calculate the similarity between film types, the film director similarity and the actor similarity are calculated, and finally the film attribute similarity is obtained;
(4) Weighting the scoring similarity and the attribute similarity according to the adjustment factors to obtain the similarity between the films;
(5) according to the film similarity information, finding out the nearest neighbors of the target film formed by the first k films which are most similar to the target film;
(6) and predicting the score of the user on the target movie according to the obtained neighbor set, and sequencing the predicted scores from high to low to form a final movie recommendation result to be presented to the user.
Each movie has attribute information such as director, actor and type, for a star-pursuit family, the presence of characters such as director and actor will deepen the user's preference for a certain movie, many users will prefer a certain type of movie due to different preferences of each person, on the basis of a collaborative filtering algorithm based on articles, three representative attribute information of movie type, director and actor are selected and fused in the calculation process of article similarity, the movie score similarity and the attribute similarity are weighted to find the similarity between any two movies finally, the following formula sim (i, j) ═ α sim _ rat (i, j) + β sim _ attr (i, j), α and β are weighted adjustment factors of score similarity and movie attribute similarity, and α + β ═ 1, when a new movie is added to the user-movie matrix, the value of alpha is 0, which means that the similarity between the movies is calculated through the attribute characteristics of the movies, thereby solving the problem of cold start of the articles.
Preferably, in step (3), the type similarity: the value in the movie-genre matrix is a definite 1 or 0, in order to calculate the genre similarity, it is first necessary to construct a movie-genre matrix, which is 1 if the movie belongs to a certain genre, and 0 otherwise, since the genre of each movie is definite, the value in the movie-genre matrix is a definite 1 or 0, and the calculation formula is as follows
In the formula, a numerator | N (i) | N (j) | represents the number of types owned by the movie i and the movie j, and a denominator sqrt (| N (i) | N (j) |) represents the calculation of multiplying and squaring the number of types owned by the movie i and the number of types owned by the movie j;
Movie director similarity: as time continues to advance, more and more movies are shot by the director, the similarity between the movies is reduced, and the similarity is combined into a director similarity solving formula which is as follows
Wherein x represents the number of movies the director participates in, and γ represents the influence factor for weakening the similarity of the number of movies to the director;
Actor similarity: the feature information of the film actors is formed by three main actors in the selected film, the success or failure of one film is closely related to the performance of the actors, and meanwhile, the film is easy to get the attention of the fans of the actors because the audience likes one actor in the film. So that two movies that the same actor plays have a certain similarity. However, there are many actors in a movie and there are few players, so three main actors in the movie are selected to form feature information of the actors in the movie, and the similarity between the actor features is measured by using the Jaccard formula
Where Ui and Uj represent the set of actors for movie i and movie j;
After obtaining the similarity results of the three attribute features of the movie type, director and actors, the definition formula of the similarity of the movie attributes is as follows
sim_norm(i,j)=sim_type(i,j)+sim_dire(i,j)+sim_actor(i,j)
then normalizing sim _ norm (i, j) to obtain the final film attribute similarity, wherein the formula is as follows
in the formula XmaxAnd XminRespectively the maximum and minimum in the data set before normalization.
Preferably, the collaborative filtering algorithm based on the improved collaborative filtering algorithm further includes mixing the collaborative filtering algorithm based on the fusion movie attribute with the collaborative filtering algorithm based on the ALS model, that is, the linear weighted mixing method, and the formula of the mixing model is as follows
δ AttrCF _ list + (1- δ) ALS _ list, where δ represents a weight value;
There are five main feedback operations of the user on the movie: browsing, collecting, playing, spitting and scoring, different feedback operations represent different attention degrees of the user to the movie, different scores are given to different feedback operations, and the scoring formulas of the recommendation algorithms according to the feedback operations of the user are as follows
wherein u represents user u, I represents a recommendation list obtained by the recommendation algorithm, and I represents an item I, p in the recommendation listuiRepresenting the operation score of the user u on the commodity i;
After obtaining the operation behavior scores of the recommendation algorithms, the recognition degree of the user to the recommendation algorithms is obtained, and the formula is as follows
Wherein u represents a user, I represents a recommended content set of the type, I represents an item of the recommended content set, and puiindicating the operation score of user u on item i, and n indicating a feedback operation itemnumber, CIIndicating a recommended content set length;
In the formula, when puIwhen equal to 0, TuI0, and the recognition degree of each recommendation algorithm by the user is represented by the proportion of behavior operation to the recommendation list at the moment
The recognition degree is the proportion of each recommendation algorithm in the total recommendation category, and the final weight value of each recommendation algorithm is obtained by the following formula;
The idea of the mixed model is that a collaborative filtering algorithm fusing film attributes and a collaborative filtering algorithm based on an ALS model respectively obtain a recommendation list, a weight value is obtained through feedback operation calculation, the results obtained by the two algorithms are weighted and averaged to calculate a final recommendation result, the accuracy of the recommendation model is improved on the basis of relieving data sparseness, and the individual recommendation effect is better achieved.
The application of the improved collaborative filtering based algorithm in a movie recommendation system is provided.
in the method scheme of the application:
Compared with the prior art, the method has the beneficial effects that:
(1) Selecting three representative attribute information of the types, directors and actors of the movies, fusing the attribute information into the calculation process of the similarity of the articles, weighting the score similarity and the attribute similarity of the movies, calculating the similarity between any two final movies, defining the following formula sim (i, j) ═ α sim _ rat (i, j) + β sim _ attr (i, j), wherein α and β are weighting adjusting factors of the score similarity and the similarity of the attributes of the movies, and α + β ═ 1.
(2) The classification and application occasions of the collaborative filtering algorithm are researched, and the collaborative filtering algorithm based on articles and the collaborative filtering algorithm based on an ALS (Alternating Least square) model are selected by combining scenes in the field of movies. Aiming at the problem of cold start of the film, the collaborative filtering based on articles is improved; aiming at the problem of low recommendation precision of the ALS model, the design idea of the hybrid model is provided, and the accuracy of the recommendation algorithm is improved under the condition of relieving the influence caused by data sparsity.
(3) The traditional recommendation algorithm is realized on a single machine, aiming at the phenomenon, the recommendation algorithm is combined with a big data platform, and the collaborative filtering algorithm is realized in a parallel mode on a Spark platform, so that the problem of expandability of the collaborative filtering recommendation algorithm in a big data environment is effectively solved, and the capability of the algorithm for processing mass data is improved.
(4) An improved collaborative filtering algorithm is used as a prototype, a movie recommendation system is designed and realized on a Spark platform, and the movie recommendation system comprises three sub-modules, namely log collection, a recommendation engine and user interaction. The log collection mainly collects user behaviors in the system, the recommendation engine comprises an offline recommendation part and a real-time recommendation part, and the user interaction provides functions of login registration, personal center, scoring collection of movies and the like for the user.
(5) a typical movie data set of MovieLens is adopted to carry out a comparison experiment in a Spark cluster environment, so that the effectiveness of the improved collaborative filtering algorithm is verified, and the experimental result is analyzed and summarized.
drawings
FIG. 1 is a detailed flow chart of the collaborative filtering algorithm for fusing movie attributes according to the present invention;
FIG. 2 is a flow chart of a hybrid model of the present invention;
FIG. 3 is a time comparison graph of the Hadoop platform and Spark platform operation of the present invention;
FIG. 4 is a comparison of acceleration ratios for different data sizes in accordance with the present invention;
FIG. 5 shows the accuracy, recall and coverage of ItemCF of the present invention at different k values;
FIG. 6 is a bar graph of RMSE values for the model of the present invention at various parameters.
Detailed Description
in order that those skilled in the art will better understand the method embodiments of the present invention, the present invention will be further described in detail with reference to the following specific examples.
Example 1
As shown in fig. 1, a movie recommendation method based on an improved collaborative filtering algorithm includes selecting three representative attribute information of a type, a director and an actor of a movie and fusing the attribute information into a calculation process of similarity of an article based on the article collaborative filtering algorithm, that is, the collaborative filtering algorithm fusing attributes of the movie, the method includes the following steps:
(1) Constructing a user-movie scoring matrix according to the scoring information of the movie by the user;
(2) Evaluating the similarity degree between the movies according to the obtained scoring matrix to obtain scoring similarity;
(3) According to the film attribute information, a film-type matrix is constructed to calculate the similarity between film types, the film director similarity and the actor similarity are calculated, and finally the film attribute similarity is obtained;
(4) Weighting the scoring similarity and the attribute similarity according to the adjustment factors to obtain the similarity between the films;
(5) According to the film similarity information, finding out the nearest neighbors of the target film formed by the first k films which are most similar to the target film;
(6) And predicting the score of the user on the target movie according to the obtained neighbor set, and sequencing the predicted scores from high to low to form a final movie recommendation result to be presented to the user.
Each movie has attribute information such as director, actor and type, for a star-pursuit family, the presence of characters such as director and actor will deepen the user's preference for a certain movie, many users will prefer a certain type of movie due to different preferences of each person, on the basis of a collaborative filtering algorithm based on articles, three representative attribute information of movie type, director and actor are selected and fused in the calculation process of article similarity, the movie score similarity and the attribute similarity are weighted to find the similarity between any two movies finally, the following formula sim (i, j) ═ α sim _ rat (i, j) + β sim _ attr (i, j), α and β are weighted adjustment factors of score similarity and movie attribute similarity, and α + β ═ 1, when a new movie is added to the user-movie matrix, the value of alpha is 0, which means that the similarity between the movies is calculated through the attribute characteristics of the movies, thereby solving the problem of cold start of the articles.
Example 2
As shown in fig. 1, the definition and solution process of the similarity of the movie attributes is explained in detail in three aspects of genre similarity, director similarity, and actor similarity:
To calculate the genre similarity, a movie-genre matrix is first constructed, with a value of 1 if the movie belongs to a certain genre, and a value of 0 otherwise, as shown in table 1,
TABLE 1 MOVIE-TYPE MATRIX
Since the genre of each movie is determined, the value in the movie-genre matrix is determined to be 1 or 0, and the calculation formula is as followsthe numerator | N (i) | N (j) | represents the number of types owned by the movie i and the movie j, and the denominator sqrt (| N (i) | N (j) |) represents the calculation of multiplying and squaring the number of types owned by the movie i and the number of types owned by the movie j;
movie director similarity: as time continues to advance, more and more movies are shot by the director, the similarity between the movies is reduced, and the similarity is combined into a director similarity solving formula which is as follows
Wherein x represents the number of movies the director participates in, and γ represents the influence factor for weakening the similarity of the number of movies to the director;
Actor similarity: the success of a movie is closely related to the skill of the actor, and the movie is also easily attended by the fans of the actor because of the audience's liking of the actor. So that two movies that the same actor plays have a certain similarity. However, there are many actors in a movie and there are few players, so three main actors in the movie are selected to form feature information of the actors in the movie, and the similarity between the actor features is measured by using the Jaccard formula
Where Ui and Uj represent the set of actors for movie i and movie j;
After obtaining the similarity results of the three attribute features of the movie type, director and actors, the definition formula of the similarity of the movie attributes is as follows
sim_norm(i,j)=sim_type(i,j)+sim_dire(i,j)+sim_actor(i,j)
then normalizing sim _ norm (i, j) to obtain the similarity of the attributes of the final shadow, wherein the formula is as follows
In the formula XmaxAnd Xminrespectively the maximum and minimum in the data set before normalization.
in the case of the example 3, the following examples are given,
As shown in FIG. 2, the invention adopts a linear weighted mixing strategy, mixes the collaborative filtering algorithm fusing the film attributes with the collaborative filtering algorithm based on the ALS model, improves the accuracy of the recommendation model on the basis of relieving data sparsity, and better realizes the individual recommendation effect, and the formula of the mixing model is as follows
δ AttrCF _ list + (1- δ) ALS _ list, where δ represents a weight value;
There are five main feedback operations of the user on the movie: browsing, collecting, playing, spitting, scoring. The browsing, collecting, playing and spitting grooves indirectly reflect the preference of the user to the movie, and are considered as implicit feedback operations. The scoring is a positive reflection of the user's preference for movies, and is considered as an explicit feedback operation for calculating the similarity between movies, so the determination of δ is mainly determined by an implicit feedback operation. Different feedback operations represent different degrees of importance to the movie to the user, such as the user only browsing the movie without any other feedback indicating that the user is not interested in the movie very much. After the user browses the details of the movie, the user plays the movie, which indicates that the user has a certain interest in the movie. In consideration of this factor, different scores are given to different feedback operations. The score formula of each recommendation algorithm according to the user feedback operation is as follows
wherein u represents user u, I represents a recommendation list obtained by the recommendation algorithm, and I represents an item I, p in the recommendation listuirepresenting the operation score of the user u on the commodity i;
Suppose that the scores of collecting, playing, browsing and spitting are respectively omega 1, omega 2, omega 3 and omega 4. The scores given to the individual behavioral actions according to the degree of importance are ω 1-2, ω 2-1, ω 3-1, and ω 4-2. Taking the number of the content sets of the recommendation list as 5 as an example, the feedback operation statistics of each recommendation algorithm and the scores of each behavior operation of the user in the current month are shown in table 2, and the feedback operation conditions of the user behaviors of the collaborative filtering recommendation algorithm fusing the movie attributes are twice collection, twice playing, once browsing and once groove spitting; the user feedback operation condition of the collaborative filtering recommendation algorithm based on the ALS model is collection once, playing twice and groove spitting once. The feedback operation behavior score of the collaborative filtering recommendation algorithm fusing the movie attributes in the current month is 2 ω 1+2 ω 2+ ω 3+ ω 4 ═ 3, and the feedback operation behavior score of the collaborative filtering recommendation algorithm based on the ALS model in the current month is ω 1+2 ω 2+ ω 4 ═ 2.
TABLE 2 feedback of operational statistics and individual behavioral operational scores
after obtaining the operation behavior scores of the recommendation algorithms, the recognition degree of the user to the recommendation algorithms is obtained, and the formula is as follows
Wherein u represents a user, I represents a recommended content set of the type, I represents an item of the recommended content set, and puiRepresenting the operation scores of the user u on the article i, n representing the number of feedback operation items, CIindicating a recommended content set length;
in the formula, when puIwhen equal to 0, TuI0, and the recognition degree of each recommendation algorithm by the user is represented by the proportion of behavior operation to the recommendation list at the moment
the recognition degree is the proportion of each recommendation algorithm in the total recommendation category, and the final weight value of each recommendation algorithm is obtained by the following formula;
The idea of the mixed model is that a collaborative filtering algorithm fusing film attributes and a collaborative filtering algorithm based on an ALS model respectively obtain a recommendation list, a weight value is obtained through feedback operation calculation, the results obtained by the two algorithms are weighted and averaged to calculate a final recommendation result, the accuracy of the recommendation model is improved on the basis of relieving data sparseness, and better individual recommendation effect is achievedAnd (5) fruit.
the application of the improved collaborative filtering based algorithm in a movie recommendation system is provided.
experimental example 1
as shown in fig. 3 and 4, the basic performance analysis of Spark platform comprises two sets of experiments: a first group: adopting different data volumes and respectively operating time on a Hadoop platform and a Spark platform; second group: and under the condition that the configuration environment of each child node is consistent, the acceleration ratio of the same data in different slave node numbers is ensured.
A first group: equally dividing 100 ten thousand scoring records in a data set into 10 parts on average, selecting 2 parts, 3 parts, 4 parts and 5 parts of the 10 ten thousand scoring records as the multiples of reference data each time by taking the 10 ten thousand scoring records as the reference, namely selecting experimental data with different scales of 10W-50W, respectively realizing an article-based collaborative filtering algorithm on a Hadoop cluster and a Spark cluster with the number of nodes of 4, increasing the scoring records each time to enable the data scale to be in an increasing state, wherein the running time is as shown in FIG. 3, and it can be seen from FIG. 3 that the computation efficiency of Spark is higher than that of Hadoop on platform on the data with different scales, in the experiment, the running time of the Hadoop platform is basically 3 times that of the Spark platform, and the time increase of the Hadoop platform is larger than that of the Spark platform along with the increase of the data amount, mainly because the intermediate result of Spark can be cached, the network transmission of the data between the nodes is reduced, hadoop only provides two operators, namely map and reduce, and the reduce task can be performed after the map task is completed every time, so that the calculation efficiency is low.
Second group: in order to verify the influence of the number of the nodes on the calculation efficiency, the running efficiency of the algorithm when the nodes are added under the same data is compared by adopting an acceleration ratio under the condition of ensuring that the configuration environments of the nodes are the same. Counting the time of running a WordCount program on different node numbers of a Spark platform by data of 100K and 20M scales, and obtaining an acceleration ratio result according to the operation time as shown in FIG. 4. As shown in FIG. 4, the acceleration ratio is shown to be in a rising trend along with the increase of the node numbers, and the improvement of the acceleration ratio of the 20M data set is more obvious than that of the 100K data set, so that the calculation efficiency of the algorithm can be improved by increasing the cluster nodes along with the increase of the data scale.
experimental example 2
As shown in fig. 5, item-based collaborative filtering algorithm (ItemCF) optimal parameter selection, when generating a recommendation list, the ItemCF algorithm needs to select the first k items with higher similarity to a target item as nearest neighbors, so the number of the nearest neighbors affects the quality of the final recommendation quality, and performance indexes of accuracy, recall rate and coverage rate of the ItemCF under different k values are calculated as shown in table 3.
TABLE 3 ItemCF accuracy, recall, and coverage at different k values
as can be seen from table 3 and fig. 5, the accuracy, the recall rate and the k value are not linearly related, the recall rate is relatively stable along with the continuous increase of the k value, no great fluctuation occurs, the accuracy reaches an optimal value when k is equal to 10, and then gradually decreases, but the overall decreasing trend is relatively slow, the most obvious change is the coverage rate, the coverage rate gradually decreases along with the increase of k, the decreasing trend is obvious, the difference between the highest value and the lowest value of the coverage rate is about 13%, and when k is equal to 10, the three index values of the accuracy, the recall rate and the coverage rate are close to the optimal.
Experimental example 3
As shown in fig. 6, the optimal parameter selection of the ALS model-based collaborative filtering algorithm has three important parameters in the training process of the ALS model: the number of the eigenvectors, the regularization factor and the iteration frequency are obtained, a data set is divided into a training set and a test set according to the proportion of 6: 4, RMSE values under different parameter combinations are calculated to measure the quality of a model, the smaller the RMSE value is, the more accurate the model is, experiments are divided into four groups according to the iteration frequency, the iteration frequency takes the value of [5, 10, 15 and 20], the number of the eigenvectors in each group of experiments takes the value of [10, 15, 20 and 50], the regularization factor takes the value of [0.01, 0.05 and 0.1], the experiment result is shown in table 4, wherein numIters represents the iteration frequency, rank represents the eigenvectors, and lambda represents the regularization factor.
TABLE 4 RMSE values of ALS model under different parameters
as can be seen from table 4 and fig. 6, in each set of experiments, when the value of the regularization factor lambda is 0.01, the RMSE gradually increases with the number of iterations and the increase of the feature vector; when the value of the regularization factor lambda is 0.1, the RMSE is reduced along with the increase of the iteration times and the feature vectors; when the iteration number is 20, the feature vector is 50 and the regularization factor is 0.1, the RMSE value is the minimum, and the model effect is the best.
Experimental example 4
The determination of the weight alpha, the weighting adjustment factors of the project attribute similarity and the project scoring similarity, the value of the weighting adjustment factors have great influence on the prediction result, and in order to determine the optimal alpha value, the values of the accuracy, the recall rate and the RMSE of the alpha under different values are counted, and the experimental result is shown in Table 5.
TABLE 5 respective index values for different weighting factors
As shown in the table 5, the three indexes are not linearly related to the adjustment factor alpha, when the adjustment factor alpha is less than 0.6, the index values of the accuracy rate and the recall rate show an increasing trend, and the RMSE shows a decreasing trend; when the adjustment factor is larger than 0.6, the index values of the accuracy rate and the recall rate show a descending trend, and the RMSE shows an ascending trend, so that the values of the three indexes are optimal when alpha is equal to 0.6.
Experimental example 5
And analyzing and comparing the combined recommendation algorithm with a single recommendation algorithm, and comparing the item-based collaborative filtering algorithm, the fusion movie attribute recommendation algorithm, the ALS model-based collaborative filtering algorithm and the mixing algorithm under the accuracy, recall rate and coverage rate indexes, wherein the experimental results are shown in table 6, and the item-based collaborative filtering algorithm and the ALS model-based collaborative filtering algorithm both adopt the optimal parameters.
TABLE 6 index values for different algorithms
From table 6, it can be seen that the coverage rate index of the ALS algorithm is higher and better than other algorithms, but the accuracy rate and recall rate index value are extremely low, the algorithm is mixed through user feedback information, the interest of the user is more accurately grasped, the accuracy rate and recall rate index value are both improved, the coverage rate is reduced by 16% compared with the recommendation algorithm based on the ALS model, the coverage rate is improved by 10% compared with the recommendation algorithm fusing movie attributes, and the whole is maintained in a relatively balanced interval, so that the defect of low accuracy of the ALS model can be alleviated by the mixing mode.
in the above experimental example 1, experimental example 2, experimental example 3, experimental example 4, and experimental example 5, the movilens dataset is adopted, which belongs to the prior art;
In order to operate the experiment, a Spark cluster is built on a Hadoop platform, the cluster comprises 4 nodes, one node is a Master node, and the other three nodes are Slave nodes; common Spark operation modes include Spark Yam, Standalone and Spark mesotos; experimental example 1, experimental example 2, experimental example 3, experimental example 4 and experimental example 5 described above were all run in Standalone mode.
The above-mentioned embodiments only express the specific embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the method of the present application, which fall within the scope of the present application.

Claims (4)

1. A movie recommendation method based on an improved collaborative filtering algorithm is characterized in that: the method comprises the following steps of selecting three representative attribute information of the type, director and actors of a movie and fusing the attribute information into a calculation process of the similarity of the articles on the basis of the article-based collaborative filtering algorithm, namely the collaborative filtering algorithm fusing the attributes of the movie, and comprises the following steps:
(1) Constructing a user-movie scoring matrix according to the scoring information of the movie by the user;
(2) Evaluating the similarity degree between the movies according to the obtained scoring matrix to obtain scoring similarity;
(3) According to the film attribute information, a film-type matrix is constructed to calculate the similarity between film types, the film director similarity and the actor similarity are calculated, and finally the film attribute similarity is obtained;
(4) Weighting the scoring similarity and the attribute similarity according to the adjustment factors to obtain the similarity between the films;
(5) According to the film similarity information, finding out the nearest neighbors of the target film formed by the first k films which are most similar to the target film;
(6) and predicting the score of the user on the target movie according to the obtained neighbor set, and sequencing the predicted scores from high to low to form a final movie recommendation result to be presented to the user.
2. the movie recommendation method based on the improved collaborative filtering algorithm according to claim 1, wherein: in step (3), the type similarity: the value in the movie-type matrix is either a definite 1 or 0, and the calculation formula is as follows
In the formula, a numerator | N (i) | N (j) | represents the number of types owned by the movie i and the movie j, and a denominator sqrt (| N (i) | N (j) |) represents the calculation of multiplying and squaring the number of types owned by the movie i and the number of types owned by the movie j;
movie director similarity: as time continues to advance, more and more movies are shot by the director, the similarity between the movies is reduced, and the similarity is combined into a director similarity solving formula which is as follows
Wherein x represents the number of movies the director participates in, and γ represents the influence factor for weakening the similarity of the number of movies to the director;
actor similarity: selecting feature information of three main actors in the movie to form movie actors, and measuring similarity between actor features by adopting Jaccard formula as follows
In the formula of UiAnd Uja set of actors representing movie i and movie j;
after obtaining the similarity results of the three attribute features of the movie type, director and actors, the definition formula of the similarity of the movie attributes is as follows
sim_norm(i,j)=sim_type(i,j)+sim_dire(i,j)+sim_actor(i,j)
Then normalizing sim _ norm (i, j) to obtain the similarity of the attributes of the final shadow, wherein the formula is as follows
In the formula Xmaxand XminRespectively the maximum and minimum in the data set before normalization.
3. the movie recommendation method based on the improved collaborative filtering algorithm according to claim 1, wherein: the collaborative filtering algorithm based on the improved collaborative filtering algorithm further comprises the step of mixing the collaborative filtering algorithm based on the fusion movie attribute with the collaborative filtering algorithm based on the ALS model, namely a linear weighted mixing method, wherein the formula of the mixed model is as follows ═ δ Attrcf _ list + (1- δ) ALS _ list, and δ represents a weight value;
There are five main feedback operations of the user on the movie: browsing, collecting, playing, spitting and scoring, different feedback operations represent different attention degrees of the user to the movie, different scores are given to different feedback operations, and the scoring formulas of the recommendation algorithms according to the feedback operations of the user are as follows
Wherein u represents user u, I represents a recommendation list obtained by the recommendation algorithm, and I represents an item I, p in the recommendation listuiRepresenting the operation score of the user u on the commodity i;
After obtaining the operation behavior scores of the recommendation algorithms, the recognition degree of the user to the recommendation algorithms is obtained, and the formula is as follows
Wherein u represents a user, I represents a recommended content set of the type, I represents an item of the recommended content set, and puirepresenting the operation scores of the user u on the article i, n representing the number of feedback operation items, CIIndicating a recommended content set length;
In the formula, when puIwhen equal to 0, TuI0, and the recognition degree of each recommendation algorithm by the user is represented by the proportion of behavior operation to the recommendation list at the moment
the recognition degree is the proportion of each recommendation algorithm in the total recommendation category, and the final weight value of each recommendation algorithm is obtained by the following formula;
4. A movie recommendation method based on improved collaborative filtering algorithm according to any one of claims 1-3, characterized in that: the application of the improved collaborative filtering based algorithm in a movie recommendation system is provided.
CN201910771272.4A 2019-08-20 2019-08-20 Movie recommendation method based on improved collaborative filtering algorithm Expired - Fee Related CN110569374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910771272.4A CN110569374B (en) 2019-08-20 2019-08-20 Movie recommendation method based on improved collaborative filtering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910771272.4A CN110569374B (en) 2019-08-20 2019-08-20 Movie recommendation method based on improved collaborative filtering algorithm

Publications (2)

Publication Number Publication Date
CN110569374A true CN110569374A (en) 2019-12-13
CN110569374B CN110569374B (en) 2022-03-18

Family

ID=68775748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910771272.4A Expired - Fee Related CN110569374B (en) 2019-08-20 2019-08-20 Movie recommendation method based on improved collaborative filtering algorithm

Country Status (1)

Country Link
CN (1) CN110569374B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067826A1 (en) * 2012-09-06 2014-03-06 Todd Christopher Jackson Recommending users to add to groups in a social networking system
CN104462385A (en) * 2014-12-10 2015-03-25 山东科技大学 Personalized movie similarity calculation method based on user interest model
CN107368540A (en) * 2017-06-26 2017-11-21 北京理工大学 The film that multi-model based on user's self-similarity is combined recommends method
CN107943948A (en) * 2017-11-24 2018-04-20 中国科学院电子学研究所苏州研究院 A kind of improved mixing collaborative filtering recommending method
CN108415987A (en) * 2018-02-12 2018-08-17 大连理工大学 A kind of cold start-up solution that film is recommended

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067826A1 (en) * 2012-09-06 2014-03-06 Todd Christopher Jackson Recommending users to add to groups in a social networking system
CN104462385A (en) * 2014-12-10 2015-03-25 山东科技大学 Personalized movie similarity calculation method based on user interest model
CN107368540A (en) * 2017-06-26 2017-11-21 北京理工大学 The film that multi-model based on user's self-similarity is combined recommends method
CN107943948A (en) * 2017-11-24 2018-04-20 中国科学院电子学研究所苏州研究院 A kind of improved mixing collaborative filtering recommending method
CN108415987A (en) * 2018-02-12 2018-08-17 大连理工大学 A kind of cold start-up solution that film is recommended

Also Published As

Publication number Publication date
CN110569374B (en) 2022-03-18

Similar Documents

Publication Publication Date Title
Steck Calibrated recommendations
Li et al. Using multidimensional clustering based collaborative filtering approach improving recommendation diversity
CN107833117B (en) Bayesian personalized sorting recommendation method considering tag information
CN105138653B (en) It is a kind of that method and its recommendation apparatus are recommended based on typical degree and the topic of difficulty
CN107220365A (en) Accurate commending system and method based on collaborative filtering and correlation rule parallel processing
CN109903138B (en) Personalized commodity recommendation method
Jiao et al. A novel learning rate function and its application on the SVD++ recommendation algorithm
CN104462383A (en) Movie recommendation method based on feedback of users' various behaviors
Anand et al. Folksonomy-based fuzzy user profiling for improved recommendations
CN110083764A (en) A kind of collaborative filtering cold start-up way to solve the problem
Leite Dantas Bezerra et al. Symbolic data analysis tools for recommendation systems
CN106846029B (en) Collaborative filtering recommendation algorithm based on genetic algorithm and novel similarity calculation strategy
CN110362755A (en) A kind of recommended method of the hybrid algorithm based on article collaborative filtering and correlation rule
Puntheeranurak et al. An Item-based collaborative filtering method using Item-based hybrid similarity
CN110059257B (en) Project recommendation method based on score correction
CN109977299A (en) A kind of proposed algorithm of convergence project temperature and expert's coefficient
Chen et al. A fuzzy matrix factor recommendation method with forgetting function and user features
Maneeroj et al. Hybrid recommender system using latent features
Ma et al. A collaborative filtering recommendation algorithm based on hierarchical structure and time awareness
Li et al. Multidimensional clustering based collaborative filtering approach for diversified recommendation
Liu et al. Recent advances in personal recommender systems
CN110569374B (en) Movie recommendation method based on improved collaborative filtering algorithm
CN110825965A (en) Improved collaborative filtering recommendation method based on trust mechanism and time weighting
Ito et al. A study on improvement of serendipity in item-based collaborative filtering using association rule
Souza Cabral et al. Combining multiple metadata types in movies recommendation using ensemble algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220318