CN108415928A

CN108415928A - A kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms

Info

Publication number: CN108415928A
Application number: CN201810049750.6A
Authority: CN
Inventors: 郝宁宁; 李媛鸣; 王川; 陈梦瑶; 石冰洁; 刘二宝; 祝晓雪; 高婧
Original assignee: Individual
Current assignee: Shandong Changxiangyun Education Technology Co ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2018-08-17
Anticipated expiration: 2038-01-18
Also published as: CN108415928B

Abstract

The invention discloses a kind of book recommendation methods and system based on weighted blend k nearest neighbor algorithms, the present invention realizes the book recommendation based on single collaborative filtering, and mixing will be weighted based on user k arest neighbors in collaborative filtering and based on article k nearest neighbor algorithms, mixing is realized to recommend, recommended technology has been applied in book recommendation system by the present invention, purpose be in order to reading fan it is personalized recommend it to be possible to interested books, reduce the time that the fan that reads finds books interested in vast books information.The proposed algorithm that the present invention applies is the proposed algorithm based on collaborative filtering, the specifically used k arest neighbors arrived based on user and the k nearest neighbor algorithms based on article, these algorithms can be directed to the scoring to books of different readers, to different readers it is personalized recommend its interested books of possibility.

Description

A kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms

Technical field

The invention belongs to book recommendation technical fields, and in particular to a kind of books based on weighted blend k- nearest neighbor algorithms Recommend method and system.

Background technology

With the development of information technology and internet, people gradually from the epoch of absence of information entered into information overload when Generation.Many times, our problems faceds not instead of substance shortage, absence of information, these things are too many, and let us dim eyesight is entangled Disorderly, do not know how to select.In face of magnanimity information, be currently, there are of both problem, on the one hand, how from the information of overload In find oneself really interested content；On the other hand, the information that how informant makes them provide is interested People notices rather than is submerged in the information of magnanimity.

In order to solve problem of information overload, there is classified catalogue and search engine.They be all information and user it Between establish matched, user can find interested information by search key.However, there is also offices for search engine Sex-limited, first, the result that it is provided is not usually personalized, and different people are scanned for the same keyword, return As a result it is often the same, and interpersonal taste is often different；Therefore, search engine can not be accurate Ground is that different user filters information；Another limitation of search engine is exactly that it requires user that must have clearly to the demand of oneself Clear understanding, and can state out with keyword, however, user is sometimes there is certain demands, these demands they Oneself is not yet, it is realized that at this time search engine is with regard to helpless.Although both tools can help user very fast Find their may interested information.But these tools cannot all be directed to different users and provide personalized service.

Commending system is that another help information and user carry out matched means.It is different with search engine to be, Without the keyword outside user's amount of imports, it can be recorded commending system according to the previous historical behavior of user, actively excavate user's Hobby helps user to find potential point of interest, and by dependent merchandise or information recommendation to user.By thus according to each What the characteristics of user, was recommended, so it disclosure satisfy that personalized requirement, recommend to meet their individual characteies for different users The product of change demand, allows information more accurately to show in front of the user, meanwhile, it is also less dependent on user and is actively entered Information go filtering information.

In book recommendation field, seldom in view of different proposed algorithms is weighted mixing, but due to different Proposed algorithm has different advantage and disadvantage, tends not to obtain good book recommendation result using single proposed algorithm.k- Nearest neighbor algorithm is divided into the k- nearest neighbor algorithms based on user and the k- nearest neighbor algorithms based on article, and previous book recommendation is normal It is realized using the algorithm based on user or based on article, and is recommended just not taking into account based on user and be based in this way The advantage of the k- nearest neighbor algorithms of article.

Invention content

It is an object of the present invention to provide a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms, the calculations Method can be directed to the scoring to books of different user, recommend its possible interested books to different user individuals, The similarity for considering object involved in system comprehensively, improves the accuracy of recommendation,

To achieve the above object, the present invention uses following technical scheme：

A kind of book recommendation method based on weighted blend k- nearest neighbor algorithms of the present invention, includes the following steps：

Step 1, user's history books scoring behavioral data is randomly divided into M part according to being uniformly distributed, chooses portion conduct Test set is used as training set by remaining M-1 parts, is established based on use on the training set that user's history books score behavioral data Family and k- arest neighbors recommended models based on article；

Step 2, it is established and is used on the training set that user's history books score behavioral data by k- arest neighbors recommended models Family interest model generates Recommended Books list, then the test set by combining user's history books scoring behavioral data, calculates most The number k of similar users is in the case of initial value, the accuracy rate and recall rate of k- nearest neighbor algorithms；

Step 3, the number k for the most like user for establishing the set algorithm in k- arest neighbors recommended models is updated successively Value calculates the book recommendation list under different value of K；It is nearest based on user and the k- based on article and in the case of calculating different value of K The accuracy rate and recall rate of adjacent algorithm；

Step 4, the accuracy rate corresponding to each different value of K is added with recall rate, obtains based on user and is based on article K- nearest neighbor algorithms performance index value；Corresponding to the maximum value for taking the performance index value of the k- nearest neighbor algorithms based on user Parameter k optimal algorithm parameter k of the value as certain user in the k- nearest neighbor algorithms based on user value；Similarly, it takes The value of parameter k corresponding to the maximum value of multigroup performance index value of k- nearest neighbor algorithms based on article is as certain user in base The value of optimal algorithm parameter k in the k- nearest neighbor algorithms of article；

Step 5, certain user is inputted in the optimal algorithm performance index value of the k- nearest neighbor algorithms based on user and is based on object The performance index value of the k- nearest neighbor algorithms of product distributes the k- nearest neighbor algorithms based on user by the two values for the user With the weights of the k- nearest neighbor algorithms based on article；

Step 6, certain user is inputted in the value of the optimized parameter k of the k- nearest neighbor algorithms based on user and based on the k- of article The value of the optimized parameter k of nearest neighbor algorithm is that user generates based on user's using the k- arest neighbors recommended models in step (1) The book recommendation list of the book recommendation list of k- nearest neighbor algorithms and the k- nearest neighbor algorithms based on article；Then according to the use The weights that the k- nearest neighbor algorithms based on user and the k- nearest neighbor algorithms based on article are distributed in family are multiplied by book recommendation row The quantity N of books in table is calculated in final mixing recommendation list, Recommended Books caused by the k- nearest neighbor algorithms based on user Account for mixing books recommendation list quantity and k- nearest neighbor algorithms based on article caused by Recommended Books account for mixing books and push away The quantity for recommending list finally obtains mixing books recommendation list.

As a further improvement on the present invention, the establishment step of the k- arest neighbors recommended models in step 1 is as follows：

Step 1.1, the user's books rating matrix for being m*n by the training set processing of user's history books scoring behavioral data R；

Step 1.2, the similarity between user is calculated using Pearson correlation coefficient；Using between homologous factors calculating article Similarity；

Step 1.3, for each user, the similarity of the user and other users are arranged by sequence from big to small The similarity of the books and other books similarly for every books, is ranked up by sequence by sequence from big to small；

Step 1.4, according to similarity calculation as a result, combination algorithm parameter k, generates Candidate Recommendation books list, recycling The calculation formula for predicting scoring calculates the prediction scoring of every books in Candidate Recommendation books list, and by prediction scoring Sequence from big to small is ranked up Candidate Recommendation books list, takes former books groups of Candidate Recommendation books sorted lists At final book recommendation list, the Recommended Books list of the k- nearest neighbor algorithms based on user is generated；

Step 1.5, similarly, the candidate book recommendation list of the k- nearest neighbor algorithms based on article is generated, candidate is calculated and pushes away The prediction scoring of every books in books list is recommended, then by the sequence of prediction scoring from big to small to Candidate Recommendation books list It is ranked up；It takes preceding this books of P of Candidate Recommendation books sorted lists to form final book recommendation list, generates and be based on article K- nearest neighbor algorithms Recommended Books list.

As a further improvement on the present invention, the calculating formula of similarity in step 1.2 between user is as follows：

In above-mentioned formula, P (u, v) indicates user_uSimilarity between user v, I_uAnd I_vRespectively indicate user u and The books set that user v scored, r_uiAnd r_viScorings of the user u to the scoring and user v of article i to books i is indicated respectively, WithThe average score to books of user u and user v is indicated respectively；

The similarity between article is calculated, the similarity measurement used is homologous factors, wherein element t_knIndicate books k and figure The degree of association between book n, homologous factors T calculating process are as follows：

Work as user_uIt is scored books i, j, k, to the composition submatrix T of books i, j, k in homologous factors T_ijkIn Each element add numerical value 1；

Aforesaid operations are carried out for all user, then by all results addeds, and by the element in homologous factors T It is normalized, normalization formula calculates as follows：

In above-mentioned formula, t '_ijIndicate the similarity between books i and books j；

By normalized homologous factors, to obtain the similarity between article.

As a further improvement on the present invention, predict that the calculation formula of scoring is as follows in step 1.5：

In above formula,WithIt is average scores of the user u and user u ' to article, sim (u, u ') is user u and user Similarity between u ', N are the set of article (neighbours) composition most like with article i.

As a further improvement on the present invention, the accuracy rate and recall rate of the k- nearest neighbor algorithms in step 2, calculation formula As follows：

Step 2.1, accuracy rate

In formula, Precision (U (u) indicate for user u, the accuracy rate of the k- nearest neighbor algorithms based on user, (I (u) indicates that, for user u, the accuracy rate of the k- nearest neighbor algorithms based on article, R (U (u)) is indicated based on use to Precision The k- arest neighbors proposed algorithms at family are the book recommendation list that user u is generated, and R (I (u)) indicates the k- arest neighbors based on article Proposed algorithm is the book recommendation list that user u is generated, and T (u) indicates that the recommendation list for the article that user u scored, U indicate All users；

Step 2.2, recall rate

In formula, Recall (U (u) indicate for user u, the recall rate of the k- nearest neighbor algorithms based on user, Recall (I (u) indicate that, for user u, the recall rate of the k- nearest neighbor algorithms based on article, R (U (u)) indicates the k- arest neighbors based on user Proposed algorithm is the book recommendation list that user u is generated, and R (I (u)) indicates that the k- arest neighbors proposed algorithms based on article are to use The book recommendation list that family u is generated, T (u) indicate that the recommendation list for the article that user u scored, U indicate all users.

As a further improvement on the present invention, in step 5, weight computing formula is as follows：

In formula, Weight (U (u)), Weight (I (u)) indicate respectively user u to based on user k- nearest neighbor algorithms and Weights Pre (U (u)), the Re (U (u)) of k- nearest neighbor algorithms based on article indicate k- arest neighbors of the user u based on user respectively The accuracy rate and recall rate of algorithm, Pre (I (u)), Re (I (u)) indicate k- nearest neighbor algorithms of the user u based on article respectively Accuracy rate and recall rate.

A kind of book recommendation system based on weighted blend k- nearest neighbor algorithms, including consumer articles information collecting layer, deposit Reservoir, recommended engine module, interface layer；

The accumulation layer, the data for using and generating for storage system include the essential information and use of user, books Family behavioural information；

The consumer articles information collecting layer, connect with accumulation layer, for be responsible for typing and safeguard user, books base This information and user behavior information；

The recommended engine module, connect with accumulation layer, on the basis of being used for user to the historical behavior data of article It is calculated, generates recommendation list；Recommended using the k- arest neighbors recommended models based on user and the k- arest neighbors based on article Model construction recommended engine；

The interface layer, connect with recommended engine module, accumulation layer and front end display unit communicates, for calculated Data need to pass to front end display unit, and user passes the scoring of books back by the acquisition of front end display unit, and interface layer is Front end display unit, which calls, provides required data, and the user behavior data that front end display unit transmits is transferred to accumulation layer deposit With.

Compared with prior art, the present invention haing the following advantages：

The k- nearest neighbor algorithms for the weighted blend that the present invention uses have k- nearest neighbor algorithms based on user and are based on object The advantage of the k- nearest neighbor algorithms of product can generate recommendation to the history scoring record of article using user, and have higher Recommendation performance.Since single k- nearest neighbor algorithms only take into account the similarity i.e. similarity of user of single object in system Or the similarity of article, but without the similarity of object involved by comprehensive gauging system, and the k- arest neighbors of weighted blend is calculated Method not only allows for the similitude between user, while also contemplating the similitude between article, what final hybrid algorithm calculated The information of the existing Recommended Books calculated based on user's similarity in book recommendation list is also based on article similarity operation The information of the Recommended Books gone out, therefore the Recommended Books information that the k- nearest neighbor algorithms of weighted blend calculate considers more comprehensively The similarity of object involved in system, improves the accuracy of recommendation.The advantage of personalized weighted blend algorithm is also embodied in Power is assigned personalizedly, i.e., the weights that different user assigns k- nearest neighbor algorithms are different.By the books for analyzing different user History scoring record is that different user assigns difference to the k- last algorithms based on user and the k- nearest neighbor algorithms based on article Weight, and can according to user's books history score record changes constantly adjust the weight that algorithm is assigned make weighting mix The weights of hop algorithm remain optimal.The advantage of personalized weighted blend algorithm is also embodied in involved by k- nearest neighbor algorithms Core parameter k optimization.By the way that the books history scoring record of different user is divided into training set and test set, foundation pushes away Performance indicator accuracy rate and recall rate are recommended, selects the value of optimal parameter k for different user, and can remember according to user's history The value of the core parameter k of the variation dynamic adjustment algorithm of record, improves the performance of proposed algorithm.

The commending system of the present invention using the k- nearest neighbor algorithms of weighted blend there is the k- arest neighbors based on user to calculate The advantage of method and k- nearest neighbor algorithms based on article.The similarity of system object is utilized in the algorithm comprehensively, and can root According to the value and weights of the core parameter k of the variation dynamic adjustment algorithm of user's history record, has and preferably recommend performance.

Description of the drawings

Fig. 1 be the present invention a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms training most The flow chart of excellent parameter k value；

Fig. 2 is the recommendation of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention Journey flow chart；

Fig. 3 be the present invention a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms based on The collaborative filtering flow chart at family；

Fig. 4 is a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention based on object The collaborative filtering flow chart of product；

Fig. 5 is the weights point of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention With flow chart；

Fig. 6 is that the mixing of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention pushes away Recommend the flow chart of realization；

Fig. 7 is the system frame of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention Frame figure；

Fig. 8 is the ER figures of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention；

Fig. 9 is that the books of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention push away Recommend module design figure；

Figure 10 is the books of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention Proposed algorithm block diagram；

Figure 11 is the user of a kind of book recommendation method and system based on weighted blend k- nearest neighbor algorithms of the present invention The mixing recommendation list of cytun.

Specific implementation mode

The present invention is described in detail below in conjunction with the accompanying drawings：

As shown in Figure 1：A kind of book recommendation method based on weighted blend k- nearest neighbor algorithms of the present invention, including following step Suddenly：

Step 1, first, user's history books scoring behavioral data is randomly divided into M part according to being uniformly distributed, chooses portion As test set, training set is used as by remaining M-1 parts.Base is established on the training set that user's history books score behavioral data In user and k- arest neighbors recommended models based on article, as shown in Fig. 2, the establishment step of recommended models is as follows：

Step 1.1, the user's books rating matrix for being m*n by the training set processing of user's history books scoring behavioral data R, wherein m is expressed as m user, and n is expressed as this books of n, r_uiIndicate scorings of the user u to article i.

Step 1.2, the similarity between user is calculated, used similarity measurement is Pearson correlation coefficient, is calculated Formula is as follows：

In above-mentioned formula, P (u, v) indicates the similarity between user u and user v, I_uAnd I_vRespectively indicate user u and The books set that user v scored, r_uiAnd r_viScorings of the user u to the scoring and user v of article i to books i is indicated respectively, WithThe average score to books of user u and user v is indicated respectively.

The similarity between article is calculated, used similarity measurement is homologous factors, wherein element t_knIndicate books k with The degree of association between books n, homologous factors T calculating process are as follows：

When user u scores to books i, j, k, to the composition submatrix T of books i, j, k in homologous factors T_ijkIn Each element add numerical value 1.

In above-mentioned formula, t '_ijIndicate the similarity between books i and books j.By normalized homologous factors, To obtain the similarity between article.

Step 1.3, for each user, the similarity of the user and other users are arranged by sequence from big to small The similarity of the books and other books similarly for every books, is ranked up by sequence by sequence from big to small.

Step 1.4, as shown in figure 3, generating the Recommended Books list of the k- nearest neighbor algorithms based on user.It is assumed to be user U generates Recommended Books list.First, the initial value of setup parameter k values is 20, and parameter k is meant that, of most like user Number.Take neighbor user collection of preceding 20 users of family u sequencing of similarity lists as user u.Then, by 20 neighbor users It has scored but the Candidate Recommendation books list of books that user u does not score as user u, and in Candidate Recommendation books list Books score, and the calculation formula of books prediction scoring is as follows：

In above-mentioned formula,Indicate user_uPrediction scoring to books i,WithIt is user u and user u ' to books Average score, sim (u, u ') is the similarity between user u and user u ', r_u′iIndicate scorings of the user u ' to books i, with The set M of user's composition most like user u.

The list of Candidate Recommendation books is ranked up from big to small by the prediction scoring of candidate books, setup parameter N is 10, ginseng Number N is meant that the number of books in book recommendation list.Preceding 10 books of Candidate Recommendation books sorted lists are taken to form most Whole book recommendation list.

Step 1.5, as shown in figure 4, generating the Recommended Books list of the k- nearest neighbor algorithms based on article.

It is assumed to be user u and generates Recommended Books list.First, the initial value of setup parameter k values is 20, the meaning of parameter k It is the number of most like books.Using all books not scored of user u as Candidate Recommendation books list, certain this candidate is taken Preceding 20 books of the sequencing of similarity list for the books that Recommended Books i and user u has been evaluated are as certain this Candidate Recommendation books Neighbor picture book fair, and score this Candidate Recommendation books, the calculation formula of books prediction scoring is as follows：

In above formula,Indicate that user u scores to the prediction of books i,It is user u and the average score to books, sim (i, j) is the similarity between books i and books j, r_ujIndicate scorings of the user u to books j, D is the figure most like with books i The set of book composition.

According to above step calculate Candidate Recommendation books list in every books prediction scoring, then by prediction scoring from Small sequence is arrived greatly to be ranked up Candidate Recommendation books list.Then, setup parameter N is 10, and parameter N is meant that books push away Recommend the number of books in list.Finally, preceding 10 books of Candidate Recommendation books sorted lists is taken to form final book recommendation List.

Step 2, it can be established and be used on the training set that user's history books score behavioral data by above six steps Family interest model generates Recommended Books list, then the test set by combining user's history books scoring behavioral data, calculates and exist In the case of initial value k=20, the registration of agenda in predictive behavior and test set, the i.e. accuracy rate of k- nearest neighbor algorithms And recall rate, specific formula for calculation are as follows：

Step 2.1, accuracy rate

In above-mentioned formula, Precision (U (u) indicate for user u, the k- nearest neighbor algorithms based on user it is accurate Rate, (I (u) indicates that for user u, the accuracy rate of the k- nearest neighbor algorithms based on article, R (U (u)) expressions are based on to Precision The k- arest neighbors proposed algorithms of user are the book recommendation list that user u is generated, and R (I (u)) indicates that the k- based on article is nearest Adjacent proposed algorithm is the book recommendation list that user u is generated, and T (u) indicates the recommendation list for the article that user u scored, U tables Show all users.

Step 2.2, recall rate

In above-mentioned formula, Recall (U (u) indicate for user u, the recall rate of the k- nearest neighbor algorithms based on user, (I (u) indicates that, for user u, the recall rate of the k- nearest neighbor algorithms based on article, R (U (u)) is indicated based on user's to Recall K- arest neighbors proposed algorithms are the book recommendation list that user u is generated, and R (I (u)) indicates that the k- arest neighbors based on article is recommended Algorithm is the book recommendation list that user u is generated, and T (u) indicates that user u, the recommendation list of the article to score, U indicate institute There is user.

Step 3, the core that step 1 establishes algorithm set in the step 1.4 in recommended models and step 1.5 is updated successively Heart k values are updated to 30,40,50, calculate under different value of K, book recommendation list.Then step 3 is repeated, k=30, k=are calculated 40, in the case of k=50, the k- nearest neighbor algorithms based on user with the accuracys rate of the k- nearest neighbor algorithms based on article and call together The rate of returning.

Step 4, it by above 3 steps, finally obtains, in the case of k=20, k=30, k=40, k=50, based on use The accuracy rate and recall rate of the k- nearest neighbor algorithms at family and the k- nearest neighbor algorithms based on article amount to 8 groups of accuracys rate and recall Rate.Then, the accuracy rate corresponding to k=20, k=30, k=40, k=50 is added with recall rate, obtains the k- based on user The performance index value of nearest neighbor algorithm and k- nearest neighbor algorithms based on article.Next, the k- arest neighbors based on user is calculated Method, in the case of different value of K, 4 groups of performance indicators are ranked up by sequence from big to small, similarly, by the k- based on article Nearest neighbor algorithm, in the case of different value of K, 4 groups of performance index values are ranked up by sequence from big to small.Finally, base is taken The value of parameter k corresponding to the maximum value of 4 groups of performance index values of the k- nearest neighbor algorithms of user is as user based on use The value of optimal algorithm parameter k in the k- nearest neighbor algorithms at family.Similarly, 4 groups of property of the k- nearest neighbor algorithms based on article are taken Optimal calculation of the value of parameter k corresponding to the maximum value of energy index value as user in the k- nearest neighbor algorithms based on article The value of method parameter k.

The purpose of above four step is optimal algorithm parameter k to be trained for different users, and obtain optimized parameter k values institute The performance index value of corresponding algorithm, i.e. accuracy rate add recall rate.Next, the performance of the algorithm obtained using above four step is referred to Scale value to k- nearest neighbor algorithm of the k- nearest neighbor algorithms family based on user based on article carries out tax power.

Step 5, certain user is inputted in the optimal algorithm performance index value of the k- nearest neighbor algorithms based on user and is based on object The performance index value of the k- nearest neighbor algorithms of product.By the two values the k- nearest neighbor algorithms based on user are distributed for the user With the weights of the k- nearest neighbor algorithms based on article, weight computing formula is as follows：

In above formula, Weight (U (u)), Weight (I (u)) indicate that user u calculates the k- arest neighbors based on user respectively The weights of method and k- nearest neighbor algorithms based on article, Pre (U (u)), Re (U (u)) indicate k-s of the user u based on user respectively The accuracy rate and recall rate of nearest neighbor algorithm, Pre (I (u)), Re (I (u)) indicate k- arest neighbors of the user u based on article respectively The accuracy rate and recall rate of algorithm, and calculated result is subjected to the processing that rounds up, using final result as certain user couple The weights that k- nearest neighbor algorithms based on user and the k- nearest neighbor algorithms based on article are distributed.The flow chart of the step such as figure below Shown in 5：

Step 6, as shown in fig. 6, input certain user the optimized parameter k of the k- nearest neighbor algorithms based on user value and base It is that user generates based on user's using the recommended models in step 1 in the value of the optimized parameter k of the k- nearest neighbor algorithms of article The book recommendation list of the book recommendation list of k- nearest neighbor algorithms and the k- nearest neighbor algorithms based on article.Then according to the use The weights that the k- nearest neighbor algorithms based on user and the k- nearest neighbor algorithms based on article are distributed in family calculate final mixing and push away It recommends in list, Recommended Books caused by the k- nearest neighbor algorithms based on user account for the quantity of mixing books recommendation list and are based on Recommended Books caused by the k- nearest neighbor algorithms of article account for the quantity of mixing books recommendation list, and calculation formula is as follows：

N (U (u))=Weight (U (u)) × N (13)

N (U (u))=Weight (I (u)) × N (14)

In above-mentioned formula, N (U (u)) indicates that Recommended Books caused by the k- nearest neighbor algorithms based on user account for mixing The quantity of book recommendation list, N (I (u)) indicate that Recommended Books caused by the k- nearest neighbor algorithms based on article account for combination chart The quantity of book recommendation list, Weight (U (u)), Weight (I (u)) indicate that user u calculates the k- arest neighbors based on user respectively Method and weights to the k- nearest neighbor algorithms based on article, N indicate the quantity of books in book recommendation list.By result of calculation into Row round finally obtains mixing books recommendation list.

Generating the principle of mixing recommendation list is, the obtained N ' of number N for recommending article are multiplied by with the weights of algorithm, then Take a books of preceding N ' of the recommendation list of the algorithm as the books in final mixing recommendation list, two algorithms take difference respectively The books of quantity are as final mixing recommendation list.

The present invention also provides a kind of book recommendation system based on weighted blend k- nearest neighbor algorithms, commending system wants face To two important objects：The core of user and article, system is recommended engine, it links together user and article.Books The groundwork of commending system is to provide a user a book recommendation list.General user, article, recommended engine three parts As soon as forming a complete commending system, this section respectively explains the modelling of system general frame and these three parts It states, specific system framework figure is illustrated in fig. 7 shown below：

Accumulation layer is used for the data that storage system is used and generated, mainly the essential information including user, books, books, User behavior information.

It is responsible for typing and safeguards user, the essential information of books and user behavior information in consumer articles information collecting layer.

Recommended engine calculates on the basis of historical behavior data of the user to article, generates recommendation list.This is Using Collaborative Filtering Recommendation Algorithm, specifically used is k- arest neighbors collaborative filtering based on user and is based on object system The k- arest neighbors collaborative filterings of product build recommended engine.Since algorithm respectively has quality, for such situation, system introduces Recommendation method is mixed, the algorithm based on k- arest neighbors is weighted mixing.

Interface layer is responsible for the communication of system and Front End.Because system operation needs to pass in backstage, calculated data Front end displaying is passed, user is obtained the scoring of books by front end and passed back, and the work of interface layer is exactly to call to provide for front end The user behavior data that front end is transmitted is transferred to accumulation layer deposit to use by required data.

Database uses the behavioral data of SQLite storage users and the master data of user, books.The data of this system Inventory is in user, books entity, and there are the relationship of multi-to-multi between user and books, there are multi-to-multis between user and user Relationship.These entity attributes are illustrated in fig. 8 shown below：

Schemed based on this ER, can be designed that corresponding database table：

(a) system object information collection

System object includes user information, books essential information.User information is divided into two parts acquisition, and first part comes The information filled in when from user's registration；Second part is calculated according to existing subscriber's behavioral data by system background , wherein the numerical value of the core parameter of the involved k- arest neighbors based on article and the k- nearest neighbor algorithms based on user, with And their weights are assigned when mixing both algorithms respectively.

For new user, since there is no user behavior data, background program will be unable to train the optimal ginseng of algorithm for it Number, assigns the weights of algorithms of different, and therefore, according to recommending emulation experiment to be drawn a conclusion, the optimized parameter of most of user is all It is 50, is 9 and 1, sheet for the weights that the k- nearest neighbor algorithms based on article and the k- nearest neighbor algorithms based on user are assigned The optimized parameter k value default settings of new user are just 50 by system, and weights default setting is 9 and 1, and user's specific object is as follows Shown in table 1：

Table 1

The acquisition of books essential information is to utilize crawler technology, is generated by bean cotyledon api interface, books specific object is such as Shown in the following table 2：

Table 2

The acquisition of user behavior record：

To being exactly that user behavior records used in algorithm in commending system, this system generally refers to user to books Scoring record.The acquisition of the record is to retrieve books on system foreground by user, and score books, the score data It will be transmitted in background data base by interface, user behavior records the specific interior of involved user-books grade form Hold, as shown in table 3 below：

Table 3

Recommending module mainly establishes user, article recommended models, training algorithm core parameter, computational algorithm weights, is User recommends may interested books.The module is the nucleus module of this book recommendation system, and function is exactly for user Realize the recommendation of books list.The module is related to two proposed algorithms altogether, is respectively, k- arest neighbors based on article and is based on The k- arest neighbors collaborative filterings of article.The recommendation function of independent algorithm not only may be implemented in the module, also by being based on using The weighted blend of the k- arest neighbors at family and the k- nearest neighbor algorithms based on article realizes the function that mixing is recommended.Recommending module is set Meter figure, is illustrated in fig. 9 shown below：

This system collects user behavior record first, and is generated user-article rating matrix to carry out related operation, To realize that the function of being recommended user, proposed algorithm design frame chart are illustrated in fig. 10 shown below：

Embodiment：

Underneath with sharing the data set of 129334 user's books scoring record, wherein relate to altogether 265 books and 1968 users.

(1) first, which is randomly divided into 3 parts according to being uniformly distributed, chooses portion as test set, it will be remaining 2 parts are used as training set.The following table 4 and table 5 illustrate the training set and test set of user cytun.

Table 4

Table 5

(2) it is established on the training set that the history books of user cytun score behavioral data based on user and is based on article K- arest neighbors recommended models, in conjunction with user cytun history books score behavioral data test set calculate k=20, k= 30, in the case of k=40, k=50, the accuracy rate of the k- nearest neighbor algorithms based on user and the k- nearest neighbor algorithms based on article And recall rate, amount to 8 groups of accuracys rate and recall rate.Shown in table 6 and table 7 specific as follows, table 6 is the parameter based on user cytun Performance, table 7 are the performance parameters based on article cytun.

Table 6

Table 7

Using user cytun in the optimal algorithm performance index value of the k- nearest neighbor algorithms based on user and based on article The performance index value of k- nearest neighbor algorithms is that user cytun distributes the k- nearest neighbor algorithms based on user and the k- based on article The weights of nearest neighbor algorithm are obtained according to weight computing formula, and the weights of the k- nearest neighbor algorithms based on user are 2, are based on article K- nearest neighbor algorithms weights be 8.

Input user cytun the value of the optimized parameter k of the k- nearest neighbor algorithms based on user and k- based on article most The value of the optimized parameter k of nearest neighbor algorithm is pushed away using the books that recommended models are k- nearest neighbor algorithms of user's generation based on user Recommend the book recommendation list of list and the k- nearest neighbor algorithms based on article；Then nearest to the k- based on user according to the user The weights that adjacent algorithm and k- nearest neighbor algorithms based on article are distributed calculate in final mixing recommendation list, based on user's Recommended Books caused by k- nearest neighbor algorithms account for the quantity of mixing books recommendation list and the k- nearest neighbor algorithms based on article Generated Recommended Books account for the quantity of mixing books recommendation list, and obtaining the k- arest neighbors based on user according to calculation formula calculates The quantity that Recommended Books caused by method account for mixing books recommendation list is 2, produced by the k- nearest neighbor algorithms based on article Recommended Books account for mixing books recommendation list quantity be 8.Final mixing books recommendation list is illustrated in fig. 11 shown below.

The proposed algorithm that the present invention applies is to be based on weighted blend k- nearest neighbor algorithms, specifically used to have arrived based on user's K- arest neighbors and k- nearest neighbor algorithms based on article, these algorithms can be directed to the scoring to books of different readers, to not Same reader recommends its may interested books personalizedly.In collaborative filtering, the parameter of selection is different, can be to pushing away The effect recommended generates different influences.The present invention is also the method by test experiment, is had trained based on k- arest neighbors for user Core parameter k in algorithm, in order to the performance for improving proposed algorithm, to obtain optimal recommendation results.The present invention is real The book recommendation based on single collaborative filtering is showed, and based on user k- arest neighbors and will be based in collaborative filtering Article k- nearest neighbor algorithms are weighted mixing, realize mixing and recommend.Finally personalized book recommendation system is made A design realizes the function of books history scoring record queries, book information inquiry and book recommendation, and on data set Recommendation emulation experiment is carried out.

More than, only presently preferred embodiments of the present invention is not limited only to the practical range of the present invention, all according to the scope of the invention The equivalence changes done of content and modification, all should be the technology scope of the present invention.

Claims

1. a kind of book recommendation method based on weighted blend k- nearest neighbor algorithms, which is characterized in that include the following steps：

Step 1, user's history books scoring behavioral data is randomly divided into M part according to being uniformly distributed, chosen a as testing Collection, by remaining M-1 part be used as training set, user's history books score behavioral data training set on establish based on user with K- arest neighbors recommended models based on article；

Step 2, by k- arest neighbors recommended models, to establish user emerging on the training set that user's history books score behavioral data Interesting model generates Recommended Books list, then the test set by combining user's history books scoring behavioral data, calculates most like The number k of user is in the case of initial value, the accuracy rate and recall rate of k- nearest neighbor algorithms；

Step 3, the number k values for the most like user for establishing algorithm set in k- arest neighbors recommended models, meter are updated successively Calculate the book recommendation list under different value of K；And in the case of calculating different value of K, calculated based on user and the k- arest neighbors based on article The accuracy rate and recall rate of method；

Step 4, the accuracy rate corresponding to each different value of K is added with recall rate, obtains the k- based on user and based on article The performance index value of nearest neighbor algorithm；Take the ginseng corresponding to the maximum value of the performance index value of the k- nearest neighbor algorithms based on user The value of optimal algorithm parameter k of the value of number k as certain user in the k- nearest neighbor algorithms based on user；Similarly, it takes and is based on The value of parameter k corresponding to the maximum value of multigroup performance index value of the k- nearest neighbor algorithms of article is as certain user based on object The value of optimal algorithm parameter k in the k- nearest neighbor algorithms of product；

Step 5, certain user is inputted in the optimal algorithm performance index value of the k- nearest neighbor algorithms based on user and based on article The performance index value of k- nearest neighbor algorithms distributes k- nearest neighbor algorithms and base based on user by the two values for the user In the weights of the k- nearest neighbor algorithms of article；

Step 6, it is nearest in the value of the optimized parameter k of the k- nearest neighbor algorithms based on user and k- based on article to input certain user The value of the optimized parameter k of adjacent algorithm is that user generates the k- based on user most using the k- arest neighbors recommended models in step (1) The book recommendation list of the book recommendation list of nearest neighbor algorithm and the k- nearest neighbor algorithms based on article；Then according to the user couple The weights that k- nearest neighbor algorithms based on user and the k- nearest neighbor algorithms based on article are distributed, are multiplied by book recommendation list The quantity N of books is calculated in final mixing recommendation list, and Recommended Books caused by the k- nearest neighbor algorithms based on user account for mixed It closes Recommended Books caused by the quantity of book recommendation list and the k- nearest neighbor algorithms based on article and accounts for mixing book recommendation row The quantity of table finally obtains mixing books recommendation list.

2. a kind of book recommendation method based on weighted blend k- nearest neighbor algorithms according to claim 1, feature exist In the establishment step of the k- arest neighbors recommended models in step 1 is as follows：

Step 1.1, the user's books rating matrix R for being m*n by the training set processing of user's history books scoring behavioral data；

Step 1.2, the similarity between user is calculated using Pearson correlation coefficient；Using similar between homologous factors calculating article Degree；

Step 1.3, for each user, the similarity of the user and other users are ranked up by sequence from big to small, Similarly, for every books, the similarity of the books and other books is ranked up by sequence from big to small；

Step 1.4, according to similarity calculation as a result, combination algorithm parameter k, generates Candidate Recommendation books list, recycling prediction The calculation formula of scoring, calculates the prediction scoring of every books in Candidate Recommendation books list, and scores from big by prediction Candidate Recommendation books list is ranked up to small sequence, former books of Candidate Recommendation books sorted lists is taken to form most Whole book recommendation list generates the Recommended Books list of the k- nearest neighbor algorithms based on user；

Step 1.5, similarly, the candidate book recommendation list of the k- nearest neighbor algorithms based on article is generated, Candidate Recommendation figure is calculated The prediction of every books in book list is scored, then is carried out to Candidate Recommendation books list by the sequence of prediction scoring from big to small Sequence；It takes preceding this books of P of Candidate Recommendation books sorted lists to form final book recommendation list, generates the k- based on article The Recommended Books list of nearest neighbor algorithm.

3. a kind of book recommendation method based on weighted blend k- nearest neighbor algorithms according to claim 2, feature exist In the calculating formula of similarity in step 1.2 between user is as follows：

In above-mentioned formula, P (u, v) indicates the similarity between user u and user v, I_uAnd I_vUser u and user v is indicated respectively The books set to score, r_uiAnd r_viScorings of the user u to the scoring and user v of article i to books i is indicated respectively,WithPoint Not Biao Shi user u and user v the average score to books；

The similarity between article is calculated, the similarity measurement used is homologous factors, wherein element t_knIndicate books k and books n Between the degree of association, homologous factors T calculating process is as follows：

When user u scores to books i, j, k, to the composition submatrix T of books i, j, k in homologous factors T_ijkIn it is every One element adds numerical value 1；

Aforesaid operations are carried out for all users, then by all results addeds, and the element in homologous factors T are carried out Normalized, normalization formula calculate as follows：

By normalized homologous factors, to obtain the similarity between article.

4. a kind of book recommendation method based on weighted blend k- nearest neighbor algorithms according to claim 2, feature exist In, predicted in step 1.5 scoring calculation formula it is as follows：

In above formula,WithAverage scores of the user u and user u ' to article, sim (u, u ') be user u and user u ' it Between similarity, with article i most like article (neighbours) form set N.

5. a kind of book recommendation method based on weighted blend k- nearest neighbor algorithms according to claim 1, feature exist In the accuracy rate and recall rate of the k- nearest neighbor algorithms in step 2, calculation formula is as follows：

Step 2.1, accuracy rate

In formula, Precision (U (u) indicate for user u, the accuracy rate of the k- nearest neighbor algorithms based on user, Precision (I (u) indicates that for user u, the accuracy rate of the k- nearest neighbor algorithms based on article, R (U (u)) indicates that the k- based on user is nearest Adjacent proposed algorithm is the book recommendation list that user u is generated, and R (I (u)) indicates that the k- arest neighbors proposed algorithms based on article are The book recommendation list that user u is generated, T (u) indicate that the recommendation list for the article that user u scored, U indicate all users；

Step 2.2, recall rate

In formula, Recall (U (u) indicate for user u, the recall rate of the k- nearest neighbor algorithms based on user, Recall (I (u) It indicates for user u, the recall rate of the k- nearest neighbor algorithms based on article, R (U (u)) indicates that the k- arest neighbors based on user pushes away It is the book recommendation list that user u is generated to recommend algorithm, and R (I (u)) indicates that the k- arest neighbors proposed algorithms based on article are user The book recommendation list that u is generated, T (u) indicate that the recommendation list for the article that user u scored, U indicate all users.

6. a kind of book recommendation method based on weighted blend k- nearest neighbor algorithms according to claim 1, feature exist In in step 5, weight computing formula is as follows：

In formula, Weight (U (u)), Weight (I (u)) indicate respectively user u to based on user k- nearest neighbor algorithms and be based on Weights Pre (U (u)), the Re (U (u)) of the k- nearest neighbor algorithms of article indicate k- nearest neighbor algorithms of the user u based on user respectively Accuracy rate and recall rate, Pre (I (u)), Re (I (u)) indicate the accurate of k- nearest neighbor algorithms of the user u based on article respectively Rate and recall rate.

7. a kind of book recommendation system based on weighted blend k- nearest neighbor algorithms, which is characterized in that including consumer articles information Collecting layer, accumulation layer, recommended engine module, interface layer；

The accumulation layer, the data for using and generating for storage system, including user, the essential information of books and user's row For information；

The consumer articles information collecting layer, connect with accumulation layer, for be responsible for typing and safeguard user, books basic letter Breath and user behavior information；

The recommended engine module, connect with accumulation layer, for user to being carried out on the basis of the historical behavior data of article It calculates, generates recommendation list；Using the k- arest neighbors recommended models based on user and the k- arest neighbors recommended models based on article Build recommended engine；

The interface layer, connect with recommended engine module, accumulation layer and front end display unit communicates, and is used for calculated data It needs to pass to front end display unit, user passes the scoring of books back by the acquisition of front end display unit, and interface layer is front end Display unit, which calls, provides required data, transfers to accumulation layer deposit to use the user behavior data that front end display unit transmits.