CN113326433B - Personalized recommendation method based on ensemble learning - Google Patents

Personalized recommendation method based on ensemble learning Download PDF

Info

Publication number
CN113326433B
CN113326433B CN202110629501.6A CN202110629501A CN113326433B CN 113326433 B CN113326433 B CN 113326433B CN 202110629501 A CN202110629501 A CN 202110629501A CN 113326433 B CN113326433 B CN 113326433B
Authority
CN
China
Prior art keywords
user
data
personalized recommendation
score
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110629501.6A
Other languages
Chinese (zh)
Other versions
CN113326433A (en
Inventor
段勇
杨堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Publication of CN113326433A publication Critical patent/CN113326433A/en
Application granted granted Critical
Publication of CN113326433B publication Critical patent/CN113326433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to the field of machine learning and recommendation systems, in particular to an integrated learning-based personalized recommendation method. The data preprocessing module is mainly responsible for reintegrating the data characteristics, and solves the problem of difficult extraction of complex characteristics by constructing new characteristics and reducing the popular learning dimension; the model establishment and optimization module is mainly responsible for establishing a personalized integrated learning prediction model based on the fused data, and performing Bayesian optimization on the basis of the establishment of the prediction model, so that the accuracy of personalized recommendation is improved; the personalized recommendation module is mainly responsible for acquiring the data result of the prediction model, acquiring the personalized recommendation result through a Top N recommendation method and verifying the personalized recommendation result. The method can improve the accuracy of personalized recommendation through ensemble learning; in addition, the popular learning is fused to reduce the dimension so as to realize the fusion of the data characteristics, and further solve the problem of difficult extraction of complex characteristics.

Description

Personalized recommendation method based on ensemble learning
Technical Field
The invention relates to the field of machine learning and recommendation systems, in particular to a personalized recommendation method based on popular learning LPP (local retention projection algorithm, locally Preserving Projections) and integrated learning GBDT (gradient lifting decision tree, gradient Boosting Decision Tree).
Background
In recent years, with continuous updating of internet technology and computer technology, the internet brings huge information data volume, and meanwhile, the phenomenon of information overload is also aggravated. Although the selection range of the information resource is expanded for the user, how to quickly and effectively screen the useful information from huge data, and improving the utilization efficiency of the information becomes a great difficulty in the development of the contemporary internet. Many existing web applications (e.g., web portals and search engines, etc.) are essentially one way to help users filter information. However, these methods can only meet the mainstream demands of users, the problem of individualization is not considered, and the problem of information overload is not solved well yet. Personalized recommendation is an effective method for solving the information overload problem as an important information filtering means.
With the development of the machine learning age, the application of the machine learning method in the field of recommendation algorithms has become a great trend. Personalized recommendations have also resorted to many machine learning methods such as support vector machines, decision trees, neural networks, deep learning, clustering, dimension reduction, regression prediction, ensemble learning, etc. The personalized recommendation method based on machine learning can effectively solve the problems that a similarity calculation method is monotonous, the similarity calculation complexity is high, potential interests of users are difficult to mine, user tag information and demographic information are difficult to utilize, commodity feature extraction is difficult, and the like, however, the user tag information, the demographic information and commodity feature information are poor in effect in solving the cold start problem, and are indispensable information for acquiring the potential interests of the users.
Disclosure of Invention
Object of the Invention
The invention provides a personalized recommendation method based on a local reservation projection algorithm and ensemble learning, which aims to solve the problem of information overload in a recommendation system and improve the efficiency and the precision of personalized recommendation.
Technical proposal
A personalized recommendation method based on ensemble learning, the method comprising:
step 1: analyzing dimension attributes of the personalized recommendation data, and dividing the personalized recommendation data into user-object-grading data; data association is performed on the associated user-item-scoring dimension;
step 2: after the processing is finished, analyzing the data type of each dimension attribute of the user-object-grading, and converting the data type into the data type required by the ensemble learning;
step 3: generating characteristic attributes according to scoring attributes in the dimension attributes of user-object scoring;
step 4: all the obtained data are subjected to standardized processing, and the calculation mode is as follows:
wherein v represents an original value of the data, v' represents a value after normalization processing, min represents a minimum value of a column in which v is located, and max represents a maximum value of a column in which v is located;
step 5: let "user-item-score" dataset A in original space have m sample points x 1 ,x 2 ,...,x m Sample point x i Is an l-dimensional vector, i is an integer from 1 to m, and the matrix formed by m samples according to columns is X; performing dimension reduction processing on the data set A by using a popular learning LPP method, wherein the dimension-reduced data set B is a sample point y 1 ,y 2 ,...,y m Composition, sample point y i Is an n-dimensional vector, the m samples are in a matrix of columns Y, where l > n;
step 6: the dimension-reduced data set B is processed according to 8:2 is divided into a training set Train and a Test set Test, wherein a data matrix corresponding to the training set Train is Y';
step 7: establishing a personalized recommendation model by adopting an integrated learning GBDT method;
step 8: optimizing GBDT model parameters by adopting a Bayesian method;
step 9: the GBDT personalized recommendation model is retrained by selecting the optimal super-parameter combination obtained through Bayes optimization;
step 10: and (3) Top N recommendation and effect verification are carried out according to the prediction result of the finally obtained personalized recommendation model on the test set.
In step 3, the number of times of scoring the article by each user is counted, and the formula is as follows:
b represents the b-th user in the "user-item-scoring" dataset a, where there are d total users in dataset a, R (b) is the score of user b for each item, countRating refers to "sum of number of items per user comment".
The step 5 specifically includes the following steps:
step 5.1: constructing a graph, calculating a sample x in a user-object-score data set A i And sample x j The euclidean distance of all samples except for, the formula is as follows:
taking a mean value of samples, wherein epsilon is a manually set threshold value, m is the total number of samples in a data set, and if the Euclidean distance is smaller than the value epsilon, two samples are considered to be very close, and one side is established between a node i and a node j of the graph;
step 5.2: determining weights, if the node i is connected with the node j, the weights of edges between the node i and the node j are calculated as follows by a nuclear heat function:
ω ij representation ofWeights, x, between node i and node j i And x j For the samples in "user-item-score" dataset a, t is a real number greater than 0 set by human;
step 5.3: the projection matrix is calculated as follows:
XLX T a=λXDX T a (5)
let the solution in the formula be a 0 ,a 1 ,...,a l-1 And their corresponding eigenvalues λ are ordered from small to large, with the projective transformation matrix being c= (a) 0 ,a 1 ,...,a l-1 ) Then the sample point y after dimension reduction i =C T x i
Wherein X is the matrix X mentioned in step 5, and the adjacent matrix W is represented by the weight omega in step two ij Constructing; the main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, wherein the weighted degree of node i is the sum of the weights of all the edges associated with that node, i.e. the sum of each row of elements of the adjacency matrix W; the laplace matrix L is defined as l=d-W.
The step 7 comprises the following steps:
step 7.1: the GBDT model is defined as follows:
y 'is Y' mentioned in step 6, K represents the round of the score prediction learner, and K represents the total round of the score prediction learner; f (f) k (Y') represents the score prediction learner of the kth round, h k (Y') represents the kth CART (Classification and Regression Trees, categorical regression tree) decision regression tree;
step 7.2: constructing a CART decision regression tree, namely h (Y') in the step 7.1;
step 7.3: the scoring prediction learner adopts a forward step-by-step algorithm; the model of the kth step is formed by the model of the kth-1 step, namely the kth step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
f k (Y′)=f k-1 (Y′)+β k (7)
f k (Y') is a kth round of score prediction learner, f k-1 (Y') is the k-1 th round of score prediction learner, beta k Representing the residual error generated by the kth round;
step 7.4: and continuing iteration until the iteration is completed, and completing model establishment.
The step 7.2 includes the following steps:
step 7.21: dividing the preprocessed data set B into H 1 ,H 2 ,...H o The output value of each region is respectively: p is p 1 ,p 2 ,...,p o
Step 7.22: recursively dividing each region into two sub-regions and determining an output value on each sub-region; selecting an optimal segmentation variable q and a segmentation point s according to the following formula;
p 1 for the region H divided in step 7.21 1 Output of p 2 For the region H divided in step 7.21 2 Output of u v And w v Respectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the value of the vmax is the number of samples of the divided region; traversing the variable q, scanning the fixed segmentation variable q for a segmentation point s, and selecting a pair (q, s) enabling the upper expression to reach a minimum value; dividing the region with the selected pair (q, s) and determining a corresponding output value;
step 7.23: continuing to call the steps 7.21 and 7.22 for the two sub-areas until a stop condition is met;
step 7.24: repartitioning the input space into o regions H' 1 ,H′ 2 ,...H′ o Generating a scoring prediction CART decision regression tree, wherein the formula is as follows:
h (u) is a partial prediction CART decision regression tree, H' v For the divided areas, O is represented as a divided area subscript, and O is represented as the number of divided total areas; p is p o For a fixed output value of the region divided in step 7.21, q 'and s' are optimal solutions iterated through steps 7.21 and 7.22.
The step 8 includes the following steps:
step 8.1: initializing a data set D '= (x' 1 ,y′ 1 ),...,(x′ n ,y′ n ) Wherein y' i =f′(x′ i ) The method comprises the steps of carrying out a first treatment on the surface of the f '(x') is the mapping relation from the dimension attribute in the data to the score;
step 8.2: GBDT model uses selected super-ginseng combinations x' i Training and calculating f '(x' i );
Step 8.3: calculating the next super-ginseng combination to super-ginseng x 'by adopting an acquisition function' i+1
Step 8.4: repeating the steps 8.2 and 8.3, and iterating for T' times;
step 8.5: outputting the super-parametric combination of the optimized objective function f '(x').
The step 10 includes the following steps:
step 10.1: setting an N value, namely recommending N articles to users, and defining the number of the users as count;
step 10.2: for each user, the real recommendation list generated on the Test set Test is marked as T (All), and scoring and predicting are carried out on the Test set Test according to the GBDT recommendation model completed by the Bayesian optimization, and the obtained result is defined as Test evaluation set;
step 10.3: grading and sorting the Test evaluation sets, recommending the first N articles to users, and marking a Top N recommendation list obtained by each user as T (Test);
step 10.4: verifying the accuracy and recall rate results of the test evaluation groups;
step 10.5: calculating the length of T (Test);
step 10.6: calculating the length of T (All);
step 10.7: calculating an intersection T (U) between the Top N recommendation list of each user and T (Test);
step 10.8: calculating accuracy:accumulating the accuracy rate generated by each user, and dividing the sum by count to obtain average accuracy rate;
step 10.9: calculating recall rate:and accumulating the recall rates generated by each user, and dividing the sum by count to obtain the average recall rate.
Advantages and effects
1. The invention utilizes the related technology in the machine learning field, solves the problem of information overload in the current society through popular learning, reduces the dimension information of the data characteristic attribute, reduces the model training time, improves the model learning capability, and greatly improves the recommending efficiency.
2. Personalized recommendation is performed through ensemble learning, and a Bayesian recommendation model is optimized, so that recommendation accuracy is improved, useful information can be more rapidly and effectively screened out from huge data, and information utilization efficiency is improved.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a flow chart of data feature preprocessing;
FIG. 3 is a personalized recommendation flow chart.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art.
A personalized recommendation method based on popular learning LPP and integrated learning GBDT can improve the accuracy of personalized recommendation through integrated learning; in addition, the popular learning is fused to reduce the dimension so as to realize the fusion of the data characteristics, and further solve the problem of difficult extraction of complex characteristics.
FIG. 1 is a general flow chart of the present invention, comprising the following 10 steps, wherein steps 1-6 are the recommended data preprocessing section of FIG. 1; step 7, constructing a personalized recommendation model part in the figure 1; step 8 and step 9 are the optimized model parts in fig. 1; step 10 is the personalized recommendation component of fig. 1.
The data preprocessing module is mainly responsible for reintegrating the data characteristics, and solves the problem of difficult extraction of complex characteristics by constructing new characteristics and reducing the popular learning dimension; the model establishment and optimization module is mainly responsible for establishing a personalized integrated learning prediction model based on the fused data, and performing Bayesian optimization on the basis of the establishment of the prediction model, so that the accuracy of personalized recommendation is improved; the personalized recommendation module is mainly responsible for acquiring the data result of the prediction model, acquiring the personalized recommendation result through a Top N recommendation method and verifying the personalized recommendation result.
The specific detailed steps are as follows:
recommended data preprocessing section:
FIG. 2 is a flow chart of the characteristic data preprocessing of the present invention, and the specific implementation steps are as follows:
step 1: analyzing dimension attributes of the personalized recommendation data, and dividing the personalized recommendation data into user-object-grading data; data association is performed on the associated "user-item-score" dimension.
Step 2: after the processing is completed, the data type of each dimension attribute is analyzed and converted into the data type required by the ensemble learning.
Step 3: and generating characteristic attributes according to a scoring attribute in the dimension attributes of the user-object scoring, wherein the formula is as follows:
b represents the b-th user in the "user-item-score" dataset a, where there are d total users, R (b) is the score of user b for each item.
Step 4: all the obtained data are subjected to standardized processing, and the calculation mode is as follows:
where v denotes an original value of the data, v' denotes a value after normalization processing, min denotes a minimum value of the column in which v is located, and max denotes a maximum value of the column in which v is located.
Step 5: let "user-item-score" dataset A in original space have m sample points x 1 ,x 2 ,...,x m Sample point x i Is an l-dimensional vector, i is an integer from 1 to m, and the m samples are arranged in a matrix of columns to form X. Performing dimension reduction processing on the data set A by using popular learning LPP, wherein the dimension reduced data set B is a sample point y 1 ,y 2 ,...,y m Composition, sample point y i Is an n-dimensional vector, and the m samples are arranged in columns to form a matrix of Y, where l > n. The method comprises the following specific steps:
step 5.1: constructing a graph, calculating a sample x in a user-object-score data set A i And sample x j The euclidean distance of all samples except for, the formula is as follows:
wherein epsilon is a manually set threshold value, the average value of samples is generally taken, m is the total number of samples in the data set, and if the distance is smaller than a certain value epsilon, two samples are considered to be very close, and one side is established between a node i and a node j of the graph.
Step 5.2: determining weights, if the node i is connected with the node j, the weights of edges between the node i and the node j are calculated as follows by a nuclear heat function:
ω ij representing the weights, x, between node i and node j i And x j For the samples in the "user-item-score" dataset a, t is a real number greater than 0 set by human.
Step 5.3: the projection matrix is calculated and the formula for calculating the projection matrix is as follows.
XLX T a=λXDX T a
Let the solution in the formula be a 0 ,a 1 ,...,a l-1 And their corresponding eigenvalues λ are ordered from small to large, with the projective transformation matrix being c= (a) 0 ,a 1 ,...,a l-1 ) Then the sample point y after dimension reduction i =C T x i
Wherein the adjacency matrix W is formed by the weight omega in the second step ij The composition is formed. The main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, where the weighted degree of node i is the sum of the weights of all the edges associated with that node, i.e. the sum of each row of elements of the adjacency matrix W. The laplace matrix L is defined as l=d-W.
Step 6: the dimension-reduced data set B is processed according to 8: the ratio of 2 is divided into a training set Train and a Test set Test, wherein the data matrix corresponding to the training set Train is Y'.
Constructing a personalized recommendation model part:
step 7: an integrated learning GBDT method is adopted to establish a personalized recommendation model, a process schematic diagram is shown in figure 3, and the specific steps are as follows:
step 7.1: the GBDT model is defined as follows:
y 'is Y' mentioned in step 6, K represents the round of the score prediction learner, and K represents the total number of iterations of constructing the score prediction learner. f (f) k (Y') representsScore prediction learner of the kth round, h k (Y') represents the kth CART decision regression tree.
Step 7.2: constructing a CART decision regression tree, namely h (Y') in the step 7.1, wherein the specific steps are as follows:
step 7.21: dividing the preprocessed data set B into H 1 ,H 2 ,...H o The output value of each region is respectively: p is p 1 ,p 2 ,...,p o
Step 7.22: each region is recursively divided into two sub-regions and the output value on each sub-region is determined. According to the following formula, the optimal segmentation variable q and the segmentation point s are selected.
p 1 For the region H divided in step 7.21 1 Output of p 2 For the region H divided in step 7.21 2 Output of u v And w v And respectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the value of the vmax is the number of samples of the divided region. Traversing the variable q, scanning the fixed segmentation variable q for the segmentation point s, selecting the pair (q, s) that minimizes the above equation. The regions are divided by the selected pairs (q, s) and the corresponding output values are determined.
Step 7.23: steps 7.21 and 7.22 are continued to be invoked on both sub-areas until a stop condition is met.
Step 7.24: repartitioning the input space into o regions H' 1 ,H′ 2 ,...H′ o Generating a scoring prediction CART decision regression tree, wherein the formula is as follows:
h (u) is a partial prediction CART decision regression tree, H' v For the divided areas, O is denoted as the divided area index, and O is denoted as the total number of divided areas. P is p o In step 7.21The fixed output values of the divided regions, q 'and s', are the optimal solutions iterated through steps 7.21 and 7.22.
Step 7.3: the score prediction learner employs a forward step algorithm. The model of the kth step is formed by the model of the kth-1 step, namely the kth step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
f k (Y′)=f k-1 (Y′)+β k
f k (Y') is a kth round of score prediction learner, f k-1 (Y') is the k-1 th round of score prediction learner, beta k Representing the residual error produced by the kth wheel.
Step 7.4: and continuing iteration until the iteration is completed, and completing model establishment.
Optimization model part:
step 8: the GBDT model parameters are optimized by adopting a Bayesian method, and the specific steps are as follows:
step 8.1: initializing a data set D '= (x' 1 ,y′ 1 ),...,(x′ n ,y′ n ) Wherein y' i =f′(x′ i ) The method comprises the steps of carrying out a first treatment on the surface of the The objective function f '(x') is the mapping of dimension attributes in the data to scores.
Step 8.2: GBDT model uses selected super-ginseng combinations x' i Training and calculating f '(x' i );
Step 8.3: calculating the next super-ginseng combination to super-ginseng x 'by adopting an acquisition function' i+1
Step 8.4: repeating the steps 8.2 and 8.3, and iterating T' times.
Step 8.5: outputting the super-parametric combination of the optimized objective function f '(x').
Step 9: and (3) selecting an optimal super-parameter combination obtained through Bayesian optimization to retrain the GBDT personalized recommendation model.
Personalized recommendation part:
step 10: top N recommendation and effect verification are carried out according to the prediction result of the finally obtained personalized recommendation model on the Test set Test, and the specific steps are as follows:
step 10.1: setting an N value, namely recommending N articles to the user, and defining the number of the users as count.
Step 10.2: for each user, the real recommendation list generated on the Test set Test is marked as T (All), and scoring and prediction are carried out on the Test set Test according to the GBDT recommendation model completed by the Bayesian optimization, and the obtained result is defined as Test evaluation set.
Step 10.3: and grading and sorting the Test evaluation groups, recommending the first N articles to users, and marking the Top N recommendation list obtained by each user as T (Test).
Step 10.4: and verifying the accuracy and recall rate results of the test evaluation sets.
Step 10.5: the T (Test) length size is calculated.
Step 10.6: and calculates the T (All) length size.
Step 10.7: and calculating an intersection T (U) between the Top N recommendation list of each user and T (Test).
Step 10.8: calculating accuracy:and accumulating the accuracy rate generated by each user, and dividing the sum by count to obtain the average accuracy rate.
Step 10.9: calculating recall rate:and accumulating the recall rates generated by each user, and dividing the sum by count to obtain the average recall rate.
The technical characteristics form the embodiment of the invention, have stronger adaptability and implementation effect, and can increase or decrease unnecessary technical characteristics according to actual needs so as to meet the requirements of different situations.

Claims (7)

1. A personalized recommendation method based on ensemble learning, the method comprising:
step 1: analyzing dimension attributes of the personalized recommendation data, and dividing the personalized recommendation data into user-object-grading data; data association is performed on the associated user-item-scoring dimension;
step 2: after the processing is finished, analyzing the data type of each dimension attribute of the user-object-grading, and converting the data type into the data type required by the ensemble learning;
step 3: generating characteristic attributes according to scoring attributes in the dimension attributes of user-object scoring;
step 4: all the obtained data are subjected to standardized processing, and the calculation mode is as follows:
wherein v represents an original value of the data, v' represents a value after normalization processing, min represents a minimum value of a column in which v is located, and max represents a maximum value of a column in which v is located;
step 5: let "user-item-score" dataset A in original space have m sample points x 1 ,x 2 ,...,x m Sample point x i Is an l-dimensional vector, i is an integer from 1 to m, and the matrix formed by m samples according to columns is X; carrying out dimension reduction processing on the data set A by using a popular learning local reservation projection algorithm, wherein the dimension-reduced data set B is a sample point y 1 ,y 2 ,...,y m Composition, sample point y i Is an n-dimensional vector, the m samples are in a matrix of columns Y, where l > n;
step 6: the dimension-reduced data set B is processed according to 8:2 is divided into a training set Train and a Test set Test, wherein a data matrix corresponding to the training set Train is Y';
step 7: establishing a personalized recommendation model by adopting an integrated learning gradient lifting decision tree method;
step 8: optimizing the parameters of the gradient lifting decision tree model by adopting a Bayesian method;
step 9: selecting an optimal super-parameter combination retraining gradient lifting decision tree personalized recommendation model obtained through Bayesian optimization;
step 10: and (3) Top N recommendation and effect verification are carried out according to the prediction result of the finally obtained personalized recommendation model on the test set.
2. The personalized recommendation method based on ensemble learning according to claim 1, wherein: in step 3, the number of times of scoring the article by each user is counted, and the formula is as follows:
b represents the b-th user in the "user-item-scoring" dataset a, where there are d total users in dataset a, R (b) is the score of user b for each item, countRating refers to "sum of number of items per user comment".
3. The personalized recommendation method based on ensemble learning according to claim 1, wherein: the step 5 specifically includes the following steps:
step 5.1: constructing a graph, calculating a sample x in a user-object-score data set A i And sample x i All samples x except j The equation is as follows:
taking a mean value of samples, wherein epsilon is a manually set threshold value, m is the total number of samples in a data set, and if the Euclidean distance is smaller than the value epsilon, two samples are considered to be very close, and one side is established between a node i and a node j of the graph;
step 5.2: determining weights, if the node i is connected with the node j, the weights of edges between the node i and the node j are calculated as follows by a nuclear heat function:
ω ij representing the weights, x, between node i and node j i And x j For the samples in "user-item-score" dataset a, t is a real number greater than 0 set by human;
step 5.3: the projection matrix is calculated as follows:
XLX T a=λXDX T a (5)
let the solution in the formula be a 0 ,a 1 ,...,a l-1 And their corresponding eigenvalues λ are ordered from small to large, with the projective transformation matrix being c= (a) 0 ,a 1 ,...,a l-1 ) Then the sample point y after dimension reduction i =C T x i
Wherein X is the matrix X mentioned in step 5, and the adjacent matrix W is represented by the weight omega in step two ij Constructing; the main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, wherein the weighted degree of node i is the sum of the weights of all the edges associated with that node, i.e. the sum of each row of elements of the adjacency matrix W; the laplace matrix L is defined as l=d-W.
4. The personalized recommendation method based on ensemble learning according to claim 1, wherein: the step 7 comprises the following steps:
step 7.1: the gradient lifting decision tree model is defined as follows:
y 'is Y' mentioned in step 6, K represents the round of the score prediction learner, and K represents the total round of the score prediction learner; f (f) k (Y') represents the score prediction learner of the kth round, h k (Y') represents the kth classification regression decision tree;
step 7.2: constructing a classification regression decision tree, namely h (Y') in the step 7.1;
step 7.3: the scoring prediction learner adopts a forward step-by-step algorithm; the model of the kth step is formed by the model of the kth-1 step, namely the kth step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
f k (Y′)=f k-1 (Y′)+β k (7)
f k (Y') is a kth round of score prediction learner, f k-1 (Y') is the k-1 th round of score prediction learner, beta k Representing the residual error generated by the kth round;
step 7.4: and continuing iteration until the iteration is completed, and completing model establishment.
5. The personalized recommendation method based on ensemble learning according to claim 4, wherein: the step 7.2 includes the following steps:
step 7.21: dividing the preprocessed data set B into H 1 ,H 2 ,...H o The output value of each region is respectively: p is p 1 ,p 2 ,...,p o
Step 7.22: recursively dividing each region into two sub-regions and determining an output value on each sub-region; selecting an optimal segmentation variable q and a segmentation point s according to the following formula;
p 1 for the region H divided in step 7.21 1 Output of p 2 For the region H divided in step 7.21 2 Output of u v And w v Respectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the value of the vmax is the number of samples of the divided region; traversing the variable q, scanning the fixed segmentation variable q for a segmentation point s, and selecting a pair (q, s) enabling the upper expression to reach a minimum value; dividing the region with the selected pair (q, s) and determining a corresponding output value;
step 7.23: continuing to call the steps 7.21 and 7.22 for the two sub-areas until a stop condition is met;
step 7.24: repartitioning the input space into o regions H' 1 ,H′ 2 ,...H′ o Generating a scoring prediction classification regression decision tree with the following formula:
h (u) is a branch prediction classification regression decision tree, H' v For the divided areas, O is represented as a divided area subscript, and O is represented as the number of divided total areas; p is p o For a fixed output value of the region divided in step 7.21, q 'and s' are optimal solutions iterated through steps 7.21 and 7.22.
6. The personalized recommendation method based on ensemble learning according to claim 1, wherein: the step 8 includes the following steps:
step 8.1: initializing a data set D '= (x' 1 ,y′ 1 ),...,(x′ n ,y′ n ) Wherein y' i =f′(x′ i ) The method comprises the steps of carrying out a first treatment on the surface of the f '(x') is the mapping relation from the dimension attribute in the data to the score;
step 8.2: gradient-lifting decision tree model using selected superparameter combinations x' i Training and calculating f '(x' i );
Step 8.3: calculating the next super-ginseng combination to super-ginseng x 'by adopting an acquisition function' i+1
Step 8.4: repeating the steps 8.2 and 8.3, and iterating for T' times;
step 8.5: outputting the super-parametric combination of the optimized objective function f '(x').
7. The personalized recommendation method based on ensemble learning according to claim 1, wherein: the step 10 includes the following steps:
step 10.1: setting an N value, namely recommending N articles to users, and defining the number of the users as count;
step 10.2: for each user, the real recommendation list generated on the Test set Test is marked as T (All), and scoring and predicting are carried out on the Test set Test according to the gradient lifting decision tree recommendation model completed by the Bayesian optimization, and the obtained result is defined as Test evaluation set;
step 10.3: grading and sorting the Test evaluation sets, recommending the first N articles to users, and marking a Top N recommendation list obtained by each user as T (Test);
step 10.4: verifying the accuracy and recall rate results of the test evaluation groups;
step 10.5: calculating the length of T (Test);
step 10.6: calculating the length of T (All);
step 10.7: calculating an intersection T (U) between the Top N recommendation list of each user and T (Test);
step 10.8: calculating accuracy:accumulating the accuracy rate generated by each user, and dividing the sum by count to obtain average accuracy rate;
step 10.9: calculating recall rate:and accumulating the recall rates generated by each user, and dividing the sum by count to obtain the average recall rate.
CN202110629501.6A 2021-03-26 2021-06-07 Personalized recommendation method based on ensemble learning Active CN113326433B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021103231807 2021-03-26
CN202110323180 2021-03-26

Publications (2)

Publication Number Publication Date
CN113326433A CN113326433A (en) 2021-08-31
CN113326433B true CN113326433B (en) 2023-10-10

Family

ID=77419834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110629501.6A Active CN113326433B (en) 2021-03-26 2021-06-07 Personalized recommendation method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN113326433B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843928A (en) * 2016-03-28 2016-08-10 西安电子科技大学 Recommendation method based on double-layer matrix decomposition
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN110109902A (en) * 2019-03-18 2019-08-09 广东工业大学 A kind of electric business platform recommender system based on integrated learning approach
CN110297978A (en) * 2019-06-28 2019-10-01 四川金蜜信息技术有限公司 Personalized recommendation algorithm based on integrated recurrence
CN110348580A (en) * 2019-06-18 2019-10-18 第四范式(北京)技术有限公司 Construct the method, apparatus and prediction technique, device of GBDT model
WO2020233245A1 (en) * 2019-05-20 2020-11-26 山东科技大学 Method for bias tensor factorization with context feature auto-encoding based on regression tree
CN112183946A (en) * 2020-09-07 2021-01-05 腾讯音乐娱乐科技(深圳)有限公司 Multimedia content evaluation method, device and training method thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843928A (en) * 2016-03-28 2016-08-10 西安电子科技大学 Recommendation method based on double-layer matrix decomposition
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN110109902A (en) * 2019-03-18 2019-08-09 广东工业大学 A kind of electric business platform recommender system based on integrated learning approach
WO2020233245A1 (en) * 2019-05-20 2020-11-26 山东科技大学 Method for bias tensor factorization with context feature auto-encoding based on regression tree
CN110348580A (en) * 2019-06-18 2019-10-18 第四范式(北京)技术有限公司 Construct the method, apparatus and prediction technique, device of GBDT model
CN110297978A (en) * 2019-06-28 2019-10-01 四川金蜜信息技术有限公司 Personalized recommendation algorithm based on integrated recurrence
CN112183946A (en) * 2020-09-07 2021-01-05 腾讯音乐娱乐科技(深圳)有限公司 Multimedia content evaluation method, device and training method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
聂黎生 ; .基于行为分析的学习资源个性化推荐.计算机技术与发展.2020,(第07期),全文. *

Also Published As

Publication number Publication date
CN113326433A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
CN106021364B (en) Foundation, image searching method and the device of picture searching dependency prediction model
CN109740154A (en) A kind of online comment fine granularity sentiment analysis method based on multi-task learning
CN106919951B (en) Weak supervision bilinear deep learning method based on click and vision fusion
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN107683469A (en) A kind of product classification method and device based on deep learning
CN108520213B (en) Face beauty prediction method based on multi-scale depth
CN112364638B (en) Personality identification method based on social text
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN112115377A (en) Graph neural network link prediction recommendation method based on social relationship
CN108897791B (en) Image retrieval method based on depth convolution characteristics and semantic similarity measurement
CN109582782A (en) A kind of Text Clustering Method based on Weakly supervised deep learning
US20220319233A1 (en) Expression recognition method and apparatus, electronic device, and storage medium
CN110390017A (en) Target sentiment analysis method and system based on attention gate convolutional network
CN110717103B (en) Improved collaborative filtering method based on stack noise reduction encoder
CN110929848A (en) Training and tracking method based on multi-challenge perception learning model
CN111539444A (en) Gaussian mixture model method for modified mode recognition and statistical modeling
CN107918772A (en) Method for tracking target based on compressive sensing theory and gcForest
CN110110724A (en) The text authentication code recognition methods of function drive capsule neural network is squeezed based on exponential type
CN109815920A (en) Gesture identification method based on convolutional neural networks and confrontation convolutional neural networks
CN107491782A (en) Utilize the image classification method for a small amount of training data of semantic space information
Huang et al. Deep clustering based on embedded auto-encoder
CN106777359A (en) A kind of text services based on limited Boltzmann machine recommend method
CN113920516A (en) Calligraphy character skeleton matching method and system based on twin neural network
CN115408605A (en) Neural network recommendation method and system based on side information and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant