CN108650532B

CN108650532B - Cable television on-demand program recommendation method and system

Info

Publication number: CN108650532B
Application number: CN201810241067.2A
Authority: CN
Inventors: 王妍; 柴剑平; 李波; 冯熙; 殷复莲; 江茜; 檀雷雷; 韩晶晶
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2018-03-22
Filing date: 2018-03-22
Publication date: 2020-06-12
Anticipated expiration: 2038-03-22
Also published as: CN108650532A

Abstract

The invention provides a method and a system for recommending cable television on-demand programs, which comprises the following steps: collecting user viewing behavior data and program metadata; using a part of the viewing behavior data for training and a part of the viewing behavior data for testing; converting training audience rating behavior data into program scores of the user to form a user-program score matrix; standardizing program metadata; obtaining a plurality of program candidate sets by adopting a plurality of analysis methods according to the scoring matrix and the metadata; and performing weighted combination on the program candidate sets to be recommended, judging the accuracy or/and recall rate of various weighted combinations according to the test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result. The recommendation method and the recommendation system realize the personalized recommendation for the user and improve the recommendation precision and efficiency.

Description

Cable television on-demand program recommendation method and system

Technical Field

The invention relates to the technical field of cable televisions, in particular to a method and a system for recommending on-demand programs of a cable television.

Background

The recommendation system is often used by users to solve the problem of information overload, and personalized services are provided for the users. The existing recommendation methods mainly comprise two categories, namely a collaborative filtering method and a content-based recommendation method, wherein the collaborative filtering is most widely applied. Specifically, the collaborative filtering method is roughly classified into a memory-based method represented by a neighbor recommendation based on user/item similarity and a model-based method represented by a recommendation based on matrix decomposition.

In the big data era, the user behavior data show a massive growth trend, and the sparsity problem of a recommendation system is increasingly highlighted.

The sparsity problem means that the number of users and items in the system is very large, and the overlap of behaviors among users is very small. And, data sparsity is defined as the number of existing actions of the user on the project as a percentage of all possible actions. Existing solutions to the sparsity problem include: the diffusion method is promoted from first-order correlation to second-order correlation and high-order correlation; adding a default scoring method; an iterative optimization method; transfer similarity methods, and the like.

In addition, a single recommendation method often does not achieve the desired results.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method and a system for recommending cable tv on-demand programs, which implement personalized recommendation for users and improve recommendation accuracy and efficiency.

According to one aspect of the present invention, there is provided a cable tv on-demand program recommendation system, comprising: the system comprises a collecting part and a display part, wherein the collecting part comprises a first collecting unit and a second collecting unit, the first collecting unit collects the viewing behavior data of a cable television user, and the second collecting unit crawls the metadata of the online program; the classification part takes one part of the audience behavior data acquired by the first acquisition unit as training audience behavior data to form a training set, and takes the other part of the audience behavior data as testing audience behavior data to form a testing set; the data preprocessing part is used for converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, and the scores of each user for each program form a user-program scoring matrix; standardizing the metadata acquired by the second acquisition unit; the program candidate set obtaining part comprises a first analysis module, a second analysis module, a third analysis module and a fourth analysis module, wherein the first analysis module adopts a matrix decomposition method to decompose a user-program scoring matrix, and generates a first program candidate set C1 to be recommended according to the values of elements in a low-rank matrix; the second analysis module adopts a matrix decomposition method to decompose a scoring matrix of the user-program, calculates the user similarity and the movie program similarity, and generates a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; the third analysis module calculates the user similarity and the program similarity according to the user-program scoring matrix, and generates a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; the fourth analysis module calculates the user similarity and the movie program similarity according to the movie metadata, and generates a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model; and a recommendation list generating unit which performs weighted combination on a plurality of program candidate sets to be recommended of the program candidate set obtaining unit according to a plurality of strategies or performs weighted combination on different similarity calculation methods of different program candidate set obtaining units by utilizing a machine learning concept, judges the accuracy or/and recall rate of each weighted combination according to the test set separated by the classification unit, and generates a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.

The cable television on-demand program recommendation system, wherein the data preprocessing part comprises: the first data cleaning module is used for cleaning the training audience rating behavior data of the training set; the second data cleaning module is used for cleaning the metadata data collected by the second collection unit; the conversion module is used for converting the cleaned training audience behavior data and the metadata, and comprises a screening unit which is used for screening users and programs and removing inactive users and cold programs; the audience rating behavior conversion unit is used for converting training audience rating behavior data of the user into scores of the user on the programs; a score transformation unit which transforms the score into an integer with a value of 0 or 1 according to a rounding method; the scoring matrix construction unit is used for forming a user-program scoring matrix by scoring each program by each user through the scoring transformation unit; a metadata processing unit that preprocesses variables of metadata of a program, the preprocessing including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.

The cable television on-demand program recommendation system, wherein the first data cleaning module comprises: the first judging unit is used for judging whether the initial time of the training audience rating behavior data of the same user is the same or not and sending the training audience rating behavior data with the same initial time of the same user to the first screening unit; the first screening unit is used for selecting the training audience rating behavior data with the large end time and deleting the rest training audience rating behavior data; the second judging unit judges whether the end time of the training audience behavior data of the same user is the same or not, and sends the training audience behavior data with the same end time of the same user to the second screening unit; the second screening unit is used for selecting the training audience rating behavior data with small starting time and deleting the rest training audience rating behavior data; the sequencing unit is used for sequencing the audience rating behavior data of the training users in a descending order according to the users and the starting time; the third judging unit judges whether the front and back training audience rating behavior data of the same user arranged by the sorting unit are overlapped in audience rating recording time, and sends the overlapped training audience rating behavior data to the third screening unit; and the third screening unit deletes the training audience behavior data with the later sequence in the overlapped training audience behavior data.

The cable television on-demand program recommendation system, wherein the second data cleansing module comprises: the editing distance obtaining unit is used for calculating the editing distance between the original on-demand program name and the crawled program name; the fourth judging unit is used for judging whether the editing distance is greater than a set threshold value or not, and sending a signal to the fourth screening unit when the editing distance is greater than the set threshold value; and the fourth screening unit deletes the metadata of the crawled program with the editing distance larger than the set threshold value.

The system for recommending cable television on-demand programs, wherein the program candidate set obtaining part further comprises: the similarity obtaining module calculates the user similarity and the program similarity, and comprises the following steps:

a similarity model construction unit that constructs a similarity model according to a similarity calculation method including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,

wherein pearson_ijPearson's correlation coefficient for program i and program j; u (i) represents a set of users that score program i, r_uiIndicating the rating of program i by user u,

representing the average rating of all users for program i;

constructing a second similarity model according to the following formula (2) by using the cosine similarity,

wherein, cosine_ijThe cosine similarity of the program i and the program j;

constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,

wherein the jaccard_pqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;

the neighbor set determining unit determines a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;

a neighbor score determining unit for determining the prediction scores of the programs in the neighbor set of each program for different users according to the following formula (4)

Wherein the content of the first and second substances,

is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, S^k(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;

and the program candidate set determining unit selects a set number of programs as the program candidate set of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.

The cable tv on-demand program recommendation system, wherein the recommendation list generating part includes a weighted combination unit, an accuracy calculating unit or/and a recall ratio calculating unit, and a recommendation list generating unit, wherein:

the weighted combination unit is used for carrying out weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition part according to a plurality of strategies or carrying out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea;

an accuracy calculating unit for calculating the accuracy of each weighted combination according to the following formula (5) based on the test viewing behavior data of the test set,

wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;

a recall ratio calculating unit for calculating recall ratios of various weighted combinations according to the following formula (6) based on the test audience behavior data of the test set,

wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;

and a recommendation list generation unit which generates a recommendation list by using a weighted combination with high accuracy or/and high recall rate as a recommendation result.

According to another aspect of the present invention, there is provided a cable tv on-demand program recommendation method, including: step S1, collecting the viewing behavior data of cable TV users, and crawling the metadata of the online programs; step S2, using one part of the viewing behavior data as training viewing behavior data to form a training set, and using the other part as testing viewing behavior data to form a testing set; step S3, converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating duration of the users for the programs to the broadcasting duration of the programs, and the scores of each user for each program form a user-program score matrix; step S4, standardizing metadata of the program; step S5, obtaining a plurality of program candidate sets using a plurality of analysis methods according to the user-program scoring matrix and the normalized metadata, the analysis methods including two or more of the following methods: decomposing a user-program scoring matrix by adopting a matrix decomposition method, and generating a first program candidate set C1 to be recommended according to the values of elements in the low-rank matrix; decomposing a scoring matrix of the user-program by adopting a matrix decomposition method, calculating user similarity and movie program similarity, and generating a second program candidate set C2 to be recommended by using a neighborhood recommendation model; calculating user similarity and program similarity according to the user-program scoring matrix, and generating a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; calculating user similarity and movie program similarity according to movie metadata, and generating a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model; and step S6, performing weighted combination on the program candidate sets to be recommended according to a plurality of strategies or performing weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning concept, judging the accuracy or/and recall rate of various weighted combinations according to the test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.

The method for recommending cable tv on-demand programs, wherein step S3 includes: cleaning training audience behavior data of a user; screening users and programs to remove inactive users and cold programs; converting training audience rating behavior data of the user into scores of the user on programs; and converting the scores into integers with the values of 0 or 1 according to a rounding method to form a user-score matrix of each user for each program.

The method for recommending the cable television on-demand programs comprises the following steps of: judging whether the starting time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with the larger ending time, and deleting the rest training audience rating behavior data; judging whether the end time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with small starting time, and deleting the rest training audience rating behavior data; arranging training audience rating behavior data of the users in descending order according to the users and the starting time; and judging whether the front and back training audience rating behavior data of the same user are overlapped in audience rating recording time, and if so, deleting the training audience rating behavior data with the later sequence in the overlapped training audience rating behavior data.

The method for recommending cable tv on-demand programs, wherein step S4 includes: cleaning the metadata of the program; preprocessing variables of metadata of the cleaned program, including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.

The method for recommending the cable television on-demand programs comprises the following steps of: calculating the edit distance between the original on-demand program name and the crawled program name; and judging whether the editing distance is greater than a set threshold value or not, and deleting the metadata of the crawled program of which the editing distance is greater than the set threshold value.

In step S5, the method for calculating the user similarity and the movie program similarity and generating a candidate set of programs to be recommended by using a neighborhood recommendation model includes:

constructing a similarity model according to a similarity algorithm including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,

wherein pearson_ijPearson's correlation coefficient for program i and program j; representing a set of users who rate a program, U (i) representing a set of users who rate a program, r_uiIndicating the rating of program i by user u,

representing the average rating of all users for program i;

wherein, cosine_ijThe cosine similarity of the program i and the program j;

determining a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;

determining the prediction scores of different users for the programs in the neighbor set of each program according to the following formula (4)

Wherein the content of the first and second substances,

and selecting a set number of programs as program candidate sets of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.

The method for recommending cable television on-demand programs, wherein the step S6 includes a weighted combination step, an accuracy calculation step or/and a recall ratio calculation step, and a recommendation list generation step, wherein:

a weighted combination step, which is to carry out weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition part according to a plurality of strategies or carry out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea;

an accuracy calculation step of calculating the accuracy of each weighted combination according to the test viewing behavior data of the test set and the following formula (5),

a recall ratio calculation step of calculating recall ratios of various weighted combinations according to the test audience behavior data of the test set and the following formula (6),

and a recommendation list generation step of generating a recommendation list by using a weighted combination with high accuracy or/and recall as a recommendation result.

The method and the system for recommending the cable television on-demand programs are characterized in that a plurality of matrix decomposition methods are used for comparison and mixing so as to ensure certain recommendation precision and efficiency, and a personalized movie recommendation list is generated for a user. The method and the system for recommending the cable television on-demand programs can help network operators to provide targeted services for users and improve the on-demand experience of the users.

Drawings

Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 is a block diagram of a cable TV on-demand program recommendation system according to the present invention;

fig. 2 is a flowchart of a method for recommending cable tv on-demand programs according to the present invention.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a block diagram of a cable tv on-demand program recommendation system according to the present invention, and as shown in fig. 1, the cable tv on-demand program recommendation system includes:

the acquisition part 1 comprises a first acquisition unit 11 and a second acquisition unit 12, wherein the first acquisition unit 11 acquires viewing behavior data of cable television users, the viewing behavior data comprises viewing behaviors of users on television programs such as viewing starting time, viewing ending time, viewing duration, rating, evaluation and the like, and the second acquisition unit 12 crawls metadata of the online programs, wherein the metadata comprises program names, directors, leaders, actors, countries, ages, regions, types, durations, ratings, ticket houses and the like;

the classification part 2 is used for forming a training set by taking one part of the audience behavior data acquired by the first acquisition unit as training audience behavior data and forming a test set by taking the other part of the audience behavior data as test audience behavior data;

the data preprocessing part 3 is used for converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, and the scores of each user for each program form a user-program scoring matrix; normalizing the metadata acquired by the second acquisition unit, for example, performing data normalization, that is, mapping the metadata onto a [0, 1] interval uniformly, for example, min-max normalization (dispersion normalization), which is a linear transformation on the original metadata, so that the result falls into the [0, 1] interval;

the program candidate set obtaining part 4 is configured to obtain a plurality of program candidate sets by using a plurality of analysis methods according to the user-program scoring matrix and the standardized metadata, and includes a first analysis module 41, a second analysis module 42, a third analysis module 43, and a fourth analysis module 44, where the first analysis module 41 decomposes the user-program scoring matrix by using a matrix decomposition method, and generates a first program candidate set C1 to be recommended according to values of elements in the low-rank matrix; the second analysis module 42 adopts a matrix decomposition method to decompose the scoring matrix of the user-program, calculates the user similarity and the movie program similarity, and generates a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; the third analysis module 43 calculates the user similarity and the program similarity according to the user-program scoring matrix, and generates a third program candidate set C3 to be recommended by using a neighborhood recommendation model; the fourth analysis module 44 calculates user similarity and movie program similarity according to the movie metadata, and generates a fourth program candidate set C4 to be recommended by using a neighborhood recommendation model;

and a recommendation list generating part 5 for performing weighted combination on the plurality of program candidate sets to be recommended of the program candidate set obtaining part 4 according to a plurality of strategies or performing weighted combination on different similarity calculation methods of different program candidate set obtaining parts by utilizing a machine learning concept, judging the accuracy or/and the recall ratio of each weighted combination according to the test set separated by the classification part, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall ratio as a recommendation result, wherein the recommendation list comprises a user list, a program ordered list, a similar user list and a similar program list.

In an embodiment of the present invention, the data preprocessing unit 3 removes invalid viewing records, such as a record of no viewing behavior, an abnormal viewing record (e.g., an extreme viewing behavior such as a viewing behavior that is always on), from the viewing behavior data, matches the metadata, determines and removes inconsistent information, for example, stores the viewing behavior data of the user in a viewing library, stores the broadcast data of the program in a broadcast library, removes information that is inconsistent with information in the broadcast library, the viewing library, or/and the program list, and converts the valid viewing behavior data into an appropriate form, specifically, the method includes:

the first data cleaning module 31 is used for cleaning the training audience behavior data of the training set;

the second data cleaning module 32 is used for cleaning the metadata data collected by the second collection unit;

the conversion module 33 is configured to convert the cleaned training audience behavior data and metadata, and includes a screening unit 331 configured to screen users and programs to remove inactive users and cold programs; the viewing behavior conversion unit 332 converts training viewing behavior data of the user into a score of the user for the program; a score conversion unit 333 that converts the score into an integer whose value is 0 or 1 according to a rounding method; the scoring matrix construction unit 334 is used for forming a user-program scoring matrix by scoring each program by each user through the scoring transformation unit; a metadata processing unit 335 that preprocesses variables of metadata of the program, the preprocessing including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute, preferably, manually classifying the variable of the character attribute, for example, classifying the type of the movie, and classifying the movie into types of action, adventure, science fiction, and the like, wherein the reason for adopting the manual classification is to worry that the quality of the crawler data is not high.

The data preprocessing part 3 can improve the recommendation precision and efficiency by collecting and cleaning the program data requested by the user.

Preferably, the first data cleansing module 31 comprises:

the first judging unit 311 is configured to judge whether the starting times of the training audience behavior data of the same user are the same, and send the training audience behavior data with the same starting time of the same user to the first screening unit;

the first screening unit 312 selects the training audience behavior data with a long end time, and deletes the rest training audience behavior data;

a second judging unit 313, which judges whether the end time of the training audience behavior data of the same user is the same, and sends the training audience behavior data with the same end time of the same user to the second screening unit;

the second screening unit 314 selects the training audience behavior data with a small starting time, and deletes the rest training audience behavior data;

the sorting unit 315, sorting the audience rating behavior data of the training users in descending order according to the users and the starting time;

a third judging unit 316, configured to judge whether two pieces of training audience behavior data of the same user arranged in the sorting unit overlap each other in audience recording time, and send the overlapped training audience behavior data to a third screening unit;

the third filtering unit 317 deletes the training audience behavior data in the overlapped training audience behavior data after the sequence.

In addition, preferably, the second data cleansing module 32 includes:

an edit distance obtaining unit 321 that calculates an edit distance between the original on-demand program name and the crawled program name;

a fourth determining unit 322, configured to determine whether the edit distance is greater than a set threshold, and send a signal to a fourth screening unit when the edit distance is greater than the set threshold;

the fourth filtering unit 323 deletes metadata of the crawled program whose edit distance is greater than a set threshold.

In one embodiment of the present invention, the program candidate set obtaining section 4 further includes:

the similarity obtaining module 45 calculates the user similarity and the program similarity, and includes:

the similarity model constructing unit 451 constructs a similarity model according to a similarity calculation method including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,

representing the average rating of all users for program i;

wherein, cosine_ijThe cosine similarity of the program i and the program j;

a neighbor set determining unit 452 that determines a neighbor set of each program according to the similarity between programs and the similarity between users using a neighborhood recommendation model;

the neighborhood score determination unit 453 determines the prediction scores of different users for the programs in the neighborhood set of each program according to the following equation (4)

Wherein the content of the first and second substances,

the program candidate set determining unit 454 selects a set number of programs as the program candidate set of the user according to the prediction scores of the neighbor sets of the programs by the user in the order of the high or low prediction scores.

In one embodiment of the present invention, the recommendation list generating section 5 includes a weighted combination unit 51, an accuracy calculating unit 52 or/and recall ratio calculating unit 53, and a recommendation list generating unit 54, in which:

a weighted combination unit 51, which performs weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition parts according to a plurality of strategies or performs weighted combination on different similarity calculation methods of different program candidate set acquisition parts by using a machine learning idea;

an accuracy calculating unit 52 for calculating the accuracy of each weighted combination according to the following equation (5) based on the test viewing behavior data of the test set,

the recall ratio calculating unit 53 calculates the recall ratio of each weighted combination according to the following equation (6) based on the test viewing behavior data of the test set,

the recommendation list generation unit 54 generates a recommendation list by using a weighted combination with high accuracy and/or recall as a recommendation result.

Fig. 2 is a flowchart of a method for recommending a cable tv on-demand program according to the present invention, and as shown in fig. 2, the method for recommending a cable tv on-demand program includes:

step S1, collecting the viewing behavior data of cable TV users, and crawling the metadata of the online programs;

step S2, using one part of the viewing behavior data as training viewing behavior data to form a training set, and using the other part as testing viewing behavior data to form a testing set;

step S3, converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, namely

The scores of each user for each program form a user-program score matrix;

step S4, standardizing metadata of the program;

step S5, obtaining a plurality of program candidate sets using a plurality of analysis methods according to the user-program scoring matrix and the normalized metadata, the analysis methods including two or more of the following methods: decomposing a user-program scoring matrix by adopting a matrix decomposition method, and generating a first program candidate set C1 to be recommended according to the values of elements in the low-rank matrix; decomposing a scoring matrix of the user-program by adopting a matrix decomposition method, calculating user similarity and movie program similarity (for example, calculating the user similarity or the movie program similarity by using a Pearson correlation coefficient, cosine similarity, reciprocal squared distance similarity, Jaccard similarity and the like), and generating a second program candidate set C2 to be recommended by using a neighborhood recommendation model; calculating user similarity and program similarity according to the user-program scoring matrix, and generating a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; calculating user similarity and movie program similarity according to movie metadata, and generating a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model;

step S6, carrying out weighted combination on a plurality of program candidate sets to be recommended according to a plurality of strategies or carrying out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea, judging the accuracy or/and recall rate of various weighted combinations according to a test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result, wherein the recommendation list comprises a user, a program ordered list, a similar user list and a similar program list.

In one embodiment of the present invention, step S3 includes:

step S31, cleaning the training audience behavior data of the user;

step S32, filtering the users and programs to remove the inactive users and cold programs, for example, the inactive users may be users with a small number of on-demand programs or/and a short duration, or the users may be ranked according to the number of on-demand programs, and finally set a percentage (e.g., 5%) of users; the cold program can be a program with less requested times and shorter requested time, or the programs can be sorted according to the requested times, and finally, a percentage (for example, 5%) of the programs is set;

step S33, converting the training audience behavior data of the user into the program rating of the user;

and step S34, converting the scores into integers with the value of 0 or 1 according to a rounding method, and forming a user-score matrix of each user for each program.

Preferably, step S31 includes:

judging whether the starting time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with the larger ending time, and deleting the rest training audience rating behavior data;

judging whether the end time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with small starting time, and deleting the rest training audience rating behavior data;

arranging training audience rating behavior data of the users in descending order according to the users and the starting time;

and judging whether the front and back training audience rating behavior data of the same user are overlapped in audience rating recording time, and if so, deleting the training audience rating behavior data with the later sequence in the overlapped training audience rating behavior data.

In one embodiment of the present invention, step S4 includes:

step S41, cleaning the metadata of the program;

step S42 is to pre-process the metadata variables of the cleaned program, including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.

Preferably, step S41 includes:

calculating the edit distance between the original on-demand program name and the crawled program name;

and judging whether the editing distance is greater than a set threshold value or not, and deleting the metadata of the crawled program of which the editing distance is greater than the set threshold value.

In an embodiment of the present invention, in step S5, the method for calculating the user similarity and the movie program similarity and generating the candidate set of programs to be recommended by using the neighborhood recommendation model includes:

representing the average rating of all users for program i;

wherein, cosine_ijThe cosine similarity of the program i and the program j;

Wherein the content of the first and second substances,

In one embodiment of the present invention, step S6 includes a weighted combination step, an accuracy calculation step or/and a recall calculation step, and a recommendation list generation step, wherein:

a weighted combination step, wherein a plurality of program candidate sets to be recommended of the program candidate set acquisition part are weighted and combined according to a plurality of strategies (such as intersection, union set, weighting and other strategies) or different similarity calculation methods of different program candidate set acquisition parts are weighted and combined by utilizing a machine learning idea;

Preferably, step S6 further includes: for a user with an empty movie candidate set to be recommended, selecting movies with popular programs and program evaluation as recommendation results, wherein the movies can be sorted from long to short according to the on-demand duration, and taking a set number of programs in the front of the sorting as programs; the program evaluation means that the comprehensive score of the program is calculated according to the box office, the score, the number of times of winning the prize and the number of times of playing, and the program with high score is the program with good program evaluation.

The above contents show various embodiments of the cable tv on-demand program recommendation method and system of the present invention, but the present invention is not limited thereto, for example:

considering the difference of different user scoring scales, constructing a second similarity model according to the following formula (7) by utilizing cosine similarity,

wherein, cosine _ advanced_ijThe cosine similarity of the program i and the program j;

as another example, the predictive scores of different users for programs in the neighbor set of programs are determined according to equation (8) below, taking into account the impact of global user behavior

Wherein, b_uiIs an offset term, b_ui＝b+b_u+b_iB represents the mean of all user scores, b_uRepresenting the deviation of the average score of user u from the global score, b_iRepresenting the deviation of the average score of program i from the global score.

The method and the system for recommending the cable television on-demand programs can judge the degree of the user interested in the unviewed programs through effective data processing and analysis, improve the program delivery efficiency and achieve the aims of precise marketing and personalized service.

The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention is defined by the claims.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A cable television on demand program recommendation system, comprising:

the system comprises a collecting part and a display part, wherein the collecting part comprises a first collecting unit and a second collecting unit, the first collecting unit collects the viewing behavior data of a cable television user, and the second collecting unit crawls the metadata of the online program;

the classification part takes one part of the audience behavior data acquired by the first acquisition unit as training audience behavior data to form a training set, and takes the other part of the audience behavior data as testing audience behavior data to form a testing set;

the data preprocessing part is used for converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, and the scores of each user for each program form a user-program scoring matrix; standardizing the metadata acquired by the second acquisition unit;

the program candidate set obtaining part comprises a first analysis module, a second analysis module, a third analysis module and a fourth analysis module, wherein the first analysis module adopts a matrix decomposition method to decompose a user-program scoring matrix, and generates a first program candidate set C1 to be recommended according to the values of elements in a low-rank matrix; the second analysis module adopts a matrix decomposition method to decompose a scoring matrix of the user-program, calculates the user similarity and the program similarity, and generates a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; the third analysis module calculates the user similarity and the program similarity according to the user-program scoring matrix, and generates a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; the fourth analysis module calculates the user similarity and the program similarity according to the metadata, and generates a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model;

and a recommendation list generating unit which performs weighted combination on a plurality of program candidate sets to be recommended of the program candidate set obtaining unit according to a plurality of strategies or performs weighted combination on different similarity calculation methods of different program candidate set obtaining units by utilizing a machine learning concept, judges the accuracy or/and recall rate of each weighted combination according to the test set separated by the classification unit, and generates a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.

2. The cable tv on-demand program recommendation system according to claim 1, wherein the data preprocessing section comprises:

the first data cleaning module is used for cleaning the training audience rating behavior data of the training set;

the second data cleaning module is used for cleaning the metadata data collected by the second collection unit;

the conversion module is used for converting the cleaned training audience behavior data and the metadata, and comprises a screening unit which is used for screening users and programs and removing inactive users and cold programs; the audience rating behavior conversion unit is used for converting training audience rating behavior data of the user into scores of the user on the programs; a score transformation unit which transforms the score into an integer with a value of 0 or 1 according to a rounding method; the scoring matrix construction unit is used for forming a user-program scoring matrix by scoring each program by each user through the scoring transformation unit; a metadata processing unit that preprocesses variables of metadata of a program, the preprocessing including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.

3. The cable tv on-demand program recommendation system of claim 2, wherein the first data cleansing module comprises:

the first judging unit is used for judging whether the initial time of the training audience rating behavior data of the same user is the same or not and sending the training audience rating behavior data with the same initial time of the same user to the first screening unit;

the first screening unit is used for selecting the training audience rating behavior data with the large end time and deleting the rest training audience rating behavior data;

the second judging unit judges whether the end time of the training audience behavior data of the same user is the same or not, and sends the training audience behavior data with the same end time of the same user to the second screening unit;

the second screening unit is used for selecting the training audience rating behavior data with small starting time and deleting the rest training audience rating behavior data;

the sequencing unit is used for sequencing the audience rating behavior data of the training users in a descending order according to the users and the starting time;

the third judging unit judges whether the front and back training audience rating behavior data of the same user arranged by the sorting unit are overlapped in audience rating recording time, and sends the overlapped training audience rating behavior data to the third screening unit;

and the third screening unit deletes the training audience behavior data with the later sequence in the overlapped training audience behavior data.

4. The cable tv on-demand program recommendation system of claim 2, wherein the second data cleansing module comprises:

the editing distance obtaining unit is used for calculating the editing distance between the original on-demand program name and the crawled program name;

the fourth judging unit is used for judging whether the editing distance is greater than a set threshold value or not, and sending a signal to the fourth screening unit when the editing distance is greater than the set threshold value;

and the fourth screening unit deletes the metadata of the crawled program with the editing distance larger than the set threshold value.

5. The cable tv on-demand program recommendation system according to claim 1, wherein the program candidate set obtaining section further comprises:

the similarity obtaining module calculates the user similarity and the program similarity, and comprises the following steps:

representing the average rating of all users for program i;

wherein, cosine_ijThe cosine similarity of the program i and the program j;

Wherein the content of the first and second substances,

6. The cable tv-on-demand program recommendation system according to claim 1, wherein the recommendation list generation section includes a weighted combination unit, an accuracy calculation unit or/and a recall ratio calculation unit, and a recommendation list generation unit, wherein:

7. A method for recommending cable TV on-demand programs is characterized by comprising the following steps:

step S3, converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating duration of the users for the programs to the broadcasting duration of the programs, and the scores of each user for each program form a user-program score matrix;

step S4, standardizing metadata of the program;

step S5, obtaining a plurality of program candidate sets using a plurality of analysis methods according to the user-program scoring matrix and the normalized metadata, the analysis methods including two or more of the following methods: decomposing a user-program scoring matrix by adopting a matrix decomposition method, and generating a first program candidate set C1 to be recommended according to the values of elements in the low-rank matrix; decomposing a scoring matrix of the user-program by adopting a matrix decomposition method, calculating user similarity and program similarity, and generating a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; calculating user similarity and program similarity according to the user-program scoring matrix, and generating a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; calculating user similarity and program similarity according to the metadata, and generating a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model;

and step S6, performing weighted combination on the program candidate sets to be recommended according to a plurality of strategies or performing weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning concept, judging the accuracy or/and recall rate of various weighted combinations according to the test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.

8. The cable tv on-demand program recommendation method according to claim 7, wherein the step S3 comprises:

cleaning training audience behavior data of a user;

screening users and programs to remove inactive users and cold programs;

converting training audience rating behavior data of the user into scores of the user on programs;

and converting the scores into integers with the values of 0 or 1 according to a rounding method to form a user-score matrix of each user for each program.

9. The cable tv on-demand program recommendation method of claim 8, wherein the method of cleansing trained viewing behavior data of the user comprises:

10. The cable tv on-demand program recommendation method according to claim 7, wherein the step S4 comprises:

cleaning the metadata of the program;

preprocessing variables of metadata of the cleaned program, including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.

11. The cable tv on-demand program recommendation method of claim 10, wherein the method of cleansing metadata of a program comprises:

12. The method of claim 7, wherein in step S5, the user similarity and the program similarity are calculated, and the method of generating the candidate set of programs to be recommended using the neighborhood recommendation model includes:

representing the average rating of all users for program i;

wherein, cosine_ijThe cosine similarity of the program i and the program j;

Wherein the content of the first and second substances,

13. The cable tv on-demand program recommendation method according to claim 7, wherein the step S6 comprises a weighted combination step, an accuracy calculation step or/and a recall ratio calculation step and a recommendation list generation step, wherein: