CN108650532B - Cable television on-demand program recommendation method and system - Google Patents

Cable television on-demand program recommendation method and system Download PDF

Info

Publication number
CN108650532B
CN108650532B CN201810241067.2A CN201810241067A CN108650532B CN 108650532 B CN108650532 B CN 108650532B CN 201810241067 A CN201810241067 A CN 201810241067A CN 108650532 B CN108650532 B CN 108650532B
Authority
CN
China
Prior art keywords
program
user
similarity
behavior data
programs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810241067.2A
Other languages
Chinese (zh)
Other versions
CN108650532A (en
Inventor
王妍
柴剑平
李波
冯熙
殷复莲
江茜
檀雷雷
韩晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201810241067.2A priority Critical patent/CN108650532B/en
Publication of CN108650532A publication Critical patent/CN108650532A/en
Application granted granted Critical
Publication of CN108650532B publication Critical patent/CN108650532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/251Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/252Processing of multiple end-users' preferences to derive collaborative data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4665Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4667Processing of monitored end-user data, e.g. trend analysis based on the log file of viewer selections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4826End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Computing Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention provides a method and a system for recommending cable television on-demand programs, which comprises the following steps: collecting user viewing behavior data and program metadata; using a part of the viewing behavior data for training and a part of the viewing behavior data for testing; converting training audience rating behavior data into program scores of the user to form a user-program score matrix; standardizing program metadata; obtaining a plurality of program candidate sets by adopting a plurality of analysis methods according to the scoring matrix and the metadata; and performing weighted combination on the program candidate sets to be recommended, judging the accuracy or/and recall rate of various weighted combinations according to the test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result. The recommendation method and the recommendation system realize the personalized recommendation for the user and improve the recommendation precision and efficiency.

Description

Cable television on-demand program recommendation method and system
Technical Field
The invention relates to the technical field of cable televisions, in particular to a method and a system for recommending on-demand programs of a cable television.
Background
The recommendation system is often used by users to solve the problem of information overload, and personalized services are provided for the users. The existing recommendation methods mainly comprise two categories, namely a collaborative filtering method and a content-based recommendation method, wherein the collaborative filtering is most widely applied. Specifically, the collaborative filtering method is roughly classified into a memory-based method represented by a neighbor recommendation based on user/item similarity and a model-based method represented by a recommendation based on matrix decomposition.
In the big data era, the user behavior data show a massive growth trend, and the sparsity problem of a recommendation system is increasingly highlighted.
The sparsity problem means that the number of users and items in the system is very large, and the overlap of behaviors among users is very small. And, data sparsity is defined as the number of existing actions of the user on the project as a percentage of all possible actions. Existing solutions to the sparsity problem include: the diffusion method is promoted from first-order correlation to second-order correlation and high-order correlation; adding a default scoring method; an iterative optimization method; transfer similarity methods, and the like.
In addition, a single recommendation method often does not achieve the desired results.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a method and a system for recommending cable tv on-demand programs, which implement personalized recommendation for users and improve recommendation accuracy and efficiency.
According to one aspect of the present invention, there is provided a cable tv on-demand program recommendation system, comprising: the system comprises a collecting part and a display part, wherein the collecting part comprises a first collecting unit and a second collecting unit, the first collecting unit collects the viewing behavior data of a cable television user, and the second collecting unit crawls the metadata of the online program; the classification part takes one part of the audience behavior data acquired by the first acquisition unit as training audience behavior data to form a training set, and takes the other part of the audience behavior data as testing audience behavior data to form a testing set; the data preprocessing part is used for converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, and the scores of each user for each program form a user-program scoring matrix; standardizing the metadata acquired by the second acquisition unit; the program candidate set obtaining part comprises a first analysis module, a second analysis module, a third analysis module and a fourth analysis module, wherein the first analysis module adopts a matrix decomposition method to decompose a user-program scoring matrix, and generates a first program candidate set C1 to be recommended according to the values of elements in a low-rank matrix; the second analysis module adopts a matrix decomposition method to decompose a scoring matrix of the user-program, calculates the user similarity and the movie program similarity, and generates a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; the third analysis module calculates the user similarity and the program similarity according to the user-program scoring matrix, and generates a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; the fourth analysis module calculates the user similarity and the movie program similarity according to the movie metadata, and generates a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model; and a recommendation list generating unit which performs weighted combination on a plurality of program candidate sets to be recommended of the program candidate set obtaining unit according to a plurality of strategies or performs weighted combination on different similarity calculation methods of different program candidate set obtaining units by utilizing a machine learning concept, judges the accuracy or/and recall rate of each weighted combination according to the test set separated by the classification unit, and generates a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.
The cable television on-demand program recommendation system, wherein the data preprocessing part comprises: the first data cleaning module is used for cleaning the training audience rating behavior data of the training set; the second data cleaning module is used for cleaning the metadata data collected by the second collection unit; the conversion module is used for converting the cleaned training audience behavior data and the metadata, and comprises a screening unit which is used for screening users and programs and removing inactive users and cold programs; the audience rating behavior conversion unit is used for converting training audience rating behavior data of the user into scores of the user on the programs; a score transformation unit which transforms the score into an integer with a value of 0 or 1 according to a rounding method; the scoring matrix construction unit is used for forming a user-program scoring matrix by scoring each program by each user through the scoring transformation unit; a metadata processing unit that preprocesses variables of metadata of a program, the preprocessing including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.
The cable television on-demand program recommendation system, wherein the first data cleaning module comprises: the first judging unit is used for judging whether the initial time of the training audience rating behavior data of the same user is the same or not and sending the training audience rating behavior data with the same initial time of the same user to the first screening unit; the first screening unit is used for selecting the training audience rating behavior data with the large end time and deleting the rest training audience rating behavior data; the second judging unit judges whether the end time of the training audience behavior data of the same user is the same or not, and sends the training audience behavior data with the same end time of the same user to the second screening unit; the second screening unit is used for selecting the training audience rating behavior data with small starting time and deleting the rest training audience rating behavior data; the sequencing unit is used for sequencing the audience rating behavior data of the training users in a descending order according to the users and the starting time; the third judging unit judges whether the front and back training audience rating behavior data of the same user arranged by the sorting unit are overlapped in audience rating recording time, and sends the overlapped training audience rating behavior data to the third screening unit; and the third screening unit deletes the training audience behavior data with the later sequence in the overlapped training audience behavior data.
The cable television on-demand program recommendation system, wherein the second data cleansing module comprises: the editing distance obtaining unit is used for calculating the editing distance between the original on-demand program name and the crawled program name; the fourth judging unit is used for judging whether the editing distance is greater than a set threshold value or not, and sending a signal to the fourth screening unit when the editing distance is greater than the set threshold value; and the fourth screening unit deletes the metadata of the crawled program with the editing distance larger than the set threshold value.
The system for recommending cable television on-demand programs, wherein the program candidate set obtaining part further comprises: the similarity obtaining module calculates the user similarity and the program similarity, and comprises the following steps:
a similarity model construction unit that constructs a similarity model according to a similarity calculation method including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,
Figure BDA0001605245040000031
wherein pearsonijPearson's correlation coefficient for program i and program j; u (i) represents a set of users that score program i, ruiIndicating the rating of program i by user u,
Figure BDA0001605245040000032
representing the average rating of all users for program i;
constructing a second similarity model according to the following formula (2) by using the cosine similarity,
Figure BDA0001605245040000033
wherein, cosineijThe cosine similarity of the program i and the program j;
constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,
Figure BDA0001605245040000034
wherein the jaccardpqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;
the neighbor set determining unit determines a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;
a neighbor score determining unit for determining the prediction scores of the programs in the neighbor set of each program for different users according to the following formula (4)
Figure BDA0001605245040000035
Wherein the content of the first and second substances,
Figure BDA0001605245040000036
is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, Sk(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;
and the program candidate set determining unit selects a set number of programs as the program candidate set of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.
The cable tv on-demand program recommendation system, wherein the recommendation list generating part includes a weighted combination unit, an accuracy calculating unit or/and a recall ratio calculating unit, and a recommendation list generating unit, wherein:
the weighted combination unit is used for carrying out weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition part according to a plurality of strategies or carrying out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea;
an accuracy calculating unit for calculating the accuracy of each weighted combination according to the following formula (5) based on the test viewing behavior data of the test set,
Figure BDA0001605245040000037
wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;
a recall ratio calculating unit for calculating recall ratios of various weighted combinations according to the following formula (6) based on the test audience behavior data of the test set,
Figure BDA0001605245040000041
wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;
and a recommendation list generation unit which generates a recommendation list by using a weighted combination with high accuracy or/and high recall rate as a recommendation result.
According to another aspect of the present invention, there is provided a cable tv on-demand program recommendation method, including: step S1, collecting the viewing behavior data of cable TV users, and crawling the metadata of the online programs; step S2, using one part of the viewing behavior data as training viewing behavior data to form a training set, and using the other part as testing viewing behavior data to form a testing set; step S3, converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating duration of the users for the programs to the broadcasting duration of the programs, and the scores of each user for each program form a user-program score matrix; step S4, standardizing metadata of the program; step S5, obtaining a plurality of program candidate sets using a plurality of analysis methods according to the user-program scoring matrix and the normalized metadata, the analysis methods including two or more of the following methods: decomposing a user-program scoring matrix by adopting a matrix decomposition method, and generating a first program candidate set C1 to be recommended according to the values of elements in the low-rank matrix; decomposing a scoring matrix of the user-program by adopting a matrix decomposition method, calculating user similarity and movie program similarity, and generating a second program candidate set C2 to be recommended by using a neighborhood recommendation model; calculating user similarity and program similarity according to the user-program scoring matrix, and generating a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; calculating user similarity and movie program similarity according to movie metadata, and generating a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model; and step S6, performing weighted combination on the program candidate sets to be recommended according to a plurality of strategies or performing weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning concept, judging the accuracy or/and recall rate of various weighted combinations according to the test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.
The method for recommending cable tv on-demand programs, wherein step S3 includes: cleaning training audience behavior data of a user; screening users and programs to remove inactive users and cold programs; converting training audience rating behavior data of the user into scores of the user on programs; and converting the scores into integers with the values of 0 or 1 according to a rounding method to form a user-score matrix of each user for each program.
The method for recommending the cable television on-demand programs comprises the following steps of: judging whether the starting time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with the larger ending time, and deleting the rest training audience rating behavior data; judging whether the end time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with small starting time, and deleting the rest training audience rating behavior data; arranging training audience rating behavior data of the users in descending order according to the users and the starting time; and judging whether the front and back training audience rating behavior data of the same user are overlapped in audience rating recording time, and if so, deleting the training audience rating behavior data with the later sequence in the overlapped training audience rating behavior data.
The method for recommending cable tv on-demand programs, wherein step S4 includes: cleaning the metadata of the program; preprocessing variables of metadata of the cleaned program, including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.
The method for recommending the cable television on-demand programs comprises the following steps of: calculating the edit distance between the original on-demand program name and the crawled program name; and judging whether the editing distance is greater than a set threshold value or not, and deleting the metadata of the crawled program of which the editing distance is greater than the set threshold value.
In step S5, the method for calculating the user similarity and the movie program similarity and generating a candidate set of programs to be recommended by using a neighborhood recommendation model includes:
constructing a similarity model according to a similarity algorithm including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,
Figure BDA0001605245040000051
wherein pearsonijPearson's correlation coefficient for program i and program j; representing a set of users who rate a program, U (i) representing a set of users who rate a program, ruiIndicating the rating of program i by user u,
Figure BDA0001605245040000052
representing the average rating of all users for program i;
constructing a second similarity model according to the following formula (2) by using the cosine similarity,
Figure BDA0001605245040000053
wherein, cosineijThe cosine similarity of the program i and the program j;
constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,
Figure BDA0001605245040000054
wherein the jaccardpqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;
determining a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;
determining the prediction scores of different users for the programs in the neighbor set of each program according to the following formula (4)
Figure BDA0001605245040000055
Wherein the content of the first and second substances,
Figure BDA0001605245040000061
is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, Sk(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;
and selecting a set number of programs as program candidate sets of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.
The method for recommending cable television on-demand programs, wherein the step S6 includes a weighted combination step, an accuracy calculation step or/and a recall ratio calculation step, and a recommendation list generation step, wherein:
a weighted combination step, which is to carry out weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition part according to a plurality of strategies or carry out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea;
an accuracy calculation step of calculating the accuracy of each weighted combination according to the test viewing behavior data of the test set and the following formula (5),
Figure BDA0001605245040000062
wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;
a recall ratio calculation step of calculating recall ratios of various weighted combinations according to the test audience behavior data of the test set and the following formula (6),
Figure BDA0001605245040000063
wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;
and a recommendation list generation step of generating a recommendation list by using a weighted combination with high accuracy or/and recall as a recommendation result.
The method and the system for recommending the cable television on-demand programs are characterized in that a plurality of matrix decomposition methods are used for comparison and mixing so as to ensure certain recommendation precision and efficiency, and a personalized movie recommendation list is generated for a user. The method and the system for recommending the cable television on-demand programs can help network operators to provide targeted services for users and improve the on-demand experience of the users.
Drawings
Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 is a block diagram of a cable TV on-demand program recommendation system according to the present invention;
fig. 2 is a flowchart of a method for recommending cable tv on-demand programs according to the present invention.
Detailed Description
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a block diagram of a cable tv on-demand program recommendation system according to the present invention, and as shown in fig. 1, the cable tv on-demand program recommendation system includes:
the acquisition part 1 comprises a first acquisition unit 11 and a second acquisition unit 12, wherein the first acquisition unit 11 acquires viewing behavior data of cable television users, the viewing behavior data comprises viewing behaviors of users on television programs such as viewing starting time, viewing ending time, viewing duration, rating, evaluation and the like, and the second acquisition unit 12 crawls metadata of the online programs, wherein the metadata comprises program names, directors, leaders, actors, countries, ages, regions, types, durations, ratings, ticket houses and the like;
the classification part 2 is used for forming a training set by taking one part of the audience behavior data acquired by the first acquisition unit as training audience behavior data and forming a test set by taking the other part of the audience behavior data as test audience behavior data;
the data preprocessing part 3 is used for converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, and the scores of each user for each program form a user-program scoring matrix; normalizing the metadata acquired by the second acquisition unit, for example, performing data normalization, that is, mapping the metadata onto a [0, 1] interval uniformly, for example, min-max normalization (dispersion normalization), which is a linear transformation on the original metadata, so that the result falls into the [0, 1] interval;
the program candidate set obtaining part 4 is configured to obtain a plurality of program candidate sets by using a plurality of analysis methods according to the user-program scoring matrix and the standardized metadata, and includes a first analysis module 41, a second analysis module 42, a third analysis module 43, and a fourth analysis module 44, where the first analysis module 41 decomposes the user-program scoring matrix by using a matrix decomposition method, and generates a first program candidate set C1 to be recommended according to values of elements in the low-rank matrix; the second analysis module 42 adopts a matrix decomposition method to decompose the scoring matrix of the user-program, calculates the user similarity and the movie program similarity, and generates a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; the third analysis module 43 calculates the user similarity and the program similarity according to the user-program scoring matrix, and generates a third program candidate set C3 to be recommended by using a neighborhood recommendation model; the fourth analysis module 44 calculates user similarity and movie program similarity according to the movie metadata, and generates a fourth program candidate set C4 to be recommended by using a neighborhood recommendation model;
and a recommendation list generating part 5 for performing weighted combination on the plurality of program candidate sets to be recommended of the program candidate set obtaining part 4 according to a plurality of strategies or performing weighted combination on different similarity calculation methods of different program candidate set obtaining parts by utilizing a machine learning concept, judging the accuracy or/and the recall ratio of each weighted combination according to the test set separated by the classification part, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall ratio as a recommendation result, wherein the recommendation list comprises a user list, a program ordered list, a similar user list and a similar program list.
In an embodiment of the present invention, the data preprocessing unit 3 removes invalid viewing records, such as a record of no viewing behavior, an abnormal viewing record (e.g., an extreme viewing behavior such as a viewing behavior that is always on), from the viewing behavior data, matches the metadata, determines and removes inconsistent information, for example, stores the viewing behavior data of the user in a viewing library, stores the broadcast data of the program in a broadcast library, removes information that is inconsistent with information in the broadcast library, the viewing library, or/and the program list, and converts the valid viewing behavior data into an appropriate form, specifically, the method includes:
the first data cleaning module 31 is used for cleaning the training audience behavior data of the training set;
the second data cleaning module 32 is used for cleaning the metadata data collected by the second collection unit;
the conversion module 33 is configured to convert the cleaned training audience behavior data and metadata, and includes a screening unit 331 configured to screen users and programs to remove inactive users and cold programs; the viewing behavior conversion unit 332 converts training viewing behavior data of the user into a score of the user for the program; a score conversion unit 333 that converts the score into an integer whose value is 0 or 1 according to a rounding method; the scoring matrix construction unit 334 is used for forming a user-program scoring matrix by scoring each program by each user through the scoring transformation unit; a metadata processing unit 335 that preprocesses variables of metadata of the program, the preprocessing including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute, preferably, manually classifying the variable of the character attribute, for example, classifying the type of the movie, and classifying the movie into types of action, adventure, science fiction, and the like, wherein the reason for adopting the manual classification is to worry that the quality of the crawler data is not high.
The data preprocessing part 3 can improve the recommendation precision and efficiency by collecting and cleaning the program data requested by the user.
Preferably, the first data cleansing module 31 comprises:
the first judging unit 311 is configured to judge whether the starting times of the training audience behavior data of the same user are the same, and send the training audience behavior data with the same starting time of the same user to the first screening unit;
the first screening unit 312 selects the training audience behavior data with a long end time, and deletes the rest training audience behavior data;
a second judging unit 313, which judges whether the end time of the training audience behavior data of the same user is the same, and sends the training audience behavior data with the same end time of the same user to the second screening unit;
the second screening unit 314 selects the training audience behavior data with a small starting time, and deletes the rest training audience behavior data;
the sorting unit 315, sorting the audience rating behavior data of the training users in descending order according to the users and the starting time;
a third judging unit 316, configured to judge whether two pieces of training audience behavior data of the same user arranged in the sorting unit overlap each other in audience recording time, and send the overlapped training audience behavior data to a third screening unit;
the third filtering unit 317 deletes the training audience behavior data in the overlapped training audience behavior data after the sequence.
In addition, preferably, the second data cleansing module 32 includes:
an edit distance obtaining unit 321 that calculates an edit distance between the original on-demand program name and the crawled program name;
a fourth determining unit 322, configured to determine whether the edit distance is greater than a set threshold, and send a signal to a fourth screening unit when the edit distance is greater than the set threshold;
the fourth filtering unit 323 deletes metadata of the crawled program whose edit distance is greater than a set threshold.
In one embodiment of the present invention, the program candidate set obtaining section 4 further includes:
the similarity obtaining module 45 calculates the user similarity and the program similarity, and includes:
the similarity model constructing unit 451 constructs a similarity model according to a similarity calculation method including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,
Figure BDA0001605245040000091
wherein pearsonijPearson's correlation coefficient for program i and program j; u (i) represents a set of users that score program i, ruiIndicating the rating of program i by user u,
Figure BDA0001605245040000092
representing the average rating of all users for program i;
constructing a second similarity model according to the following formula (2) by using the cosine similarity,
Figure BDA0001605245040000093
wherein, cosineijThe cosine similarity of the program i and the program j;
constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,
Figure BDA0001605245040000094
wherein the jaccardpqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;
a neighbor set determining unit 452 that determines a neighbor set of each program according to the similarity between programs and the similarity between users using a neighborhood recommendation model;
the neighborhood score determination unit 453 determines the prediction scores of different users for the programs in the neighborhood set of each program according to the following equation (4)
Figure BDA0001605245040000095
Wherein the content of the first and second substances,
Figure BDA0001605245040000096
is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, Sk(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;
the program candidate set determining unit 454 selects a set number of programs as the program candidate set of the user according to the prediction scores of the neighbor sets of the programs by the user in the order of the high or low prediction scores.
In one embodiment of the present invention, the recommendation list generating section 5 includes a weighted combination unit 51, an accuracy calculating unit 52 or/and recall ratio calculating unit 53, and a recommendation list generating unit 54, in which:
a weighted combination unit 51, which performs weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition parts according to a plurality of strategies or performs weighted combination on different similarity calculation methods of different program candidate set acquisition parts by using a machine learning idea;
an accuracy calculating unit 52 for calculating the accuracy of each weighted combination according to the following equation (5) based on the test viewing behavior data of the test set,
Figure BDA0001605245040000097
wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;
the recall ratio calculating unit 53 calculates the recall ratio of each weighted combination according to the following equation (6) based on the test viewing behavior data of the test set,
Figure BDA0001605245040000101
wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;
the recommendation list generation unit 54 generates a recommendation list by using a weighted combination with high accuracy and/or recall as a recommendation result.
Fig. 2 is a flowchart of a method for recommending a cable tv on-demand program according to the present invention, and as shown in fig. 2, the method for recommending a cable tv on-demand program includes:
step S1, collecting the viewing behavior data of cable TV users, and crawling the metadata of the online programs;
step S2, using one part of the viewing behavior data as training viewing behavior data to form a training set, and using the other part as testing viewing behavior data to form a testing set;
step S3, converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, namely
Figure BDA0001605245040000102
The scores of each user for each program form a user-program score matrix;
step S4, standardizing metadata of the program;
step S5, obtaining a plurality of program candidate sets using a plurality of analysis methods according to the user-program scoring matrix and the normalized metadata, the analysis methods including two or more of the following methods: decomposing a user-program scoring matrix by adopting a matrix decomposition method, and generating a first program candidate set C1 to be recommended according to the values of elements in the low-rank matrix; decomposing a scoring matrix of the user-program by adopting a matrix decomposition method, calculating user similarity and movie program similarity (for example, calculating the user similarity or the movie program similarity by using a Pearson correlation coefficient, cosine similarity, reciprocal squared distance similarity, Jaccard similarity and the like), and generating a second program candidate set C2 to be recommended by using a neighborhood recommendation model; calculating user similarity and program similarity according to the user-program scoring matrix, and generating a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; calculating user similarity and movie program similarity according to movie metadata, and generating a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model;
step S6, carrying out weighted combination on a plurality of program candidate sets to be recommended according to a plurality of strategies or carrying out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea, judging the accuracy or/and recall rate of various weighted combinations according to a test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result, wherein the recommendation list comprises a user, a program ordered list, a similar user list and a similar program list.
In one embodiment of the present invention, step S3 includes:
step S31, cleaning the training audience behavior data of the user;
step S32, filtering the users and programs to remove the inactive users and cold programs, for example, the inactive users may be users with a small number of on-demand programs or/and a short duration, or the users may be ranked according to the number of on-demand programs, and finally set a percentage (e.g., 5%) of users; the cold program can be a program with less requested times and shorter requested time, or the programs can be sorted according to the requested times, and finally, a percentage (for example, 5%) of the programs is set;
step S33, converting the training audience behavior data of the user into the program rating of the user;
and step S34, converting the scores into integers with the value of 0 or 1 according to a rounding method, and forming a user-score matrix of each user for each program.
Preferably, step S31 includes:
judging whether the starting time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with the larger ending time, and deleting the rest training audience rating behavior data;
judging whether the end time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with small starting time, and deleting the rest training audience rating behavior data;
arranging training audience rating behavior data of the users in descending order according to the users and the starting time;
and judging whether the front and back training audience rating behavior data of the same user are overlapped in audience rating recording time, and if so, deleting the training audience rating behavior data with the later sequence in the overlapped training audience rating behavior data.
In one embodiment of the present invention, step S4 includes:
step S41, cleaning the metadata of the program;
step S42 is to pre-process the metadata variables of the cleaned program, including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.
Preferably, step S41 includes:
calculating the edit distance between the original on-demand program name and the crawled program name;
and judging whether the editing distance is greater than a set threshold value or not, and deleting the metadata of the crawled program of which the editing distance is greater than the set threshold value.
In an embodiment of the present invention, in step S5, the method for calculating the user similarity and the movie program similarity and generating the candidate set of programs to be recommended by using the neighborhood recommendation model includes:
constructing a similarity model according to a similarity algorithm including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,
Figure BDA0001605245040000111
wherein pearsonijPearson's correlation coefficient for program i and program j; representing a set of users who rate a program, U (i) representing a set of users who rate a program, ruiIndicating the rating of program i by user u,
Figure BDA0001605245040000112
representing the average rating of all users for program i;
constructing a second similarity model according to the following formula (2) by using the cosine similarity,
Figure BDA0001605245040000113
wherein, cosineijThe cosine similarity of the program i and the program j;
constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,
Figure BDA0001605245040000121
wherein the jaccardpqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;
determining a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;
determining the prediction scores of different users for the programs in the neighbor set of each program according to the following formula (4)
Figure BDA0001605245040000122
Wherein the content of the first and second substances,
Figure BDA0001605245040000123
is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, Sk(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;
and selecting a set number of programs as program candidate sets of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.
In one embodiment of the present invention, step S6 includes a weighted combination step, an accuracy calculation step or/and a recall calculation step, and a recommendation list generation step, wherein:
a weighted combination step, wherein a plurality of program candidate sets to be recommended of the program candidate set acquisition part are weighted and combined according to a plurality of strategies (such as intersection, union set, weighting and other strategies) or different similarity calculation methods of different program candidate set acquisition parts are weighted and combined by utilizing a machine learning idea;
an accuracy calculation step of calculating the accuracy of each weighted combination according to the test viewing behavior data of the test set and the following formula (5),
Figure BDA0001605245040000124
wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;
a recall ratio calculation step of calculating recall ratios of various weighted combinations according to the test audience behavior data of the test set and the following formula (6),
Figure BDA0001605245040000125
wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;
and a recommendation list generation step of generating a recommendation list by using a weighted combination with high accuracy or/and recall as a recommendation result.
Preferably, step S6 further includes: for a user with an empty movie candidate set to be recommended, selecting movies with popular programs and program evaluation as recommendation results, wherein the movies can be sorted from long to short according to the on-demand duration, and taking a set number of programs in the front of the sorting as programs; the program evaluation means that the comprehensive score of the program is calculated according to the box office, the score, the number of times of winning the prize and the number of times of playing, and the program with high score is the program with good program evaluation.
The above contents show various embodiments of the cable tv on-demand program recommendation method and system of the present invention, but the present invention is not limited thereto, for example:
considering the difference of different user scoring scales, constructing a second similarity model according to the following formula (7) by utilizing cosine similarity,
Figure BDA0001605245040000131
wherein, cosine _ advancedijThe cosine similarity of the program i and the program j;
as another example, the predictive scores of different users for programs in the neighbor set of programs are determined according to equation (8) below, taking into account the impact of global user behavior
Figure BDA0001605245040000132
Wherein, buiIs an offset term, bui=b+bu+biB represents the mean of all user scores, buRepresenting the deviation of the average score of user u from the global score, biRepresenting the deviation of the average score of program i from the global score.
The method and the system for recommending the cable television on-demand programs can judge the degree of the user interested in the unviewed programs through effective data processing and analysis, improve the program delivery efficiency and achieve the aims of precise marketing and personalized service.
The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention is defined by the claims.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (13)

1. A cable television on demand program recommendation system, comprising:
the system comprises a collecting part and a display part, wherein the collecting part comprises a first collecting unit and a second collecting unit, the first collecting unit collects the viewing behavior data of a cable television user, and the second collecting unit crawls the metadata of the online program;
the classification part takes one part of the audience behavior data acquired by the first acquisition unit as training audience behavior data to form a training set, and takes the other part of the audience behavior data as testing audience behavior data to form a testing set;
the data preprocessing part is used for converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating time of the users for the programs to the broadcasting time of the programs, and the scores of each user for each program form a user-program scoring matrix; standardizing the metadata acquired by the second acquisition unit;
the program candidate set obtaining part comprises a first analysis module, a second analysis module, a third analysis module and a fourth analysis module, wherein the first analysis module adopts a matrix decomposition method to decompose a user-program scoring matrix, and generates a first program candidate set C1 to be recommended according to the values of elements in a low-rank matrix; the second analysis module adopts a matrix decomposition method to decompose a scoring matrix of the user-program, calculates the user similarity and the program similarity, and generates a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; the third analysis module calculates the user similarity and the program similarity according to the user-program scoring matrix, and generates a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; the fourth analysis module calculates the user similarity and the program similarity according to the metadata, and generates a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model;
and a recommendation list generating unit which performs weighted combination on a plurality of program candidate sets to be recommended of the program candidate set obtaining unit according to a plurality of strategies or performs weighted combination on different similarity calculation methods of different program candidate set obtaining units by utilizing a machine learning concept, judges the accuracy or/and recall rate of each weighted combination according to the test set separated by the classification unit, and generates a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.
2. The cable tv on-demand program recommendation system according to claim 1, wherein the data preprocessing section comprises:
the first data cleaning module is used for cleaning the training audience rating behavior data of the training set;
the second data cleaning module is used for cleaning the metadata data collected by the second collection unit;
the conversion module is used for converting the cleaned training audience behavior data and the metadata, and comprises a screening unit which is used for screening users and programs and removing inactive users and cold programs; the audience rating behavior conversion unit is used for converting training audience rating behavior data of the user into scores of the user on the programs; a score transformation unit which transforms the score into an integer with a value of 0 or 1 according to a rounding method; the scoring matrix construction unit is used for forming a user-program scoring matrix by scoring each program by each user through the scoring transformation unit; a metadata processing unit that preprocesses variables of metadata of a program, the preprocessing including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.
3. The cable tv on-demand program recommendation system of claim 2, wherein the first data cleansing module comprises:
the first judging unit is used for judging whether the initial time of the training audience rating behavior data of the same user is the same or not and sending the training audience rating behavior data with the same initial time of the same user to the first screening unit;
the first screening unit is used for selecting the training audience rating behavior data with the large end time and deleting the rest training audience rating behavior data;
the second judging unit judges whether the end time of the training audience behavior data of the same user is the same or not, and sends the training audience behavior data with the same end time of the same user to the second screening unit;
the second screening unit is used for selecting the training audience rating behavior data with small starting time and deleting the rest training audience rating behavior data;
the sequencing unit is used for sequencing the audience rating behavior data of the training users in a descending order according to the users and the starting time;
the third judging unit judges whether the front and back training audience rating behavior data of the same user arranged by the sorting unit are overlapped in audience rating recording time, and sends the overlapped training audience rating behavior data to the third screening unit;
and the third screening unit deletes the training audience behavior data with the later sequence in the overlapped training audience behavior data.
4. The cable tv on-demand program recommendation system of claim 2, wherein the second data cleansing module comprises:
the editing distance obtaining unit is used for calculating the editing distance between the original on-demand program name and the crawled program name;
the fourth judging unit is used for judging whether the editing distance is greater than a set threshold value or not, and sending a signal to the fourth screening unit when the editing distance is greater than the set threshold value;
and the fourth screening unit deletes the metadata of the crawled program with the editing distance larger than the set threshold value.
5. The cable tv on-demand program recommendation system according to claim 1, wherein the program candidate set obtaining section further comprises:
the similarity obtaining module calculates the user similarity and the program similarity, and comprises the following steps:
a similarity model construction unit that constructs a similarity model according to a similarity calculation method including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,
Figure FDA0002418969380000021
wherein pearsonijPearson's correlation coefficient for program i and program j; u (i) represents a set of users that score program i, ruiIndicating the rating of program i by user u,
Figure FDA0002418969380000024
representing the average rating of all users for program i;
constructing a second similarity model according to the following formula (2) by using the cosine similarity,
Figure FDA0002418969380000022
wherein, cosineijThe cosine similarity of the program i and the program j;
constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,
Figure FDA0002418969380000023
wherein the jaccardpqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;
the neighbor set determining unit determines a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;
a neighbor score determining unit for determining the prediction scores of the programs in the neighbor set of each program for different users according to the following formula (4)
Figure FDA0002418969380000031
Wherein the content of the first and second substances,
Figure FDA0002418969380000032
is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, Sk(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;
and the program candidate set determining unit selects a set number of programs as the program candidate set of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.
6. The cable tv-on-demand program recommendation system according to claim 1, wherein the recommendation list generation section includes a weighted combination unit, an accuracy calculation unit or/and a recall ratio calculation unit, and a recommendation list generation unit, wherein:
the weighted combination unit is used for carrying out weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition part according to a plurality of strategies or carrying out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea;
an accuracy calculating unit for calculating the accuracy of each weighted combination according to the following formula (5) based on the test viewing behavior data of the test set,
Figure FDA0002418969380000033
wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;
a recall ratio calculating unit for calculating recall ratios of various weighted combinations according to the following formula (6) based on the test audience behavior data of the test set,
Figure FDA0002418969380000034
wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;
and a recommendation list generation unit which generates a recommendation list by using a weighted combination with high accuracy or/and high recall rate as a recommendation result.
7. A method for recommending cable TV on-demand programs is characterized by comprising the following steps:
step S1, collecting the viewing behavior data of cable TV users, and crawling the metadata of the online programs;
step S2, using one part of the viewing behavior data as training viewing behavior data to form a training set, and using the other part as testing viewing behavior data to form a testing set;
step S3, converting the training audience rating behavior data of the users in the training set into scores of the users for the programs, wherein the scores are the ratio of the audience rating duration of the users for the programs to the broadcasting duration of the programs, and the scores of each user for each program form a user-program score matrix;
step S4, standardizing metadata of the program;
step S5, obtaining a plurality of program candidate sets using a plurality of analysis methods according to the user-program scoring matrix and the normalized metadata, the analysis methods including two or more of the following methods: decomposing a user-program scoring matrix by adopting a matrix decomposition method, and generating a first program candidate set C1 to be recommended according to the values of elements in the low-rank matrix; decomposing a scoring matrix of the user-program by adopting a matrix decomposition method, calculating user similarity and program similarity, and generating a second program candidate set C2 to be recommended by applying a neighborhood recommendation model; calculating user similarity and program similarity according to the user-program scoring matrix, and generating a third program candidate set C3 to be recommended by applying a neighborhood recommendation model; calculating user similarity and program similarity according to the metadata, and generating a fourth program candidate set C4 to be recommended by applying a neighborhood recommendation model;
and step S6, performing weighted combination on the program candidate sets to be recommended according to a plurality of strategies or performing weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning concept, judging the accuracy or/and recall rate of various weighted combinations according to the test set, and generating a recommendation list by taking the weighted combination with high accuracy or/and recall rate as a recommendation result.
8. The cable tv on-demand program recommendation method according to claim 7, wherein the step S3 comprises:
cleaning training audience behavior data of a user;
screening users and programs to remove inactive users and cold programs;
converting training audience rating behavior data of the user into scores of the user on programs;
and converting the scores into integers with the values of 0 or 1 according to a rounding method to form a user-score matrix of each user for each program.
9. The cable tv on-demand program recommendation method of claim 8, wherein the method of cleansing trained viewing behavior data of the user comprises:
judging whether the starting time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with the larger ending time, and deleting the rest training audience rating behavior data;
judging whether the end time of the training audience rating behavior data of the same user is the same, if so, selecting the training audience rating behavior data with small starting time, and deleting the rest training audience rating behavior data;
arranging training audience rating behavior data of the users in descending order according to the users and the starting time;
and judging whether the front and back training audience rating behavior data of the same user are overlapped in audience rating recording time, and if so, deleting the training audience rating behavior data with the later sequence in the overlapped training audience rating behavior data.
10. The cable tv on-demand program recommendation method according to claim 7, wherein the step S4 comprises:
cleaning the metadata of the program;
preprocessing variables of metadata of the cleaned program, including: judging the attribute of the variable, normalizing the variable of the numerical attribute, and classifying the variable of the character attribute.
11. The cable tv on-demand program recommendation method of claim 10, wherein the method of cleansing metadata of a program comprises:
calculating the edit distance between the original on-demand program name and the crawled program name;
and judging whether the editing distance is greater than a set threshold value or not, and deleting the metadata of the crawled program of which the editing distance is greater than the set threshold value.
12. The method of claim 7, wherein in step S5, the user similarity and the program similarity are calculated, and the method of generating the candidate set of programs to be recommended using the neighborhood recommendation model includes:
constructing a similarity model according to a similarity algorithm including a pearson correlation coefficient, a cosine similarity, a reciprocal squared distance similarity, and a Jaccard similarity, wherein a first similarity model is constructed according to the following formula (1) using the pearson correlation coefficient,
Figure FDA0002418969380000051
wherein pearsonijPearson's correlation coefficient for program i and program j; representing a set of users who rate a program, U (i) representing a set of users who rate a program, ruiIndicating the rating of program i by user u,
Figure FDA0002418969380000052
representing the average rating of all users for program i;
constructing a second similarity model according to the following formula (2) by using the cosine similarity,
Figure FDA0002418969380000053
wherein, cosineijThe cosine similarity of the program i and the program j;
constructing a third similarity model according to the following formula (3) by using the Jaccard similarity,
Figure FDA0002418969380000054
wherein the jaccardpqThe Jaccard similarity of a user p and a user q, | U (p) ∩ U (q) | represents the number of programs scored by the user p and the user q together, | U (p) ∪ U (q) | represents the sum of the number of programs scored by the user p and the number of programs scored by the user q;
determining a neighbor set of each program according to the similarity between the programs and the similarity between the users by using a neighborhood recommendation model;
determining the prediction scores of different users for the programs in the neighbor set of each program according to the following formula (4)
Figure FDA0002418969380000055
Wherein the content of the first and second substances,
Figure FDA0002418969380000056
is the predicted score of user u for program i, R (u) is the program set of user u' S behavior, Sk(i) Is k programs most similar to the program i, sim (i, j) represents the similarity between the program i and the program j;
and selecting a set number of programs as program candidate sets of the user according to the prediction scores of the neighbor sets of the programs by the user and the sequence of the prediction scores.
13. The cable tv on-demand program recommendation method according to claim 7, wherein the step S6 comprises a weighted combination step, an accuracy calculation step or/and a recall ratio calculation step and a recommendation list generation step, wherein:
a weighted combination step, which is to carry out weighted combination on a plurality of program candidate sets to be recommended of the program candidate set acquisition part according to a plurality of strategies or carry out weighted combination on different similarity calculation methods of different program candidate set acquisition parts by utilizing a machine learning idea;
an accuracy calculation step of calculating the accuracy of each weighted combination according to the test viewing behavior data of the test set and the following formula (5),
Figure FDA0002418969380000061
wherein Precision is the accuracy of a weighted combination, n represents the number of users on the test set, hit (p) represents the number of elements in the intersection of the recommended program list of user p and the program list actually requested by user p on the test set, and L represents the length of the recommended list;
a recall ratio calculation step of calculating recall ratios of various weighted combinations according to the test audience behavior data of the test set and the following formula (6),
Figure FDA0002418969380000062
wherein Recall is a weighted combination Recall rate, hit (p) represents the number of elements in the intersection of the recommended program list of the user p and the program list actually requested by the user p on the test set, and test (p) represents the number of programs actually requested by the user p on the test set;
and a recommendation list generation step of generating a recommendation list by using a weighted combination with high accuracy or/and recall as a recommendation result.
CN201810241067.2A 2018-03-22 2018-03-22 Cable television on-demand program recommendation method and system Active CN108650532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810241067.2A CN108650532B (en) 2018-03-22 2018-03-22 Cable television on-demand program recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810241067.2A CN108650532B (en) 2018-03-22 2018-03-22 Cable television on-demand program recommendation method and system

Publications (2)

Publication Number Publication Date
CN108650532A CN108650532A (en) 2018-10-12
CN108650532B true CN108650532B (en) 2020-06-12

Family

ID=63744710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810241067.2A Active CN108650532B (en) 2018-03-22 2018-03-22 Cable television on-demand program recommendation method and system

Country Status (1)

Country Link
CN (1) CN108650532B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508407A (en) * 2019-01-14 2019-03-22 上海电机学院 The tv product recommended method of time of fusion and Interest Similarity
CN110147853A (en) * 2019-02-26 2019-08-20 国网吉林省电力有限公司 A kind of test teaching notes generation method and system for power grid regulation emulation training
CN110430471B (en) * 2019-07-24 2021-05-07 山东海看新媒体研究院有限公司 Television recommendation method and system based on instantaneous calculation
CN112365447B (en) * 2020-10-20 2022-08-19 四川长虹电器股份有限公司 Multidimensional movie and television scoring method
CN112836600B (en) * 2021-01-19 2023-12-22 新华智云科技有限公司 Video similarity calculation method and system
CN114222170A (en) * 2021-12-06 2022-03-22 深圳Tcl新技术有限公司 Television program recommendation method and device, computer equipment and storage medium
CN115034847A (en) * 2022-05-25 2022-09-09 山东大学 Product recommendation method, system, storage medium and equipment based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780920A (en) * 2011-07-05 2012-11-14 上海奂讯通信安装工程有限公司 Television program recommending method and system
CN103106285A (en) * 2013-03-04 2013-05-15 中国信息安全测评中心 Recommendation algorithm based on information security professional social network platform
CN105430505A (en) * 2015-11-13 2016-03-23 云南大学 IPTV program recommending method based on combined strategy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780920A (en) * 2011-07-05 2012-11-14 上海奂讯通信安装工程有限公司 Television program recommending method and system
CN103106285A (en) * 2013-03-04 2013-05-15 中国信息安全测评中心 Recommendation algorithm based on information security professional social network platform
CN105430505A (en) * 2015-11-13 2016-03-23 云南大学 IPTV program recommending method based on combined strategy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RESEARCH OF USERS" VIEWING HABITS BASED ON CLUSTERING METHOD;江茜等;《Proceedings of CCIS2014》;20141130;全文 *
Spark框架下的受众分群及矩阵分解的推荐算法研究;周虹君等;《中国新通信》;20161130;全文 *

Also Published As

Publication number Publication date
CN108650532A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108650532B (en) Cable television on-demand program recommendation method and system
CN103559206B (en) A kind of information recommendation method and system
CN110704674B (en) Video playing integrity prediction method and device
US9875441B2 (en) Question recommending method, apparatus and system
CN107341268B (en) Hot searching ranking method and system
CN101489107B (en) Collaborative filtering recommendation method based on population attribute keyword vector
CN107483982B (en) Anchor recommendation method and device
US20120323725A1 (en) Systems and methods for supplementing content-based attributes with collaborative rating attributes for recommending or filtering items
CN110337012B (en) Intelligent recommendation method and device based on Internet television platform
CN105653572A (en) Resource processing method and apparatus
CN102737029A (en) Searching method and system
CN106326391A (en) Method and device for recommending multimedia resources
CN101482884A (en) Cooperation recommending system based on user predilection grade distribution
KR101354721B1 (en) Search system and method of search service
CN109241451B (en) Content combination recommendation method and device and readable storage medium
CN111861550B (en) Family portrait construction method and system based on OTT equipment
KR20170079429A (en) A clustering based collaborative filtering method with a consideration of users' features and movie recommendation system using thereof
CN106604068B (en) A kind of method and its system of more new media program
CN115760202A (en) Product operation management system and method based on artificial intelligence
CN111581435A (en) Video cover image generation method and device, electronic equipment and storage medium
KR101780237B1 (en) Method and device for answering user question based on q&a data provided on online
CN114513687A (en) Server and media asset recommendation method
EP2151799A1 (en) Recommander method and system, in particular for IPTV
CN116861063B (en) Method for exploring commercial value degree of social media hot search
CN108804492B (en) Method and device for recommending multimedia objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant