CN113705873A - Construction method of film and television work scoring prediction model and scoring prediction method - Google Patents

Construction method of film and television work scoring prediction model and scoring prediction method Download PDF

Info

Publication number
CN113705873A
CN113705873A CN202110948252.7A CN202110948252A CN113705873A CN 113705873 A CN113705873 A CN 113705873A CN 202110948252 A CN202110948252 A CN 202110948252A CN 113705873 A CN113705873 A CN 113705873A
Authority
CN
China
Prior art keywords
data
correlation
video
reserved
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110948252.7A
Other languages
Chinese (zh)
Other versions
CN113705873B (en
Inventor
张树武
刘杰
王艺颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110948252.7A priority Critical patent/CN113705873B/en
Publication of CN113705873A publication Critical patent/CN113705873A/en
Application granted granted Critical
Publication of CN113705873B publication Critical patent/CN113705873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a construction method of a film and television work scoring prediction model and a scoring prediction method, wherein the construction method comprises the following steps: collecting attribute data of a video on a movie platform; removing data of which the correlation with the video score is smaller than the lower limit of the correlation threshold value in the attribute data to obtain a reserved data item; merging the data with the correlation among the data in the reserved data items larger than the upper limit of the correlation threshold value according to a merging rule; splicing the merged data and the original data which is smaller than the upper limit of the correlation threshold value and is in the reserved data item to construct a feature vector of the video; and carrying out unique hot coding on the characteristic vector, splicing the characteristic vector with original data which is smaller than the upper limit of the correlation threshold and is originally reserved in the data item, and inputting the data into a preset model to train so as to obtain a score prediction model. The method can accurately predict the video scoring data of the movie platform user where the data set is located, and provides a scientific prediction mode for investment of the movie industry.

Description

Construction method of film and television work scoring prediction model and scoring prediction method
Technical Field
The invention relates to the technical field of film and television work scoring prediction, in particular to a construction method of a film and television work scoring prediction model and a scoring prediction method.
Background
In recent years, with the improvement of economic living standard of people, the investment scale of the film and television industry is gradually increased. However, for film investors and distributors, the increasing shooting cost and the intense competitive environment also greatly increase the investment risk of the film; for the audience, the advertisement and marketing means covering the ground also make it more and more difficult to select a movie worth watching, so the researches of recommendation of movie and television works, artist selection and the like related to the movie and the television works become popular topics in the industry.
The scoring prediction technology of the film and television works is a way of digging out the film and television value information in advance, and is also a technical premise of recommending work and research in the field of the film and television works.
At present, on the specific problem of building a film and television work prediction module, how to specifically analyze mass data by means of data mining to screen out high-correlation data becomes a key problem in order to build a film and television work scoring prediction model. The traditional film and television work scoring prediction model takes related data of film and television works acquired by platforms such as film and television communities, internet encyclopedia information and the like as data sources, and a machine learning method is used for analyzing the data and constructing the prediction model. However, due to the fact that the number of data items related to a movie work is large, and initial data obtained from the internet has the characteristics of large data noise, high sparsity and the like, in the modeling process of most movie work scoring prediction models, the data processing capacity is large, and a prediction index system with high correlation degree with public evaluation cannot be selected for building the prediction models, so that the accuracy of prediction results of a traditional movie work scoring prediction model is not high, and the interpretability of the prediction results is low.
Disclosure of Invention
The invention provides a construction method of a film and television work scoring prediction model and a scoring prediction method, which are used for solving the defect of low interpretability of a prediction result caused by low correlation between a prediction item and a target prediction item in film and television work scoring prediction in the prior art and realizing accurate and efficient film and television work scoring prediction.
The invention provides a method for constructing a film and television work scoring prediction model, which comprises the following steps:
collecting attribute data of a video on a movie platform;
removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;
merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;
splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;
and coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.
According to the construction method of the film and television work scoring prediction model, provided by the invention, the attribute data of a video on a film and television platform is collected through a web crawler;
wherein the attribute data includes movie feature attribute data and authoring personnel attribute data.
According to the construction method of the rating prediction model of the film and television works, provided by the invention, the data with the correlation with the video rating smaller than the preset lower limit of the correlation threshold in the attribute data are removed, and the specific method for obtaining the reserved data items comprises the following steps:
constructing a Pearson coefficient between data in the attribute data;
and deleting the data of the attribute data, the Pearson coefficient of the video score of which is less than the lower limit of the correlation threshold value, so as to obtain the reserved data item.
According to the construction method of the scoring prediction model of the film and television works, provided by the invention, the data with the correlation among the data in the reserved data items larger than the preset upper limit of the correlation threshold value are merged according to the merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value, and the specific method comprises the following steps:
taking the data with the Pearson coefficient larger than the upper limit of the correlation threshold among the data in the reserved data items as high-correlation characteristic data to be merged;
selecting two data with the largest Pearson coefficient in the high correlation characteristic data to be merged for characteristic merging;
deleting the data subjected to feature merging in the reserved data item, and repeating the high-correlation feature data judgment and feature merging operation to be merged until no data with the Pearson coefficient larger than the upper limit of the correlation threshold exists in the reserved data item.
According to the construction method of the scoring prediction model of the film and television works, the characteristic vector is coded, and after the characteristic vector is spliced with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, the characteristic vector is input into a preset model to be trained to obtain the scoring prediction model, and the specific method comprises the following steps:
constructing a video feature data set after carrying out single-hot coding processing on the data of the feature vector;
dividing the video characteristic data set into a verification set, a training set and a test set according to a preset proportion;
carrying out hyper-parameter optimization on a preset extreme gradient lifting model by using the verification set;
and putting the training set and the test set into an extreme gradient lifting model optimized by a verification set for training, and performing model evaluation by using a cross-validation method to obtain the scoring prediction model.
According to the construction method of the film and television work scoring prediction model, the verification set carries out hyper-parameter optimization on the extreme gradient lifting model based on a grid search method combining machine learning and a k-fold cross verification method.
The invention also provides a device for constructing the film and television work scoring prediction model, which comprises the following components:
the acquisition module is used for acquiring attribute data of a video on a movie platform;
the first processing module is used for removing data, of which the correlation with the video scores is smaller than a preset lower limit of a correlation threshold value, from the attribute data to obtain a reserved data item;
the second processing module is used for merging the data, obtained by the first processing module, of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlations among the data in the reserved data items are smaller than the upper limit of the correlation threshold value;
the construction module is used for splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data item to construct a feature vector of the video;
and the training module is used for coding the characteristic vector constructed by the construction module, splicing the characteristic vector with data which is originally in the reserved data item and has the correlation smaller than the upper limit of the correlation threshold value, and inputting the data into a preset model for training so as to obtain a scoring prediction model.
The invention also provides a film and television work scoring prediction method applying the film and television work scoring prediction model, which comprises the following steps:
acquiring data used for constructing a feature vector contained in a video to be predicted;
inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.
According to the scoring and predicting method for film and television works provided by the invention, before the data for constructing the feature vector is input into a scoring and predicting model for scoring and predicting, whether the data for constructing the feature vector contains attribute data of creators is also judged: if yes, inputting the data for constructing the feature vector into a scoring prediction model for scoring prediction; if not, initializing the data lacking the attribute data of the creator according to the following formula:
Figure BDA0003217608820000041
wherein DatainitFor the initialized value of the blank data, N is the number of video types contained in the work to be predicted, PworkFor the job position corresponding to the deficient authoring staff,
Figure BDA0003217608820000042
the position in the video history data containing the ith category in the attribute data is PworkThe history data of the corresponding data missing items of the participating creators.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that the processor realizes the steps of the construction method of the scoring prediction model of the film and television works or the scoring prediction method of the film and television works when executing the program.
According to the construction method and the scoring prediction method of the scoring prediction model of the film and television works, the obtained data are subjected to correlation analysis processing, so that the data item for prediction and the feature to be predicted have higher correlation, and the redundant feature and the feature with low prediction gain are removed to the greatest extent; and then, by constructing a video feature vector with high relevance to the score of the video, the construction of an accurate and efficient movie work score prediction model is realized, the problems that the score prediction of a movie platform with high interpretability and high accuracy is difficult to establish due to high data scarcity and various data items are solved, and the accurate and efficient movie work score prediction is realized.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for constructing a film and television work scoring prediction model provided by the invention;
FIG. 2 is a schematic structural diagram of a device for constructing a film and television work scoring prediction model provided by the invention;
FIG. 3 is a schematic flow chart of a movie work scoring prediction method provided by the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for constructing the film and television work scoring prediction model is described below with reference to fig. 1, and is a method for constructing a film and television feature vector based on video data of a film and television community platform and historical data of film creators and learning the video scoring features from the data by a machine learning algorithm so as to realize scoring prediction of the film and television works, and specifically comprises the following steps:
101. collecting attribute data of a video on a movie platform;
in the step, data used for prediction are divided into film characteristic attribute data and creation personnel attribute data participating in film creation according to the meaning of the data, and corresponding data on a film and television platform are respectively collected according to the divided attribute data items;
it is to be understood that the movie feature attribute data can include: the video rating method comprises the following steps of (1) video name, movie showing year, video creation staff list, video theme type and video rating data on a target movie platform;
the author attribute data can include: the system comprises a master exhibition list, a director list, a drama list, artist movie and television community website numbers, scores of recent artists and average scores of historical videos of corresponding artists.
In one embodiment, the name of the video, the year of showing the video, the name list of creators of the video, the subject type of the video, and the rating data items of the video on the target video platform are divided into film feature attribute data according to the target data category available in the selected target video platform, the initially divided film feature attribute data SET is recorded as a first dimension feature data SET _ c1, the related information of the film is recorded by using a crawler technology according to the data requirement of the film feature attribute data, and the data items of the specific data divided by the film feature attribute data are shown in table 1:
table 1 data item meaning correspondence table of film characteristic attribute data
Figure BDA0003217608820000061
Figure BDA0003217608820000071
Dividing the first few main actor lists, director lists and drama list data of the video into attribute items of film creation participation personnel according to the positions of the creative personnel of the obtained video, using a crawler technology according to the data requirements of the attribute data of the film creation participation personnel to obtain relevant basic information of the creative personnel, including but not limited to the names of the creative personnel, the positions of the creative personnel undertaking the creation, the website numbers of artists, the scores of recent artists and historical video average scores thereof, then using the collected data as a second dimension characteristic data item SET SET _ c2, wherein the divided data items of specific data are shown in Table 2:
table 2 data item meaning correspondence table of attribute data of movie creator
Data item Meaning of data item
p_name Names of participating creators
p_nnx Artist recent work scoring
p_tnx Artist History video average score
p_id Artist movie and television community website number
Where p represents different positions of video participants, p includes dir (director), scr (screenplay), act (director) in the actual data of this example, and x represents the x-th artist under the same position, such as the recent work scoring data item named act _ nn2 with lead actor number 2 in the actual data. Other movie community sites may include, but are not limited to, the above job positions, and movie feature attribute data when data collection is performed.
102. And removing the data of which the correlation with the video score is smaller than the preset lower limit of the correlation threshold value in the attribute data to obtain a reserved data item.
It can be understood that, for the collected data used for score prediction of video, the correlation between the data and the video score directly affects the efficiency and accuracy of the score, so that the redundant data and the data with low prediction gain need to be removed in the construction process of the model, thereby improving the efficiency of the model construction.
In one embodiment, first, a video feature data item SET _ cA is constructed by stitching the first dimension feature data item SET _ c1 and the second dimension feature data item SET _ c2, and a pearson coefficient between data in the video feature data item SET _ cA and a video score in a target video platform is calculated, wherein the pearson coefficient can be calculated according to the following formula 1:
Figure BDA0003217608820000081
wherein, PXYThe index is a Pearson coefficient, n is a data sample size, and X, Y is a data sample value of the film characteristic attribute data and the creator attribute data respectively; μ is a mathematical expectation.
According to the calculated Pearson coefficient PXYAs a result, the Pearson coefficient P with the video score is deleted from the SET of video feature data items SET _ cAXYData smaller than a preset lower threshold for correlation, e.g. a lower threshold for correlation of 0.3, are deleted for the pearson coefficient P in the SET cA of the SET of video feature data itemsXYLess than 0.3, resulting in a retained data item, and the data in this retained data item may constitute a SET cA of high correlation initial video feature data items.
103. And merging the data of which the correlation among the data in the reserved data items is greater than the upper limit of a preset correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is less than the upper limit of the correlation threshold value.
In this step, the feature engineering is performed on the prediction data items determined to be reserved in the acquired data, that is, the data with the correlation larger than the preset upper limit of the correlation threshold is merged, that is, the data with certain similarity is merged, so that the redundancy of the data is further reduced, and then the video feature vector is constructed by splicing the data after the feature engineering processing and the reserved prediction data items.
Specifically, according to the result of the correlation analysis in the foregoing steps, the data with the correlation coefficient between data in the obtained high correlation initial video feature data item SET _ cA greater than the preset upper limit of the correlation threshold is subjected to feature merging to form new feature data, so as to form a new feature data SET, for example, the upper limit of the correlation threshold is SET to 0.6.
More specifically, when the features of the data are combined, a pair of data with the largest pearson coefficient is preferably selected as a feature pair to be combined each time, and the constructed new feature SET is recorded as SET _ newFeature.
More specifically, the merging rule for each feature to be merged versus a specific feature vector refers to the following equation 2:
Figure BDA0003217608820000091
wherein T is the data characteristic after merging processing, A, B is the data characteristic needing to be merged, FA and FB are specific sample data needing to be merged, PA,TIs the Pearson coefficient value, P, between the FA and the video score in the target video platformB,TIs the Pearson coefficient value between FB and video score.
Deleting the data which performs the new feature merging from the high correlation initial video feature data item SET _ cA; then, the next group of high correlation data is continuously searched for merging, namely when the upper limit of the correlation threshold is set to 0.6, until all the data with the correlation coefficient larger than 0.6 perform the feature merging operation.
104. And splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video.
Specifically, a video feature vector SET _ cF is constructed by splicing the data in the video feature data item SET _ cA and the merged data.
105. And coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.
Specifically, before training, data in the constructed video feature vector SET _ cF is subjected to a one-hot encoding process to construct a video feature data SET S, and the video feature data SET S is divided into a verification SET S _ evaluation, a training SET S _ train, and a test SET S _ test according to a predetermined ratio, for example, 1:2: 7. And then carrying out hyper-parameter optimization on a preset extreme gradient lifting model, namely the XGboost model, based on a grid search method and a verification set S _ validation of the machine learning field combined with a k-fold cross verification method.
For example, in one embodiment, k-fold cross-validation chooses k 10, and in turn, the following parameters are tuned: learning rate learning _ rate, maximum tree depth max _ depth, maximum step size max _ delta _ step of weight change of a single tree, maximum iteration times n _ estimators of a weak learner, proportion subsample of a subsample for training a model in a whole sample set, proportion colosampie _ byte of feature sampling when establishing a tree, L2 regularization term reg _ alpha of weight, L1 regularization term reg _ lambda of weight, and positive and negative balance weight scale _ pos _ weight.
And then, putting the constructed training set S _ train and the test set S _ test into an XGboost model subjected to hyper-parameter optimization for training, and performing model evaluation by using a cross-validation method to establish a video work scoring prediction model based on the video feature vector.
For example, in one embodiment, a goodness-of-fit R is selected2And the mean square error MSE as the prediction index. In order to better express the effect, a Random Forest algorithm Random Forest, an adaptive enhancement algorithm Adaboost and a proximity algorithm KNN are selected and compared with the extreme gradient boost algorithm XGboost constructed by the method, and the actual prediction model effect is shown in Table 3.
TABLE 3 Algorithm model effect comparison
Algorithm model Mean square error MSE Goodness of fit R2
XGBoost 0.6238 0.7165
Random forest 0.6922 0.6854
Adaboost 0.8117 0.6314
KNN 0.8244 0.6253
As can be seen from table 3, compared with other machine learning algorithms, the score prediction model implemented based on the XGBoost algorithm has better score prediction performance.
The following describes the building device of the scoring prediction model for film and television works provided by the present invention with reference to fig. 2, and the building device of the scoring prediction model for film and television works described below and the building method of the scoring prediction model for film and television works described above can be referred to correspondingly.
The construction apparatus comprises an acquisition module 210, a first processing module 220, a second processing module 230, a construction module 240, and a training module 250; wherein the content of the first and second substances,
the acquisition module 210 is configured to acquire attribute data of a video on a movie platform;
the first processing module 220 is configured to remove data, of the attribute data, whose correlation with the video score is smaller than a preset lower limit of a correlation threshold, to obtain a retained data item;
the second processing module 230 is configured to merge the data, which are obtained by the first processing module and have the correlation between the data in the reserved data items greater than the preset upper correlation threshold limit, according to a merge rule until the correlations between the data in the reserved data items are all smaller than the upper correlation threshold limit;
the construction module 240 is configured to splice the merged data with the original data with a correlation smaller than the upper limit of the correlation threshold in the retained data item to construct a feature vector of the video;
the training module 250 is configured to encode the feature vector constructed by the construction module, and input the encoded feature vector into a preset model for training after splicing with data, which is originally in the retained data item and has a correlation smaller than an upper limit of a correlation threshold, so as to obtain a score prediction model.
Specifically, the attribute data of the video is collected on a target video platform through a collection module, then the data with lower correlation with the video score in the collected attribute data is deleted through a first processing module, namely a part of data with little influence on the score prediction of the target video is removed, the redundancy of the data is reduced, then the data with higher correlation among the rest attribute data, namely the data with more similar data among the data, is merged through a second processing module, the redundancy of the attribute data is further reduced, the merged data and the previously reserved data with higher correlation with the video score are spliced through a construction module to construct the feature vector of the video, and finally the constructed feature vector is encoded through a training module and is spliced with the rest attribute data processed by the second processing module for model training, and then a grading prediction model is obtained.
The device realizes the construction of an accurate and efficient movie work scoring prediction model by constructing movie work feature vectors with high scoring correlation with videos, can effectively solve the problem of predicting movie work scoring in the existing research, and the problem of low interpretability of a prediction result caused by low correlation between a prediction item and a target prediction item in a prediction index system, and realizes accurate and efficient movie work scoring prediction.
Furthermore, the present invention also provides a scoring prediction method for a movie work, which applies the scoring prediction model for a movie work as described above, and is described below with reference to fig. 3, where the scoring prediction method for a movie work includes:
301. acquiring data used for constructing a feature vector contained in a video to be predicted;
302. inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.
In specific implementation, basic data required for constructing a video feature vector is input into a trained scoring prediction model for prediction based on the information of a work to be predicted; according to the method, firstly, the information of a work to be predicted is correspondingly input according to data items contained in a video feature data item SET SET _ cA, in one embodiment, seven movies are selected on a target movie platform for prediction, after the information of the video to be predicted is obtained, historical data existing in a database of a corresponding creator is obtained according to the constructed SET _ cA data item SET.
Furthermore, if the input information of the works to be predicted provides complete attribute item information of film creation personnel, the data of the works to be predicted for constructing the feature vectors can be directly input into a trained film and television work score prediction model for prediction, and the obtained result is the final target prediction score.
If the basic information provided by the data to be predicted has a data missing item, filling the data according to the rule expressed by the following formula 3:
Figure BDA0003217608820000121
wherein DatainitFor the initialized value of the blank data, N is the number of video types contained in the work to be predicted, PworkFor the job position corresponding to the deficient authoring staff,
Figure BDA0003217608820000122
for video history containing ith category in attribute dataPosition in data is PworkThe history data of the corresponding data missing items of the participating creators.
For example, if the basic information provided by the data to be predicted lacks the information for editing the drama, and the movie comprises three video theme types of drama, action and crime, the data filling rule is used for filling the data
Equation 3 can be transformed into equation 4 below:
Figure BDA0003217608820000131
wherein, because the video theme type includes three video theme types of drama, action and crime, N is 3, data (i)P drama editingThe database contains corresponding historical data of series in videos of the ith category, wherein i corresponds to three video theme types of a plot, an action and a crime in the example; finally, the Data obtained by calculationinitThe value serves as an initialization value for predicting the start of data. The preprocessing scheme for predicting the cold start of the data is realized by calculating the initialization value of the blank data, and the problem that the scoring prediction of a film and television platform with high interpretability and high accuracy is difficult to establish due to high data scarcity is further solved.
Specifically, the ratio of the specific prediction score to the film truth score of the selected film is shown in table 4:
TABLE 4 comparison table of film prediction score and true score
Figure BDA0003217608820000132
Figure BDA0003217608820000141
As can be seen from table 4, the method for scoring a product prediction by using the umbra even in the case of missing historical data can still give a more accurate prediction.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a method of building a film and television scoring prediction model or steps of a film and television scoring prediction method.
The construction method of the film and television work scoring prediction model comprises the following steps:
101. collecting attribute data of a video on a movie platform; wherein the attribute data comprises movie feature attribute data and authoring personnel attribute data;
102. removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;
103. merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;
104. splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;
105. and coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.
The movie work scoring prediction method comprises the following steps:
301. acquiring data used for constructing a feature vector contained in a video to be predicted;
302. inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the method for constructing a film and television work scoring prediction model or the method for scoring a film and television work provided by the above methods.
The construction method of the film and television work scoring prediction model comprises the following steps:
101. collecting attribute data of a video on a movie platform; wherein the attribute data comprises movie feature attribute data and authoring personnel attribute data;
102. removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;
103. merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;
104. splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;
105. and coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.
The movie work scoring prediction method comprises the following steps:
301. acquiring data used for constructing a feature vector contained in a video to be predicted;
302. inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.
In still another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to implement a method for constructing a scoring prediction model for a movie or a method for scoring a movie or a method for predicting a movie or a movie when the computer program is executed by a processor.
The construction method of the film and television work scoring prediction model comprises the following steps:
101. collecting attribute data of a video on a movie platform; wherein the attribute data comprises movie feature attribute data and authoring personnel attribute data;
102. removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;
103. merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;
104. splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;
105. and coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.
The movie work scoring prediction method comprises the following steps:
301. acquiring data used for constructing a feature vector contained in a video to be predicted;
302. inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for constructing a film and television work scoring prediction model is characterized by comprising the following steps:
collecting attribute data of a video on a movie platform;
removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;
merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;
splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;
and coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.
2. The method for constructing a film and television work scoring prediction model according to claim 1, wherein the attribute data of the video on the film and television platform is collected through a web crawler;
wherein the attribute data includes movie feature attribute data and authoring personnel attribute data.
3. The method for constructing a scoring prediction model for film and television works according to claim 1, wherein the specific method for removing the data in the attribute data, which has a correlation with the video score smaller than a preset lower limit of a correlation threshold, to obtain the reserved data items comprises:
constructing a Pearson coefficient between data in the attribute data;
and deleting the data of the attribute data, the Pearson coefficient of the video score of which is less than the lower limit of the correlation threshold value, so as to obtain the reserved data item.
4. The method for constructing a scoring prediction model for film and television works according to claim 3, wherein the specific method for merging the data in the reserved data items whose correlations between data are greater than the preset upper threshold of correlation values according to a merging rule until the correlations between data in the reserved data items are less than the upper threshold of correlation values is as follows:
taking the data with the Pearson coefficient larger than the upper limit of the correlation threshold among the data in the reserved data items as high-correlation characteristic data to be merged;
selecting two data with the largest Pearson coefficient in the high correlation characteristic data to be merged for characteristic merging;
deleting the data subjected to feature merging in the reserved data item, and repeating the high-correlation feature data judgment and feature merging operation to be merged until no data with the Pearson coefficient larger than the upper limit of the correlation threshold exists in the reserved data item.
5. The method for constructing a scoring prediction model for film and television works as claimed in claim 1, wherein the specific method for encoding the feature vectors, splicing the encoded feature vectors with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data items, and inputting the data into a preset model to train to obtain the scoring prediction model comprises the following steps:
constructing a video feature data set after carrying out single-hot coding processing on the data of the feature vector;
dividing the video characteristic data set into a verification set, a training set and a test set according to a preset proportion;
carrying out hyper-parameter optimization on a preset extreme gradient lifting model by using the verification set;
and putting the training set and the test set into an extreme gradient lifting model optimized by a verification set for training, and performing model evaluation by using a cross-validation method to obtain the scoring prediction model.
6. The method for constructing a movie work scoring prediction model according to claim 5, wherein the extreme gradient boosting model is optimized by using the validation set based on a grid search method combining machine learning and k-fold cross validation.
7. A building device of a film and television work scoring prediction model is characterized by comprising:
the acquisition module is used for acquiring attribute data of a video on a movie platform;
the first processing module is used for removing data, of which the correlation with the video scores is smaller than a preset lower limit of a correlation threshold value, from the attribute data to obtain a reserved data item;
the second processing module is used for merging the data, obtained by the first processing module, of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlations among the data in the reserved data items are smaller than the upper limit of the correlation threshold value;
the construction module is used for splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data item to construct a feature vector of the video;
and the training module is used for coding the characteristic vector constructed by the construction module, splicing the characteristic vector with data which is originally in the reserved data item and has the correlation smaller than the upper limit of the correlation threshold value, and inputting the data into a preset model for training so as to obtain a scoring prediction model.
8. A scoring prediction method for a movie work using the scoring prediction model for a movie work according to any one of claims 1 to 6, comprising:
acquiring data used for constructing a feature vector contained in a video to be predicted;
inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.
9. The scoring prediction method for film and television works as claimed in claim 8, wherein before inputting the data for constructing feature vectors into a scoring prediction model for scoring prediction, it is further determined whether the data for constructing feature vectors contains creator attribute data: if yes, inputting the data for constructing the feature vector into a scoring prediction model for scoring prediction; if not, initializing the data lacking the attribute data of the creator according to the following formula:
Figure FDA0003217608810000031
wherein DatainitFor the initialized value of the blank data, N is the number of video types contained in the work to be predicted, PworkFor the job position corresponding to the deficient authoring staff,
Figure FDA0003217608810000032
the position in the video history data containing the ith category in the attribute data is PworkThe history data of the corresponding data missing items of the participating creators.
10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for constructing the scoring prediction model for film and television works according to any one of claims 1 to 6 or the scoring prediction method for film and television works according to any one of claims 8 to 9 when executing the program.
CN202110948252.7A 2021-08-18 2021-08-18 Construction method of film and television work score prediction model and score prediction method Active CN113705873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110948252.7A CN113705873B (en) 2021-08-18 2021-08-18 Construction method of film and television work score prediction model and score prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110948252.7A CN113705873B (en) 2021-08-18 2021-08-18 Construction method of film and television work score prediction model and score prediction method

Publications (2)

Publication Number Publication Date
CN113705873A true CN113705873A (en) 2021-11-26
CN113705873B CN113705873B (en) 2024-01-19

Family

ID=78653211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110948252.7A Active CN113705873B (en) 2021-08-18 2021-08-18 Construction method of film and television work score prediction model and score prediction method

Country Status (1)

Country Link
CN (1) CN113705873B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462383A (en) * 2014-12-10 2015-03-25 山东科技大学 Movie recommendation method based on feedback of users' various behaviors
US20150093737A1 (en) * 2012-10-31 2015-04-02 Sk Telecom Co., Ltd. Apparatus and method for automatic scoring
CN106548206A (en) * 2016-10-27 2017-03-29 太原理工大学 Multi-modal nuclear magnetic resonance image data classification method based on minimum spanning tree
CN106980909A (en) * 2017-03-30 2017-07-25 重庆大学 A kind of box office receipts Forecasting Methodology based on Fuzzy Linear Regression
CN107025606A (en) * 2017-03-29 2017-08-08 西安电子科技大学 The item recommendation method of score data and trusting relationship is combined in a kind of social networks
CN107038494A (en) * 2017-03-06 2017-08-11 中国电影科学技术研究所 A kind of movie theatre box office Forecasting Methodology and device
CN108764972A (en) * 2018-05-08 2018-11-06 中国电影科学技术研究所 A kind of film box office prediction technique and device
CN109766950A (en) * 2019-01-18 2019-05-17 东北大学 A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM
CN111143425A (en) * 2019-09-16 2020-05-12 昆明理工大学 XGboost-based high-dimensional data set adaptive feature selection method
CN111243751A (en) * 2020-01-17 2020-06-05 河北工业大学 Heart disease prediction method based on dual feature selection and XGboost algorithm
CN112562312A (en) * 2020-10-21 2021-03-26 浙江工业大学 GraphSAGE traffic network data prediction method based on fusion characteristics
US20210104321A1 (en) * 2018-11-15 2021-04-08 Ampel Biosolutions, Llc Machine learning disease prediction and treatment prioritization

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150093737A1 (en) * 2012-10-31 2015-04-02 Sk Telecom Co., Ltd. Apparatus and method for automatic scoring
CN104462383A (en) * 2014-12-10 2015-03-25 山东科技大学 Movie recommendation method based on feedback of users' various behaviors
CN106548206A (en) * 2016-10-27 2017-03-29 太原理工大学 Multi-modal nuclear magnetic resonance image data classification method based on minimum spanning tree
CN107038494A (en) * 2017-03-06 2017-08-11 中国电影科学技术研究所 A kind of movie theatre box office Forecasting Methodology and device
CN107025606A (en) * 2017-03-29 2017-08-08 西安电子科技大学 The item recommendation method of score data and trusting relationship is combined in a kind of social networks
CN106980909A (en) * 2017-03-30 2017-07-25 重庆大学 A kind of box office receipts Forecasting Methodology based on Fuzzy Linear Regression
CN108764972A (en) * 2018-05-08 2018-11-06 中国电影科学技术研究所 A kind of film box office prediction technique and device
US20210104321A1 (en) * 2018-11-15 2021-04-08 Ampel Biosolutions, Llc Machine learning disease prediction and treatment prioritization
CN109766950A (en) * 2019-01-18 2019-05-17 东北大学 A kind of industrial user's short-term load forecasting method based on form cluster and LightGBM
CN111143425A (en) * 2019-09-16 2020-05-12 昆明理工大学 XGboost-based high-dimensional data set adaptive feature selection method
CN111243751A (en) * 2020-01-17 2020-06-05 河北工业大学 Heart disease prediction method based on dual feature selection and XGboost algorithm
CN112562312A (en) * 2020-10-21 2021-03-26 浙江工业大学 GraphSAGE traffic network data prediction method based on fusion characteristics

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
丁聪;倪少权;吕红霞;: "基于梯度提升的城市轨道交通客流量预测分析", 城市轨道交通研究, no. 09 *
刘建国;周涛;郭强;汪秉宏;: "个性化推荐***评价方法综述", 复杂***与复杂性科学, no. 03 *
张怡文;敖希琴;时培俊;郭傲东;费久龙;陈家丽;: "基于Pearson相关指标的BP神经网络PM2.5预测模型", 青岛大学学报(自然科学版), no. 02 *
王明佳;韩景倜;: "基于多维度用户相似性度量的协同过滤推荐算法", 统计与决策, no. 09 *
陆君之: "基于随机森林回归算法的电影评分预测模型", 《江苏通信》, vol. 34, no. 01, pages 75 - 77 *

Also Published As

Publication number Publication date
CN113705873B (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN112765486B (en) Knowledge graph fused attention mechanism movie recommendation method
CN110674850A (en) Image description generation method based on attention mechanism
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
CN108647226B (en) Hybrid recommendation method based on variational automatic encoder
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN110889450B (en) Super-parameter tuning and model construction method and device
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN114048354B (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
CN112085525A (en) User network purchasing behavior prediction research method based on hybrid model
CN113920516B (en) Calligraphy character skeleton matching method and system based on twin neural network
CN114780831A (en) Sequence recommendation method and system based on Transformer
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN111639230B (en) Similar video screening method, device, equipment and storage medium
CN113553831A (en) Method and system for analyzing aspect level emotion based on BAGCNN model
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN108446605A (en) Double interbehavior recognition methods under complex background
Zhai Research on image recognition based on deep learning technology
CN112905906B (en) Recommendation method and system fusing local collaboration and feature intersection
CN117370650A (en) Cloud computing data recommendation method based on service combination hypergraph convolutional network
CN113569018A (en) Question and answer pair mining method and device
CN117237479A (en) Product style automatic generation method, device and equipment based on diffusion model
CN107944045B (en) Image search method and system based on t distribution Hash
CN113705873B (en) Construction method of film and television work score prediction model and score prediction method
CN114969511A (en) Content recommendation method, device and medium based on fragments
Rui et al. Data Reconstruction based on supervised deep auto-encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant