CN113705873A

CN113705873A - Construction method of film and television work scoring prediction model and scoring prediction method

Info

Publication number: CN113705873A
Application number: CN202110948252.7A
Authority: CN
Inventors: 张树武; 刘杰; 王艺颖
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-11-26
Anticipated expiration: 2041-08-18
Also published as: CN113705873B

Abstract

The invention provides a construction method of a film and television work scoring prediction model and a scoring prediction method, wherein the construction method comprises the following steps: collecting attribute data of a video on a movie platform; removing data of which the correlation with the video score is smaller than the lower limit of the correlation threshold value in the attribute data to obtain a reserved data item; merging the data with the correlation among the data in the reserved data items larger than the upper limit of the correlation threshold value according to a merging rule; splicing the merged data and the original data which is smaller than the upper limit of the correlation threshold value and is in the reserved data item to construct a feature vector of the video; and carrying out unique hot coding on the characteristic vector, splicing the characteristic vector with original data which is smaller than the upper limit of the correlation threshold and is originally reserved in the data item, and inputting the data into a preset model to train so as to obtain a score prediction model. The method can accurately predict the video scoring data of the movie platform user where the data set is located, and provides a scientific prediction mode for investment of the movie industry.

Description

Construction method of film and television work scoring prediction model and scoring prediction method

Technical Field

The invention relates to the technical field of film and television work scoring prediction, in particular to a construction method of a film and television work scoring prediction model and a scoring prediction method.

Background

In recent years, with the improvement of economic living standard of people, the investment scale of the film and television industry is gradually increased. However, for film investors and distributors, the increasing shooting cost and the intense competitive environment also greatly increase the investment risk of the film; for the audience, the advertisement and marketing means covering the ground also make it more and more difficult to select a movie worth watching, so the researches of recommendation of movie and television works, artist selection and the like related to the movie and the television works become popular topics in the industry.

The scoring prediction technology of the film and television works is a way of digging out the film and television value information in advance, and is also a technical premise of recommending work and research in the field of the film and television works.

At present, on the specific problem of building a film and television work prediction module, how to specifically analyze mass data by means of data mining to screen out high-correlation data becomes a key problem in order to build a film and television work scoring prediction model. The traditional film and television work scoring prediction model takes related data of film and television works acquired by platforms such as film and television communities, internet encyclopedia information and the like as data sources, and a machine learning method is used for analyzing the data and constructing the prediction model. However, due to the fact that the number of data items related to a movie work is large, and initial data obtained from the internet has the characteristics of large data noise, high sparsity and the like, in the modeling process of most movie work scoring prediction models, the data processing capacity is large, and a prediction index system with high correlation degree with public evaluation cannot be selected for building the prediction models, so that the accuracy of prediction results of a traditional movie work scoring prediction model is not high, and the interpretability of the prediction results is low.

Disclosure of Invention

The invention provides a construction method of a film and television work scoring prediction model and a scoring prediction method, which are used for solving the defect of low interpretability of a prediction result caused by low correlation between a prediction item and a target prediction item in film and television work scoring prediction in the prior art and realizing accurate and efficient film and television work scoring prediction.

The invention provides a method for constructing a film and television work scoring prediction model, which comprises the following steps:

collecting attribute data of a video on a movie platform;

removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;

merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;

splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;

and coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.

According to the construction method of the film and television work scoring prediction model, provided by the invention, the attribute data of a video on a film and television platform is collected through a web crawler;

wherein the attribute data includes movie feature attribute data and authoring personnel attribute data.

According to the construction method of the rating prediction model of the film and television works, provided by the invention, the data with the correlation with the video rating smaller than the preset lower limit of the correlation threshold in the attribute data are removed, and the specific method for obtaining the reserved data items comprises the following steps:

constructing a Pearson coefficient between data in the attribute data;

and deleting the data of the attribute data, the Pearson coefficient of the video score of which is less than the lower limit of the correlation threshold value, so as to obtain the reserved data item.

According to the construction method of the scoring prediction model of the film and television works, provided by the invention, the data with the correlation among the data in the reserved data items larger than the preset upper limit of the correlation threshold value are merged according to the merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value, and the specific method comprises the following steps:

taking the data with the Pearson coefficient larger than the upper limit of the correlation threshold among the data in the reserved data items as high-correlation characteristic data to be merged;

selecting two data with the largest Pearson coefficient in the high correlation characteristic data to be merged for characteristic merging;

deleting the data subjected to feature merging in the reserved data item, and repeating the high-correlation feature data judgment and feature merging operation to be merged until no data with the Pearson coefficient larger than the upper limit of the correlation threshold exists in the reserved data item.

According to the construction method of the scoring prediction model of the film and television works, the characteristic vector is coded, and after the characteristic vector is spliced with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, the characteristic vector is input into a preset model to be trained to obtain the scoring prediction model, and the specific method comprises the following steps:

constructing a video feature data set after carrying out single-hot coding processing on the data of the feature vector;

dividing the video characteristic data set into a verification set, a training set and a test set according to a preset proportion;

carrying out hyper-parameter optimization on a preset extreme gradient lifting model by using the verification set;

and putting the training set and the test set into an extreme gradient lifting model optimized by a verification set for training, and performing model evaluation by using a cross-validation method to obtain the scoring prediction model.

According to the construction method of the film and television work scoring prediction model, the verification set carries out hyper-parameter optimization on the extreme gradient lifting model based on a grid search method combining machine learning and a k-fold cross verification method.

The invention also provides a device for constructing the film and television work scoring prediction model, which comprises the following components:

the acquisition module is used for acquiring attribute data of a video on a movie platform;

the first processing module is used for removing data, of which the correlation with the video scores is smaller than a preset lower limit of a correlation threshold value, from the attribute data to obtain a reserved data item;

the second processing module is used for merging the data, obtained by the first processing module, of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlations among the data in the reserved data items are smaller than the upper limit of the correlation threshold value;

the construction module is used for splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data item to construct a feature vector of the video;

and the training module is used for coding the characteristic vector constructed by the construction module, splicing the characteristic vector with data which is originally in the reserved data item and has the correlation smaller than the upper limit of the correlation threshold value, and inputting the data into a preset model for training so as to obtain a scoring prediction model.

The invention also provides a film and television work scoring prediction method applying the film and television work scoring prediction model, which comprises the following steps:

acquiring data used for constructing a feature vector contained in a video to be predicted;

inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.

According to the scoring and predicting method for film and television works provided by the invention, before the data for constructing the feature vector is input into a scoring and predicting model for scoring and predicting, whether the data for constructing the feature vector contains attribute data of creators is also judged: if yes, inputting the data for constructing the feature vector into a scoring prediction model for scoring prediction; if not, initializing the data lacking the attribute data of the creator according to the following formula:

wherein Data_initFor the initialized value of the blank data, N is the number of video types contained in the work to be predicted, P_workFor the job position corresponding to the deficient authoring staff,

the position in the video history data containing the ith category in the attribute data is P_workThe history data of the corresponding data missing items of the participating creators.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that the processor realizes the steps of the construction method of the scoring prediction model of the film and television works or the scoring prediction method of the film and television works when executing the program.

According to the construction method and the scoring prediction method of the scoring prediction model of the film and television works, the obtained data are subjected to correlation analysis processing, so that the data item for prediction and the feature to be predicted have higher correlation, and the redundant feature and the feature with low prediction gain are removed to the greatest extent; and then, by constructing a video feature vector with high relevance to the score of the video, the construction of an accurate and efficient movie work score prediction model is realized, the problems that the score prediction of a movie platform with high interpretability and high accuracy is difficult to establish due to high data scarcity and various data items are solved, and the accurate and efficient movie work score prediction is realized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for constructing a film and television work scoring prediction model provided by the invention;

FIG. 2 is a schematic structural diagram of a device for constructing a film and television work scoring prediction model provided by the invention;

FIG. 3 is a schematic flow chart of a movie work scoring prediction method provided by the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method for constructing the film and television work scoring prediction model is described below with reference to fig. 1, and is a method for constructing a film and television feature vector based on video data of a film and television community platform and historical data of film creators and learning the video scoring features from the data by a machine learning algorithm so as to realize scoring prediction of the film and television works, and specifically comprises the following steps:

101. collecting attribute data of a video on a movie platform;

in the step, data used for prediction are divided into film characteristic attribute data and creation personnel attribute data participating in film creation according to the meaning of the data, and corresponding data on a film and television platform are respectively collected according to the divided attribute data items;

it is to be understood that the movie feature attribute data can include: the video rating method comprises the following steps of (1) video name, movie showing year, video creation staff list, video theme type and video rating data on a target movie platform;

the author attribute data can include: the system comprises a master exhibition list, a director list, a drama list, artist movie and television community website numbers, scores of recent artists and average scores of historical videos of corresponding artists.

In one embodiment, the name of the video, the year of showing the video, the name list of creators of the video, the subject type of the video, and the rating data items of the video on the target video platform are divided into film feature attribute data according to the target data category available in the selected target video platform, the initially divided film feature attribute data SET is recorded as a first dimension feature data SET _ c1, the related information of the film is recorded by using a crawler technology according to the data requirement of the film feature attribute data, and the data items of the specific data divided by the film feature attribute data are shown in table 1:

table 1 data item meaning correspondence table of film characteristic attribute data

Dividing the first few main actor lists, director lists and drama list data of the video into attribute items of film creation participation personnel according to the positions of the creative personnel of the obtained video, using a crawler technology according to the data requirements of the attribute data of the film creation participation personnel to obtain relevant basic information of the creative personnel, including but not limited to the names of the creative personnel, the positions of the creative personnel undertaking the creation, the website numbers of artists, the scores of recent artists and historical video average scores thereof, then using the collected data as a second dimension characteristic data item SET SET _ c2, wherein the divided data items of specific data are shown in Table 2:

table 2 data item meaning correspondence table of attribute data of movie creator

Data item	Meaning of data item
		p_name	Names of participating creators
p_nnx	Artist recent work scoring
		p_tnx	Artist History video average score
p_id	Artist movie and television community website number

Where p represents different positions of video participants, p includes dir (director), scr (screenplay), act (director) in the actual data of this example, and x represents the x-th artist under the same position, such as the recent work scoring data item named act _ nn2 with lead actor number 2 in the actual data. Other movie community sites may include, but are not limited to, the above job positions, and movie feature attribute data when data collection is performed.

102. And removing the data of which the correlation with the video score is smaller than the preset lower limit of the correlation threshold value in the attribute data to obtain a reserved data item.

It can be understood that, for the collected data used for score prediction of video, the correlation between the data and the video score directly affects the efficiency and accuracy of the score, so that the redundant data and the data with low prediction gain need to be removed in the construction process of the model, thereby improving the efficiency of the model construction.

In one embodiment, first, a video feature data item SET _ cA is constructed by stitching the first dimension feature data item SET _ c1 and the second dimension feature data item SET _ c2, and a pearson coefficient between data in the video feature data item SET _ cA and a video score in a target video platform is calculated, wherein the pearson coefficient can be calculated according to the following formula 1:

wherein, P_XYThe index is a Pearson coefficient, n is a data sample size, and X, Y is a data sample value of the film characteristic attribute data and the creator attribute data respectively; μ is a mathematical expectation.

According to the calculated Pearson coefficient P_XYAs a result, the Pearson coefficient P with the video score is deleted from the SET of video feature data items SET _ cA_XYData smaller than a preset lower threshold for correlation, e.g. a lower threshold for correlation of 0.3, are deleted for the pearson coefficient P in the SET cA of the SET of video feature data items_XYLess than 0.3, resulting in a retained data item, and the data in this retained data item may constitute a SET cA of high correlation initial video feature data items.

103. And merging the data of which the correlation among the data in the reserved data items is greater than the upper limit of a preset correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is less than the upper limit of the correlation threshold value.

In this step, the feature engineering is performed on the prediction data items determined to be reserved in the acquired data, that is, the data with the correlation larger than the preset upper limit of the correlation threshold is merged, that is, the data with certain similarity is merged, so that the redundancy of the data is further reduced, and then the video feature vector is constructed by splicing the data after the feature engineering processing and the reserved prediction data items.

Specifically, according to the result of the correlation analysis in the foregoing steps, the data with the correlation coefficient between data in the obtained high correlation initial video feature data item SET _ cA greater than the preset upper limit of the correlation threshold is subjected to feature merging to form new feature data, so as to form a new feature data SET, for example, the upper limit of the correlation threshold is SET to 0.6.

More specifically, when the features of the data are combined, a pair of data with the largest pearson coefficient is preferably selected as a feature pair to be combined each time, and the constructed new feature SET is recorded as SET _ newFeature.

More specifically, the merging rule for each feature to be merged versus a specific feature vector refers to the following equation 2:

wherein T is the data characteristic after merging processing, A, B is the data characteristic needing to be merged, FA and FB are specific sample data needing to be merged, P_A,TIs the Pearson coefficient value, P, between the FA and the video score in the target video platform_B,TIs the Pearson coefficient value between FB and video score.

Deleting the data which performs the new feature merging from the high correlation initial video feature data item SET _ cA; then, the next group of high correlation data is continuously searched for merging, namely when the upper limit of the correlation threshold is set to 0.6, until all the data with the correlation coefficient larger than 0.6 perform the feature merging operation.

104. And splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video.

Specifically, a video feature vector SET _ cF is constructed by splicing the data in the video feature data item SET _ cA and the merged data.

105. And coding the feature vector, splicing the feature vector with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data item, and inputting the data into a preset model to train so as to obtain a score prediction model.

Specifically, before training, data in the constructed video feature vector SET _ cF is subjected to a one-hot encoding process to construct a video feature data SET S, and the video feature data SET S is divided into a verification SET S _ evaluation, a training SET S _ train, and a test SET S _ test according to a predetermined ratio, for example, 1:2: 7. And then carrying out hyper-parameter optimization on a preset extreme gradient lifting model, namely the XGboost model, based on a grid search method and a verification set S _ validation of the machine learning field combined with a k-fold cross verification method.

For example, in one embodiment, k-fold cross-validation chooses k 10, and in turn, the following parameters are tuned: learning rate learning _ rate, maximum tree depth max _ depth, maximum step size max _ delta _ step of weight change of a single tree, maximum iteration times n _ estimators of a weak learner, proportion subsample of a subsample for training a model in a whole sample set, proportion colosampie _ byte of feature sampling when establishing a tree, L2 regularization term reg _ alpha of weight, L1 regularization term reg _ lambda of weight, and positive and negative balance weight scale _ pos _ weight.

And then, putting the constructed training set S _ train and the test set S _ test into an XGboost model subjected to hyper-parameter optimization for training, and performing model evaluation by using a cross-validation method to establish a video work scoring prediction model based on the video feature vector.

For example, in one embodiment, a goodness-of-fit R is selected²And the mean square error MSE as the prediction index. In order to better express the effect, a Random Forest algorithm Random Forest, an adaptive enhancement algorithm Adaboost and a proximity algorithm KNN are selected and compared with the extreme gradient boost algorithm XGboost constructed by the method, and the actual prediction model effect is shown in Table 3.

TABLE 3 Algorithm model effect comparison

Algorithm model	Mean square error MSE	Goodness of fit R²
			XGBoost	0.6238	0.7165
Random forest	0.6922	0.6854
			Adaboost	0.8117	0.6314
KNN	0.8244	0.6253

As can be seen from table 3, compared with other machine learning algorithms, the score prediction model implemented based on the XGBoost algorithm has better score prediction performance.

The following describes the building device of the scoring prediction model for film and television works provided by the present invention with reference to fig. 2, and the building device of the scoring prediction model for film and television works described below and the building method of the scoring prediction model for film and television works described above can be referred to correspondingly.

The construction apparatus comprises an acquisition module 210, a first processing module 220, a second processing module 230, a construction module 240, and a training module 250; wherein the content of the first and second substances,

the acquisition module 210 is configured to acquire attribute data of a video on a movie platform;

the first processing module 220 is configured to remove data, of the attribute data, whose correlation with the video score is smaller than a preset lower limit of a correlation threshold, to obtain a retained data item;

the second processing module 230 is configured to merge the data, which are obtained by the first processing module and have the correlation between the data in the reserved data items greater than the preset upper correlation threshold limit, according to a merge rule until the correlations between the data in the reserved data items are all smaller than the upper correlation threshold limit;

the construction module 240 is configured to splice the merged data with the original data with a correlation smaller than the upper limit of the correlation threshold in the retained data item to construct a feature vector of the video;

the training module 250 is configured to encode the feature vector constructed by the construction module, and input the encoded feature vector into a preset model for training after splicing with data, which is originally in the retained data item and has a correlation smaller than an upper limit of a correlation threshold, so as to obtain a score prediction model.

Specifically, the attribute data of the video is collected on a target video platform through a collection module, then the data with lower correlation with the video score in the collected attribute data is deleted through a first processing module, namely a part of data with little influence on the score prediction of the target video is removed, the redundancy of the data is reduced, then the data with higher correlation among the rest attribute data, namely the data with more similar data among the data, is merged through a second processing module, the redundancy of the attribute data is further reduced, the merged data and the previously reserved data with higher correlation with the video score are spliced through a construction module to construct the feature vector of the video, and finally the constructed feature vector is encoded through a training module and is spliced with the rest attribute data processed by the second processing module for model training, and then a grading prediction model is obtained.

The device realizes the construction of an accurate and efficient movie work scoring prediction model by constructing movie work feature vectors with high scoring correlation with videos, can effectively solve the problem of predicting movie work scoring in the existing research, and the problem of low interpretability of a prediction result caused by low correlation between a prediction item and a target prediction item in a prediction index system, and realizes accurate and efficient movie work scoring prediction.

Furthermore, the present invention also provides a scoring prediction method for a movie work, which applies the scoring prediction model for a movie work as described above, and is described below with reference to fig. 3, where the scoring prediction method for a movie work includes:

301. acquiring data used for constructing a feature vector contained in a video to be predicted;

302. inputting the data for constructing the feature vector into a score prediction model for score prediction, and outputting a score corresponding to the video to be predicted.

In specific implementation, basic data required for constructing a video feature vector is input into a trained scoring prediction model for prediction based on the information of a work to be predicted; according to the method, firstly, the information of a work to be predicted is correspondingly input according to data items contained in a video feature data item SET SET _ cA, in one embodiment, seven movies are selected on a target movie platform for prediction, after the information of the video to be predicted is obtained, historical data existing in a database of a corresponding creator is obtained according to the constructed SET _ cA data item SET.

Furthermore, if the input information of the works to be predicted provides complete attribute item information of film creation personnel, the data of the works to be predicted for constructing the feature vectors can be directly input into a trained film and television work score prediction model for prediction, and the obtained result is the final target prediction score.

If the basic information provided by the data to be predicted has a data missing item, filling the data according to the rule expressed by the following formula 3:

for video history containing ith category in attribute dataPosition in data is P_workThe history data of the corresponding data missing items of the participating creators.

For example, if the basic information provided by the data to be predicted lacks the information for editing the drama, and the movie comprises three video theme types of drama, action and crime, the data filling rule is used for filling the data

Equation 3 can be transformed into equation 4 below:

wherein, because the video theme type includes three video theme types of drama, action and crime, N is 3, data (i)_{P drama editing}The database contains corresponding historical data of series in videos of the ith category, wherein i corresponds to three video theme types of a plot, an action and a crime in the example; finally, the Data obtained by calculation_initThe value serves as an initialization value for predicting the start of data. The preprocessing scheme for predicting the cold start of the data is realized by calculating the initialization value of the blank data, and the problem that the scoring prediction of a film and television platform with high interpretability and high accuracy is difficult to establish due to high data scarcity is further solved.

Specifically, the ratio of the specific prediction score to the film truth score of the selected film is shown in table 4:

TABLE 4 comparison table of film prediction score and true score

As can be seen from table 4, the method for scoring a product prediction by using the umbra even in the case of missing historical data can still give a more accurate prediction.

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a method of building a film and television scoring prediction model or steps of a film and television scoring prediction method.

The construction method of the film and television work scoring prediction model comprises the following steps:

101. collecting attribute data of a video on a movie platform; wherein the attribute data comprises movie feature attribute data and authoring personnel attribute data;

102. removing data of which the correlation with the video score is smaller than a preset lower limit of a correlation threshold value from the attribute data to obtain a reserved data item;

103. merging the data of which the correlation among the data in the reserved data items is greater than the preset upper limit of the correlation threshold value according to a merging rule until the correlation among the data in the reserved data items is smaller than the upper limit of the correlation threshold value;

104. splicing the merged data and the original data with correlation smaller than the upper limit of the correlation threshold in the reserved data items to construct a feature vector of the video;

The movie work scoring prediction method comprises the following steps:

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the method for constructing a film and television work scoring prediction model or the method for scoring a film and television work provided by the above methods.

The movie work scoring prediction method comprises the following steps:

In still another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to implement a method for constructing a scoring prediction model for a movie or a method for scoring a movie or a method for predicting a movie or a movie when the computer program is executed by a processor.

The movie work scoring prediction method comprises the following steps:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for constructing a film and television work scoring prediction model is characterized by comprising the following steps:

collecting attribute data of a video on a movie platform;

2. The method for constructing a film and television work scoring prediction model according to claim 1, wherein the attribute data of the video on the film and television platform is collected through a web crawler;

3. The method for constructing a scoring prediction model for film and television works according to claim 1, wherein the specific method for removing the data in the attribute data, which has a correlation with the video score smaller than a preset lower limit of a correlation threshold, to obtain the reserved data items comprises:

constructing a Pearson coefficient between data in the attribute data;

4. The method for constructing a scoring prediction model for film and television works according to claim 3, wherein the specific method for merging the data in the reserved data items whose correlations between data are greater than the preset upper threshold of correlation values according to a merging rule until the correlations between data in the reserved data items are less than the upper threshold of correlation values is as follows:

5. The method for constructing a scoring prediction model for film and television works as claimed in claim 1, wherein the specific method for encoding the feature vectors, splicing the encoded feature vectors with the original data with the correlation smaller than the upper limit of the correlation threshold in the reserved data items, and inputting the data into a preset model to train to obtain the scoring prediction model comprises the following steps:

6. The method for constructing a movie work scoring prediction model according to claim 5, wherein the extreme gradient boosting model is optimized by using the validation set based on a grid search method combining machine learning and k-fold cross validation.

7. A building device of a film and television work scoring prediction model is characterized by comprising:

8. A scoring prediction method for a movie work using the scoring prediction model for a movie work according to any one of claims 1 to 6, comprising:

9. The scoring prediction method for film and television works as claimed in claim 8, wherein before inputting the data for constructing feature vectors into a scoring prediction model for scoring prediction, it is further determined whether the data for constructing feature vectors contains creator attribute data: if yes, inputting the data for constructing the feature vector into a scoring prediction model for scoring prediction; if not, initializing the data lacking the attribute data of the creator according to the following formula:

10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for constructing the scoring prediction model for film and television works according to any one of claims 1 to 6 or the scoring prediction method for film and television works according to any one of claims 8 to 9 when executing the program.