WO2020156171A1 - Video publishing method, apparatus and device, and storage medium - Google Patents

Video publishing method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2020156171A1
WO2020156171A1 PCT/CN2020/072191 CN2020072191W WO2020156171A1 WO 2020156171 A1 WO2020156171 A1 WO 2020156171A1 CN 2020072191 W CN2020072191 W CN 2020072191W WO 2020156171 A1 WO2020156171 A1 WO 2020156171A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
cover
candidate
training
attraction
Prior art date
Application number
PCT/CN2020/072191
Other languages
French (fr)
Chinese (zh)
Inventor
张树业
王俊东
张壮辉
梁德澎
岑洪杰
梁柱锦
Original Assignee
广州市百果园信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市百果园信息技术有限公司 filed Critical 广州市百果园信息技术有限公司
Publication of WO2020156171A1 publication Critical patent/WO2020156171A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4666Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Definitions

  • the embodiments of the present application relate to the field of Internet technology, for example, to a video publishing method, device, device, and storage medium.
  • the embodiments of the present application provide a video publishing method, device, equipment, and storage medium, which realize the intelligent determination of the video cover of the video to be published, improve the correlation between the video cover and user preferences, and improve the post-publishing of the video to be published. Click-through rate and number of views.
  • the embodiment of the application provides a video publishing method, which includes:
  • An embodiment of the present application provides a video publishing device, which includes:
  • Candidate cover acquisition module configured to acquire two or more video frames as candidate covers in the video to be published
  • the feature information determining module is configured to input each video frame as a candidate cover into a different type of neural network model constructed in advance to obtain different attraction factors corresponding to the different type of neural network model for the candidate cover Feature information below, where the different attractive factors represent different factors that attract users to click or watch the video to be released;
  • a feature vector determining module configured to determine the feature vector of the candidate cover according to the feature information of the candidate cover under the different attraction factors
  • the video cover determination module is configured to determine the video cover of the to-be-published video according to the feature vectors corresponding to the candidate covers corresponding to two or more video frames.
  • An embodiment of the present application provides a device, which includes:
  • One or more processors are One or more processors;
  • Storage device set to store one or more programs
  • the one or more processors implement the video publishing method described in any embodiment of the present application.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the video publishing method described in any embodiment of the present application is implemented.
  • FIG. 1A is a flowchart of a video publishing method provided in Embodiment 1 of this application;
  • FIG. 1B is a schematic diagram of the principle of a video publishing process provided in Embodiment 1 of this application;
  • FIG. 2 is a schematic diagram of the construction process of a different type of neural network model and sorting model provided in the second embodiment of the application;
  • FIG. 3 is a schematic diagram of the principle of a neural network model training process provided in Embodiment 3 of the application;
  • FIG. 4 is a schematic diagram of the principle of a video publishing process provided in Embodiment 4 of this application;
  • FIG. 5 is a schematic structural diagram of a video publishing device provided in Embodiment 5 of this application.
  • FIG. 6 is a schematic structural diagram of a device provided in Embodiment 6 of this application.
  • the embodiments of this application mainly focus on different factors in the video cover that attract users to click or watch the video, and analyze the feature information of candidate covers under different attractive factors, so as to describe the characteristics of each candidate cover from multiple dimensions corresponding to different attractive factors.
  • FIG. 1A is a flowchart of a video publishing method provided in Embodiment 1 of this application.
  • This embodiment can be applied to any smart terminal configured with video applications capable of publishing multiple types of videos.
  • the solution of the embodiment of this application can be applied to the problem of how to intelligently determine the video cover of the video to be published.
  • the video publishing method provided in this embodiment can be executed by the video publishing device provided in the embodiment of this application.
  • the device can be implemented by software and/or hardware and integrated into the device that executes the method.
  • the device can It is any kind of smart terminal equipped with a video application program capable of publishing multiple types of videos, such as a smart phone, a tablet computer, or a notebook computer.
  • the method may include the following steps:
  • S110 Acquire two or more video frames as candidate cover art in the video to be published.
  • the to-be-published video means that any user can pre-generate and publish to the Internet through multiple types of video applications for widespread dissemination, so as to realize the video data corresponding to online social networking, such as short videos recorded by users, and online Live video, etc.
  • online social networking such as short videos recorded by users, and online Live video, etc.
  • this embodiment is mainly aimed at The factors that attract users to click or watch videos are different.
  • a dedicated video cover is intelligently set for each video to be published in advance to meet the viewing preferences of users.
  • a video frame in the to-be-published video is used as the video cover of the to-be-published video.
  • a to-be-published video includes a large number of video frames, and consecutive video frames in a period of time include repeated videos Picture
  • the processing volume of the video frame is too large at this time, and there are a lot of repeated operations on similar video images, resulting in a longer duration of the selection process of the video cover and The efficiency is low. Therefore, in this embodiment, the corresponding video cover is obtained by processing the candidate cover, which improves the efficiency of determining the video cover.
  • the candidate cover refers to the video frames that meet certain conditions among all the video frames included in the video to be released and can represent the characteristics of similar video frames in different categories, such as the video frame with the best image effect in each category of similar video frames.
  • the video cover of the video to be published it is first necessary to select two or more video frames that meet certain conditions and can represent the characteristics of multiple types of similar video frames from all the video frames included in the video to be published.
  • the video frame is used as the candidate cover of the video to be published, and the candidate cover is subsequently processed to determine the video cover of the video to be published, reducing the number of processing video frames and improving the processing efficiency.
  • S120 Input each video frame as a candidate cover into different types of neural network models constructed in advance, and obtain feature information of the candidate cover under different attraction factors corresponding to the different types of neural network models constructed in advance.
  • the attraction factor refers to a factor that attracts users to click or watch the video to be published.
  • the attraction factors in the video to be published are relatively complicated, for example, for the same user, the video cover of different videos to be published attracts the user to click or watch different factors, such as the novelty or novelty of the content displayed on the cover. Creativity, the degree of excitement of the displayed content, the age, appearance and gender of the cover character, or image clarity, etc.; for the same video to be released, different users have different first subjective feelings, just like the same video attracts the first user
  • the factor of clicking or watching may be the appearance of the characters on the cover, and the factor that attracts the second user to click or watch may be the plot revealed in the content displayed on the cover.
  • a corresponding neural network model can be trained in advance for the attraction factor.
  • the neural network model is a deep machine learning model, and the corresponding training parameters can be set for the neural network model in advance.
  • the neural network model is trained with the initial training parameters, so that the neural network model has a certain feature extraction ability, and for each candidate cover, it can accurately extract the candidate cover Characteristic information under this attraction factor.
  • neural network models corresponding to the attraction factors can be pre-trained, so that different types of neural network models can be constructed in advance.
  • the attraction factor corresponding to each neural network model is a pre-mined factor that has a positive influence on the user clicking or watching the video, which can reflect to a certain extent that the video content plot contained in the candidate cover is useful for attracting users to click Or the degree of influence of the number of views, combining the user's click-through rate or the number of observations with the content plot contained in the video cover to accurately measure the attractiveness of each candidate cover to the user.
  • each Candidate covers are input to different types of neural network models constructed in this embodiment, and different types of neural network models perform parallel processing on the candidate covers to obtain different types of neural network models.
  • the feature information under the attraction factor According to the above process, the same processing is performed on each candidate cover to obtain the feature information of each candidate cover under different attraction factors.
  • different types of neural network models are used to process the candidate covers. Different types of neural network models can be used to obtain the attractiveness scores of the same candidate cover for different attractive factors. The attractiveness scores indicate that the candidate covers are in different
  • the feature information under the attraction factor is used to determine the degree of influence that the candidate cover attracts users to click or watch the video to be released under different attraction factors.
  • S130 Determine the feature vector of the candidate cover according to the feature information of the candidate cover under different attractive factors.
  • the candidate cover when obtaining the feature information of each candidate cover under different attractive factors, in order to more completely and accurately determine the attractiveness of each candidate cover for clicking or watching the video to be published, the candidate cover can be attracted in different ways.
  • the feature information under the factors is processed accordingly to obtain the feature vector of the candidate cover represented by multiple dimensions corresponding to different attraction factors.
  • the feature vector can describe the candidate cover from multiple dimensions for the user to click or watch to be released.
  • the attractiveness of the video, the feature vector of each candidate cover is processed subsequently, and the attractiveness of each candidate cover for users to click or watch the video to be released is judged from multiple dimensions.
  • S140 Determine the video cover of the video to be published according to feature vectors corresponding to the candidate covers corresponding to two or more video frames.
  • the video cover refers to an image that represents the to-be-published video displayed in each type of video application after the to-be-published video is published online.
  • the feature vectors of two or more candidate covers can be analyzed separately, that is, the feature information corresponding to different attraction factors of the same candidate cover in multiple dimensions Perform fusion analysis to comprehensively judge the attractiveness of the candidate cover for users to click or watch the video to be released from the perspective of multiple attractive factors, so as to select the most attractive for users to click or watch the video to be released from the multiple candidate covers
  • Candidate cover as the final video cover of the video to be published.
  • the video of the video to be published is determined according to the feature vectors corresponding to the candidate covers corresponding to two or more video frames.
  • the cover may include: inputting feature vectors corresponding to candidate covers corresponding to two or more video frames into a pre-built ranking model to obtain ranking scores corresponding to candidate covers corresponding to two or more video frames;
  • the video cover of the video to be released is determined according to the ranking scores of the candidate cover corresponding to two or more video frames.
  • the ranking model is a machine learning model. By using a large number of training samples to train the training parameters and neuron structure in the model, it can have a certain attractiveness judgment ability, so as to pass different
  • the multi-dimensional feature vector represented by the attraction factor accurately determines the attractiveness of the candidate cover for the user to click or watch the video to be released.
  • the ranking score refers to the attractiveness score for users to click or watch the video to be published.
  • the feature vector of each candidate cover is obtained according to the feature information of each candidate cover under different attraction factors.
  • the feature vectors of multiple candidate covers are respectively input into the sorting constructed by a large number of training samples in advance.
  • the ranking model performs a fusion analysis on the feature information of different attractive factors corresponding to multiple dimensions contained in the feature vector of the candidate cover, and obtains that the candidate cover is for the user to click or watch the video to be published.
  • Attractiveness ranking score when the ranking scores of two or more candidate covers are obtained, the candidate cover with the highest ranking score is selected from multiple candidate covers as the video cover of the video to be released.
  • the video cover is the most attractive for users to click or watch the video to be published, which increases the click rate and number of views of the video to be published after it is published.
  • two or more video frames are obtained as candidate covers in the video to be released, and each candidate cover is input into the neural network model corresponding to different attractive factors, and the results under different attractive factors are obtained.
  • Feature information generate feature vectors in multiple attractive factor dimensions, and determine the corresponding video cover based on the feature vectors of multiple candidate covers, which improves the correlation between the video cover and user preferences corresponding to different attractive factors, without manual selection
  • the video cover realizes the intelligent determination of the video cover in the video to be released, and solves the problem of selecting the video cover through a single factor, which cannot meet the diverse click and viewing needs of different users. At this time, the video cover is more attractive to users , Which improves the click-through rate and number of views of the video to be published after it is published.
  • FIG. 2 is a schematic diagram of the construction process of a different type of neural network model and sorting model provided in the second embodiment of the application. This embodiment is described on the basis of the above embodiment. As shown in FIG. 2, this embodiment mainly mines the attraction factors in the video cover that affect the user to click or watch the video, so as to explain the construction process of different types of neural network models and ranking models.
  • the training attraction factor refers to factors that may affect the user's click or watch the video, which may be pre-set according to multiple user habits or user preferences, and there will be factors that have a positive effect on the user's click or watch of the video. , There will also be factors that have a negative influence on the user's click or watch of the video.
  • the face value of the cover task may be a positive influence factor
  • the scary and bloody content displayed on the cover may be a negative influence factor.
  • the attraction factors in the cover that have a positive effect on the user clicking or watching the video it is possible to first obtain two or more training attraction presets according to multiple user habits or user preferences. Factor, and then obtain the corresponding video cover for each training attraction factor in turn, and determine the attraction of the video cover corresponding to each training attraction factor to the user to click or watch the video, so as to select from two or more training attraction factors
  • the training attraction factors that have a positive effect on the user's clicking or watching the video are filtered out, so as to construct different types of neural network models required in this embodiment.
  • S220 Acquire the click index amount of the historical video corresponding to each training attraction factor after being posted online.
  • the click indicator refers to an indicator that can clearly reflect the attractiveness of the video cover in the historical video for the user to click or watch the historical video;
  • the click indicator volume refers to the user who publishes the historical video online according to the corresponding video cover Click or watch the click rate or number of views of the historical video.
  • each training attraction factor needs to be obtained first The corresponding click indicator volume.
  • multiple rounds of testing are used to obtain the click index corresponding to each training attraction factor.
  • the test sequence of training attraction factors is determined, and the video cover of historical videos is specifically re-determined for each training attraction factor, and based on this The historical video is published online on the video cover determined at the time, so as to perform a round of historical video click index test for the training attraction factor.
  • each training attraction factor needs to be judged on the attractiveness of the user to click or watch the video, it is first necessary to obtain an initial video cover of the historical video without any training attraction factor participating in the determination.
  • the click index volume of historical videos after the online release is used as the reference basis for subsequent judgment of the attractiveness of multiple training attraction factors.
  • a video frame is randomly selected from the historical videos as the corresponding video cover in the first round of testing.
  • the historical video is published online according to the video cover of the first round of testing, and the click indicators for the first round of testing are obtained.
  • the training attraction factor for this round of testing can be selected according to the predetermined test sequence of the training attraction factor, and the neural network model pre-trained for the training attraction factor for this round of testing can be obtained.
  • the ranking model the ranking score of each candidate cover is obtained, and then the corresponding video cover in the second round of testing is determined.
  • the historical video is published online according to the corresponding video cover of the second round of testing, and the second round of testing is obtained Click metrics.
  • the neural network models jointly construct different types of neural network models corresponding to this test, and input two or more candidate covers selected in advance from all video frames of historical videos into the different types constructed in this test.
  • the feature information of the candidate cover under different training attraction factors corresponding to the current round of the test and the previous round of testing is obtained, and then the feature vector of the candidate cover is obtained.
  • the feature vector is input into the ranking model to obtain each candidate According to the ranking score of the cover, the corresponding video cover in the third round of testing is determined.
  • the historical video is published online according to the corresponding video cover of the third round of testing, and the click indicators of the third round of testing are obtained;
  • the rear-round click index is tested cyclically, and the click index corresponding to the training attraction factor in each round of the click index test is obtained; until multiple rounds of testing are used to complete the corresponding click index for all the training attraction factors
  • the training attraction factor that has a positive influence on the user's clicking or watching the video in this embodiment is determined, so as to realize the effective mining of the attraction factor in the video cover.
  • the ranking model can also be trained through a large number of candidate covers, and the results obtained from different types of neural network models are different. Train the feature information under the attraction factor, determine the feature vector of each candidate cover, and then input the feature vector into the ranking model for training, so as to improve the accuracy of the ranking score output by the ranking model.
  • the sorting model in this embodiment can use a machine learning model such as Multilayer Perceptron (MLP), Support Vector Machine (SVM), etc., and any loss function can be used for training during the training process.
  • MLP Multilayer Perceptron
  • SVM Support Vector Machine
  • any loss function can be used for training during the training process.
  • a loss function constrained by a relative relationship is used. The loss function is defined as:
  • is a pre-set adjustable super parameter during model training, and it is a positive number. By selecting a better super parameter for machine learning, the performance of machine learning can be improved.
  • the penalty measures of the loss function are set to two stages to illustrate the magnitude of, which are respectively u ⁇ and u> ⁇ .
  • S230 Determine whether the click index amount corresponding to the current training attraction factor is greater than the click index amount corresponding to the previous training attraction factor, and if the click index amount corresponding to the current training attraction factor is greater than the click index amount corresponding to the previous training attraction factor, execute S240; If the click index amount corresponding to the current training attraction factor is not greater than the click index amount corresponding to the previous training attraction factor, S250 is executed.
  • the click corresponding to the current training attraction factor in each round of the test may also be The index amount is compared with the click index amount corresponding to the last training attraction factor obtained when the neural network model corresponding to the current training attraction factor is not added in the last round of click index amount test, that is, to determine the nerve corresponding to the current training attraction factor.
  • the influence of the video cover determined in the network model and the previously determined video cover on the click index of the same historical video so as to determine the attractiveness of the current training attraction factor for users to click or watch the video.
  • the click index corresponding to the current training attraction factor is greater than the click index corresponding to the previous training attraction factor, it means that the historical video will be published online according to the video cover obtained after adding the neural network model corresponding to the current training attraction factor , The number of users clicking or watching the historical video increases, that is, the current training attraction factor has a positive influence on the user clicking or watching the historical video.
  • the neural network model corresponding to the training attraction factor is used as a pre-built different type One of the components of the neural network model, and continues to perform the click index test of the next training attraction factor.
  • the click index corresponding to the current training attraction factor is less than or equal to the click index corresponding to the previous training attraction factor, it means that the historical video will be online based on the video cover obtained after adding the neural network model corresponding to the current training attraction factor After the release, the number of users clicking or watching the historical video decreases or remains unchanged, that is, the current training attraction factor does not have a positive effect on the user clicking or watching the historical video, and may also have a negative effect. In this case, the training is directly discarded
  • the neural network model corresponding to the attraction factor is not used as a component of the different types of neural network models built in advance, and the click indicator volume test for the next training attraction factor will continue.
  • the currently discarded neural network model may not be used as a different type of neural network model to be constructed in the next round of click index testing, that is, when the neural network model corresponding to the current training attraction factor is discarded in this round of testing, it is also That is, the current round of testing failed.
  • the click index corresponding to the training attraction factor that failed the test is excluded, and it is the same as the training attraction factor that passed the test in the previous round. The corresponding click index quantity is compared and judged to improve the effectiveness of the excavated attraction factor.
  • S260 Use the next training attraction factor as the current training attraction factor, and continue to perform S230 until the two or more training attraction factors are traversed to obtain different types of neural network models constructed in advance.
  • the current training attractiveness factor After judging the attractiveness of the user to click or watch the video by the current training attractiveness factor, continue to judge the attractiveness of the next training attractiveness factor for the user to click or watch the video through multiple rounds of testing, that is, to The next training attraction factor is used as the current training attraction factor, and the above judgment process is continued until the training attraction factor is traversed, and then different types of neural network models constructed in advance are obtained.
  • the setting of the training attraction factor in this embodiment can be to mine the training attraction factor multiple times in the process of multiple rounds of click index testing, and implement an iterable attraction factor mining process through multiple rounds of click indicator testing, until multiple consecutive rounds
  • the click index volume test the click index volume corresponding to the training attraction factor is less than or equal to the click index volume in the previous round of testing, indicating that it is no longer possible to dig out the attraction factor with positive influence.
  • stop the attraction factor mining In the round-click index test, different types of neural network models constructed by the neural network models corresponding to the positively-influenced attraction factors required in this embodiment are obtained.
  • the technical solution provided in this embodiment is to determine the difference between the click index amount of the historical video corresponding to the current training attraction factor and the click index amount corresponding to the previous training attraction factor after being posted online, thereby effectively mining the training attraction factor.
  • FIG. 3 is a schematic diagram of the principle of a neural network model training process provided in Embodiment 3 of this application. This embodiment is described on the basis of the above embodiment. This embodiment mainly explains the training process of offline training of neural network models corresponding to different attraction factors.
  • the neural network model corresponding to each training attraction factor when training the neural network model corresponding to each training attraction factor, it is first necessary to obtain two or more preset training attraction factors and two historical videos corresponding to each training attraction factor. Or two or more historical video frames as historical candidate covers, use the historical candidate covers in the historical video as training samples, and determine the feature label of the historical candidate cover under the corresponding training attraction factor.
  • the feature label can be pre-labeled historical The score of the candidate cover under the corresponding training attraction factor.
  • S320 Input each historical candidate cover into a preset model corresponding to the training attraction factor to obtain historical feature information of the historical candidate cover under the training attraction factor.
  • each historical candidate cover is directly input into the preset preset for the training attraction factor in this embodiment.
  • the preset model is trained for the feature information of the historical candidate cover under the training attraction factor.
  • the input historical candidate The cover is analyzed to determine the feature information of the historical candidate cover under the training attraction factor, so that the feature information is subsequently compared with the feature label of the historical candidate cover under the training attraction factor, and the preset model is based on the comparison result Adjust the training parameters and neuron structure in, so as to perform iterative training on the preset model.
  • S330 Determine the training loss according to the historical feature information and feature labels of the historical candidate cover under the training attraction factor, and adjust the parameters of the preset model by using a stochastic gradient descent method, so that the training loss is less than a predetermined loss threshold, and the final The preset model is used as the neural network model corresponding to the training attraction factor.
  • the feature information of the historical candidate cover under each training attraction factor is obtained, the feature information is an estimated value.
  • the feature information and the feature of the historical candidate cover under the training attraction factor are obtained.
  • Tag comparison that is, to compare the estimated value and actual value of the feature information of the historical candidate cover, so as to determine the training loss in the preset model during this training.
  • the training loss can clearly indicate the current training prediction
  • the accuracy of the model for feature extraction in historical candidate covers any loss function may be used to determine the training loss of this training, which is not limited.
  • this embodiment obtains the training loss that exists when multiple historical candidate covers of this batch training are used as training samples, it is also necessary to use the random gradient descent method to judge the training loss.
  • the loss is greater than or equal to the established loss threshold, indicating that the preset model for this training is not accurate enough for feature extraction in the historical candidate cover, and it needs to be trained again to adjust the parameters in the preset model to make the training loss corresponding Decrease; then obtain the next batch of multiple historical candidate covers as training samples, determine the corresponding training loss again, and adjust the parameters in the preset model again by using the stochastic gradient descent method, and loop in turn until the training loss obtained is less than
  • the established loss threshold indicates that the preset model of this training has reached a certain accuracy for the feature extraction in the historical candidate cover, and there is no need to retrain.
  • the current latest preset model is used as the neural network corresponding to the training attraction factor Model:
  • the neural network model corresponding to each training attraction factor is trained to obtain different types of neural network models corresponding to different attraction factors.
  • the method of stochastic gradient descent is a very widely used algorithm in machine learning.
  • the method of stochastic gradient descent is mainly aimed at minimizing the loss function of each sample.
  • the loss function obtained in each iteration is not toward the global optimal
  • the overall iterative direction is toward the global optimal solution, and the final result is often near the global optimal solution. Therefore, the latest preset model obtained can achieve a certain accuracy for feature extraction in historical candidate covers.
  • the technical solution provided in this embodiment trains the neural network model corresponding to each training attraction factor through the historical feature information and feature labels of the historical candidate cover under each training attraction factor, and improves the neural network corresponding to each training attraction factor
  • the model obtains the accuracy of feature information, analyzes the correlation between candidate cover features and user preferences corresponding to different attractive factors, and effectively improves the attractiveness of the video cover to users.
  • FIG. 4 is a schematic diagram of the principle of a video publishing process provided in Embodiment 4 of this application. This embodiment is described on the basis of the above embodiment. As shown in FIG. 4, this embodiment mainly explains the process of obtaining the candidate cover and the process of publishing the video.
  • S410 Acquire multiple initial video frames that meet a preset condition in the video to be released.
  • the preset condition that should be met for initial screening in all video frames can be determined first.
  • the preset condition may be able to filter out dimness, blurry, or blurring in all video frames.
  • the initial video frame that meets the preset conditions is selected from all the video frames included in the video to be published, and then the corresponding candidate covers are continued to be filtered in the initial video frame.
  • the pre-selection in this embodiment The conditional screening can adopt fast algorithms, and the overhead in the entire video publishing process is very small, which improves the processing rate.
  • S420 Use a clustering algorithm to process multiple initial video frames to obtain two or more cluster sets.
  • the clustering algorithm can be used to process the initial video frame, thereby classifying the video frames with similar characteristics in the initial video frame into one category, and obtaining two or Two or more cluster sets, and the cluster set may include at least one initial video frame.
  • S430 Select a target video frame from each cluster set as a candidate cover of the video to be published.
  • one target video frame can be selected from each cluster set as the candidate cover of the cluster set, so as to obtain two or more of the videos to be published.
  • duplicate video frames can be eliminated at this time, reducing repeated operations on similar video images, and improving the accuracy of video covers in the video to be published.
  • S440 Input each video frame as a candidate cover into different types of neural network models constructed in advance, and obtain feature information of the candidate cover under different attraction factors corresponding to the different types of neural network models constructed in advance.
  • S450 Perform normalization processing on feature information of the candidate cover under different attraction factors to obtain a feature vector of the candidate cover.
  • the feature information of the candidate cover under different attractive factors is for different features in different dimensions.
  • the normalization method can be used to The feature information of the candidate cover under different attractive factors is preprocessed to obtain the feature vector of the candidate cover.
  • n is the serial number identification of the candidate cover
  • t is the serial number identification of the attraction factor
  • the feature information of the candidate cover under different attraction factors is normalized by the following formula: among them, N is the number of candidate covers; At this time, the normalized feature vector of each candidate cover is
  • S460 Determine the video cover of the video to be published according to the feature vectors corresponding to the candidate covers corresponding to the two or more video frames.
  • the video cover of the video to be published when determining the video cover of the video to be published, publish the video to be published online according to the video cover to increase the click rate and the number of views of the video to be published.
  • the technical solution provided in this embodiment clusters the initial video frames that meet preset conditions in the video to be released, and selects target video frames from the obtained multiple cluster sets as candidate covers of the video to be released, reducing
  • the number of video frame processing determined by the video cover is increased, and the determination rate of the video cover is increased.
  • the corresponding to-be-published video is published online according to the video cover, and the click-through rate and the number of views of the to-be-published video after publishing are increased.
  • FIG. 5 is a schematic structural diagram of a video publishing device provided in Embodiment 5 of the application.
  • the device may include: a candidate cover acquisition module 510, configured to acquire two or more of the videos to be published as The video frame of the candidate cover; the feature information determining module 520 is configured to input each video frame as a candidate cover into the different types of neural network models constructed in advance, and obtain the corresponding values of the candidate cover in the different types of neural network models constructed in advance.
  • the feature information under different attraction factors where the different attraction factors represent different factors that attract users to click or watch the video to be published;
  • the feature vector determining module 530 is set to determine the candidate cover based on the feature information of the candidate cover under different attraction factors
  • the video cover determining module 540 is configured to determine the video cover of the video to be published according to the feature vectors corresponding to the candidate covers corresponding to two or more video frames.
  • two or more video frames are obtained as candidate covers in the video to be released, and each candidate cover is input into the neural network model corresponding to different attractive factors, and the results under different attractive factors are obtained.
  • Feature information generate feature vectors in multiple attractive factor dimensions, and determine the corresponding video cover based on the feature vectors of multiple candidate covers, which improves the correlation between the video cover and user preferences corresponding to different attractive factors, without manual selection
  • the video cover realizes the intelligent determination of the video cover in the video to be released, and solves the problem of selecting the video cover through a single factor, which cannot meet the diverse click and viewing needs of different users. At this time, the video cover is more attractive to users , Which improves the click-through rate and number of views of the video to be published after it is published.
  • FIG. 6 is a schematic structural diagram of a device provided in Embodiment 6 of this application. As shown in FIG. 6, the device includes a processor 60, a storage device 61, and a communication device 62.
  • the storage device 61 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the video publishing method provided in the embodiments of the present application.
  • the processor 60 executes various functional applications and data processing of the device by running the software programs, instructions, and modules stored in the storage device 61, that is, realizes the foregoing video publishing method.
  • the seventh embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • the video publishing method in any of the foregoing embodiments can be implemented.
  • the method may include: obtaining two or more video frames as candidate covers in a video to be published; inputting each video frame as a candidate cover into a pre-built different type of neural network model, and obtaining the candidate cover in the pre-built
  • the different types of neural network models correspond to the feature information under different attractive factors, where the different attractive factors represent different factors that attract users to click or watch the video to be published; according to the feature information of the candidate cover under different attractive factors, determine The feature vector of the candidate cover; the video cover of the video to be published is determined according to the feature vector corresponding to the candidate cover corresponding to two or more video frames.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a video publishing method, apparatus and device, and a storage medium. The method comprises: acquiring two or more video frames, as candidate covers, in a video to be published; inputting each of the video frames as the candidate covers into different types of pre-constructed neural network models to obtain feature information of the candidate covers under different attraction factors corresponding to the different types of pre-constructed neural network models, wherein the different attraction factors represent different elements in attracting a user to click on or watch the video to be published; determining feature vectors of the candidate covers according to the feature information of the candidate covers under the different attraction factors; and determining, according to the feature vectors respectively corresponding to the candidate covers corresponding to the two or more video frames, a video cover of the video to be published.

Description

视频发布方法、装置、设备和存储介质Video publishing method, device, equipment and storage medium
本申请要求在2019年01月29日提交中国专利局、申请号为201910087567.X的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with application number 201910087567.X on January 29, 2019. The entire content of this application is incorporated into this application by reference.
技术领域Technical field
本申请实施例涉及互联网技术领域,例如涉及一种视频发布方法、装置、设备和存储介质。The embodiments of the present application relate to the field of Internet technology, for example, to a video publishing method, device, device, and storage medium.
背景技术Background technique
随着互联网技术的快速发展,短视频工具或平台吸引了众多的移动互联网用户,同时也占据了巨大的流量。用户利用此类短视频工具或平台发布大量的短视频,从而达到娱乐、分享或交流等目的。With the rapid development of Internet technology, short video tools or platforms have attracted a large number of mobile Internet users, while also occupying huge traffic. Users use such short video tools or platforms to publish a large number of short videos to achieve entertainment, sharing or communication purposes.
用户在短视频工具或者平台上发布短视频之前,会在短视频中精心挑选出一视频帧作为该短视频的封面,但由于用户的使用习惯或操作等原因,也有相当多的用户跳过手动选择封面这一步骤,直接发布短视频,此时客户端通常采用一些简单的策略确定封面,比如直接选择第一帧。Before users publish a short video on a short video tool or platform, they will carefully select a video frame as the cover of the short video. However, due to user habits or operations, quite a few users skip manual The step of selecting the cover is to directly publish the short video. At this time, the client usually adopts some simple strategies to determine the cover, such as directly selecting the first frame.
数据分析表明,短视频的自选封面比默认封面能带来更高的点击率和更多的观看次数,也就是通过一种策略产生的封面,会提高短视频的点击率,以及增加用户对该短视频的观看次数。然而由于每天发布的视频量巨大,无法做到所有的短视频都通过人工方式进行挑选,会耗费大量的人力成本;同时简单的封面确定策略仅通过单一的方式选取确定视频封面,无法满足不同用户多样化的点击和观看需求。Data analysis shows that the self-selected cover of short videos can bring higher click-through rate and more views than the default cover. That is, the cover generated through a strategy will increase the click-through rate of short videos and increase users’ The number of views of the short video. However, due to the huge amount of videos released every day, it is impossible to select all short videos manually, which will consume a lot of labor costs. At the same time, a simple cover determination strategy only uses a single method to determine the video cover, which cannot satisfy different users. Diversified click and watch needs.
发明内容Summary of the invention
本申请实施例提供了一种视频发布方法、装置、设备和存储介质,实现待发布视频中视频封面的智能确定,提升了视频封面与用户喜好的关联度,提高了待发布视频在发布后的点击率和观看次数。The embodiments of the present application provide a video publishing method, device, equipment, and storage medium, which realize the intelligent determination of the video cover of the video to be published, improve the correlation between the video cover and user preferences, and improve the post-publishing of the video to be published. Click-through rate and number of views.
本申请实施例提供了一种视频发布方法,该方法包括:The embodiment of the application provides a video publishing method, which includes:
获取待发布视频中两个或两个以上作为候选封面的视频帧;Obtain two or more video frames as candidate cover art in the video to be published;
将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到所述候选封面在所述预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息,其中,所述不同吸引因子表示吸引用户来点击或观看 待发布视频的不同因素;Input each video frame as a candidate cover into different types of neural network models constructed in advance, and obtain feature information of the candidate cover under different attraction factors corresponding to the different types of neural network models constructed in advance, where: The different attraction factors represent different factors that attract users to click or watch the video to be published;
根据所述候选封面在所述不同吸引因子下的特征信息,确定所述候选封面的特征向量;Determine the feature vector of the candidate cover according to the feature information of the candidate cover under the different attraction factors;
根据所述两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定所述待发布视频的视频封面。Determine the video cover of the to-be-published video according to feature vectors corresponding to the candidate covers corresponding to the two or more video frames.
本申请实施例提供了一种视频发布装置,该装置包括:An embodiment of the present application provides a video publishing device, which includes:
候选封面获取模块,设置为获取待发布视频中两个或两个以上作为候选封面的视频帧;Candidate cover acquisition module, configured to acquire two or more video frames as candidate covers in the video to be published;
特征信息确定模块,设置为将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到所述候选封面在所述预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息,其中,所述不同吸引因子表示吸引用户来点击或观看待发布视频的不同因素;The feature information determining module is configured to input each video frame as a candidate cover into a different type of neural network model constructed in advance to obtain different attraction factors corresponding to the different type of neural network model for the candidate cover Feature information below, where the different attractive factors represent different factors that attract users to click or watch the video to be released;
特征向量确定模块,设置为根据所述候选封面在所述不同吸引因子下的特征信息,确定所述候选封面的特征向量;A feature vector determining module, configured to determine the feature vector of the candidate cover according to the feature information of the candidate cover under the different attraction factors;
视频封面确定模块,设置为根据两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定所述待发布视频的视频封面。The video cover determination module is configured to determine the video cover of the to-be-published video according to the feature vectors corresponding to the candidate covers corresponding to two or more video frames.
本申请实施例提供了一种设备,该设备包括:An embodiment of the present application provides a device, which includes:
一个或多个处理器;One or more processors;
存储装置,设置为存储一个或多个程序;Storage device, set to store one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请任意实施例中所述的视频发布方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the video publishing method described in any embodiment of the present application.
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请任意实施例中所述的视频发布方法。The embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the video publishing method described in any embodiment of the present application is implemented.
附图说明Description of the drawings
图1A为本申请实施例一提供的一种视频发布方法的流程图;FIG. 1A is a flowchart of a video publishing method provided in Embodiment 1 of this application;
图1B为本申请实施例一提供的一种视频发布过程的原理示意图;FIG. 1B is a schematic diagram of the principle of a video publishing process provided in Embodiment 1 of this application;
图2为本申请实施例二提供的一种不同类型的神经网络模型和排序模型的构建过程的原理示意图;2 is a schematic diagram of the construction process of a different type of neural network model and sorting model provided in the second embodiment of the application;
图3为本申请实施例三提供的一种神经网络模型训练过程的原理示意图;3 is a schematic diagram of the principle of a neural network model training process provided in Embodiment 3 of the application;
图4为本申请实施例四提供的一种视频发布过程的原理示意图;FIG. 4 is a schematic diagram of the principle of a video publishing process provided in Embodiment 4 of this application;
图5为本申请实施例五提供的一种视频发布装置的结构示意图;FIG. 5 is a schematic structural diagram of a video publishing device provided in Embodiment 5 of this application;
图6为本申请实施例六提供的一种设备的结构示意图。FIG. 6 is a schematic structural diagram of a device provided in Embodiment 6 of this application.
具体实施方式detailed description
下面结合附图和实施例对本申请进行说明。此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The application will be described below with reference to the drawings and embodiments. The specific embodiments described here are only used to explain the application, but not to limit the application. For ease of description, the drawings only show a part of the structure related to the present application instead of all of the structure.
本申请实施例主要针对视频封面中吸引用户点击或观看视频的因素不同,对不同吸引因子下的候选封面的特征信息进行分析,以便从不同吸引因子对应的多个维度来描述每个候选封面的特征,进而通过对待发布视频中每个候选封面的多维度特征进行综合分析,以满足不同用户多样化的点击和观看需求,得到对用户点击或观看待发布视频的吸引力最大的视频封面,提高了待发布视频在发布后的点击率和观看次数。The embodiments of this application mainly focus on different factors in the video cover that attract users to click or watch the video, and analyze the feature information of candidate covers under different attractive factors, so as to describe the characteristics of each candidate cover from multiple dimensions corresponding to different attractive factors. Features, and then comprehensively analyze the multi-dimensional features of each candidate cover in the video to be released to meet the diversified click and watch needs of different users, and obtain the most attractive video cover for users to click or watch the video to be released, and improve The click-through rate and number of views of the video to be published after publication.
实施例一Example one
图1A为本申请实施例一提供的一种视频发布方法的流程图,本实施例可应用于任一种配置有能够发布多类视频的视频应用程序的智能终端中。本申请实施例的方案可以适用于如何智能确定待发布视频的视频封面的问题。本实施例提供的一种视频发布方法可以由本申请实施例提供的视频发布装置来执行,该装置可以通过软件和/或硬件的方式来实现,并集成在执行本方法的设备中,该设备可以是配置有能够发布多类视频的视频应用程序的任一种智能终端,如智能手机、平板电脑或者笔记本电脑等。FIG. 1A is a flowchart of a video publishing method provided in Embodiment 1 of this application. This embodiment can be applied to any smart terminal configured with video applications capable of publishing multiple types of videos. The solution of the embodiment of this application can be applied to the problem of how to intelligently determine the video cover of the video to be published. The video publishing method provided in this embodiment can be executed by the video publishing device provided in the embodiment of this application. The device can be implemented by software and/or hardware and integrated into the device that executes the method. The device can It is any kind of smart terminal equipped with a video application program capable of publishing multiple types of videos, such as a smart phone, a tablet computer, or a notebook computer.
参考图1A,该方法可以包括如下步骤:Referring to FIG. 1A, the method may include the following steps:
S110,获取待发布视频中两个或两个以上作为候选封面的视频帧。S110: Acquire two or more video frames as candidate cover art in the video to be published.
本实施例中,待发布视频是指任一用户可以通过多类视频应用程序预先生成的需要发布到互联网中进行广泛传播,从而实现网络社交所对应的视频数据,如用户录制的短视频、网络直播视频等。用户在互联网中存在的大量视频中选择一视频进行点击或者观看时,会根据多个视频的封面来选择吸引力较大的视频,而忽略其他吸引力较小的视频,因此本实施例主要针对吸引用户点击或者观看视频的因素不同,在待发布视频上传到互联网中之前,预先为每个待发布视频智能设定专有的视频封面,以符合用户的观看喜好。In this embodiment, the to-be-published video means that any user can pre-generate and publish to the Internet through multiple types of video applications for widespread dissemination, so as to realize the video data corresponding to online social networking, such as short videos recorded by users, and online Live video, etc. When a user selects a video to click or watch from a large number of videos existing on the Internet, the more attractive video will be selected according to the cover of the multiple videos, and other less attractive videos will be ignored. Therefore, this embodiment is mainly aimed at The factors that attract users to click or watch videos are different. Before the videos to be published are uploaded to the Internet, a dedicated video cover is intelligently set for each video to be published in advance to meet the viewing preferences of users.
本实施例中将待发布视频内的一视频帧作为该待发布视频的视频封面,此时由于一个待发布视频中包括大量的视频帧,且一时间段内的连续视频帧中包 括重复的视频画面,如果对全部视频帧处理而选取一特定视频帧作为视频封面,此时视频帧的处理量过大,且存在大量对相似视频画面的重复操作,导致视频封面的选择过程持续时间较长以及效率较低,因此本实施例中通过对候选封面处理得到对应的视频封面,提高视频封面的确定效率。其中,候选封面是指在待发布视频包含的全部视频帧中满足一定条件的、能够代表不同类中相似视频帧的特征的视频帧,如每类相似视频帧中图像效果最好的视频帧。In this embodiment, a video frame in the to-be-published video is used as the video cover of the to-be-published video. At this time, because a to-be-published video includes a large number of video frames, and consecutive video frames in a period of time include repeated videos Picture, if a specific video frame is selected as the video cover for processing all the video frames, the processing volume of the video frame is too large at this time, and there are a lot of repeated operations on similar video images, resulting in a longer duration of the selection process of the video cover and The efficiency is low. Therefore, in this embodiment, the corresponding video cover is obtained by processing the candidate cover, which improves the efficiency of determining the video cover. Among them, the candidate cover refers to the video frames that meet certain conditions among all the video frames included in the video to be released and can represent the characteristics of similar video frames in different categories, such as the video frame with the best image effect in each category of similar video frames.
可选的,在确定待发布视频的视频封面时,首先需要在该待发布视频包含的全部视频帧中选取出满足一定条件且能够代表多类相似视频帧的特征的两个或者两个以上的视频帧,作为该待发布视频的候选封面,后续对该候选封面进行处理,来确定该待发布视频的视频封面,减少视频帧的处理数量,提高处理效率。Optionally, when determining the video cover of the video to be published, it is first necessary to select two or more video frames that meet certain conditions and can represent the characteristics of multiple types of similar video frames from all the video frames included in the video to be published. The video frame is used as the candidate cover of the video to be published, and the candidate cover is subsequently processed to determine the video cover of the video to be published, reducing the number of processing video frames and improving the processing efficiency.
S120,将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到候选封面在预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息。S120: Input each video frame as a candidate cover into different types of neural network models constructed in advance, and obtain feature information of the candidate cover under different attraction factors corresponding to the different types of neural network models constructed in advance.
本实施例中,吸引因子表示吸引用户来点击或观看待发布视频的因素。一实施例中,由于待发布视频中吸引因子相对繁杂,例如:对于同一个用户,不同待发布视频的视频封面吸引该用户点击或观看的因素不尽相同,如可能是封面显示内容的新奇或创意、显示内容的紧张刺激程度、封面人物的年龄、颜值和性别或者图像清晰度等;而对于同一待发布视频,不同用户的第一主观感受也不尽相同,如同一视频吸引第一用户点击或观看的因素可能是封面中的人物颜值,而吸引第二用户点击或观看的因素可能是封面显示内容中透露的剧情。因此本实施例中针对每一吸引因子,可以预先为该吸引因子训练一个对应的神经网络模型,该神经网络模型是一种深度机器学习模型,能够预先为该神经网络模型设定对应的训练参数,并通过大量历史视频的视频帧,对该神经网络模型中初始设定的训练参数进行训练,使得该神经网络模型具备一定的特征提取能力,对于每个候选封面,均能够准确提取出候选封面在该吸引因子下的特征信息。In this embodiment, the attraction factor refers to a factor that attracts users to click or watch the video to be published. In one embodiment, because the attraction factors in the video to be published are relatively complicated, for example, for the same user, the video cover of different videos to be published attracts the user to click or watch different factors, such as the novelty or novelty of the content displayed on the cover. Creativity, the degree of excitement of the displayed content, the age, appearance and gender of the cover character, or image clarity, etc.; for the same video to be released, different users have different first subjective feelings, just like the same video attracts the first user The factor of clicking or watching may be the appearance of the characters on the cover, and the factor that attracts the second user to click or watch may be the plot revealed in the content displayed on the cover. Therefore, in this embodiment, for each attraction factor, a corresponding neural network model can be trained in advance for the attraction factor. The neural network model is a deep machine learning model, and the corresponding training parameters can be set for the neural network model in advance. , And through a large number of historical video video frames, the neural network model is trained with the initial training parameters, so that the neural network model has a certain feature extraction ability, and for each candidate cover, it can accurately extract the candidate cover Characteristic information under this attraction factor.
本实施例中对于不同的吸引因子,能够预先训练出该吸引因子对应类型下的神经网络模型,从而预先构建出不同类型的神经网络模型。本实施例中每个神经网络模型对应的吸引因子是预先挖掘出的对用户点击或观看视频有正向影响的因素,能够在一种程度上反映候选封面中包含的视频内容情节对于吸引用户点击或观看的次数的影响程度,将用户的点击率或观察次数与视频封面中包含的内容情节之间的内在关系相结合,准确衡量每个候选封面对于用户的吸引力。In this embodiment, for different attraction factors, neural network models corresponding to the attraction factors can be pre-trained, so that different types of neural network models can be constructed in advance. In this embodiment, the attraction factor corresponding to each neural network model is a pre-mined factor that has a positive influence on the user clicking or watching the video, which can reflect to a certain extent that the video content plot contained in the candidate cover is useful for attracting users to click Or the degree of influence of the number of views, combining the user's click-through rate or the number of observations with the content plot contained in the video cover to accurately measure the attractiveness of each candidate cover to the user.
可选的,在得到待发布视频中两个或两个以上的候选封面时,为了判断每一候选封面对于用户点击或观看该待发布视频的吸引力,如图1B所示,可以将每个候选封面输入到本实施例中预先构建的不同类型的神经网络模型中,由不同类型的神经网络模型对该候选封面进行并行处理,得到不同类型的神经网络模型输出的该候选封面在对应的不同吸引因子下的特征信息;依据上述过程,对每一候选封面进行同样的处理,得到每个候选封面在不同吸引因子下的特征信息。本实施例中通过不同类型的神经网络模型对候选封面进行处理,可以是通过不同类型的神经网络模型对同一候选封面针对不同吸引因子的吸引力得分,通过该吸引力得分来表示候选封面在不同吸引因子下的特征信息,从而判断候选封面在不同的吸引因子下分别吸引用户点击或观看待发布视频的影响程度。Optionally, when two or more candidate covers in the video to be published are obtained, in order to determine the attractiveness of each candidate cover for users to click or watch the video to be published, as shown in Figure 1B, each Candidate covers are input to different types of neural network models constructed in this embodiment, and different types of neural network models perform parallel processing on the candidate covers to obtain different types of neural network models. The feature information under the attraction factor: According to the above process, the same processing is performed on each candidate cover to obtain the feature information of each candidate cover under different attraction factors. In this embodiment, different types of neural network models are used to process the candidate covers. Different types of neural network models can be used to obtain the attractiveness scores of the same candidate cover for different attractive factors. The attractiveness scores indicate that the candidate covers are in different The feature information under the attraction factor is used to determine the degree of influence that the candidate cover attracts users to click or watch the video to be released under different attraction factors.
S130,根据候选封面在不同吸引因子下的特征信息,确定候选封面的特征向量。S130: Determine the feature vector of the candidate cover according to the feature information of the candidate cover under different attractive factors.
一实施例中,在得到每个候选封面在不同吸引因子下的特征信息时,为了更加完整、准确的判断每个候选封面对于点击或观看待发布视频的吸引力,可以对候选封面在不同吸引因子下的特征信息进行相应的特征处理,从而得到由不同吸引因子对应的多个维度来表示的候选封面的特征向量,该特征向量可以从多个维度描述该候选封面对于用户点击或观看待发布视频的吸引力,后续对每个候选封面的特征向量进行处理,从多维度判断每个候选封面对于用户点击或观看待发布视频的吸引力。In one embodiment, when obtaining the feature information of each candidate cover under different attractive factors, in order to more completely and accurately determine the attractiveness of each candidate cover for clicking or watching the video to be published, the candidate cover can be attracted in different ways. The feature information under the factors is processed accordingly to obtain the feature vector of the candidate cover represented by multiple dimensions corresponding to different attraction factors. The feature vector can describe the candidate cover from multiple dimensions for the user to click or watch to be released The attractiveness of the video, the feature vector of each candidate cover is processed subsequently, and the attractiveness of each candidate cover for users to click or watch the video to be released is judged from multiple dimensions.
S140,根据两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定待发布视频的视频封面。S140: Determine the video cover of the video to be published according to feature vectors corresponding to the candidate covers corresponding to two or more video frames.
本实施例中,视频封面是指待发布视频在线上发布后,每类视频应用程序中显示的表示该待发布视频的图像。In this embodiment, the video cover refers to an image that represents the to-be-published video displayed in each type of video application after the to-be-published video is published online.
一实施例中,在得到每个候选封面的特征向量后,可以对两个或两个以上候选封面的特征向量分别进行分析,也就是对同一候选封面在多维度下不同吸引因子对应的特征信息进行融合分析,从多个吸引因子的角度综合判断该候选封面对于用户点击或观看待发布视频的吸引力,从而在多个候选封面中选取出对于用户点击或观看待发布视频的吸引力最高的候选封面,作为该待发布视频最终的视频封面。In one embodiment, after the feature vector of each candidate cover is obtained, the feature vectors of two or more candidate covers can be analyzed separately, that is, the feature information corresponding to different attraction factors of the same candidate cover in multiple dimensions Perform fusion analysis to comprehensively judge the attractiveness of the candidate cover for users to click or watch the video to be released from the perspective of multiple attractive factors, so as to select the most attractive for users to click or watch the video to be released from the multiple candidate covers Candidate cover, as the final video cover of the video to be published.
可选的,为了准确衡量每个候选封面对用户点击或观看视频的吸引力,本实施例中根据两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定待发布视频的视频封面,可以包括:将两个或两个以上视频帧对应的候选封面分别对应的特征向量输入预先构建的排序模型中,得到两个或两个以上视频 帧对应的候选封面分别对应的排序得分;根据两个或两个以上视频帧对应的候选封面分别对应的排序得分,确定待发布视频的视频封面。Optionally, in order to accurately measure the attractiveness of each candidate cover for users to click or watch the video, in this embodiment, the video of the video to be published is determined according to the feature vectors corresponding to the candidate covers corresponding to two or more video frames. The cover may include: inputting feature vectors corresponding to candidate covers corresponding to two or more video frames into a pre-built ranking model to obtain ranking scores corresponding to candidate covers corresponding to two or more video frames; The video cover of the video to be released is determined according to the ranking scores of the candidate cover corresponding to two or more video frames.
一实施例中,排序模型是一种机器学习模型,通过采用大量训练样本对该模型中的训练参数和神经元结构进行训练,能够具备一定的吸引力判断能力,从而根据每个候选封面通过不同吸引因子表示的多维度下的特征向量准确判断出候选封面对于用户点击或观看待发布视频的吸引力。排序得分是指对于用户点击或观看待发布视频的吸引力得分。In one embodiment, the ranking model is a machine learning model. By using a large number of training samples to train the training parameters and neuron structure in the model, it can have a certain attractiveness judgment ability, so as to pass different The multi-dimensional feature vector represented by the attraction factor accurately determines the attractiveness of the candidate cover for the user to click or watch the video to be released. The ranking score refers to the attractiveness score for users to click or watch the video to be published.
可选的,本实施例中通过每个候选封面在不同吸引因子下的特征信息得到该候选封面的特征向量,此时将多个候选封面的特征向量分别输入到预先通过大量训练样本构建的排序模型中,如图1B所示,该排序模型对候选封面的特征向量中包含的多个维度对应的不同吸引因子下的特征信息进行融合分析,得到表示候选封面对于用户点击或观看待发布视频的吸引力的排序得分;在得到获取的两个或两个以上候选封面的排序得分时,在多个候选封面中选取出排序得分最高的候选封面,作为该待发布视频的视频封面,此时该视频封面对于用户点击或观看待发布视频的吸引力最高,提高了待发布视频在发布后的点击率和观看次数。Optionally, in this embodiment, the feature vector of each candidate cover is obtained according to the feature information of each candidate cover under different attraction factors. At this time, the feature vectors of multiple candidate covers are respectively input into the sorting constructed by a large number of training samples in advance. In the model, as shown in Figure 1B, the ranking model performs a fusion analysis on the feature information of different attractive factors corresponding to multiple dimensions contained in the feature vector of the candidate cover, and obtains that the candidate cover is for the user to click or watch the video to be published. Attractiveness ranking score; when the ranking scores of two or more candidate covers are obtained, the candidate cover with the highest ranking score is selected from multiple candidate covers as the video cover of the video to be released. The video cover is the most attractive for users to click or watch the video to be published, which increases the click rate and number of views of the video to be published after it is published.
本实施例提供的技术方案,通过在待发布视频中获取两个或两个以上视频帧作为候选封面,将每个候选封面输入不同吸引因子对应的神经网络模型中,得到在不同吸引因子下的特征信息,生成多个吸引因子维度下的特征向量,根据多个候选封面的特征向量,确定对应的视频封面,提升了视频封面与不同吸引因子对应的用户喜好之间的关联度,无需人工挑选视频封面,实现了待发布视频中视频封面的智能确定,解决了通过单一因素选取视频封面,无法满足不同用户多样化的点击和观看需求的问题,此时的视频封面对用户存在较大吸引力,提高了待发布视频在发布后的点击率和观看次数。In the technical solution provided by this embodiment, two or more video frames are obtained as candidate covers in the video to be released, and each candidate cover is input into the neural network model corresponding to different attractive factors, and the results under different attractive factors are obtained. Feature information, generate feature vectors in multiple attractive factor dimensions, and determine the corresponding video cover based on the feature vectors of multiple candidate covers, which improves the correlation between the video cover and user preferences corresponding to different attractive factors, without manual selection The video cover realizes the intelligent determination of the video cover in the video to be released, and solves the problem of selecting the video cover through a single factor, which cannot meet the diverse click and viewing needs of different users. At this time, the video cover is more attractive to users , Which improves the click-through rate and number of views of the video to be published after it is published.
实施例二Example two
图2为本申请实施例二提供的一种不同类型的神经网络模型和排序模型的构建过程的原理示意图。本实施例是在上述实施例的基础上进行说明。如图2所示,本实施例主要对于视频封面中影响用户点击或观看视频的吸引因子进行挖掘,从而对不同类型的神经网络模型和排序模型的构建过程进行解释说明。FIG. 2 is a schematic diagram of the construction process of a different type of neural network model and sorting model provided in the second embodiment of the application. This embodiment is described on the basis of the above embodiment. As shown in FIG. 2, this embodiment mainly mines the attraction factors in the video cover that affect the user to click or watch the video, so as to explain the construction process of different types of neural network models and ranking models.
本实施例中的方法可以包括如下步骤:The method in this embodiment may include the following steps:
S210,获取预先设定的两个或两个以上的训练吸引因子。S210: Obtain two or more preset training attraction factors.
本实施例中,训练吸引因子是指根据多个用户习惯或者用户喜好等预先设 定的可能会对用户点击或观看视频造成影响的因素,会存在对用户点击或观看视频有正向影响的因素,也会存在对用户点击或观看视频有反向影响的因素,例如封面任务的颜值可能是有正向影响的因素,而封面中显示的恐怖血腥内容可能是有反向影响的因素。In this embodiment, the training attraction factor refers to factors that may affect the user's click or watch the video, which may be pre-set according to multiple user habits or user preferences, and there will be factors that have a positive effect on the user's click or watch of the video. , There will also be factors that have a negative influence on the user's click or watch of the video. For example, the face value of the cover task may be a positive influence factor, and the scary and bloody content displayed on the cover may be a negative influence factor.
本实施例中在对封面中对用户点击或观看视频有正向影响的吸引因子进行挖掘时,可以首先获取根据多个用户习惯或者用户喜好等预先设定的两个或两个以上的训练吸引因子,后续依次针对每个训练吸引因子得到对应的视频封面,并判断每个训练吸引因子对应的视频封面对用户点击或观看视频的吸引力,从而从两个或两个以上的训练吸引因子中筛选出对用户点击或观看视频有正向影响的训练吸引因子,从而构建本实施例中需要的不同类型的神经网络模型。In this embodiment, when mining the attraction factors in the cover that have a positive effect on the user clicking or watching the video, it is possible to first obtain two or more training attraction presets according to multiple user habits or user preferences. Factor, and then obtain the corresponding video cover for each training attraction factor in turn, and determine the attraction of the video cover corresponding to each training attraction factor to the user to click or watch the video, so as to select from two or more training attraction factors The training attraction factors that have a positive effect on the user's clicking or watching the video are filtered out, so as to construct different types of neural network models required in this embodiment.
S220,依次获取每个训练吸引因子对应的历史视频在线上发布后的点击指标量。S220: Acquire the click index amount of the historical video corresponding to each training attraction factor after being posted online.
本实施例中,点击指标是指能够明确反映历史视频中的视频封面对用户点击或观看该历史视频的吸引力的指标;点击指标量是指根据对应的视频封面将历史视频在线上发布后用户点击或观看该历史视频的点击率或者观看次数等。In this embodiment, the click indicator refers to an indicator that can clearly reflect the attractiveness of the video cover in the historical video for the user to click or watch the historical video; the click indicator volume refers to the user who publishes the historical video online according to the corresponding video cover Click or watch the click rate or number of views of the historical video.
可选的,在获取到两个或两个以上的训练吸引因子时,需要对每个训练吸引因子对应的用户点击或观看视频的点击指标量进行测试,此时首先需要获取每个训练吸引因子对应的点击指标量。本实施例中采用多轮测试的方式分别获取每个训练吸引因子对应的点击指标量,首先确定训练吸引因子的测试顺序,针对每个训练吸引因子专门重新确定历史视频的视频封面,并根据此时确定的视频封面在线上发布该历史视频,从而对该训练吸引因子执行一轮历史视频的点击指标量测试。Optionally, when two or more training attraction factors are obtained, it is necessary to test the click index of the user click or watch the video corresponding to each training attraction factor. In this case, each training attraction factor needs to be obtained first The corresponding click indicator volume. In this embodiment, multiple rounds of testing are used to obtain the click index corresponding to each training attraction factor. First, the test sequence of training attraction factors is determined, and the video cover of historical videos is specifically re-determined for each training attraction factor, and based on this The historical video is published online on the video cover determined at the time, so as to perform a round of historical video click index test for the training attraction factor.
示例性的,由于需要对每个训练吸引因子对用户点击或观看视频的吸引力进行判断,因此首先需要得到一个初始的无任何训练吸引因子参与确定的历史视频的视频封面,根据该视频封面将历史视频在线上发布后的点击指标量作为后续判断多个训练吸引因子的吸引力的参考依据,此时在首轮点击指标量测试中,无需设定任何神经网络模型,采用随机抽样的方式,从历史视频中随机选取一个视频帧作为首轮测试中对应的视频封面,此时根据首轮测试的视频封面将该历史视频在线上发布,得到首轮测试的点击指标量。Exemplarily, since each training attraction factor needs to be judged on the attractiveness of the user to click or watch the video, it is first necessary to obtain an initial video cover of the historical video without any training attraction factor participating in the determination. According to the video cover The click index volume of historical videos after the online release is used as the reference basis for subsequent judgment of the attractiveness of multiple training attraction factors. At this time, in the first round of click index volume test, there is no need to set any neural network model, and random sampling is adopted. A video frame is randomly selected from the historical videos as the corresponding video cover in the first round of testing. At this time, the historical video is published online according to the video cover of the first round of testing, and the click indicators for the first round of testing are obtained.
在第二轮点击指标量测试中,可以根据预先确定的训练吸引因子的测试顺序,选取该轮测试针对的训练吸引因子,获取对该轮测试针对的训练吸引因子预先训练的神经网络模型,从而将预先从历史视频的全部视频帧中选取出的两个或两个以上的候选封面输入到该神经网络模型中,得到候选封面在该训练吸引因子下的特征信息,进而将该特征信息输入到排序模型中,得到每个候选封 面的排序得分,进而确定第二轮测试中对应的视频封面,此时根据第二轮测试对应的视频封面将该历史视频在线上发布,得到第二轮测试的点击指标量。In the second round of click index testing, the training attraction factor for this round of testing can be selected according to the predetermined test sequence of the training attraction factor, and the neural network model pre-trained for the training attraction factor for this round of testing can be obtained. Input two or more candidate covers selected from all the video frames of historical videos into the neural network model to obtain the feature information of the candidate cover under the training attraction factor, and then input the feature information into In the ranking model, the ranking score of each candidate cover is obtained, and then the corresponding video cover in the second round of testing is determined. At this time, the historical video is published online according to the corresponding video cover of the second round of testing, and the second round of testing is obtained Click metrics.
在第三轮点击指标量测试中,再次选取该轮测试针对的训练吸引因子,获取对该轮测试针对的训练吸引因子预先训练的神经网络模型,并与前轮已经测试的训练吸引因子对应的神经网络模型共同构建出本次测试对应的不同类型的神经网络模型,将预先从历史视频的全部视频帧中选取出的两个或两个以上的候选封面输入到本次测试预先构建的不同类型的神经网络模型中,得到候选封面在本轮测试与前轮测试对应的不同训练吸引因子下的特征信息,进而得到候选封面的特征向量,将该特征向量输入到排序模型中,得到每个候选封面的排序得分,进而确定第三轮测试中对应的视频封面,此时根据第三轮测试对应的视频封面将该历史视频在线上发布,得到第三轮测试的点击指标量;依次根据本轮的测试过程对后轮点击指标量循环测试,得到每一轮点击指标量测试中针对的训练吸引因子对应的点击指标量;直至采用多轮测试方式对全部的训练吸引因子均完成对应的点击指标量测试,从而确定出本实施例中对用户点击或观看视频有正向影响的训练吸引因子,实现视频封面中吸引因子的有效挖掘。In the third round of click index test, again select the training attraction factor targeted by this round of testing, and obtain the neural network model pre-trained for the training attraction factor targeted by this round of testing, and corresponding to the training attraction factor tested in the previous round The neural network models jointly construct different types of neural network models corresponding to this test, and input two or more candidate covers selected in advance from all video frames of historical videos into the different types constructed in this test. In the neural network model, the feature information of the candidate cover under different training attraction factors corresponding to the current round of the test and the previous round of testing is obtained, and then the feature vector of the candidate cover is obtained. The feature vector is input into the ranking model to obtain each candidate According to the ranking score of the cover, the corresponding video cover in the third round of testing is determined. At this time, the historical video is published online according to the corresponding video cover of the third round of testing, and the click indicators of the third round of testing are obtained; In the testing process, the rear-round click index is tested cyclically, and the click index corresponding to the training attraction factor in each round of the click index test is obtained; until multiple rounds of testing are used to complete the corresponding click index for all the training attraction factors Through quantitative testing, the training attraction factor that has a positive influence on the user's clicking or watching the video in this embodiment is determined, so as to realize the effective mining of the attraction factor in the video cover.
此外,在多轮点击指标量测试过程中,将候选封面的特征向量输入到排序模型时,也可以通过大量候选封面对该排序模型进行训练,,通过不同类型的神经网络模型中得到的在不同训练吸引因子下的特征信息,确定每个候选封面的特征向量,后续将该特征向量输入到排序模型中进行训练,,提高该排序模型输出的排序得分的准确性。可选的,本实施例中排序模型可以采用多层感知机(Multilayer Perceptron,MLP)、支持向量机(Support Vector Machine,SVM)等机器学习模型,训练过程中可以采用任一种损失函数进行训练,示例性的,本实施例中采用一种相对关系约束的损失函数。该损失函数定义为:In addition, in the process of multiple rounds of click index testing, when the feature vector of the candidate cover is input to the ranking model, the ranking model can also be trained through a large number of candidate covers, and the results obtained from different types of neural network models are different. Train the feature information under the attraction factor, determine the feature vector of each candidate cover, and then input the feature vector into the ranking model for training, so as to improve the accuracy of the ranking score output by the ranking model. Optionally, the sorting model in this embodiment can use a machine learning model such as Multilayer Perceptron (MLP), Support Vector Machine (SVM), etc., and any loss function can be used for training during the training process. For example, in this embodiment, a loss function constrained by a relative relationship is used. The loss function is defined as:
Figure PCTCN2020072191-appb-000001
Figure PCTCN2020072191-appb-000001
其中,
Figure PCTCN2020072191-appb-000002
为一候选封面在当前轮点击指标量测试时排序模型输出的排序得分;
Figure PCTCN2020072191-appb-000003
为同一候选封面在前一轮点击指标量测试时排序模型输出的排序得分;本实施例中要求
Figure PCTCN2020072191-appb-000004
保证当前轮测试的训练吸引因子对用户点击或观看视频的正向影响。δ为模型训练时预先设定的可调整的超参,为正数,通过对机器学习选择较优的超参,提高机器学习的性能。此时,本实 施例中根据
Figure PCTCN2020072191-appb-000005
Figure PCTCN2020072191-appb-000006
的大小,将损失函数的惩罚措施设置为两个阶段来说明,分别为u≤δ和u>δ两个阶段。
among them,
Figure PCTCN2020072191-appb-000002
Is the sorting score output by the sorting model for a candidate cover in the current round of click index test;
Figure PCTCN2020072191-appb-000003
It is the ranking score output by the ranking model during the previous round of click index test for the same candidate cover; it is required in this embodiment
Figure PCTCN2020072191-appb-000004
Ensure that the training attraction factor of the current round of testing has a positive impact on the user's click or watch of the video. δ is a pre-set adjustable super parameter during model training, and it is a positive number. By selecting a better super parameter for machine learning, the performance of machine learning can be improved. At this time, according to this embodiment
Figure PCTCN2020072191-appb-000005
versus
Figure PCTCN2020072191-appb-000006
The penalty measures of the loss function are set to two stages to illustrate the magnitude of, which are respectively u≤δ and u>δ.
其中,
Figure PCTCN2020072191-appb-000007
表示损失函数在第一阶段的惩罚力度;
Figure PCTCN2020072191-appb-000008
表示损失函数在第二阶段的惩罚力度;
Figure PCTCN2020072191-appb-000009
为第一阶段和第二阶段的判断指标;此时,
Figure PCTCN2020072191-appb-000010
表示相邻两轮的排序得分之间的差异大小,在u<0时,说明
Figure PCTCN2020072191-appb-000011
远大于
Figure PCTCN2020072191-appb-000012
满足训练要求,此时l rank=0,无需进行惩罚;在0≤u≤δ时,说明
Figure PCTCN2020072191-appb-000013
虽然大于
Figure PCTCN2020072191-appb-000014
但差异较小,为了满足排序模型的准确性,可以进行较小力度的惩罚,此时l rank=0.5u 2,由于0≤u≤δ,使得0.5u 2保证了较小的惩罚力度;在u>δ时,说明
Figure PCTCN2020072191-appb-000015
小于
Figure PCTCN2020072191-appb-000016
不符合训练要求,此时l rank=δu-0.5δ 2,可以在该阶段进行较大力度的惩罚;从而通过该损失函数实现排序模型的训练,保证排序模型输出排序得分的准确性。
among them,
Figure PCTCN2020072191-appb-000007
Indicates the penalty of the loss function in the first stage;
Figure PCTCN2020072191-appb-000008
Indicates the penalty of the loss function in the second stage;
Figure PCTCN2020072191-appb-000009
Is the judgment index for the first and second stages; at this time,
Figure PCTCN2020072191-appb-000010
Indicates the difference between the ranking scores of two adjacent rounds. When u<0, it means
Figure PCTCN2020072191-appb-000011
Much larger than
Figure PCTCN2020072191-appb-000012
Meet the training requirements, at this time l rank =0, no penalty is required; when 0≤u≤δ, it means
Figure PCTCN2020072191-appb-000013
Although greater than
Figure PCTCN2020072191-appb-000014
However, the difference is small. In order to meet the accuracy of the ranking model, a smaller punishment can be performed. At this time, l rank = 0.5u 2 , because 0≤u≤δ, 0.5u 2 guarantees a smaller punishment; When u>δ, explain
Figure PCTCN2020072191-appb-000015
Less than
Figure PCTCN2020072191-appb-000016
Does not meet the training requirements, at this time l rank = δu-0.5δ 2 , a relatively strong penalty can be performed at this stage; thus, the training of the ranking model is realized through the loss function, and the accuracy of the ranking score output by the ranking model is guaranteed.
S230,判断当前训练吸引因子对应的点击指标量是否大于上一训练吸引因子对应的点击指标量,若当前训练吸引因子对应的点击指标量大于上一训练吸引因子对应的点击指标量,执行S240;若当前训练吸引因子对应的点击指标量不大于上一训练吸引因子对应的点击指标量,执行S250。S230: Determine whether the click index amount corresponding to the current training attraction factor is greater than the click index amount corresponding to the previous training attraction factor, and if the click index amount corresponding to the current training attraction factor is greater than the click index amount corresponding to the previous training attraction factor, execute S240; If the click index amount corresponding to the current training attraction factor is not greater than the click index amount corresponding to the previous training attraction factor, S250 is executed.
一实施例中,在通过多轮测试方式得到每一轮测试中当前训练吸引因子对应的点击指标量时,为了对当前训练吸引因子的吸引力进行判断,还可以将当前训练吸引因子对应的点击指标量与上一轮点击指标量测试中未加入当前训练吸引因子对应的神经网络模型时得到的上一训练吸引因子对应的点击指标量进行比对,也就是判断加入当前训练吸引因子对应的神经网络模型时确定的视频封面与加入之前确定的视频封面对同一历史视频的点击指标量的影响,从而判断当前训练吸引因子对用户点击或观看视频的吸引力。In one embodiment, when the click index corresponding to the current training attraction factor in each round of the test is obtained through multiple rounds of testing, in order to judge the attractiveness of the current training attraction factor, the click corresponding to the current training attraction factor may also be The index amount is compared with the click index amount corresponding to the last training attraction factor obtained when the neural network model corresponding to the current training attraction factor is not added in the last round of click index amount test, that is, to determine the nerve corresponding to the current training attraction factor The influence of the video cover determined in the network model and the previously determined video cover on the click index of the same historical video, so as to determine the attractiveness of the current training attraction factor for users to click or watch the video.
S240,将当前训练吸引因子对应的神经网络模型作为预先构建的不同类型的神经网络模型中的一个。S240: Use the neural network model corresponding to the current training attraction factor as one of the different types of neural network models constructed in advance.
可选的,如果当前训练吸引因子对应的点击指标量大于上一训练吸引因子对应的点击指标量,说明根据加入当前训练吸引因子对应的神经网络模型后得 到的视频封面将历史视频在线上发布后,用户点击或观看该历史视频的数量增多,也就是当前训练吸引因子对用户点击或观看该历史视频具有正向影响,此时将该训练吸引因子对应的神经网络模型作为预先构建的不同类型的神经网络模型的组成之一,并继续执行下一训练吸引因子的点击指标量测试。Optionally, if the click index corresponding to the current training attraction factor is greater than the click index corresponding to the previous training attraction factor, it means that the historical video will be published online according to the video cover obtained after adding the neural network model corresponding to the current training attraction factor , The number of users clicking or watching the historical video increases, that is, the current training attraction factor has a positive influence on the user clicking or watching the historical video. At this time, the neural network model corresponding to the training attraction factor is used as a pre-built different type One of the components of the neural network model, and continues to perform the click index test of the next training attraction factor.
S250,舍弃当前训练吸引因子对应的神经网络模型。S250: Abandon the neural network model corresponding to the current training attraction factor.
可选的,如果当前训练吸引因子对应的点击指标量小于或等于上一训练吸引因子对应的点击指标量,说明根据加入当前训练吸引因子对应的神经网络模型后得到的视频封面将历史视频在线上发布后,用户点击或观看该历史视频的数量减少或者不变,也就是当前训练吸引因子对用户点击或观看该历史视频不具有正向影响,可能还具有反向影响,此时直接舍弃该训练吸引因子对应的神经网络模型,而不作为预先构建的不同类型的神经网络模型的组成,并继续执行下一训练吸引因子的点击指标量测试。可选的,当前舍弃的神经网络模型可以不作为下一轮点击指标量测试中构建的不同类型的神经网络模型,也就是在本轮测试中舍弃当前训练吸引因子对应的神经网络模型时,也就是本轮测试未通过,此时在将下一训练吸引因子作为当前训练吸引因子进行测试时,排除未通过测试的训练吸引因子对应的点击指标量,而与前一轮通过测试的训练吸引因子对应的点击指标量进行比对判断,提高挖掘出的吸引因子的有效性。Optionally, if the click index corresponding to the current training attraction factor is less than or equal to the click index corresponding to the previous training attraction factor, it means that the historical video will be online based on the video cover obtained after adding the neural network model corresponding to the current training attraction factor After the release, the number of users clicking or watching the historical video decreases or remains unchanged, that is, the current training attraction factor does not have a positive effect on the user clicking or watching the historical video, and may also have a negative effect. In this case, the training is directly discarded The neural network model corresponding to the attraction factor is not used as a component of the different types of neural network models built in advance, and the click indicator volume test for the next training attraction factor will continue. Optionally, the currently discarded neural network model may not be used as a different type of neural network model to be constructed in the next round of click index testing, that is, when the neural network model corresponding to the current training attraction factor is discarded in this round of testing, it is also That is, the current round of testing failed. At this time, when the next training attraction factor is used as the current training attraction factor for testing, the click index corresponding to the training attraction factor that failed the test is excluded, and it is the same as the training attraction factor that passed the test in the previous round. The corresponding click index quantity is compared and judged to improve the effectiveness of the excavated attraction factor.
S260,将下一训练吸引因子作为当前训练吸引因子,继续执行S230,直至遍历所述两个或两个以上的训练吸引因子,得到预先构建的不同类型的神经网络模型。S260: Use the next training attraction factor as the current training attraction factor, and continue to perform S230 until the two or more training attraction factors are traversed to obtain different types of neural network models constructed in advance.
可选的,在对当前训练吸引因子对用户点击或观看视频的吸引力进行判断后,通过多轮测试方式继续对下一训练吸引因子对用户点击或观看视频的吸引力进行判断,也就是将下一训练吸引因子作为当前训练吸引因子,继续执行上述的判断过程,直至遍历完训练吸引因子,则得到预先构建的不同类型的神经网络模型。本实施例中设定训练吸引因子,可以是在多轮点击指标量测试过程中多次挖掘训练吸引因子,通过多轮点击指标量测试实现一种可迭代的吸引因子挖掘流程,直至连续多轮的点击指标量测试中训练吸引因子对应的点击指标量小于或等于前一轮测试中的点击指标量,说明无法再挖掘出具有正向影响的吸引因子,此时停止吸引因子的挖掘,在多轮点击指标量测试中,得到本实施例中需要的由正向影响的吸引因子对应的神经网络模型构建的不同类型的神经网络模型。Optionally, after judging the attractiveness of the user to click or watch the video by the current training attractiveness factor, continue to judge the attractiveness of the next training attractiveness factor for the user to click or watch the video through multiple rounds of testing, that is, to The next training attraction factor is used as the current training attraction factor, and the above judgment process is continued until the training attraction factor is traversed, and then different types of neural network models constructed in advance are obtained. The setting of the training attraction factor in this embodiment can be to mine the training attraction factor multiple times in the process of multiple rounds of click index testing, and implement an iterable attraction factor mining process through multiple rounds of click indicator testing, until multiple consecutive rounds In the click index volume test, the click index volume corresponding to the training attraction factor is less than or equal to the click index volume in the previous round of testing, indicating that it is no longer possible to dig out the attraction factor with positive influence. At this time, stop the attraction factor mining. In the round-click index test, different types of neural network models constructed by the neural network models corresponding to the positively-influenced attraction factors required in this embodiment are obtained.
本实施例提供的技术方案,通过判断当前训练吸引因子对应的历史视频在线上发布后的点击指标量与上一训练吸引因子对应的点击指标量的不同,从而在训练吸引因子中有效挖掘出对用户点击或观看视频有正向影响的吸引因子, 将该吸引因子对应的神经网络模型构建为最终训练出的不同类型的神经网络模型,有效提升视频封面对用户的吸引力,提高待发布视频在发布后的点击率和观看次数。The technical solution provided in this embodiment is to determine the difference between the click index amount of the historical video corresponding to the current training attraction factor and the click index amount corresponding to the previous training attraction factor after being posted online, thereby effectively mining the training attraction factor. The user clicks or watches the attraction factor that has a positive impact on the video, and the neural network model corresponding to the attraction factor is constructed into different types of neural network models finally trained, which effectively enhances the attractiveness of the video cover to the user and improves the video to be released. Click-through rate and number of views after publishing.
实施例三Example three
图3为本申请实施例三提供的一种神经网络模型训练过程的原理示意图。本实施例是在上述实施例的基础上进行说明。本实施例中主要对离线训练不同吸引因子对应的神经网络模型的训练过程进行解释说明。FIG. 3 is a schematic diagram of the principle of a neural network model training process provided in Embodiment 3 of this application. This embodiment is described on the basis of the above embodiment. This embodiment mainly explains the training process of offline training of neural network models corresponding to different attraction factors.
本实施例中可以包括如下步骤:This embodiment may include the following steps:
S310,获取预先设定的两个或两个以上的训练吸引因子,以及每个训练吸引因子对应的历史视频中两个或两个以上作为历史候选封面的历史视频帧,并确定每个历史候选封面在对应的训练吸引因子下的特征标签。S310. Obtain two or more preset training attraction factors, and two or more historical video frames in the historical video corresponding to each training attraction factor as historical candidate covers, and determine each historical candidate The feature label of the cover under the corresponding training attraction factor.
本实施例在对每一个训练吸引因子对应的神经网络模型进行训练时,首先需要获取预先设定的两个或两个以上的训练吸引因子,以及每个训练吸引因子对应的历史视频中两个或两个以上作为历史候选封面的历史视频帧,将历史视频中的历史候选封面作为训练样本,并确定历史候选封面在对应的训练吸引因子下的特征标签,该特征标签可以是预先标注的历史候选封面在对应的训练吸引因子下的得分。In this embodiment, when training the neural network model corresponding to each training attraction factor, it is first necessary to obtain two or more preset training attraction factors and two historical videos corresponding to each training attraction factor. Or two or more historical video frames as historical candidate covers, use the historical candidate covers in the historical video as training samples, and determine the feature label of the historical candidate cover under the corresponding training attraction factor. The feature label can be pre-labeled historical The score of the candidate cover under the corresponding training attraction factor.
S320,将每个历史候选封面输入训练吸引因子对应的预设模型中,得到历史候选封面在训练吸引因子下的历史特征信息。S320: Input each historical candidate cover into a preset model corresponding to the training attraction factor to obtain historical feature information of the historical candidate cover under the training attraction factor.
一实施例中,在得到每个训练吸引因子对应的历史视频中两个或两个以上的候选封面时,直接将每个历史候选封面输入到本实施例中为该训练吸引因子设定的预设模型中,此时该预设模型是针对历史候选封面在该训练吸引因子下的特征信息进行训练,通过该模型中的训练参数和多个神经元结构之间的关系,对输入的历史候选封面进行分析,确定历史候选封面在该训练吸引因子下的特征信息,以便后续将该特征信息与历史候选封面在该训练吸引因子下的特征标签进行比对,根据比对结果对该预设模型中的训练参数和神经元结构进行调整,从而对该预设模型进行迭代训练。In one embodiment, when two or more candidate covers in the historical video corresponding to each training attraction factor are obtained, each historical candidate cover is directly input into the preset preset for the training attraction factor in this embodiment. In the model, the preset model is trained for the feature information of the historical candidate cover under the training attraction factor. Through the relationship between the training parameters in the model and the structure of multiple neurons, the input historical candidate The cover is analyzed to determine the feature information of the historical candidate cover under the training attraction factor, so that the feature information is subsequently compared with the feature label of the historical candidate cover under the training attraction factor, and the preset model is based on the comparison result Adjust the training parameters and neuron structure in, so as to perform iterative training on the preset model.
S330,根据历史候选封面在训练吸引因子下的历史特征信息和特征标签,确定训练损失,采用随机梯度下降的方法调整预设模型的参数,使得所述训练损失小于既定的损失阈值,将最终的预设模型作为训练吸引因子对应的神经网络模型。S330: Determine the training loss according to the historical feature information and feature labels of the historical candidate cover under the training attraction factor, and adjust the parameters of the preset model by using a stochastic gradient descent method, so that the training loss is less than a predetermined loss threshold, and the final The preset model is used as the neural network model corresponding to the training attraction factor.
一实施例中,在得到历史候选封面在每个训练吸引因子下的特征信息时, 该特征信息是一种预估值,此时将该特征信息与历史候选封面在该训练吸引因子下的特征标签进行比对,也就是对历史候选封面的特征信息的预估值和实际值进行比对,从而确定本次训练该预设模型中存在的训练损失,该训练损失可以明确表明当前训练的预设模型对于历史候选封面中特征提取的准确程度。可选的,本实施例中可以采用任一种损失函数来判断本次训练的训练损失,对此不作限定。同时,本实施例在得到本次分批次训练的多个历史候选封面作为训练样本时存在的训练损失时,还需要采用随机梯度下降的方法对该训练损失进行判断,如果本次训练的训练损失大于或等于既定的损失阈值,说明本次训练的预设模型对历史候选封面中特征提取的准确性还不高,还需要再次进行训练,从而调整预设模型中的参数,使得训练损失相应降低;进而获取下一批次的多个历史候选封面作为训练样本,再次确定对应的训练损失,并再次采用随机梯度下降的方法调整预设模型中的参数,依次循环,直到得到的训练损失小于既定的损失阈值,说明本次训练的预设模型对历史候选封面中的特征提取已经达到一定的准确性,无需再次训练,此时将当前最新的预设模型作为该训练吸引因子对应的神经网络模型;依据上述过程对于每一个训练吸引因子对应的神经网络模型均进行训练,得到不同吸引因子对应的不同类型的神经网络模型。其中,随机梯度下降的方法是机器学习中使用非常广泛的算法,随机梯度下降的方法主要是针对最小化每条样本的损失函数,此时虽然不是每次迭代得到的损失函数都向着全局最优方向,但是整体迭代的方向是向全局最优解的,最终的结果往往是在全局最优解附近,因此得到的最新的预设模型对历史候选封面中的特征提取能够达到一定的准确性。In one embodiment, when the feature information of the historical candidate cover under each training attraction factor is obtained, the feature information is an estimated value. At this time, the feature information and the feature of the historical candidate cover under the training attraction factor are obtained. Tag comparison, that is, to compare the estimated value and actual value of the feature information of the historical candidate cover, so as to determine the training loss in the preset model during this training. The training loss can clearly indicate the current training prediction Suppose the accuracy of the model for feature extraction in historical candidate covers. Optionally, in this embodiment, any loss function may be used to determine the training loss of this training, which is not limited. At the same time, when this embodiment obtains the training loss that exists when multiple historical candidate covers of this batch training are used as training samples, it is also necessary to use the random gradient descent method to judge the training loss. If the training of this training is The loss is greater than or equal to the established loss threshold, indicating that the preset model for this training is not accurate enough for feature extraction in the historical candidate cover, and it needs to be trained again to adjust the parameters in the preset model to make the training loss corresponding Decrease; then obtain the next batch of multiple historical candidate covers as training samples, determine the corresponding training loss again, and adjust the parameters in the preset model again by using the stochastic gradient descent method, and loop in turn until the training loss obtained is less than The established loss threshold indicates that the preset model of this training has reached a certain accuracy for the feature extraction in the historical candidate cover, and there is no need to retrain. At this time, the current latest preset model is used as the neural network corresponding to the training attraction factor Model: According to the above process, the neural network model corresponding to each training attraction factor is trained to obtain different types of neural network models corresponding to different attraction factors. Among them, the method of stochastic gradient descent is a very widely used algorithm in machine learning. The method of stochastic gradient descent is mainly aimed at minimizing the loss function of each sample. At this time, although the loss function obtained in each iteration is not toward the global optimal However, the overall iterative direction is toward the global optimal solution, and the final result is often near the global optimal solution. Therefore, the latest preset model obtained can achieve a certain accuracy for feature extraction in historical candidate covers.
本实施例提供的技术方案,通过历史候选封面在每个训练吸引因子下的历史特征信息和特征标签,对该训练吸引因子对应的神经网络模型进行训练,提高每个训练吸引因子对应的神经网络模型获取特征信息的准确性,对候选封面特征与不同吸引因子对应的用户喜好之间的关联度进行分析,有效提升视频封面对用户的吸引力。The technical solution provided in this embodiment trains the neural network model corresponding to each training attraction factor through the historical feature information and feature labels of the historical candidate cover under each training attraction factor, and improves the neural network corresponding to each training attraction factor The model obtains the accuracy of feature information, analyzes the correlation between candidate cover features and user preferences corresponding to different attractive factors, and effectively improves the attractiveness of the video cover to users.
实施例四Example four
图4为本申请实施例四提供的一种视频发布过程的原理示意图。本实施例是在上述实施例的基础上进行说明。如图4所示,本实施例主要针对候选封面的获取过程以及视频发布过程进行解释说明。FIG. 4 is a schematic diagram of the principle of a video publishing process provided in Embodiment 4 of this application. This embodiment is described on the basis of the above embodiment. As shown in FIG. 4, this embodiment mainly explains the process of obtaining the candidate cover and the process of publishing the video.
本实施例可以包括如下步骤:This embodiment may include the following steps:
S410,获取待发布视频中满足预设条件的多个初始视频帧。S410: Acquire multiple initial video frames that meet a preset condition in the video to be released.
本实施例在获取待发布视频中的候选封面时,首先可以确定在全部视频帧中初始筛选所应满足的预设条件,该预设条件可以是能够在全部视频帧中过滤掉昏暗、模糊或纯色等视频帧的条件,此时在待发布视频包含的全部视频帧中筛选出满足预设条件的初始视频帧,后续在该初始视频帧中继续筛选对应的候选封面,本实施例中的预设条件筛选可以采用快速算法,在整个视频发布流程中占用的开销非常小,提高了处理速率。In this embodiment, when obtaining the candidate cover art in the video to be released, the preset condition that should be met for initial screening in all video frames can be determined first. The preset condition may be able to filter out dimness, blurry, or blurring in all video frames. In this case, the initial video frame that meets the preset conditions is selected from all the video frames included in the video to be published, and then the corresponding candidate covers are continued to be filtered in the initial video frame. The pre-selection in this embodiment The conditional screening can adopt fast algorithms, and the overhead in the entire video publishing process is very small, which improves the processing rate.
S420,采用聚类算法对多个初始视频帧进行处理,得到两个或两个以上的聚类集。S420: Use a clustering algorithm to process multiple initial video frames to obtain two or more cluster sets.
本实施例中,在得到待发布视频中的初始视频帧时,可以采用聚类算法对初始视频帧进行处理,从而将初始视频帧中具有相似特征的视频帧分为一类,得到两个或两个以上的聚类集,该聚类集中可以包括至少一个初始视频帧。In this embodiment, when the initial video frame in the video to be published is obtained, the clustering algorithm can be used to process the initial video frame, thereby classifying the video frames with similar characteristics in the initial video frame into one category, and obtaining two or Two or more cluster sets, and the cluster set may include at least one initial video frame.
S430,从每个聚类集中选取一个目标视频帧,作为待发布视频的候选封面。S430: Select a target video frame from each cluster set as a candidate cover of the video to be published.
可选的,在得到两个或两个以上的聚类集后,可以在每一聚类集中选取一个目标视频帧,作为该聚类集对应的候选封面,从而得到待发布视频中两个或两个以上的候选封面,此时可以排除掉重复的视频帧,减少对具有相似视频画面的重复操作,提高待发布视频中视频封面的准确性。Optionally, after two or more cluster sets are obtained, one target video frame can be selected from each cluster set as the candidate cover of the cluster set, so as to obtain two or more of the videos to be published. With more than two candidate covers, duplicate video frames can be eliminated at this time, reducing repeated operations on similar video images, and improving the accuracy of video covers in the video to be published.
S440,将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到候选封面在预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息。S440: Input each video frame as a candidate cover into different types of neural network models constructed in advance, and obtain feature information of the candidate cover under different attraction factors corresponding to the different types of neural network models constructed in advance.
S450,对候选封面在不同吸引因子下的特征信息进行归一化处理,得到候选封面的特征向量。S450: Perform normalization processing on feature information of the candidate cover under different attraction factors to obtain a feature vector of the candidate cover.
可选的,在得到候选封面在不同吸引因子下的特征信息时,该特征信息是针对不同维度下的不同特征,此时考虑到每个维度的量纲问题,可以采用归一化的方式对候选封面在不同吸引因子下的特征信息进行预处理,从而得到候选封面的特征向量。示例性的,若每个候选封面在不同吸引因子下的特征信息为
Figure PCTCN2020072191-appb-000017
其中,n为候选封面的序号标识,t为吸引因子的序号标识,
Figure PCTCN2020072191-appb-000018
表示第n个候选封面在第t个吸引因子下的特征信息;此时通过下述公式对候选封面在不同吸引因子下的特征信息进行归一化处理:
Figure PCTCN2020072191-appb-000019
其中,
Figure PCTCN2020072191-appb-000020
N为候选封面的数量;
Figure PCTCN2020072191-appb-000021
此时得到归一化后的每个候选封面的特征向量为
Figure PCTCN2020072191-appb-000022
Optionally, when the feature information of the candidate cover under different attractive factors is obtained, the feature information is for different features in different dimensions. At this time, considering the dimension of each dimension, the normalization method can be used to The feature information of the candidate cover under different attractive factors is preprocessed to obtain the feature vector of the candidate cover. Exemplarily, if the feature information of each candidate cover under different attractive factors is
Figure PCTCN2020072191-appb-000017
Among them, n is the serial number identification of the candidate cover, t is the serial number identification of the attraction factor,
Figure PCTCN2020072191-appb-000018
Represents the feature information of the nth candidate cover under the t-th attraction factor; at this time, the feature information of the candidate cover under different attraction factors is normalized by the following formula:
Figure PCTCN2020072191-appb-000019
among them,
Figure PCTCN2020072191-appb-000020
N is the number of candidate covers;
Figure PCTCN2020072191-appb-000021
At this time, the normalized feature vector of each candidate cover is
Figure PCTCN2020072191-appb-000022
S460,根据两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定待发布视频的视频封面。S460: Determine the video cover of the video to be published according to the feature vectors corresponding to the candidate covers corresponding to the two or more video frames.
S470,根据视频封面在线上发布待发布视频。S470: Publish the to-be-published video online according to the video cover.
可选的,在确定待发布视频的视频封面时,根据该视频封面在线上发布该待发布视频,提高待发布视频的点击率和观看次数。Optionally, when determining the video cover of the video to be published, publish the video to be published online according to the video cover to increase the click rate and the number of views of the video to be published.
本实施例提供的技术方案,通过对待发布视频中满足预设条件的初始视频帧进行聚类,并从得到的多个聚类集中分别选取目标视频帧,作为该待发布视频的候选封面,减少了视频封面确定的视频帧处理数量,提高了视频封面的确定速率,根据该视频封面在线上发布对应的待发布视频,提高待发布视频在发布后的点击率和观看次数。The technical solution provided in this embodiment clusters the initial video frames that meet preset conditions in the video to be released, and selects target video frames from the obtained multiple cluster sets as candidate covers of the video to be released, reducing The number of video frame processing determined by the video cover is increased, and the determination rate of the video cover is increased. The corresponding to-be-published video is published online according to the video cover, and the click-through rate and the number of views of the to-be-published video after publishing are increased.
实施例五Example five
图5为本申请实施例五提供的一种视频发布装置的结构示意图,如图5所示,该装置可以包括:候选封面获取模块510,设置为获取待发布视频中两个或两个以上作为候选封面的视频帧;特征信息确定模块520,设置为将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到候选封面在预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息,其中,该不同吸引因子表示吸引用户来点击或观看待发布视频的不同因素;特征向量确定模块530,设置为根据候选封面在不同吸引因子下的特征信息,确定候选封面的特征向量;视频封面确定模块540,设置为根据两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定待发布视频的视频封面。FIG. 5 is a schematic structural diagram of a video publishing device provided in Embodiment 5 of the application. As shown in FIG. 5, the device may include: a candidate cover acquisition module 510, configured to acquire two or more of the videos to be published as The video frame of the candidate cover; the feature information determining module 520 is configured to input each video frame as a candidate cover into the different types of neural network models constructed in advance, and obtain the corresponding values of the candidate cover in the different types of neural network models constructed in advance. The feature information under different attraction factors, where the different attraction factors represent different factors that attract users to click or watch the video to be published; the feature vector determining module 530 is set to determine the candidate cover based on the feature information of the candidate cover under different attraction factors The feature vector of the cover; the video cover determining module 540 is configured to determine the video cover of the video to be published according to the feature vectors corresponding to the candidate covers corresponding to two or more video frames.
本实施例提供的技术方案,通过在待发布视频中获取两个或两个以上视频帧作为候选封面,将每个候选封面输入不同吸引因子对应的神经网络模型中,得到在不同吸引因子下的特征信息,生成多个吸引因子维度下的特征向量,根据多个候选封面的特征向量,确定对应的视频封面,提升了视频封面与不同吸引因子对应的用户喜好之间的关联度,无需人工挑选视频封面,实现了待发布视频中视频封面的智能确定,解决了通过单一因素选取视频封面,无法满足不同用户多样化的点击和观看需求的问题,此时的视频封面对用户存在较大吸引力,提高了待发布视频在发布后的点击率和观看次数。In the technical solution provided by this embodiment, two or more video frames are obtained as candidate covers in the video to be released, and each candidate cover is input into the neural network model corresponding to different attractive factors, and the results under different attractive factors are obtained. Feature information, generate feature vectors in multiple attractive factor dimensions, and determine the corresponding video cover based on the feature vectors of multiple candidate covers, which improves the correlation between the video cover and user preferences corresponding to different attractive factors, without manual selection The video cover realizes the intelligent determination of the video cover in the video to be released, and solves the problem of selecting the video cover through a single factor, which cannot meet the diverse click and viewing needs of different users. At this time, the video cover is more attractive to users , Which improves the click-through rate and number of views of the video to be published after it is published.
实施例六Example Six
图6为本申请实施例六提供的一种设备的结构示意图,如图6所示,该设备包括处理器60、存储装置61和通信装置62。FIG. 6 is a schematic structural diagram of a device provided in Embodiment 6 of this application. As shown in FIG. 6, the device includes a processor 60, a storage device 61, and a communication device 62.
存储装置61作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中提供的视频发布方法对应的程序指令/模块。处理器60通过运行存储在存储装置61中的软件程序、指令以及模块,从而执行设备的多种功能应用以及数据处理,即实现上述视频发布方法。The storage device 61, as a computer-readable storage medium, can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the video publishing method provided in the embodiments of the present application. The processor 60 executes various functional applications and data processing of the device by running the software programs, instructions, and modules stored in the storage device 61, that is, realizes the foregoing video publishing method.
实施例七Example Seven
本申请实施例七还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时可实现上述任意实施例中的视频发布方法。该方法可以包括:获取待发布视频中两个或两个以上作为候选封面的视频帧;将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到候选封面在预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息,其中,该不同吸引因子表示吸引用户来点击或观看待发布视频的不同因素;根据候选封面在不同吸引因子下的特征信息,确定候选封面的特征向量;根据两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定待发布视频的视频封面。The seventh embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the video publishing method in any of the foregoing embodiments can be implemented. The method may include: obtaining two or more video frames as candidate covers in a video to be published; inputting each video frame as a candidate cover into a pre-built different type of neural network model, and obtaining the candidate cover in the pre-built The different types of neural network models correspond to the feature information under different attractive factors, where the different attractive factors represent different factors that attract users to click or watch the video to be published; according to the feature information of the candidate cover under different attractive factors, determine The feature vector of the candidate cover; the video cover of the video to be published is determined according to the feature vector corresponding to the candidate cover corresponding to two or more video frames.

Claims (10)

  1. 一种视频发布方法,包括:A video publishing method, including:
    获取待发布视频中两个或两个以上作为候选封面的视频帧;Obtain two or more video frames as candidate cover art in the video to be published;
    将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到所述候选封面在所述预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息,其中,所述不同吸引因子表示吸引用户来点击或观看待发布视频的不同因素;Input each video frame as a candidate cover into different types of neural network models constructed in advance, and obtain feature information of the candidate cover under different attraction factors corresponding to the different types of neural network models constructed in advance, where: The different attraction factors represent different factors that attract users to click or watch the video to be published;
    根据所述候选封面在所述不同吸引因子下的特征信息,确定所述候选封面的特征向量;Determine the feature vector of the candidate cover according to the feature information of the candidate cover under the different attraction factors;
    根据所述两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定所述待发布视频的视频封面。Determine the video cover of the to-be-published video according to feature vectors corresponding to the candidate covers corresponding to the two or more video frames.
  2. 根据权利要求1所述的方法,在所述获取待发布视频中两个或两个以上作为候选封面的视频帧之前,还包括:The method according to claim 1, before said acquiring two or more video frames as candidate cover art in the video to be published, further comprising:
    获取预先设定的两个或两个以上的训练吸引因子;Obtain two or more pre-set training attraction factors;
    在当前训练吸引因子对应的点击指标量大于上一训练吸引因子对应的点击指标量的情况下,将所述当前训练吸引因子对应的神经网络模型作为所述预先构建的不同类型的神经网络模型中的一个,直至遍历所述两个或两个以上的训练吸引因子,得到所述预先构建的不同类型的神经网络模型。In the case that the amount of click indicators corresponding to the current training attraction factor is greater than the amount of click indicators corresponding to the previous training attraction factor, the neural network model corresponding to the current training attraction factor is used as the pre-built different types of neural network models One of the steps until the two or more training attraction factors are traversed to obtain the pre-built neural network models of different types.
  3. 根据权利要求1所述的方法,在所述获取待发布视频中两个或两个以上作为候选封面的视频帧之前,还包括:The method according to claim 1, before said acquiring two or more video frames as candidate cover art in the video to be published, further comprising:
    获取预先设定的训练吸引因子,以及所述训练吸引因子对应的历史视频中两个或两个以上作为历史候选封面的历史视频帧,并确定每个历史候选封面在所述训练吸引因子下的特征标签;Obtain a preset training attraction factor, and two or more historical video frames in the historical videos corresponding to the training attraction factor as historical candidate covers, and determine the value of each historical candidate cover under the training attraction factor Feature tag
    将每个历史候选封面输入所述训练吸引因子对应的预设模型中,得到所述历史候选封面在所述训练吸引因子下的历史特征信息;Input each historical candidate cover into a preset model corresponding to the training attraction factor to obtain historical feature information of the historical candidate cover under the training attraction factor;
    根据所述历史候选封面在所述训练吸引因子下的历史特征信息和特征标签,确定训练损失,采用随机梯度下降的方法调整所述预设模型的参数,使得所述训练损失小于既定的损失阈值,将最终的预设模型作为所述训练吸引因子对应的神经网络模型。Determine the training loss according to the historical feature information and feature labels of the historical candidate cover under the training attraction factor, and use the stochastic gradient descent method to adjust the parameters of the preset model so that the training loss is less than the predetermined loss threshold , Taking the final preset model as the neural network model corresponding to the training attraction factor.
  4. 根据权利要求1所述的方法,其中,所述根据所述两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定所述待发布视频的视频封面,包括:The method according to claim 1, wherein the determining the video cover of the video to be published according to the feature vectors corresponding to the candidate covers corresponding to the two or more video frames respectively comprises:
    将所述两个或两个以上视频帧对应的候选封面分别对应的特征向量输入预 先构建的排序模型中,得到所述两个或两个以上视频帧对应的候选封面分别对应的排序得分;Input the feature vectors corresponding to the candidate covers corresponding to the two or more video frames into the pre-built ranking model to obtain the ranking scores corresponding to the candidate covers corresponding to the two or more video frames;
    根据所述两个或两个以上视频帧对应的候选封面分别对应的排序得分,确定所述待发布视频的视频封面。The video cover of the video to be published is determined according to the ranking scores respectively corresponding to the candidate cover of the two or more video frames.
  5. 根据权利要求1所述的方法,在所述确定所述待发布视频的视频封面之后,还包括:The method according to claim 1, after said determining the video cover of the video to be published, further comprising:
    根据所述视频封面在线上发布所述待发布视频。Publish the video to be published online according to the video cover.
  6. 根据权利要求1所述的方法,其中,所述根据所述候选封面在所述不同吸引因子下的特征信息,确定所述候选封面的特征向量,包括:The method according to claim 1, wherein the determining the feature vector of the candidate cover according to the feature information of the candidate cover under the different attraction factors comprises:
    对所述候选封面在所述不同吸引因子下的特征信息进行归一化处理,得到所述候选封面的特征向量。Performing normalization processing on the feature information of the candidate cover under the different attraction factors to obtain the feature vector of the candidate cover.
  7. 根据权利要求1所述的方法,其中,所述获取待发布视频中两个或两个以上作为候选封面的视频帧,包括:The method according to claim 1, wherein said obtaining two or more video frames as candidate cover art in the video to be published comprises:
    获取所述待发布视频中满足预设条件的多个初始视频帧;Acquiring multiple initial video frames that meet preset conditions in the video to be published;
    采用聚类算法对所述多个初始视频帧进行处理,得到两个或两个以上的聚类集;Using a clustering algorithm to process the multiple initial video frames to obtain two or more cluster sets;
    从每个聚类集中选取一个目标视频帧,作为所述待发布视频的候选封面。A target video frame is selected from each cluster set as the candidate cover of the video to be published.
  8. 一种视频发布装置,包括:A video publishing device includes:
    候选封面获取模块,设置为获取待发布视频中两个或两个以上作为候选封面的视频帧;Candidate cover acquisition module, configured to acquire two or more video frames as candidate covers in the video to be published;
    特征信息确定模块,设置为将作为候选封面的每个视频帧输入预先构建的不同类型的神经网络模型中,得到所述候选封面在所述预先构建的不同类型的神经网络模型对应的不同吸引因子下的特征信息,其中,所述不同吸引因子表示吸引用户来点击或观看待发布视频的不同因素;The feature information determining module is configured to input each video frame as a candidate cover into a different type of neural network model constructed in advance to obtain different attraction factors corresponding to the different type of neural network model for the candidate cover Feature information below, where the different attractive factors represent different factors that attract users to click or watch the video to be released;
    特征向量确定模块,设置为根据所述候选封面在所述不同吸引因子下的特征信息,确定所述候选封面的特征向量;A feature vector determining module, configured to determine the feature vector of the candidate cover according to the feature information of the candidate cover under the different attraction factors;
    视频封面确定模块,设置为根据所述两个或两个以上视频帧对应的候选封面分别对应的特征向量,确定所述待发布视频的视频封面。The video cover determining module is configured to determine the video cover of the video to be published according to the feature vectors corresponding to the candidate covers corresponding to the two or more video frames.
  9. 一种设备,包括:A device that includes:
    一个或多个处理器;One or more processors;
    存储装置,设置为存储一个或多个程序;Storage device, set to store one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的视频发布方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the video publishing method according to any one of claims 1-7.
  10. 一种计算机可读存储介质,存储有计算机程序,所述程序被处理器执行时实现如权利要求1-7中任一所述的视频发布方法。A computer-readable storage medium storing a computer program, which when executed by a processor, realizes the video publishing method according to any one of claims 1-7.
PCT/CN2020/072191 2019-01-29 2020-01-15 Video publishing method, apparatus and device, and storage medium WO2020156171A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910087567.XA CN111491202B (en) 2019-01-29 2019-01-29 Video publishing method, device, equipment and storage medium
CN201910087567.X 2019-01-29

Publications (1)

Publication Number Publication Date
WO2020156171A1 true WO2020156171A1 (en) 2020-08-06

Family

ID=71794177

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072191 WO2020156171A1 (en) 2019-01-29 2020-01-15 Video publishing method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN111491202B (en)
WO (1) WO2020156171A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112689187A (en) * 2020-12-17 2021-04-20 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN112800276B (en) * 2021-01-20 2023-06-20 北京有竹居网络技术有限公司 Video cover determining method, device, medium and equipment
CN113111222B (en) * 2021-03-26 2024-03-19 北京达佳互联信息技术有限公司 Short video template generation method, device, server and storage medium
CN113315984B (en) * 2021-05-21 2022-07-08 北京达佳互联信息技术有限公司 Cover display method, device, system, equipment and storage medium
CN113821678B (en) * 2021-07-21 2024-04-12 腾讯科技(深圳)有限公司 Method and device for determining video cover

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881798A (en) * 2015-06-05 2015-09-02 北京京东尚科信息技术有限公司 Device and method for personalized search based on commodity image features
CN106503693A (en) * 2016-11-28 2017-03-15 北京字节跳动科技有限公司 The offer method and device of video front cover
CN107832725A (en) * 2017-11-17 2018-03-23 北京奇虎科技有限公司 Video front cover extracting method and device based on evaluation index
US10123068B1 (en) * 2014-05-12 2018-11-06 Tunespotter, Inc. System, method, and program product for generating graphical video clip representations associated with video clips correlated to electronic audio files
CN109165301A (en) * 2018-09-13 2019-01-08 北京字节跳动网络技术有限公司 Video cover selection method, device and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4958748B2 (en) * 2007-11-27 2012-06-20 キヤノン株式会社 Audio processing device, video processing device, and control method thereof
CN106599208B (en) * 2016-12-15 2022-05-06 腾讯科技(深圳)有限公司 Content sharing method and user client
CN107958030B (en) * 2017-11-17 2021-08-24 北京奇虎科技有限公司 Video cover recommendation model optimization method and device
CN108595493B (en) * 2018-03-15 2022-02-08 腾讯科技(深圳)有限公司 Media content pushing method and device, storage medium and electronic device
CN108650524B (en) * 2018-05-23 2022-08-16 腾讯科技(深圳)有限公司 Video cover generation method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10123068B1 (en) * 2014-05-12 2018-11-06 Tunespotter, Inc. System, method, and program product for generating graphical video clip representations associated with video clips correlated to electronic audio files
CN104881798A (en) * 2015-06-05 2015-09-02 北京京东尚科信息技术有限公司 Device and method for personalized search based on commodity image features
CN106503693A (en) * 2016-11-28 2017-03-15 北京字节跳动科技有限公司 The offer method and device of video front cover
CN107832725A (en) * 2017-11-17 2018-03-23 北京奇虎科技有限公司 Video front cover extracting method and device based on evaluation index
CN109165301A (en) * 2018-09-13 2019-01-08 北京字节跳动网络技术有限公司 Video cover selection method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN111491202A (en) 2020-08-04
CN111491202B (en) 2021-06-15

Similar Documents

Publication Publication Date Title
WO2020156171A1 (en) Video publishing method, apparatus and device, and storage medium
US10515443B2 (en) Utilizing deep learning to rate attributes of digital images
Wang et al. Revisiting video saliency: A large-scale benchmark and a new model
Shen et al. Fast video classification via adaptive cascading of deep models
Jin et al. Image aesthetic predictors based on weighted CNNs
JP5506722B2 (en) Method for training a multi-class classifier
CN110909205B (en) Video cover determination method and device, electronic equipment and readable storage medium
CN110737783A (en) method, device and computing equipment for recommending multimedia content
US20160110794A1 (en) E-commerce recommendation system and method
US20200241716A1 (en) Image display with selective depiction of motion
CN108229262B (en) Pornographic video detection method and device
CN111597446B (en) Content pushing method and device based on artificial intelligence, server and storage medium
CN109189889B (en) Bullet screen recognition model establishing method, device, server and medium
CN111090778A (en) Picture generation method, device, equipment and storage medium
CN108549857B (en) Event detection model training method and device and event detection method
CN111832952B (en) Education courseware pushing system
CN106874922B (en) Method and device for determining service parameters
CN112925924A (en) Multimedia file recommendation method and device, electronic equipment and storage medium
CN111432206A (en) Video definition processing method and device based on artificial intelligence and electronic equipment
WO2019232723A1 (en) Systems and methods for cleaning data
CN110390315B (en) Image processing method and device
CN113569081A (en) Image recognition method, device, equipment and storage medium
CN109214400A (en) Classifier training method, apparatus, equipment and computer readable storage medium
CN108665455B (en) Method and device for evaluating image significance prediction result
CN111309706A (en) Model training method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20749492

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20749492

Country of ref document: EP

Kind code of ref document: A1