WO2022188563A1 - 动态封面设置方法和*** - Google Patents

动态封面设置方法和*** Download PDF

Info

Publication number
WO2022188563A1
WO2022188563A1 PCT/CN2022/072819 CN2022072819W WO2022188563A1 WO 2022188563 A1 WO2022188563 A1 WO 2022188563A1 CN 2022072819 W CN2022072819 W CN 2022072819W WO 2022188563 A1 WO2022188563 A1 WO 2022188563A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
bullet screen
feature information
video file
frame
Prior art date
Application number
PCT/CN2022/072819
Other languages
English (en)
French (fr)
Inventor
时英选
Original Assignee
上海哔哩哔哩科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海哔哩哔哩科技有限公司 filed Critical 上海哔哩哔哩科技有限公司
Publication of WO2022188563A1 publication Critical patent/WO2022188563A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/26603Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4666Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Definitions

  • the embodiments of the present application relate to the field of computer technologies, and in particular, to a method, system, computer device, and computer-readable storage medium for setting a dynamic cover.
  • UGC User Generated Content
  • the purpose of the embodiments of the present application is to provide a dynamic cover setting method, system, computer device, and computer-readable storage medium, which are used to solve the following problems: the technology learned by the inventor has poor cover experience and low click-through rate.
  • An aspect of the embodiments of the present application provides a dynamic cover setting method, the method includes determining a target video segment from a video file; and extracting the target video segment, and obtaining the video file according to the target video segment.
  • Dynamic cover art includes determining a target video segment from a video file; and extracting the target video segment, and obtaining the video file according to the target video segment.
  • the determining the target video segment from the video file includes: acquiring multiple bullet screens of the video file, each bullet screen being associated with a time point on the timeline of the video file; At a time point on the time axis associated with each bullet screen, obtain the bullet screen density distribution on the time axis; according to the bullet screen density distribution, filter out one or more video clips with the highest bullet screen density in the video file ; and determining the one or more video clips or the one or more video clips carrying the bullet screen as the target video clip.
  • the acquiring multiple bullet screens of the video file includes: acquiring all the bullet screens of the video file; and, according to the bullet screen content of each bullet screen in the all bullet screen Filter out multiple invalid bullet screens to obtain the multiple bullet screen; wherein, the multiple invalid bullet screens include: the bullet screen content has nothing to do with the video content of the video file, and/or the bullet screen content A bullet screen that has nothing to do with the video picture of the video file.
  • determining a target video clip from a video file includes: dividing the video file into M video clips, where M is a positive integer greater than 1; performing quality scores on each video clip; and The quality score of the target video segment is determined from the M video segments.
  • the performing quality scoring on each video clip includes: according to the bullet screen feature information of the each video clip and/or the frame feature information of each frame in the each video clip, assigning a rating to each video clip.
  • performing quality scoring on each video clip includes: extracting frame feature information of each frame in the i-th video clip, where 1 ⁇ i ⁇ M, where i is a positive integer; and according to the i-th video clip The frame feature information of each frame in the i-th video segment is subjected to a quality score.
  • performing a quality score on the ith video clip according to the frame feature information of each frame in the ith video clip includes: according to the picture feature information and the frame feature information of each frame , performing quality scoring on the i-th video clip; wherein, the picture feature information is feature information of a target static picture, and the target static picture includes a static cover picture of the video file.
  • performing a quality score on the i-th video clip according to the picture feature information and the frame feature information of the respective frames includes: according to the time sequence order of the M frames, assigning the The frame feature information of the frame is sequentially input into the LSTM model to obtain M output vectors through the LSTM model, and the M output vectors are in one-to-one correspondence with the M frames; the vector matrix formed by the M output vectors Perform convolution and pooling operations to obtain a first feature vector; obtain a second feature vector according to the picture feature information; splicing the first feature vector and the second feature vector to obtain a splicing vector; A linear regression operation is performed on the spliced vector to obtain a quality score corresponding to the i-th video segment.
  • Another aspect of the embodiments of the present application further provides a dynamic cover setting system, including: a determination module for determining a target video segment from a video file; and a setting module for extracting the target video segment, and according to the The target video segment obtains the dynamic cover image of the video file.
  • Another aspect of the embodiments of the present application further provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, when the processor executes the computer-readable instructions Implement the following steps:
  • Another aspect of the embodiments of the present application provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the dynamic cover setting method, system, device, and computer-readable storage medium provided by the embodiments of the present application can extract key or highlight clips (that is, target video clips) of video files, and obtain dynamic cover images according to the target video clips, thereby having The following advantages:
  • the dynamic cover image and the video file itself have a strong correlation, which can optimize the user's experience when browsing and selecting videos, and prevent users from mistakenly clicking on the cover that does not match the video content. And watch video content that does not meet user expectations to avoid wasting data traffic.
  • FIG. 1 schematically shows an application environment diagram of a dynamic cover page setting method according to an embodiment of the present application
  • FIG. 2 schematically shows a flowchart of a method for setting a dynamic cover page according to Embodiment 1 of the present application
  • Fig. 3 is the sub-step flow chart of step S200 in Fig. 2;
  • Fig. 4 is another sub-step flowchart of step S300 in Fig. 3;
  • Figure 5 is an example diagram of implementing bullet screen screening
  • Fig. 6 is another sub-step flowchart of step S200 in Fig. 2;
  • Fig. 7 is the sub-step flow chart of step S602 in Fig. 6;
  • Fig. 8 is another sub-step flowchart of step S602 in Fig. 6;
  • Fig. 9 is another sub-step flowchart of step S702 in Fig. 7;
  • Fig. 10 is the sub-step flow chart of step S900 in Fig. 9;
  • 11 is an example diagram of identifying target video clips by artificial intelligence
  • FIG. 12 schematically shows a block diagram of a dynamic cover page setting system according to Embodiment 2 of the present application.
  • FIG. 13 schematically shows a schematic diagram of a hardware architecture of a computer device suitable for implementing a method for setting a dynamic cover page according to Embodiment 3 of the present application.
  • the video cover will have the following defects:
  • the cover of the video does not match the content of the video, which often occurs in the submissions published by the cover party and the title party;
  • the above-mentioned defects waste the viewer's time and reduce the video viewing experience, which may result in a lower click-through rate of some video contents.
  • the present application provides a number of embodiments to solve the above-mentioned shortcomings, with specific reference to the following.
  • LSTM Long Short-Term Memory, long short-term memory network
  • Recurrent Neural Networks Recurrent Neural Networks
  • Gate Gate
  • the density distribution also known as the probability density distribution, refers to the probability of an event occurring randomly. For example, for a uniform distribution function, the density distribution is equal to the probability of an interval (the range of events) divided by the length of the interval.
  • Dynamic cover art which is a video clip with multiple frames.
  • a bullet screen is a subtitle that pops up and moves in a predetermined direction when watching a video on the Internet.
  • Barrage has no fixed vocabulary in English, it is usually called: comment, danmaku, barrage, bullet screen, bullet-screen comment, etc.
  • Barrage allows video viewers to comment or express their thoughts, but unlike ordinary video sharing sites that only display in the dedicated comment area under the player, it will appear on the video screen in real time in the form of sliding subtitles to ensure that all viewers can notice .
  • Some bullet screen systems use scripting languages to provide specific bullet screen forms, such as the appearance or disappearance of the bullet screen at a specific position, controlling the speed of the bullet screen, and the position of the bullet screen.
  • the bullet screen that appears at the bottom or top of the screen will also be used as subtitles for videos without subtitles.
  • each barrage can include the following information:
  • FIG. 1 schematically shows a schematic diagram of an environmental application according to an embodiment of the present application. As shown in Figure 1:
  • the provider network 2 can connect a plurality of mobile terminals 6 through the network 4 .
  • the provider network 2 can provide content services.
  • Content services may include content streaming services such as Internet Protocol video streaming services.
  • Content streaming services may be configured to distribute content via various transport technologies.
  • Content services may be configured to provide content such as video, audio, textual data, combinations thereof, and the like.
  • Content may include content streams (eg, video streams, audio streams, information streams), content files (eg, video files, audio files, text files), and/or other data.
  • the provider network 2 may implement a bullet chat service configured to allow users to comment and/or share comments associated with content, ie bullet chat.
  • the bullet chat is presented on the same screen with the content.
  • a bullet chat can appear in an overlay above the content.
  • Bullets may be animated when displayed.
  • the barrage can be scrolled (for example, right to left, left to right, top to bottom, bottom to top), and this animation effect can be implemented based on the transition property of CSS3 (cascading style sheets) of.
  • the provider network 2 may be located in a data center, such as a single site, or distributed in different geographic locations (eg, in multiple sites).
  • the provider network 2 may provide services via one or more networks 4 .
  • Network 4 includes various network devices such as routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices and/or the like.
  • the network 4 may include physical links such as coaxial cable links, twisted pair cable links, fiber optic links, combinations thereof, and the like.
  • the network 4 may include wireless links, such as cellular links, satellite links, Wi-Fi links, and the like.
  • the provider network 2 may be configured to receive multiple messages.
  • the plurality of messages may include a plurality of bullet screens associated with the content.
  • the provider network 2 may be configured to manage messages for various content items. Users can browse content and access different content items to view comments for a particular content, such as comments posted by other users for that particular content. Comments from users associated with a particular content item may be output to other users viewing the particular content item. For example, all users accessing a content item (eg, a video clip) can view comments associated with that content item.
  • the input comment content can be output in real-time or near real-time.
  • the provider network 2 may be configured to process multiple messages, eg, various processing operations such as message storage, message screening, message push, and the like.
  • the message store is used to store a plurality of messages in a data store such as a database.
  • Message filtering can include rejecting or flagging messages that match filtering criteria.
  • filter criteria can specify terms and/or phrases such as profanity, hate speech, indecent language, etc.
  • Filter criteria can specify characters such as symbols, fonts, etc. Filter criteria can specify language, computer readable code patterns, etc.
  • the provider network 2 may perform natural language processing, subject recognition, pattern recognition, artificial intelligence, etc. to automatically characterize and/or group messages.
  • frequently occurring phrases or patterns may be identified as topics.
  • a database of topics associated with content may be maintained.
  • Themes may include genre (eg, action, drama, comedy), personality (eg, actor, actress, director), language, and the like.
  • the messages may be grouped based on characteristics of the client device and/or the user sending the message. Demographics, interests, history, and/or the like can be stored for multiple users to determine potential groupings of messages.
  • the provider network 2 may also identify highlights, pictures, etc. in the video file based on artificial intelligence.
  • the provider network 2 may be implemented by one or more computing nodes.
  • One or more compute nodes may include virtualized compute instances.
  • Virtualized computing instances may include virtual machines, such as emulations of computer systems, operating systems, servers, and the like.
  • the compute node may load the virtual machine by the compute node based on the virtual image and/or other data that defines the specific software (eg, operating system, dedicated application, server) used for emulation. Different virtual machines may be loaded and/or terminated on one or more compute nodes as demands for different types of processing services change.
  • a hypervisor can be implemented to manage the usage of different virtual machines on the same compute node.
  • a plurality of mobile terminals 6 may be configured to access the content and services of the provider network 2 .
  • the plurality of mobile terminals 6 may include any type of electronic device, such as mobile devices, tablet devices, laptops, workstations, virtual reality devices, gaming devices, set-top boxes, digital streaming devices, vehicle terminals, smart TVs, set-top boxes, and the like.
  • the plurality of mobile terminals 6 may output (eg, display, render, present) content (video, etc.) to the user.
  • the mobile terminal 6 may also identify highlights and the like in the video file based on artificial intelligence.
  • the provider network 2 (or the mobile terminal 6) can extract the highlights of the video file and use the highlight of the video file as its dynamic cover image, so as to improve the user experience and increase the cover of the video file
  • the interestingness of the video file can attract the attention of other users and increase the click-through rate of the video file.
  • the provider network 2 can filter out high-quality video files from a large number of video files, extract highlights of the high-quality video files, and use the highlights of the high-quality video files as their dynamic cover images to optimize users Experience when browsing and selecting videos, and improve the click-through rate of high-quality video files.
  • the solution can be implemented by a computer device 1300 , and the computer device 1300 can be the provider network 2 or its computing node, or it can be the mobile terminal 6 .
  • FIG. 2 schematically shows a flowchart of a method for setting a dynamic cover page according to Embodiment 1 of the present application.
  • the dynamic cover setting method may include steps S200-S202, wherein:
  • Step S200 determining the target video segment from the video file.
  • the video file may be a video manuscript based on various video formats, such as: AVI (Audio Video Interleaved, audio and video interleaved) format, H.264/AVC (Advanced Video Coding, advanced video coding), H.265/HEVC (High Efficiency Video Coding, High Efficiency Video Coding) H.265 format, etc.
  • AVI Audio Video Interleaved, audio and video interleaved
  • H.264/AVC Advanced Video Coding, advanced video coding
  • H.265/HEVC High Efficiency Video Coding, High Efficiency Video Coding
  • the target video clip may be a wonderful video clip in the video file.
  • whether a video clip is a wonderful video clip can be judged by the activeness of speeches of a large number of audiences, by artificial intelligence (eg, a trained neural network model), or by other methods.
  • artificial intelligence eg, a trained neural network model
  • Step S202 extracting the target video clip, and obtaining a dynamic cover image of the video file according to the target video clip.
  • the computer device 1300 may automatically trim the video file to obtain the target video clip, and use the target video clip as a material for making the dynamic cover image.
  • the target video clip can be directly set as the dynamic cover image.
  • the target video segment can be processed, and the processed video content can be used as the dynamic cover image.
  • the process may add video rendering special effects (such as 2D stickers), compositing some wonderful frames, and the like.
  • the target video segment includes multiple sub-video segments of different time segments, it is necessary to synthesize the multiple sub-video segments, or select one or more sub-video segments from the multiple sub-video segments and select the Synthesize one or more sub-video clips, or extract multiple key frames from the multiple sub-video clips and synthesize the multiple key frames, and use the synthesized video clip as the dynamic cover image.
  • the dynamic cover image setting method provided by the embodiment of the present application can extract the key or highlight of the video file (that is, the target video fragment), and obtain the dynamic cover image according to the target video fragment, thereby having the following advantages:
  • the dynamic cover image and the video file itself have a strong correlation, which can optimize the user's experience when browsing and selecting videos, and prevent users from mistakenly clicking on the cover that does not match the video content. And watch video content that does not meet user expectations to avoid wasting data traffic.
  • step S200 Several solutions for implementing step S200 are provided below:
  • a highlight video clip ie, the target video clip in the video file is searched.
  • the step of determining the target video segment from the video file may include steps S300 to S306, wherein: step S300 is to acquire multiple bullet screens of the video file, Each bullet screen is associated with a time point on the time axis of the video file; step S302, according to the time point on the time axis associated with each bullet screen, obtain the bullet screen density distribution on the time axis; step S304, according to the density distribution of the bullet screen, screen out one or more video clips with the highest bullet screen density in the video file; and step S306, select the one or more video clips or the one carrying the bullet screen or multiple video clips are determined as the target video clips.
  • the timeline can be represented by a progress bar.
  • the time interval in which the bullet screen is sent in a centralized manner generally corresponds to the key node of the corresponding video file, and this key node generally corresponds to the wonderful video clips, key video clips or video clips that easily attract the attention of a large number of users of the corresponding video file. .
  • this embodiment by analyzing the density distribution of the bullet screen, the target video segment that can effectively attract attention can be accurately found.
  • the video clip carrying the bullet screen is used as the target video clip, the information richness and user experience of the cover can be further provided.
  • the step S300 may include steps S400 to S402, wherein: in step S400, all the bullet screens of the video file are acquired; in step S402, according to the The bullet screen content of each bullet screen, and filter out multiple invalid bullet screen from all the bullet screen to obtain the multiple bullet screen; wherein, the multiple invalid bullet screen includes: the bullet screen content and the video A bullet screen that has nothing to do with the video content of the file, and/or a bullet screen whose content is not related to the video picture of the video file.
  • This embodiment can improve the efficiency and accuracy of screening the one or more video segments based on the bullet screen density distribution.
  • just meal time represents the advertising moment
  • 111 or 222 represents the interaction with the up master (content provider), etc.
  • the video file is searched for a highlight video segment (ie, the target video segment) based on the quality score.
  • determining the target video segment from the video file in step S200 may include steps S600 to S604, wherein: step S600, the video file is divided into M video segments , where M is a positive integer greater than 1; step S602 , perform a quality score on each video clip; and step S604 , determine the target video clip from the M video clips according to the quality score of each video clip.
  • the quality score of each video clip is achieved in various ways, such as:
  • Non-artificial intelligence methods such as weight-based evaluation methods
  • Example 1 According to the evaluation dimensions such as the number of bullet screens associated with the video clip A, the form of the bullet screen, and the type of the bullet screen user, and assign a weight coefficient to each evaluation dimension, and obtain the quality score of the video clip A through weighted calculation.
  • the evaluation dimensions such as the number of bullet screens associated with the video clip A, the form of the bullet screen, and the type of the bullet screen user
  • Example 2 Obtain the following information in the time interval corresponding to the video clip A: a progress bar drag event (for example, a drag event that drags the progress bar into the time interval, a drag event that drags the progress bar out of the time interval) Drag-in events), etc.; configure positive weight coefficients for drag-in events and negative weight coefficients for drag-out events, multiply the number of drag-in events and the data of drag-out events by their respective weight coefficients to obtain the quality of video clip A score.
  • a progress bar drag event for example, a drag event that drags the progress bar into the time interval, a drag event that drags the progress bar out of the time interval
  • Drag-in events etc.
  • configure positive weight coefficients for drag-in events and negative weight coefficients for drag-out events multiply the number of drag-in events and the data of drag-out events by their respective weight coefficients to obtain the quality of video clip A score.
  • the computer device 1300 can determine the target video segment according to the quality of the bullet screen or the video segment itself to a certain extent.
  • step S602 may also be implemented by the following steps: step S700 , according to the bullet screen feature information of each video clip and/or the characteristics of each frame in each video clip
  • the frame feature information is used to score the quality of each video clip; wherein, the bullet screen feature information includes the bullet screen density.
  • the bullet screen feature information may also include bullet screen content features and the like.
  • the highlight video clip (ie, the target video clip) in the video file can be searched more accurately through the bullet screen information in each video clip, the frame feature information of each frame, or a combination of the two. .
  • step S602 in order to accurately search for the wonderful video clip (ie, the target video clip) in the video file, as shown in FIG. 8 , step S602 can be implemented by the following steps: Step S800, extracting The frame feature information of each frame in the ith video segment, 1 ⁇ i ⁇ M, i is a positive integer; and step S802, according to the frame feature information of each frame in the ith video segment, for the ith video segment video clips for quality scoring.
  • the computer device 1300 performs the following operations: extracting frame feature information of each frame, such as a feature vector, through a convolutional neural network, etc.; inputting the frame feature information of each frame into the trained quality scoring model, which is evaluated by quality
  • the model outputs the quality score of the ith video clip.
  • the quality scoring model may be a model obtained based on various algorithms, such as based on an LSTM algorithm.
  • the inventors of the present invention have found that when the main uploader of the up video file is to upload a video file, a representative static cover image is usually selected. Therefore, the computer device 1300 can determine the target video segment with reference to the static cover image to a certain extent.
  • step S702 can also be implemented by the following steps: step S900 , according to the picture feature information and the frame feature information of each frame, perform a quality check on the i-th video clip scoring; wherein, the picture feature information is the feature information of the target static picture, and the target static picture includes the static cover picture of the video file.
  • the highlight video clip ie, the target video clip
  • the video file can be searched more precisely.
  • Step S804 may be implemented by various artificial intelligence models or a combination of artificial intelligence models.
  • step S900 may be implemented by the following steps: step S1000 , step S1000 , according to the time sequence of the M frames, the frame feature information of each frame is sequentially input into the In the LSTM model, M output vectors are obtained through the LSTM model, and the M output vectors are in one-to-one correspondence with the M frames; step S1002, convolution and pooling are performed on the vector matrix formed by the M output vectors Step S1004, obtaining a second feature vector according to the feature information of the picture; Step S1006, splicing the first feature vector and the second feature vector to obtain a splicing vector; Step S1008 , and perform a linear regression operation on the splicing vector to obtain the quality score corresponding to the ith video segment.
  • the long-term dependency of the LSTM model can be captured, and the relationship between each frame can be learned. Combined with the feature information of the static cover image, the accuracy of determining the target video segment can be improved.
  • the convolution operation is performed on each frame (X 1 , X 2 , ... X M ) in the ith video segment to obtain M feature vectors ( That is, frame feature information).
  • the CNN model may include 256 convolution kernels.
  • the 256 convolution kernels respectively perform convolution operations on frame X 1 to generate a feature vector x 1 corresponding to frame X 1
  • the feature vector x 1 is a one-dimensional vector, which includes 256 parameters, and each parameter is a convolution result obtained by performing a convolution operation on frame X 1 by one of the convolution kernels.
  • M feature vectors ie x 1 , x t , . . . x M , can be obtained through the CNN module.
  • f t decides whether to let the information C t-1 learned at time t-1 pass or partially pass.
  • f t ⁇ [0,1] represents the selection weight of the node at time t to the cell memory at time t-1
  • W f is the weight matrix of the forget gate
  • b f is the bias term of the forget gate
  • i t represents the selection weight of the node at time t to the current node information, which is used to decide which information should be kept.
  • i t ⁇ [0,1]
  • bi is the bias term of the input gate
  • Wi is the weight matrix of the input gate
  • the nonlinear function ⁇ (x) 1/(1+ex;
  • q t represents the new candidate value vector for updating the cell state.
  • b q is the bias term
  • W q represents the weight matrix of the information to be updated
  • tan is the hyperbolic tangent activation function.
  • o t represents one of the output vectors at time t.
  • b o is the bias of the output gate
  • W o is the weight matrix of the output gate
  • [x t , h t-1 ] represents the vector after splicing x t and h t-1 .
  • h t represents another output vector (hidden state vector) at time t.
  • C t is the updated current cell state information
  • C t f t *C t-1 +it *q t
  • C t -1 is the previous cell state information.
  • f t *C t-1 represents the information to be deleted
  • i t *q t represents the newly added information.
  • a vector matrix (M*256 matrix) is formed, and conv1d (one-dimensional convolution) and Max Pool (pooling) are sequentially performed on the vector matrix. , take the maximum value of each block), Conv1d (one-dimensional convolution) and AVE Pool (pooling, take the average value of each block) to obtain the first feature vector.
  • the ith video clip When the quality score of the ith video clip is above 0.85, the ith video clip is considered to be a wonderful video clip.
  • the dynamic cover setting system may be divided into one or more program modules, and one or more program modules are stored in a storage medium, and executed by one or more processors to complete the embodiments of the present application.
  • the program modules referred to in the embodiments of the present application refer to a series of computer-readable instruction segments capable of performing specific functions. The following description will specifically introduce the functions of each program module in the embodiments of the present application.
  • the dynamic cover setting system 1200 may include a determination module 1210 and a setting module 1220, wherein:
  • the determining module 1210 is configured to determine the target video segment from the video file.
  • the setting module 1220 is configured to extract the target video clip, and obtain the dynamic cover image of the video file according to the target video clip.
  • the determining module 1210 is further configured to: acquire multiple bullet screens of the video file, each bullet screen is associated with a time point on the timeline of the video file; according to the association of each bullet screen The time point on the time axis of the video file is obtained, and the density distribution of the bullet screen on the time axis is obtained; according to the density distribution of the bullet screen, one or more video clips with the highest bullet screen density in the video file are screened out; The one or more video clips or the one or more video clips carrying the bullet screen are determined as the target video clip.
  • the determining module 1210 is further configured to: acquire all the bullet screens of the video file; and filter out many bullet screens from the all the bullet screens according to the bullet screen content of each bullet screen in the all the bullet screens. Invalid bullet screen to obtain the plurality of bullet screen; wherein, the plurality of invalid bullet screen includes: the bullet screen content has nothing to do with the video content of the video file, and/or the bullet screen content is not related to the video The video screen of the file has nothing to do with the barrage.
  • the determining module 1210 is further configured to: divide the video file into M video clips, where M is a positive integer greater than 1; perform a quality score for each video clip; and according to the quality of each video clip scoring, and determining the target video segment from the M video segments.
  • the determining module 1210 is further configured to: according to the bullet screen feature information of each video clip and/or the frame feature information of each frame in each video clip, perform a quality score on each video clip ; wherein, the bullet screen feature information includes the bullet screen density.
  • the determining module 1210 is further configured to: extract the frame feature information of each frame in the ith video clip, 1 ⁇ i ⁇ M, where i is a positive integer; The frame feature information of the frame is used to score the quality of the i-th video clip.
  • the determining module 1210 is further configured to: according to the picture feature information and the frame feature information of the respective frames, perform a quality score on the i-th video clip; wherein, the picture feature information is a target static picture
  • the feature information of the target static picture includes the static cover picture of the video file.
  • the determining module 1210 is further configured to: input the frame feature information of each frame into the LSTM model in turn according to the time sequence order of the M frames, so as to obtain M output vectors through the LSTM model,
  • the M output vectors are in one-to-one correspondence with the M frames; convolution and pooling operations are performed on the vector matrix formed by the M output vectors to obtain the first feature vector; according to the picture feature information, the first feature vector is obtained.
  • FIG. 13 schematically shows a schematic diagram of a hardware architecture of a computer device 1300 suitable for implementing a method for setting a dynamic cover page according to Embodiment 3 of the present application.
  • the computer device 1300 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • it can be a smart phone, tablet computer, notebook computer, desktop computer, rack server, blade server, tower server or rack server (including an independent server, or a server cluster composed of multiple servers), etc.
  • the computer device 1300 at least includes but is not limited to: a memory 1310 , a processor 1320 , and a network interface 1330 that can communicate with each other through a system bus. in:
  • the memory 1310 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory, etc. (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 1310 may be an internal storage module of the computer device 1300 , such as a hard disk or memory of the computer device 1300 .
  • the memory 1310 may also be an external storage device of the computer device 1300, such as a pluggable hard disk, a Smart Media Card (SMC for short), a Secure Digital (Secure Digital) equipped on the computer device 1300 Digital, referred to as SD) card, flash card (Flash Card) and so on.
  • the memory 1310 may also include both an internal storage module of the computer device 1300 and an external storage device thereof.
  • the memory 1310 is generally used to store the operating system and various application software installed in the computer device 1300 , such as program codes of the dynamic cover setting method and the like.
  • the memory 1310 may also be used to temporarily store various types of data that have been output or will be output.
  • the processor 1320 may be a central processing unit (Central Processing Unit, CPU for short), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 1320 is generally used to control the overall operation of the computer device 1300 , such as performing control and processing related to data interaction or communication with the computer device 1300 .
  • the processor 1320 is configured to execute program codes or process data stored in the memory 1310 .
  • the network interface 1330 which may include a wireless network interface or a wired network interface, is typically used to establish a communication link between the computer device 1300 and other computer devices.
  • the network interface 1330 is used to connect the computer device 1300 with an external terminal through a network, and establish a data transmission channel and a communication link between the computer device 1300 and the external terminal.
  • the network can be Intranet, Internet, Global System of Mobile communication (GSM for short), Wideband Code Division Multiple Access (WCDMA for short), 4G network , 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 13 only shows a computer device having components 1310-1330, but it should be understood that implementation of all of the shown components is not required, and that more or fewer components may be implemented instead.
  • the dynamic cover setting method stored in the memory 1310 can also be divided into one or more program modules and executed by one or more processors (the processor 1320 in this embodiment) to complete Examples of this application.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer-readable storage medium stores computer-readable instructions of a computer program, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the computer-readable storage medium may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC for short), a secure digital ( Secure Digital, referred to as SD) card, flash memory card (Flash Card) and so on.
  • the computer-readable storage medium may also include both the internal storage unit of the computer device and the external storage device thereof.
  • the computer-readable storage medium is generally used to store the operating system and various application software installed on the computer device, for example, the program code of the dynamic cover setting method in the embodiment.
  • the computer-readable storage medium can also be used to temporarily store various types of data that have been output or will be output.
  • each module or each step of the above-mentioned embodiments of the present application can be implemented by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in multiple computing devices. network, they can optionally be implemented with program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, can be different from the The illustrated or described steps are performed in order, either by fabricating them separately into individual integrated circuit modules, or by fabricating multiple modules or steps of them into a single integrated circuit module. As such, the embodiments of the present application are not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请实施例提供了一种动态封面设置方法,所述方法包括:从视频文件中确定目标视频片段;及提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。本申请实施例包括以下优点:第一:由于为动态封面图,其动态显示效果可以使得可视化效果好,视觉上显得丰富多彩,提高视觉观赏体验和趣味性,吸引其他用户的注意力,提升视频文件的点击率。第二:由于目标视频片段来自视频文件本身,因此动态封面图和视频文件本身具有强关联性,从而可以优化用户浏览及选择视频时的体验感,避免用户因与视频内容不符的封面错误地点击并观看不符合用户预期的视频内容,以避免浪费数据流量。

Description

动态封面设置方法和***
本申请申明2021年03月10日递交的申请号为202110258999.X、名称为“动态封面设置方法和***”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及一种动态封面设置方法、***、计算机设备及计算机可读存储介质。
背景技术
随着多媒体技术的发展,Bilibilli等网络平台逐渐发展出了UGC(User Generated Content,用户原创内容)形式的内容生产模式。UGC在于提倡每个用户将自己原创的内容(如,视频文件)通过互联网平台进行展示给其他用户。UGC使得人人都可以是内容生成者,从而可以快速生产海量视频以丰富人们的精神生活。但是,海量视频也同时导致每个用户的视频文件容易被淹没在这海量视频中。因此,用户在发布其视频文件时,通常会为其发布的视频文件设置一个视频封面,从而使得其他用户能够更直观地获知视频文件中的内容以提高点击量。
发明内容
本申请实施例的目的是提供一种动态封面设置方法、***、计算机设备及计算机可读存储介质,用于解决以下问题:发明人了解到的技术的封面体验不好、点击率低。
本申请实施例的一个方面提供了一种动态封面设置方法,所述方法包括从视频文件中确定目标视频片段;及提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
可选的,所述从视频文件中确定目标视频片段,包括:获取所述视频文件的多个弹幕,每个弹幕关联所述视频文件的时间轴上的一个时间点;根据所述每个弹幕关联的时间轴上的时间点,获取所述时间轴上的弹幕密度分布;根据所述弹幕密度分布,筛选出所述视频文件中弹幕密度最高的一个或多个视频片段;及将所述一个或多个视频片段或携带弹幕的所述一个或多个视频片段确定为所述目标视频片段。
可选的,所述获取视频文件的多个弹幕,包括:获取所述视频文件的所有弹幕;及根据所述所有弹幕中的各个弹幕的弹幕内容,从所述所有弹幕中滤除多个无效弹幕以得到所 述多个弹幕;其中,所述多个无效弹幕包括:弹幕内容与所述视频文件的视频内容无关的弹幕,和/或弹幕内容与所述视频文件的视频画面无关的弹幕。
可选的,从视频文件中确定目标视频片段,包括:将所述视频文件分为M个视频片段,M为大于1的正整数;对各个视频片段进行质量评分;及根据所述各个视频片段的质量评分,从所述M个视频片段中确定所述目标视频片段。
可选的,所述对各个视频片段进行质量评分,包括:根据所述各个视频片段的弹幕特征信息和/或所述各个视频片段中的各个帧的帧特征信息,对所述各个视频片段进行质量评分;其中,所述弹幕特征信息包括弹幕密度。
可选的,所述对各个视频片段进行质量评分,包括:提取第i个视频片段中各个帧的帧特征信息,1≤i≤M,i为正整数;及根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评分。
可选的,所述根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评分,包括:根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分;其中,所述图片特征信息为目标静态图片的特征信息,所述目标静态图片包括所述视频文件的静态封面图片。
可选的,所述根据所述图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分,包括:根据所述M个帧的时序顺序,将所述各个帧的帧特征信息依次输入到LSTM模型中以通过所述LSTM模型得到M个输出向量,所述M个输出向量与所述M个帧一一对应;对所述M个输出向量形成的向量矩阵进行卷积和池化操作,得到第一特征向量;根据所述图片特征信息,得到第二特征向量;将所述第一特征向量与所述第二特征向量拼接,得到拼接向量;对所述拼接向量进行线性回归操作,以得到对应所述第i个视频片段的质量评分。
本申请实施例的一个方面又提供了一种动态封面设置***,包括:确定模块,用于从视频文件中确定目标视频片段;及设置模块,用于提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
本申请实施例的一个方面又提供了一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
从视频文件中确定目标视频片段;及
提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
本申请实施例的一个方面又提供了一种计算机可读存储介质,其上存储有计算机可读 指令,所述计算机可读指令被处理器执行时实现以下步骤:
从视频文件中确定目标视频片段;及
提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
本申请实施例提供的动态封面设置方法、***、设备及计算机可读存储介质,可以提取视频文件的关键或精彩片段(即目标视频片段),并根据该目标视频片段得到动态封面图,从而有以下优点:
第一:由于为动态封面图,其动态显示效果可以使得可视化效果好,视觉上显得丰富多彩,提高视觉观赏体验和趣味性,吸引其他用户的注意力,提升视频文件的点击率。
第二:由于目标视频片段来自视频文件本身,因此动态封面图和视频文件本身具有强关联性,从而可以优化用户浏览及选择视频时的体验感,避免用户因与视频内容不符的封面错误地点击并观看不符合用户预期的视频内容,以避免浪费数据流量。
附图说明
图1示意性示出了根据本申请实施例的动态封面设置方法的应用环境图;
图2示意性示出了根据本申请实施例一的动态封面设置方法的流程图;
图3为图2中步骤S200的子步骤流程图;
图4为图3中步骤S300的另一子步骤流程图;
图5为实施弹幕筛选的示例图;
图6为图2中步骤S200的另一子步骤流程图;
图7为图6中步骤S602的子步骤流程图;
图8为图6中步骤S602的另一子步骤流程图;
图9为图7中步骤S702的另一子步骤流程图;
图10为图9中步骤S900的子步骤流程图;
图11为通过人工智能识别目标视频片段的示例图;
图12示意性示出了根据本申请实施例二的动态封面设置***的框图;及
图13示意性示出了根据本申请实施例三的适于实现动态封面设置方法的计算机设备的硬件架构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前 提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请实施例中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本发明人了解到的技术中,视频封面会存在以下缺陷:
第一:都为静态显示,可视化效果差,视觉上显得单调和枯燥;
第二:视频封面和视频内容不符,该种情况常常出现在封面党、标题党发布的投稿;
上述缺陷浪费观看者时间和降低视频观看体验,可能使得部分视频内容点击率较低。
本申请提供了多个实施例解决上述缺陷,具体参照下文。
在本申请的描述中,需要理解的是,步骤前的数字标号并不标识执行步骤的前后顺序,仅用于方便描述本申请及区别每一步骤,因此不能理解为对本申请的限制。
以下为本申请的术语解释:
LSTM(Long Short-Term Memory,长短期记忆网络),是递归神经网络(Recurrent Neural Networks)的一种,通过引入门(Gate)机制控制特征的流通和损失,学习长期依赖关系。
密度分布,也叫概率密度分布,概率指事件随机发生的机率。例如,对于均匀分布函数,密度分布等于一段区间(事件的取值范围)的概率除以该段区间的长度。
动态封面图,为包含多个帧的视频片段。
弹幕,是通过网络观看视频时弹出的并沿预定方向移动的字幕。弹幕在英文中还没有固定词汇,其通常称之为:comment、danmaku、barrage、bullet screen、bullet-screen comment等。弹幕允许观看视频者发表评论或感想,但与普通视频分享网站只在播放器下专用点评区显示不同,其会以滑动字幕的方式实时出现在视频画面上,保证所有观看者都能注意到。一些弹幕***利用脚本语言能提供特定的弹幕形式,如弹幕特定位置出现或消失,控制弹幕弹出速度,弹幕位置等。除此之外,在画面底部或顶部固定出现的弹幕也会作为没字幕视频的字幕使用。
例如,每个弹幕可以包括如下信息:
Figure PCTCN2022072819-appb-000001
Figure PCTCN2022072819-appb-000002
图1示意性示出了根据本申请实施例的环境应用示意图。如图1所示:
提供商网络2可以通过网络4连接多个移动终端6。提供商网络2可以提供内容服务。
内容服务可以包括诸如互联网协议视频流服务之类的内容流服务。内容流服务可以被配置为经由各种传输技术来分发内容。内容服务可以被配置为提供诸如视频,音频,文本数据,其组合等的内容。内容可以包括内容流(例如,视频流,音频流,信息流),内容文件(例如,视频文件,音频文件,文本文件)和/或其他数据。
提供商网络2可以实现弹幕服务,该弹幕服务被配置为允许用户评论和/或共享与内容相关联的评论,即弹幕。弹幕以与内容一起呈现在同一屏幕上。例如,弹幕可以在内容上方的覆盖图中显示。弹幕在显示时可能会带有动画效果。例如,弹幕可以滚动(例如,从右到左,从左到右,从上到下,从下到上),这种动画效果可以基于CSS3(cascading style sheets,层叠样式表)的transition属性实现的。
提供商网络2可以位于诸如单个场所之类的数据中心,或者分布在不同的地理位置(例如,在多个场所)中。提供商网络2可以经由一个或多个网络4提供服务。网络4包括各种网络设备,例如路由器,交换机,多路复用器,集线器,调制解调器,网桥,中继器,防火墙,代理设备和/或类似。网络4可以包括物理链路,例如同轴电缆链路,双绞线电缆链路,光纤链路,其组合等。网络4可以包括无线链路,诸如蜂窝链路,卫星链路,Wi-Fi链路等。
提供商网络2可以被配置为接收多个消息。所述多个消息可以包括与内容相关联的多个弹幕。
提供商网络2可以被配置为管理用于各种内容项的消息。用户可以浏览内容并访问不同的内容项以查看针对特定内容的评论,例如其他用户针对该特定内容发布的评论。来自与特定内容项目相关联的用户的评论可以被输出到观看该特定内容项目的其他用户。例如,访问内容项目(例如,视频剪辑)的所有用户可以查看与该内容项目相关联的评论。输入的评论内容可以实时或接近实时地输出。
提供商网络2可以被配置为处理多个消息,例如,消息存储、消息筛选、消息推送等各种处理操作。其中,消息存储用于将多个消息存储在诸如数据库的数据存储中。消息筛选可以包括拒绝或标记与筛选标准匹配的消息。其中,筛选标准可以指定术语和/或短语,例如亵渎,仇恨言论,不雅语言等。筛选标准可以指定字符,例如符号,字体等。筛选标准可以指定语言,计算机可读代码模式等。
提供商网络2可以执行自然语言处理,主题识别,模式识别,人工智能等,以自动确定消息的特征和/或对消息进行分组。作为示例,频繁出现的短语或模式可以被识别为主题。作为另一个示例,可以维护与内容相关联的主题的数据库。主题可以包括流派(例如,动作,戏剧,喜剧),个性(例如,演员,女演员,导演),语言等。可以基于客户端设备和/或发送消息的用户的特征对消息进行分组。可以存储人口统计学,兴趣,历史和/或类似物以供多个用户确定消息的潜在分组。在其他实施例中,提供商网络2还可以基于人工智能识别视频文件中的精彩片段、画面等。
提供商网络2可以由一个或多个计算节点实现。一个或多个计算节点可以包括虚拟化的计算实例。虚拟化的计算实例可以包括虚拟机,例如计算机***,操作***,服务器等的仿真。计算节点可以基于虚拟映像和/或定义用于仿真的特定软件(例如,操作***,专用应用程序,服务器)的其他数据,由计算节点加载虚拟机。随着对不同类型的处理服务的需求改变,可以在一个或多个计算节点上加载和/或终止不同的虚拟机。可以实现管理程序来管理同一计算节点上不同虚拟机的使用。
多个移动终端6可以被配置为访问提供商网络2的内容和服务。多个移动终端6可以包括任何类型的电子设备,诸如移动设备、平板设备、膝上型计算机、工作站、虚拟现实设备,游戏设备、机顶盒、数字流媒体设备、车辆终端、智能电视、机顶盒等。
多个移动终端6可以内容(视频等)输出(例如,显示、渲染、呈现)给用户。在其他实施例中,移动终端6还可以基于人工智能识别视频文件中的精彩片段等。
在示例性的实施例中,提供商网络2(或移动终端6)可以提取视频文件的精彩片段,并将该视频文件的精彩片段作为其动态封面图,以提高用户体验,增加视频文件的封面的有趣性,从而吸引其他用户的注意力,提升视频文件的点击率。
在示例性的实施例中,提供商网络2可以从海量视频文件中筛选出优质视频文件,提取该优质视频文件的精彩片段,并将该优质视频文件的精彩片段作为其动态封面图,优化用户浏览及选择视频时的体验感,提升优质视频文件的点击率。
以下将通过多个实施例介绍动态封面图设置方案。该方案可以通过计算机设备1300实施,计算机设备1300可以是提供商网络2或其计算节点,也可以是移动终端6。
实施例一
图2示意性示出了根据本申请实施例一的动态封面设置方法的流程图。
如图2所示,该动态封面设置方法可以包括步骤S200~S202,其中:
步骤S200,从视频文件中确定目标视频片段。
所述视频文件,可以是基于各种视频格式的视频稿件,例如:AVI(Audio Video Interleaved,音频视频交错)格式、H.264/AVC(Advanced Video Coding,高级视频编码)、H.265/HEVC(High Efficiency Video Coding,高效率视频编码)H.265格式等。
所述目标视频片段,可以为所述视频文件中的一个精彩视频片段。
在本实施例中,一个视频片段是否为精彩视频片段,可以通过海量观众的发言踊跃程度判断,通过人工智能(如训练好的神经网络模型)判断,或通过其他方式判断。
步骤S202,提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
当确定所述目标视频片段,计算机设备1300可以对所述视频文件进行自动裁剪以得到所述目标视频片段,并将所述目标视频片段作为用于制作所述动态封面图的素材。
其一:可以将目标视频片段直接设置为所述动态封面图。
其二:可以将目标视频片段进行处理,将处理后的视频内容作为所述动态封面图。作为示例,所述处理可以添加视频渲染特效(如二维贴纸)、合成部分精彩的帧等。
其三:当目标视频片段包括不同时间片段的多个子视频片段时,则需要对所述多个子视频片段进行合成,或从所述多个子视频片段中挑选一个或多个子视频片段并对该挑选的一个或多子视频片段进行合成,或从所述多个子视频片段抽取多个关键帧并对所述多个关键帧进行合成,将合成后得到的视频片段作为所述动态封面图。
以上列举了几种得到所述动态封面图的方式,应理解,其并不用于限制本申请保护范围。
本申请实施例提供的动态封面图设置方法,可以提取视频文件的关键或精彩片段(即目标视频片段),并根据该目标视频片段得到动态封面图,从而有以下优点:
第一:由于为动态封面图,其动态显示效果可以使得可视化效果好,视觉上显得丰富多彩,提高视觉观赏体验和趣味性,吸引其他用户的注意力,提升视频文件的点击率。
第二:由于目标视频片段来自视频文件本身,因此动态封面图和视频文件本身具有强关联性,从而可以优化用户浏览及选择视频时的体验感,避免用户因与视频内容不符的封面错误地点击并观看不符合用户预期的视频内容,以避免浪费数据流量。
以下提供实施步骤S200的几种方案:
方式一:
基于弹幕搜索所述视频文件中的精彩视频片段(即,所述目标视频片段)。
在示例性的实施例中,如图3所示,所述从视频文件中确定目标视频片段的步骤,可以包括步骤S300~S306,其中:步骤S300,获取所述视频文件的多个弹幕,每个弹幕关联所述视频文件的时间轴上的一个时间点;步骤S302,根据所述每个弹幕关联的时间轴上的时间点,获取所述时间轴上的弹幕密度分布;步骤S304,根据所述弹幕密度分布,筛选出所述视频文件中弹幕密度最高的一个或多个视频片段;及步骤S306,将所述一个或多个视频片段或携带弹幕的所述一个或多个视频片段确定为所述目标视频片段。所述时间轴可以通过进度条表示。本申请人经研究发现,集中发送弹幕的时间区间,一般对应相应视频文件的关键节点,这个关键节点一般对应为相应视频文件的精彩视频片段、关键视频片段或容易引起大量用户关注的视频片段。本实施例通过分析弹幕密度分布,可以准确地找到可以有效吸引关注的所述目标视频片段。另外,当将携带弹幕的视频片段作为所述目标视频片段时,可以进一步提供封面的信息丰富度和用户体验。
在示例性的实施例中,如图4所示,所述步骤S300可以包括步骤S400~S402,其中:步骤S400,获取所述视频文件的所有弹幕;步骤S402,根据所述所有弹幕中的各个弹幕的弹幕内容,从所述所有弹幕中滤除多个无效弹幕以得到所述多个弹幕;其中,所述多个无效弹幕包括:弹幕内容与所述视频文件的视频内容无关的弹幕,和/或弹幕内容与所述视频文件的视频画面无关的弹幕。本实施例可以提高基于弹幕密度分布筛选所述一个或多个视频片段的效率和准确性。
为方便理解,以下结合图5提供一个操作示例:
①获取视频文件A当下时刻的所有弹幕。
②根据各个弹幕的弹幕内容,执行弹幕过滤操作。
比如:恰饭时间代表广告时刻,111或者222代表着与up主(内容提供者)的弹幕互动等。
③分析在时间轴上的弹幕密度分布,根据弹幕密度分布选出弹幕集中最高的若干视频片段。
方式二:
基于质量评分搜索所述视频文件中的精彩视频片段(即,所述目标视频片段)。
在示例性的实施例中,如图6所示,步骤S200中的从视频文件中确定目标视频片段,可以包括步骤S600~S604,其中:步骤S600,将所述视频文件分为M个视频片段,M为大于1的正整数;步骤S602,对各个视频片段进行质量评分;及步骤S604,根据所述各个视频片段的质量评分,从所述M个视频片段中确定所述目标视频片段。其中,各个视频片段的质量评分通过多种方式实现,例如:
(1)非人工智能方式,如基于权重的评估方式:
以下以对视频片段A为例,仅示例性介绍几种对视频片段A进行质量评分的手段:
示例1:根据视频片段A关联的弹幕数量、弹幕形式、弹幕用户类型等评价维度,并为每个评价维度分配设置权重系数,通过加权计算得到视频片段A的质量评分。
例如:获取弹幕发送者的用户类型为高等级用户的数量,根据高等级用户的数量乘以预设高权重***。本示例在于,根据各个弹幕的弹幕id获取每个弹幕发送者是否为高等级用户,高等级用户在所有弹幕发送者的数量占比越高,则质量评分越高。
示例2:获取在视频片段A对应的时间区间的以下信息:进度条拖动事件(如,将进度条拖入到所述时间区间的拖入事件、将进度条从所述时间区间拖出的拖出事件)等;为拖入事件配置正权重系数,为拖出事件配置负权重系数,将拖入事件的数量和拖出事件的数据分别乘以各自权重系数,以得到视频片段A的质量评分。
(2)人工智能方式:
本发明人发现,弹幕的精彩程度或密度,通常与同一个时间区间的精彩视频内容高度相关,也可能与同一个时间区内的内容指令高度相关。因此,计算机设备1300可以在一定程度上根据弹幕或视频片段本身的质量,来确定所述目标视频片段。
在示例性的实施例中,如图7所示,步骤S602还可以通过如下步骤实现:步骤S700,根据所述各个视频片段的弹幕特征信息和/或所述各个视频片段中的各个帧的帧特征信息,对所述各个视频片段进行质量评分;其中,所述弹幕特征信息包括弹幕密度。当然,所述弹幕特征信息也可以包括弹幕内容特征等。在本实施例中,通过各个视频片段中的弹幕信息、各个帧的帧特征信息或二者结合,可以更加精确地搜索所述视频文件中的精彩视频片段(即,所述目标视频片段)。
在示例性的实施例中,为了能够精确地搜索所述视频文件中的精彩视频片段(即,所 述目标视频片段),如图8所示,步骤S602可以通过如下步骤实现:步骤S800,提取第i个视频片段中各个帧的帧特征信息,1≤i≤M,i为正整数;及步骤S802,根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评分。作为示例,计算机设备1300执行如下操作:通过卷积神经网络等提取所述各个帧的帧特征信息,如特征向量;将各个帧的帧特征信息输入到训练好的质量评分模型中,由质量评估模型输出第i个视频片段的质量评分。所述质量评分模型可以是基于各种算法得到的模型,如基于LSTM算法等。
本发明人发现,up主上传视频文件时,通常会选择一个具有代表性的静态封面图片。因此,计算机设备1300可以在一定程度上参考静态封面图片,来确定所述目标视频片段。
在示例性的实施例中,如图9所示,步骤S702还可以通过如下步骤实现:步骤S900,根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分;其中,所述图片特征信息为目标静态图片的特征信息,所述目标静态图片包括所述视频文件的静态封面图片。在本实施例中,通过引入静态封面片段,可以更加精确地搜索所述视频文件中的精彩视频片段(即,所述目标视频片段)。
步骤S804可以通过各种人工智能模型或人工智能模型组合实现。
在示例性的实施例中,如图10所示,步骤S900可以通过如下步骤实现:步骤S1000,步骤S1000,根据所述M个帧的时序顺序,将所述各个帧的帧特征信息依次输入到LSTM模型中以通过所述LSTM模型得到M个输出向量,所述M个输出向量与所述M个帧一一对应;步骤S1002,对所述M个输出向量形成的向量矩阵进行卷积和池化操作,得到第一特征向量;步骤S1004,根据所述图片特征信息,得到第二特征向量;步骤S1006,将所述第一特征向量与所述第二特征向量拼接,得到拼接向量;步骤S1008,对所述拼接向量进行线性回归操作,以得到对应所述第i个视频片段的质量评分。本实施例通过LSTM模型的捕获长期依赖性,可以学习到各个帧之间的关系,并结合静态封面图像的特征信息,可以提高确定目标视频片段的准确性。
为方便理解,以下结合图11提供一个操作示例:
①通过CNN(Convolutional Neural Networks,卷积神经网络)模型,对第i个视频片段中的各个帧(X 1、X 2、...X M)进行卷积操作,以得到M个特征向量(即,帧特征信息)。
作为示例,所述CNN模型可以包括256个卷积核,以帧X 1为例,256个卷积核分别对帧X 1进行卷积操作,从而生成一个对应帧X 1的特征向量x 1,该特征向量x 1为一维向量,其包括256个参数,每个参数为其中一个卷积核对帧X 1进行卷积操作得到的卷积结果。可 知,通过所述CNN模块可以得到M个特征向量,即x 1、x t、...x M
②根据时序顺序,将各个特征向量x 1、x t、...x M依次输入到LSTM模型中,通过LSTM模型输出M个输出向量h 1、h 2、...h M
以输入向量x t为例,介绍LSTM模型的工作原理:
遗忘门:f t=σ(W f[x t, t-1]+ f)
输入门:
i t=σ(W i[x t, t-1]+ i)
q t=tanh(W q[h t-1+x t])+b q)
输入门:
o t=σ(W o[x t,h t-1]+b o)
h t=o t*tanhC t
其中,f t决定是否让t-1时刻学到的信息C t-1通过或部分通过。其中,f t∈[0,1],表示t时刻的节点对t-1时刻细胞记忆的选择权重,W f为遗忘门的权重矩阵,b f为遗忘门的偏置项,h t-1表示t-1节点的隐层状态信息,非线性函数σ(x)=1/(1+e -x);
i t表示t时刻的节点对当前节点信息的选择权重,用于决定哪些信息该保留。其中,i t∈[0,1],b i为输入门的偏置项,W i为输入门的权重矩阵,非线性函数σ(x)=1/(1+e-x;
q t表示新的候选值向量,用于更新细胞状态。其中,b q为偏置项,W q表示待更新信息的权重矩阵,tan为双曲正切激活函数。
o t表示t时刻其中一个输出向量。b o为输出门的偏置,W o为输出门的权重矩阵,[x t,h t-1]表示x t和h t-1拼接后的向量。
h t表示t时刻另一个输出向量(隐藏状态向量)。
C t为更新后的当前细胞状态信息,C t=f t*C t-1+i t*q t,C t-1为上一个细胞状态信息。f t*C t-1表示希望删除的信息,i t*q t表示新增的信息。
需要说明的是,本实施例可以使用各种变形的LSTM模型中,上述LSTM模型仅为示例。
③根据所述M个输出向量h 1、h 2、...h M形成一个向量矩阵(M*256的矩阵),对该向量矩阵依次进行conv1d(一维卷积)和Max Pool(池化,取每个块的最大值)、Conv1d(一维卷积)和AVE Pool(池化,取每个块的平均值),得到第一特征向量。
④通过另一CNN模型对目标静态图片进行特征提取,以得到对应于目标静态图片的特征图(图片特征信息),并对所述述图片特征信息通过两个全连接层操作后,得到第二特征向量。
⑤将所述第一特征向量分别与所述第二特征向量拼接,得到拼接向量。
⑥对所述拼接向量通过两个全连接层操作进行线性计算,并经过Sigmoid处理后得到第i个视频片段的质量评分。其中,Sigmoid用于将质量评分限制在0~1之间。
当第i个视频片段的质量评分为0.85以上,则认为该第i个视频片段为精彩视频片段。
实施例二
图12示意性示出了根据本申请实施例二的动态封面设置***的框图,该动态封面设置***可以被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请实施例。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机可读指令段,以下描述将具体介绍本申请实施例中各程序模块的功能。
如图12所示,该动态封面设置***1200可以包括确定模块1210和设置模块1220,其中:
确定模块1210,用于从视频文件中确定目标视频片段。
设置模块1220,用于提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
可选的,所述确定模块1210还用于:获取所述视频文件的多个弹幕,每个弹幕关联所述视频文件的时间轴上的一个时间点;根据所述每个弹幕关联的时间轴上的时间点,获取所述时间轴上的弹幕密度分布;根据所述弹幕密度分布,筛选出所述视频文件中弹幕密度最高的一个或多个视频片段;及将所述一个或多个视频片段或携带弹幕的所述一个或多个视频片段确定为所述目标视频片段。
可选的,所述确定模块1210还用于:获取所述视频文件的所有弹幕;及根据所述所有弹幕中的各个弹幕的弹幕内容,从所述所有弹幕中滤除多个无效弹幕以得到所述多个弹幕;其中,所述多个无效弹幕包括:弹幕内容与所述视频文件的视频内容无关的弹幕,和/或弹幕内容与所述视频文件的视频画面无关的弹幕。
可选的,所述确定模块1210还用于:将所述视频文件分为M个视频片段,M为大于1的正整数;对各个视频片段进行质量评分;及根据所述各个视频片段的质量评分,从所述M个视频片段中确定所述目标视频片段。
可选的,所述确定模块1210还用于:根据所述各个视频片段的弹幕特征信息和/或所述各个视频片段中的各个帧的帧特征信息,对所述各个视频片段进行质量评分;其中,所述弹幕特征信息包括弹幕密度。
可选的,所述确定模块1210还用于:提取第i个视频片段中各个帧的帧特征信息,1≤i≤M,i为正整数;及根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评分。
可选的,所述确定模块1210还用于:根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分;其中,所述图片特征信息为目标静态图片的特征信息,所述目标静态图片包括所述视频文件的静态封面图片。
可选的,所述确定模块1210还用于:根据所述M个帧的时序顺序,将所述各个帧的帧特征信息依次输入到LSTM模型中以通过所述LSTM模型得到M个输出向量,所述M个输出向量与所述M个帧一一对应;对所述M个输出向量形成的向量矩阵进行卷积和池化操作,得到第一特征向量;根据所述图片特征信息,得到第二特征向量;将所述第一特征向量与所述第二特征向量拼接,得到拼接向量;对所述拼接向量进行线性回归操作,以得到对应所述第i个视频片段的质量评分。
实施例三
图13示意性示出了根据本申请实施例三的适于实现动态封面设置方法的计算机设备1300的硬件架构示意图。本实施例中,计算机设备1300是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。例如,可以是智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图13所示,计算机设备1300至少包括但不限于:可通过***总线相互通信链接存储器1310、处理器1320、网络接口1330。其中:
存储器1310至少包括一种类型的计算机可读存储介质,可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器1310 可以是计算机设备1300的内部存储模块,例如该计算机设备1300的硬盘或内存。在另一些实施例中,存储器1310也可以是计算机设备1300的外部存储设备,例如该计算机设备1300上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,存储器1310还可以既包括计算机设备1300的内部存储模块也包括其外部存储设备。本实施例中,存储器1310通常用于存储安装于计算机设备1300的操作***和各类应用软件,例如动态封面设置方法的程序代码等。此外,存储器1310还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器1320在一些实施例中可以是中央处理器(Central Processing Unit,简称为CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器1320通常用于控制计算机设备1300的总体操作,例如执行与计算机设备1300进行数据交互或者通信相关的控制和处理等。本实施例中,处理器1320用于运行存储器1310中存储的程序代码或者处理数据。
网络接口1330可包括无线网络接口或有线网络接口,该网络接口1330通常用于在计算机设备1300与其他计算机设备之间建立通信链接。例如,网络接口1330用于通过网络将计算机设备1300与外部终端相连,在计算机设备1300与外部终端之间的建立数据传输通道和通信链接等。网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯***(Global System of Mobile communication,简称为GSM)、宽带码分多址(Wideband Code Division Multiple Access,简称为WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图13仅示出了具有部件1310-1330的计算机设备,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器1310中的动态封面设置方法还可以被分割为一个或者多个程序模块,并由一个或多个处理器(本实施例为处理器1320)所执行,以完成本申请实施例。
实施例四
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质其上存储有计算机程序计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
从视频文件中确定目标视频片段;及
提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
本实施例中,计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存 储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,计算机可读存储介质可以是计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,计算机可读存储介质也可以是计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,计算机可读存储介质还可以既包括计算机设备的内部存储单元也包括其外部存储设备。本实施例中,计算机可读存储介质通常用于存储安装于计算机设备的操作***和各类应用软件,例如实施例中动态封面设置方法的程序代码等。此外,计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的各类数据。
显然,本领域的技术人员应该明白,上述的本申请实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请实施例不限制于任何特定的硬件和软件结合。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种动态封面设置方法,所述方法包括:
    从视频文件中确定目标视频片段;及
    提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
  2. 根据权利要求1所述的动态封面设置方法,所述从视频文件中确定目标视频片段,包括:
    获取所述视频文件的多个弹幕,每个弹幕关联所述视频文件的时间轴上的一个时间点;
    根据所述每个弹幕关联的时间轴上的时间点,获取所述时间轴上的弹幕密度分布;
    根据所述弹幕密度分布,筛选出所述视频文件中弹幕密度最高的一个或多个视频片段;及
    将所述一个或多个视频片段或携带弹幕的所述一个或多个视频片段确定为所述目标视频片段。
  3. 根据权利要求2所述的动态封面设置方法,所述获取视频文件的多个弹幕,包括:
    获取所述视频文件的所有弹幕;及
    根据所述所有弹幕中的各个弹幕的弹幕内容,从所述所有弹幕中滤除多个无效弹幕以得到所述多个弹幕;其中,所述多个无效弹幕包括:弹幕内容与所述视频文件的视频内容无关的弹幕,和/或弹幕内容与所述视频文件的视频画面无关的弹幕。
  4. 根据权利要求1至3任意一项所述的动态封面设置方法,从视频文件中确定目标视频片段,包括:
    将所述视频文件分为M个视频片段,M为大于1的正整数;
    对各个视频片段进行质量评分;及
    根据所述各个视频片段的质量评分,从所述M个视频片段中确定所述目标视频片段。
  5. 根据权利要求4所述的动态封面设置方法,所述对各个视频片段进行质量评分,包括:
    根据所述各个视频片段的弹幕特征信息和/或所述各个视频片段中的各个帧的帧特征信息,对所述各个视频片段进行质量评分;其中,所述弹幕特征信息包括弹幕密度。
  6. 根据权利要求4或5所述的动态封面设置方法,所述对各个视频片段进行质量评分,包括:
    提取第i个视频片段中各个帧的帧特征信息,1≤i≤M,i为正整数;及
    根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评 分。
  7. 根据权利要求6所述的动态封面设置方法,所述根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评分,包括:
    根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分;
    其中,所述图片特征信息为目标静态图片的特征信息,所述目标静态图片包括所述视频文件的静态封面图片。
  8. 根据权利要求7所述的动态封面设置方法,所述根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分,包括:
    根据所述M个帧的时序顺序,将所述各个帧的帧特征信息依次输入到LSTM模型中以通过所述LSTM模型得到M个输出向量,所述M个输出向量与所述M个帧一一对应;
    对所述M个输出向量形成的向量矩阵进行卷积和池化操作,得到第一特征向量;
    根据所述图片特征信息,得到第二特征向量;
    将所述第一特征向量与所述第二特征向量拼接,得到拼接向量;
    对所述拼接向量进行线性回归操作,以得到对应所述第i个视频片段的质量评分。
  9. 一种动态封面设置***,包括:
    确定模块,用于从视频文件中确定目标视频片段;及
    设置模块,用于提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
  10. 根据权利要求9所述的动态封面设置***,所述确定模块,还用于:
    获取所述视频文件的多个弹幕,每个弹幕关联所述视频文件的时间轴上的一个时间点;
    根据所述每个弹幕关联的时间轴上的时间点,获取所述时间轴上的弹幕密度分布;
    根据所述弹幕密度分布,筛选出所述视频文件中弹幕密度最高的一个或多个视频片段;及
    将所述一个或多个视频片段或携带弹幕的所述一个或多个视频片段确定为所述目标视频片段。
  11. 根据权利要求10所述的动态封面设置***,所述确定模块,还用于:
    获取所述视频文件的所有弹幕;及
    根据所述所有弹幕中的各个弹幕的弹幕内容,从所述所有弹幕中滤除多个无效弹幕以得到所述多个弹幕;其中,所述多个无效弹幕包括:弹幕内容与所述视频文件的视频内容无关的弹幕,和/或弹幕内容与所述视频文件的视频画面无关的弹幕。
  12. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
    从视频文件中确定目标视频片段;及
    提取所述目标视频片段,并根据所述目标视频片段得到所述视频文件的动态封面图。
  13. 根据权利要求12所述的计算机设备,所述从视频文件中确定目标视频片段,包括:
    获取所述视频文件的多个弹幕,每个弹幕关联所述视频文件的时间轴上的一个时间点;
    根据所述每个弹幕关联的时间轴上的时间点,获取所述时间轴上的弹幕密度分布;
    根据所述弹幕密度分布,筛选出所述视频文件中弹幕密度最高的一个或多个视频片段;及
    将所述一个或多个视频片段或携带弹幕的所述一个或多个视频片段确定为所述目标视频片段。
  14. 根据权利要求11所述的计算机设备,所述获取视频文件的多个弹幕,包括:
    获取所述视频文件的所有弹幕;及
    根据所述所有弹幕中的各个弹幕的弹幕内容,从所述所有弹幕中滤除多个无效弹幕以得到所述多个弹幕;其中,所述多个无效弹幕包括:弹幕内容与所述视频文件的视频内容无关的弹幕,和/或弹幕内容与所述视频文件的视频画面无关的弹幕。
  15. 根据权利要求12至14任意一项所述的计算机设备,从视频文件中确定目标视频片段,包括:
    将所述视频文件分为M个视频片段,M为大于1的正整数;
    对各个视频片段进行质量评分;及
    根据所述各个视频片段的质量评分,从所述M个视频片段中确定所述目标视频片段。
  16. 根据权利要求15所述的计算机设备,所述对各个视频片段进行质量评分,包括:
    根据所述各个视频片段的弹幕特征信息和/或所述各个视频片段中的各个帧的帧特征信息,对所述各个视频片段进行质量评分;其中,所述弹幕特征信息包括弹幕密度。
  17. 根据权利要求15或16所述的计算机设备,所述对各个视频片段进行质量评分,包括:
    提取第i个视频片段中各个帧的帧特征信息,1≤i≤M,i为正整数;及
    根据所述第i个视频片段中的各个帧的帧特征信息,对所述第i个视频片段进行质量评分。
  18. 根据权利要求17所述的计算机设备,所述根据所述第i个视频片段中的各个帧的 帧特征信息,对所述第i个视频片段进行质量评分,包括:
    根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分;
    其中,所述图片特征信息为目标静态图片的特征信息,所述目标静态图片包括所述视频文件的静态封面图片。
  19. 根据权利要求18所述的计算机设备,所述根据图片特征信息和所述各个帧的帧特征信息,对所述第i个视频片段进行质量评分,包括:
    根据所述M个帧的时序顺序,将所述各个帧的帧特征信息依次输入到LSTM模型中以通过所述LSTM模型得到M个输出向量,所述M个输出向量与所述M个帧一一对应;
    对所述M个输出向量形成的向量矩阵进行卷积和池化操作,得到第一特征向量;
    根据所述图片特征信息,得到第二特征向量;
    将所述第一特征向量与所述第二特征向量拼接,得到拼接向量;
    对所述拼接向量进行线性回归操作,以得到对应所述第i个视频片段的质量评分。
  20. 一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行权利要求1至8中任意一项所述的动态封面设置方法的步骤。
PCT/CN2022/072819 2021-03-10 2022-01-19 动态封面设置方法和*** WO2022188563A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110258999.XA CN115086709A (zh) 2021-03-10 2021-03-10 动态封面设置方法和***
CN202110258999.X 2021-03-10

Publications (1)

Publication Number Publication Date
WO2022188563A1 true WO2022188563A1 (zh) 2022-09-15

Family

ID=83226339

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072819 WO2022188563A1 (zh) 2021-03-10 2022-01-19 动态封面设置方法和***

Country Status (2)

Country Link
CN (1) CN115086709A (zh)
WO (1) WO2022188563A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107071587A (zh) * 2017-04-25 2017-08-18 腾讯科技(深圳)有限公司 视频片段的获取方法及装置
CN107707967A (zh) * 2017-09-30 2018-02-16 咪咕视讯科技有限公司 一种视频文件封面的确定方法、装置及计算机可读存储介质
CN108650524A (zh) * 2018-05-23 2018-10-12 腾讯科技(深圳)有限公司 视频封面生成方法、装置、计算机设备及存储介质
CN110191357A (zh) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 视频片段精彩度评估、动态封面生成方法及装置
CN110324662A (zh) * 2019-06-28 2019-10-11 北京奇艺世纪科技有限公司 一种视频封面生成方法及装置
CN111225236A (zh) * 2020-01-20 2020-06-02 北京百度网讯科技有限公司 生成视频封面的方法、装置、电子设备以及计算机可读存储介质
CN112069952A (zh) * 2020-08-25 2020-12-11 北京小米松果电子有限公司 视频片段提取方法、视频片段提取装置及存储介质
TW202103021A (zh) * 2019-07-12 2021-01-16 大陸商信泰光學(深圳)有限公司 視訊封面編輯方法、系統及其電腦程式產品

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018040059A1 (en) * 2016-09-02 2018-03-08 Microsoft Technology Licensing, Llc Clip content categorization
CN109286850B (zh) * 2017-07-21 2020-11-13 Tcl科技集团股份有限公司 一种基于弹幕的视频标注方法及终端
CN109729435A (zh) * 2017-10-27 2019-05-07 优酷网络技术(北京)有限公司 视频片段的提取方法及装置
CN107995535B (zh) * 2017-11-28 2019-11-26 百度在线网络技术(北京)有限公司 一种展示视频的方法、装置、设备和计算机存储介质
CN108595493B (zh) * 2018-03-15 2022-02-08 腾讯科技(深圳)有限公司 媒体内容的推送方法和装置、存储介质、电子装置
WO2019179496A1 (en) * 2018-03-22 2019-09-26 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and system for retrieving video temporal segments
CN111277892B (zh) * 2020-01-20 2022-03-22 北京百度网讯科技有限公司 用于选取视频片段的方法、装置、服务器和介质
CN111309951A (zh) * 2020-01-23 2020-06-19 北京达佳互联信息技术有限公司 广告语获取方法及其装置、存储介质
CN111767461B (zh) * 2020-06-24 2024-02-06 北京奇艺世纪科技有限公司 数据处理方法及装置
CN111782603A (zh) * 2020-06-29 2020-10-16 掌阅科技股份有限公司 视频书封展示方法、计算设备及计算机存储介质
CN112087665B (zh) * 2020-09-17 2023-01-13 掌阅科技股份有限公司 视频直播的预览方法、计算设备及计算机存储介质
CN112100442B (zh) * 2020-11-13 2021-02-26 腾讯科技(深圳)有限公司 用户倾向性识别方法、装置、设备及存储介质
CN112423127A (zh) * 2020-11-20 2021-02-26 上海哔哩哔哩科技有限公司 视频加载方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107071587A (zh) * 2017-04-25 2017-08-18 腾讯科技(深圳)有限公司 视频片段的获取方法及装置
CN107707967A (zh) * 2017-09-30 2018-02-16 咪咕视讯科技有限公司 一种视频文件封面的确定方法、装置及计算机可读存储介质
CN108650524A (zh) * 2018-05-23 2018-10-12 腾讯科技(深圳)有限公司 视频封面生成方法、装置、计算机设备及存储介质
CN110191357A (zh) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 视频片段精彩度评估、动态封面生成方法及装置
CN110324662A (zh) * 2019-06-28 2019-10-11 北京奇艺世纪科技有限公司 一种视频封面生成方法及装置
TW202103021A (zh) * 2019-07-12 2021-01-16 大陸商信泰光學(深圳)有限公司 視訊封面編輯方法、系統及其電腦程式產品
CN111225236A (zh) * 2020-01-20 2020-06-02 北京百度网讯科技有限公司 生成视频封面的方法、装置、电子设备以及计算机可读存储介质
CN112069952A (zh) * 2020-08-25 2020-12-11 北京小米松果电子有限公司 视频片段提取方法、视频片段提取装置及存储介质

Also Published As

Publication number Publication date
CN115086709A (zh) 2022-09-20

Similar Documents

Publication Publication Date Title
US11061962B2 (en) Recommending and presenting comments relative to video frames
US20210342385A1 (en) Interactive method and system of bullet screen easter eggs
CN111432235A (zh) 直播视频生成方法、装置、计算机可读介质及电子设备
CN111310041B (zh) 图文发布的方法、模型的训练方法、装置及存储介质
JP7240505B2 (ja) 音声パケット推薦方法、装置、電子機器およびプログラム
CN114095749B (zh) 推荐及直播界面展示方法、计算机存储介质、程序产品
US11706496B2 (en) Echo bullet screen
WO2021103366A1 (zh) 基于微信小程序的弹幕处理方法和***
JP7337172B2 (ja) 音声パケット推薦方法、装置、電子機器およびプログラム
CN107515870B (zh) 一种搜索方法和装置、一种用于搜索的装置
CN111552884A (zh) 用于内容推荐的方法和设备
CN112182281B (zh) 一种音频推荐方法、装置及存储介质
US11843843B2 (en) Bullet screen key content jump method and bullet screen jump method
CN111259245A (zh) 作品推送方法、装置及存储介质
CN114286181A (zh) 一种视频优化方法、装置、电子设备和存储介质
WO2022188563A1 (zh) 动态封面设置方法和***
CN112711945B (zh) 广告召回方法和***
CN117150053A (zh) 多媒体信息推荐模型训练方法、推荐方法及装置
CN111193795B (zh) 信息推送方法及装置、电子设备和计算机可读存储介质
CN114996435A (zh) 基于人工智能的信息推荐方法、装置、设备及存储介质
US9357178B1 (en) Video-revenue prediction tool
CN114630194B (zh) 弹幕跳转链接的方法、***、设备及计算机可读存储介质
US12010405B2 (en) Generating video summary
CN113766257B (zh) 直播数据的处理方法、装置
CN113806542B (zh) 文本分析方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22766090

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22766090

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 22766090

Country of ref document: EP

Kind code of ref document: A1