CN111491209A - Video cover determining method and device, electronic equipment and storage medium - Google Patents

Video cover determining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111491209A
CN111491209A CN202010269593.7A CN202010269593A CN111491209A CN 111491209 A CN111491209 A CN 111491209A CN 202010269593 A CN202010269593 A CN 202010269593A CN 111491209 A CN111491209 A CN 111491209A
Authority
CN
China
Prior art keywords
video
frame
scene
target
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010269593.7A
Other languages
Chinese (zh)
Inventor
孙玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010269593.7A priority Critical patent/CN111491209A/en
Publication of CN111491209A publication Critical patent/CN111491209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4431OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB characterized by the use of Application Program Interface [API] libraries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for determining a video cover, wherein the method comprises the following steps: determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed; searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information. According to the video cover determining method and device, the electronic equipment and the storage medium, the video frame with higher relevance to the scene is used as the video cover for displaying through matching of the key information of the scene and the frame feature description information, the diversity of video cover displaying is increased, and therefore the success rate of video recommendation is improved.

Description

Video cover determining method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of internet, and in particular, to a method and an apparatus for determining a video cover, an electronic device, and a storage medium.
Background
The video cover is used for rapidly displaying the video content, so that a user can judge whether the video content is interested before browsing the video in a video website or a video APP.
In the prior art, for a specific video, the video cover of the specific video is not changed once being determined. For example, there is a movie video "red sea action" in a certain video APP, and its video cover is a poster picture of the movie, and the video cover is unchanged no matter what scene the user sees the video "red sea action".
Assuming that a user sees hotspot information of 'zhanghaiqing' from movie video 'red sea action' in a video website, and sees true heart after the zhanghaiqing is difficult, the user has an interest in the hotspot information, and clicks the hotspot information, the hotspot scene recommends and displays a plurality of videos related to the hotspot information for the user, wherein the videos comprise 'red sea action' of zhanghai translation and haiqing lead actor. However, when the user sees the video, the user cannot judge how the video is related to the hotspot information of "zhanghai action after the user has difficulty in seeing the real heart" through the video cover of the "honghai action", so that the user does not know whether to click to watch the video. Therefore, when the cover of the video and the scene where the video is located do not show an intuitive relationship, the user cannot pre-judge whether the video content is interested, so that the video cannot be clicked to watch, and the success rate of recommendation of the video is reduced.
Disclosure of Invention
In view of at least one of the above technical problems in the prior art, embodiments of the present invention provide a method and an apparatus for determining a video cover, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present invention provides a method for determining a cover of a video, including:
determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed;
searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;
and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
Optionally, determining a video cover of the target video in the scene to be displayed based on a frame index corresponding to the target frame feature description information, including:
determining a target video frame from the target video based on a frame index corresponding to the target frame feature description information;
and determining a video cover of the target video in the scene to be displayed according to the target video frame.
Optionally, the video cover is cut from the target video frame by using FFmpeg technology.
Optionally, determining a video cover of the target video in the scene to be displayed based on a frame index corresponding to the target frame feature description information, including:
sending the frame index corresponding to the target frame feature description information to a terminal;
receiving a video cover of the target video in the scene to be displayed, which is sent by a terminal; wherein the video cover is determined by the terminal based on a target video frame determined from the target video by the frame index.
Optionally, the video cover is obtained by the terminal by intercepting a target video frame through an API provided by a local operating system.
Optionally, the frame feature description information in the frame feature database corresponding to the target video is obtained based on an AI identification manner and/or a manual editing manner.
Optionally, the frame feature description information includes one or a combination of the following information: person names, role names, places, events, and lines.
In a second aspect, an embodiment of the present invention provides a video cover determination apparatus, including:
the key information determining module is used for determining key information corresponding to a scene to be displayed, and the key information is used for representing semantic information of the scene to be displayed;
the matching module is used for searching the target frame feature description information matched with the key information in a frame feature database corresponding to the target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;
and the cover determining module is used for determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the video cover determination method according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the video cover determination method according to the first aspect.
According to the video cover determining method and device, the electronic equipment and the storage medium, the video frame with higher relevance to the scene is used as the video cover for displaying through matching of the key information of the scene and the frame feature description information, the diversity of video cover displaying is increased, and therefore the success rate of video recommendation is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for determining a cover of a video according to an embodiment of the present invention;
FIG. 2 is another flow chart of a method for determining cover pages in a video according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a video cover determination apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a video cover determination method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s110, determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed;
specifically, the scene to be shown in the embodiment of the present invention may refer to a use scene containing specific information in a video website or a video APP. The scene may be a scene that a website or an APP shows according to certain hot spot information, or a scene that a user uses a specific search keyword when searching for a video, and the present invention is not particularly limited. For example, in the current popular movie in red sea, the video APP shows the hotspot information "zhangyaqing after having trouble seeing the real heart" in the website for the user to click on according to the impressive scenario in the movie. After clicking the hotspot, the user is in a scene to be displayed, namely 'Zhanghaiqing is difficult to see true heart'.
Specifically, the scene to be shown in the embodiment of the present invention includes certain semantic information, and the semantic information may be read by a server or a terminal, regardless of the scene to be shown according to some hot spot information, or a scene formed by a user using a specific search keyword when searching for a video, or other scenes to be shown that can be understood by those skilled in the art and are not listed in the embodiment of the present invention. Therefore, the key information corresponding to the scene to be displayed, which is to be determined in this step, is used to represent the semantic information of the scene to be displayed, and may be specifically represented as several keywords capable of representing the semantic information of the scene most. For example, when the user is in a scene to be displayed, that is, "zhangyihaiqing is difficult to see after the user finds true heart", the key information corresponding to the scene may be names of two actors, that is, "zhangyinjing" and "haiqing", because the two keywords are most likely to be the information focused by the user.
Further, the specific method for determining the key information corresponding to the scene to be displayed in this step may adopt an automatic text summarization (automatic summarization) method in Natural language Processing (N L P, Natural L angle Processing), and automatically implement text analysis, content induction, and automatic summary generation by using a background algorithm, or may also adopt segmentation and dictionary lookup in Natural language Understanding (N L U, Natural L angle Understanding) method, so as to use an artificial intelligence algorithm to understand the text semantics and extract information having specific semantics in the text, such as information of actors, roles, locations, and the like.
S120, searching target frame feature description information matched with the key information in a frame feature database corresponding to the target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;
specifically, the video in the embodiment of the present invention is composed of a plurality of video frames, and for each video frame, information indicating the feature of the video frame, that is, the frame index and the frame feature description information in this step, may be extracted from the video frame. When the frame feature data are stored, the frame indexes and the frame feature descriptors of a plurality of video frames can be combined into a frame feature database corresponding to the video, and the frame feature database can be used as a carrier for storage.
In particular, the frame index may be used to characterize temporal information of the video frame, i.e., a starting time of the video frame in the entire video, which information may be used to locate the video frame in the video. The frame feature description information refers to feature information extracted for specific contents of the video frame. The frame feature description information includes one or a combination of the following information: person names, role names, places, events, and lines. The character name refers to the name of an actor appearing in the video frame, the character name refers to the name of a character appearing in the video frame, the location refers to the location where a scenario in the video frame occurs, the event refers to the summary of the event occurring in the scenario in the video frame, and the speech-line refers to the speech-line spoken by the character in the video frame.
For example, for a video frame in the video "red sea action", the corresponding frame characterization information may include the following information:
character names: zhang Ying, Hai Qing;
role name: yangrui, Xiahan;
a place: a harbor wharf;
event: after the task is finished, the Yankee chat and the summer nan chat;
the lines: we are not already a combat friend.
As can be seen, the frame feature description information includes feature information of multiple dimensions of the corresponding video frame, and the feature information is related to the content of the video frame. In the embodiment of the invention, in order to realize the matching of the displayed video cover and the scene to be displayed, namely, which video frame in the video is selected as the video cover matched with the scene for displaying, the key information corresponding to the scene to be displayed can be matched with the frame characteristic description information corresponding to the video frame. Specifically, for a specific scene, the frame feature description information in the frame feature database may be searched and matched with the key information of the scene, and the matching result may be that the frame feature description information corresponding to which video frame includes the key information of the scene to be displayed.
For example, when the user is in a scene to be displayed, namely "zhangyihaiqing is difficult to see true heart", the key information corresponding to the scene may be names of two actors, namely "zhangyingqing" and "haiqing". When the frame feature database of the video 'red sea action' is searched for matching according to the key information 'translation' and 'haiqing', the character name information in the frame feature description information corresponding to one of the video frames is found to contain the 'translation' and the 'haiqing'. At this time, the frame feature description information is the matching result.
S130, determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
Specifically, the frame index and the frame feature description information have a corresponding relationship, and therefore, after matching with appropriate frame feature description information, the corresponding frame index information can be obtained. According to the time information of the video frame in the whole video represented by the frame index information, a frame image corresponding to the frame index can be obtained, wherein the frame image refers to an image picture corresponding to the video frame and belongs to the specific image content of the video frame.
Specifically, the following two methods for determining the video cover of the target video in the scene to be shown according to the frame index are provided.
The first method comprises the following steps: determining a target video frame from the target video based on a frame index corresponding to the target frame feature description information; and determining a video cover of the target video in the scene to be displayed according to the target video frame.
Specifically, the method may be that the server directly searches for video frame data locally stored by the server according to the frame index, finds a frame image corresponding to the frame index as a video cover by means of dynamic interception, and stores the frame image as a picture file with a specific format on the server for subsequent display in a scene to be displayed. The technology for dynamically intercepting and storing the frame picture can adopt an FFmpeg instruction, and the intercepting instruction is shown as the following form:
ffmpeg-i inputfile.avi-r 1-ss 01:30:14-vframes120 4cif-f image2image-%05d.jpeg。
the method has the advantages that the frame image is searched locally in the server, transmission of network data is not needed, and the method is simple and efficient to operate.
The second method is that the frame index corresponding to the target frame feature description information is sent to a terminal; receiving a video cover of the target video in the scene to be displayed, which is sent by a terminal; wherein the video cover is determined by the terminal based on a target video frame determined from the target video by the frame index.
The receiving of the frame index sent by the server in the embodiment of the present invention may refer to that the user terminal positions a specific frame image as a video cover to be subsequently displayed in a scene to be displayed through guidance of the frame index information. That is, the frame index received by the user terminal may refer to a frame index corresponding to a frame image to be displayed in the scene to be displayed of the video.
Specifically, the frame index determining process described in the embodiment of the present invention may be executed at a server, and the main steps at least include: the server side determines key information corresponding to a scene to be displayed, and searches frame feature description information matched with the key information in a frame feature database.
Specifically, the frame index is sent to the user terminal, the user terminal obtains a corresponding frame image from the server after learning the frame index, and calls an API provided by an operating system such as local Android or iOS of the terminal to intercept, and the intercepted video cover is used for being subsequently displayed in a scene to be displayed.
The method has the advantages that the method supports the user to carry out personalized operation, and can intercept frame images with corresponding sizes, parameters and picture styles as video covers according to the actual requirements of the user.
After the frame image corresponding to the frame index is obtained, the frame image needs to be displayed in the scene to be displayed. Specifically, the frame image is displayed in the scene to be displayed as the video cover of the video. Taking the scene "zhangyihaiqing is difficult to see after the attack" to be shown as an example, there are a plurality of recommended videos in the scene, including video "red sea action". In the method of an embodiment of the invention, the video cover of the video "red sea action" is no longer the default movie poster, but rather a specific frame image containing the actor's interpretation and the clearance as determined by the method of an embodiment of the invention.
According to the video cover determining method provided by the embodiment of the invention, the video cover with higher relevance to the scene is displayed through matching the key information of the scene with the frame characteristic description information, so that the diversity of video cover display is increased, and the success rate of video recommendation is improved.
Based on any of the above embodiments, fig. 2 is a schematic flow chart of a video cover determination method according to an embodiment of the present invention, before step S110, or between step S110 and step S120, the method further includes:
and S100, acquiring the frame feature description information based on an AI identification mode and/or a manual editing mode.
Specifically, before constructing the frame feature database, the embodiment of the present invention needs to extract frame feature information of a plurality of video frames, in particular, frame feature description information related to the content of the video frames. The embodiment of the invention at least provides three methods for extracting the frame feature description information of a plurality of video frames in a video, namely an AI identification mode, a manual editing mode and a mode combining AI identification and manual editing.
Specifically, in the AI recognition method, since a video often includes a large number of video frames, it is difficult to extract frame feature information of each video frame in practical applications. Therefore, the frame pictures of the video frames are collected once every preset time interval, and the frame index information of each video frame is determined. Then, recognizing the frame feature description information of the video frame according to intelligent recognition technologies such as face recognition, image recognition, character recognition and the like, wherein the frame feature description information at least comprises one or the combination of the following information: the person name, the role name, the location, the event, and the speech, wherein the specific identification technology is not limited in the embodiments of the present invention. The AI identification mode has the advantages that a large amount of frame characteristic information can be generated in a short time, and the efficiency is high; the method has the disadvantage that due to the limitation of the adopted identification algorithm, the situation of false identification can exist, so that the frame feature description information is caused.
Specifically, in the manual editing mode, an editor fills in a preset interval time in an adopted auxiliary tool to collect frame pictures of video frames, and at this time, frame index information of each video frame is determined. Then, the editor manually fills in the frame feature description information such as the character name, the role name, the place, the event, the speech and the like corresponding to each video frame by manually reading the content of the frame picture of each video frame. The AI identification mode has the advantages that the manually filled frame feature description information can be ensured to be as accurate as possible; the disadvantage is that the editor is required to spend a lot of effort to fill in information, which is inefficient.
Specifically, the characteristics of the two methods are integrated in a mode of combining AI identification and manual editing. Firstly, generating frame characteristic information by adopting an AI identification mode, and then providing an artificial verification auxiliary system for an editor, wherein the system can provide a visual browsing and editing interface for the generated frame characteristic data for the editor. The editor can manually edit the frame feature data online. In the process, the manually edited frame feature data has higher priority due to higher accuracy, and the manually edited data cannot be overwritten by data identified by the subsequent AI. Therefore, the mode of combining AI identification and manual editing gives consideration to the respective advantages of the two methods, and simultaneously ensures certain accuracy and efficiency.
In the three methods for extracting frame feature description information of a plurality of video frames in a video provided in the embodiment of the present invention, any one of the three methods may be selected according to an actual application environment, and the embodiment of the present invention is not limited specifically.
According to the video cover determining method provided by the embodiment of the invention, the frame feature description information of a plurality of video frames in the video is extracted by providing an AI identification mode, a manual editing mode, a combination of AI identification and manual editing and the like, so that various choices based on different accuracies and different efficiencies are provided for a user when the frame feature description information is extracted.
Based on any of the above embodiments, fig. 3 is a schematic structural diagram of a video cover determination apparatus provided in an embodiment of the present invention, where the apparatus includes:
a key information determining module 310, configured to determine key information corresponding to a scene to be displayed, where the key information is used to represent semantic information of the scene to be displayed;
specifically, the scene to be shown in the key information determining module 310 refers to a use scene containing specific information in a video website or a video APP. The scene may be a scene that a website or an APP shows according to certain hot spot information, or a scene that a user uses a specific search keyword when searching for a video, and the present invention is not particularly limited.
Specifically, the scene to be shown in the embodiment of the present invention includes certain semantic information, and the semantic information may be read by a server or a terminal, regardless of the scene to be shown according to some hot spot information, or a scene formed by a user using a specific search keyword when searching for a video, or other scenes to be shown that can be understood by those skilled in the art and are not listed in the embodiment of the present invention. Therefore, the key information corresponding to the scene to be displayed, which is to be determined in the embodiment of the present invention, refers to the summary information in the semantic information in the scene to be displayed, which may be specifically represented as several keywords capable of representing the semantic information of the scene most.
A matching module 320, configured to search, in a frame feature database corresponding to a target video, target frame feature description information matched with the key information; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;
specifically, the video in the embodiment of the present invention is composed of a plurality of video frames, and for each video frame, information indicating a feature of the video frame, that is, frame feature data, may be extracted from the video frame. When the frame feature data are saved, the frame feature data of a plurality of frames can be formed into a frame feature database, and the frame feature data can be stored by using a computer database as a carrier.
Specifically, the frame feature data corresponding to each video frame is composed of at least two parts of content: frame index and frame characterization information. The frame index is used to represent the time information of the video frame, i.e. the starting time of the video frame in the whole video, which can be used to locate the video frame in the video. The frame feature description information refers to feature information extracted for specific contents of the video frame.
The visible frame feature description information includes feature information of multiple dimensions of the corresponding video frame, and the feature information is related to the content of the video frame. The matching module 320 may match the key information corresponding to the scene to be displayed with the frame feature description information corresponding to the video frame in order to match the displayed video cover with the scene to be displayed, that is, select which video frame in the video to be displayed as the video cover matched with the scene for displaying. Specifically, for a specific scene, the frame feature description information in the frame feature database may be searched and matched with the key information of the scene, and the matching result may be that the frame feature description information corresponding to which video frame includes the key information of the scene to be displayed.
A cover determining module 330, configured to determine, based on the frame index corresponding to the target frame feature description information, a video cover of the target video in the scene to be displayed.
Specifically, the frame index and the frame feature description information have a corresponding relationship, and therefore, after matching with appropriate frame feature description information, the corresponding frame index information can be obtained. According to the time information of the video frame in the whole video represented by the frame index information, a frame image corresponding to the frame index can be obtained, wherein the frame image refers to an image picture corresponding to the video frame and belongs to the specific image content of the video frame.
After the frame image corresponding to the frame index is obtained, the frame image needs to be displayed in the scene to be displayed. Specifically, the frame image in the cover determination module 330 is displayed in the scene to be displayed as the video cover of the video.
According to the video cover display device provided by the embodiment of the invention, the video cover with higher relevance to the scene is displayed through matching the key information of the scene with the frame characteristic description information, so that the diversity of video cover display is increased, and the success rate of video recommendation is improved.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform the following method: determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed; searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index; and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the video cover determination method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed; searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index; and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for video cover determination, comprising:
determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed;
searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;
and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
2. The method for determining the video cover according to claim 1, wherein determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information comprises:
determining a target video frame from the target video based on a frame index corresponding to the target frame feature description information;
and determining a video cover of the target video in the scene to be displayed according to the target video frame.
3. The method of claim 2, wherein the video cover is captured from the target video frame by using FFmpeg techniques.
4. The method for determining the video cover according to claim 1, wherein determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information comprises:
sending the frame index corresponding to the target frame feature description information to a terminal;
receiving a video cover of the target video in the scene to be displayed, which is sent by a terminal; wherein the video cover is determined by the terminal based on a target video frame determined from the target video by the frame index.
5. The method of claim 4, wherein the video cover is captured from the target video frame by the terminal through an API provided by a local operating system.
6. The method for determining the cover of video according to any one of claims 1 to 5, wherein the frame feature description information in the frame feature database corresponding to the target video is obtained based on AI identification and/or manual editing.
7. The method of any of claims 1-5, wherein the frame feature description information comprises one or a combination of the following information: person names, role names, places, events, and lines.
8. A video cover determination apparatus, comprising:
the key information determining module is used for determining key information corresponding to a scene to be displayed, and the key information is used for representing semantic information of the scene to be displayed;
the matching module is used for searching the target frame feature description information matched with the key information in a frame feature database corresponding to the target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;
and the cover determining module is used for determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the video cover determination method according to any one of claims 1 to 7 are implemented when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the video cover determination method according to any one of claims 1 to 7.
CN202010269593.7A 2020-04-08 2020-04-08 Video cover determining method and device, electronic equipment and storage medium Pending CN111491209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010269593.7A CN111491209A (en) 2020-04-08 2020-04-08 Video cover determining method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010269593.7A CN111491209A (en) 2020-04-08 2020-04-08 Video cover determining method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111491209A true CN111491209A (en) 2020-08-04

Family

ID=71798659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010269593.7A Pending CN111491209A (en) 2020-04-08 2020-04-08 Video cover determining method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111491209A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112118467A (en) * 2020-09-17 2020-12-22 咪咕文化科技有限公司 Video cover showing method, electronic equipment and storage medium
CN113965798A (en) * 2021-10-25 2022-01-21 北京百度网讯科技有限公司 Video information generating and displaying method, device, equipment and storage medium
CN115474084A (en) * 2022-08-10 2022-12-13 北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for generating video cover image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804452A (en) * 2017-04-28 2018-11-13 合信息技术(北京)有限公司 Multimedia resource cover methods of exhibiting and device
CN108833938A (en) * 2018-06-20 2018-11-16 上海连尚网络科技有限公司 Method and apparatus for selecting video cover
CN108845742A (en) * 2018-06-22 2018-11-20 腾讯科技(深圳)有限公司 The acquisition methods and device and computer readable storage medium of a kind of image frame
US10163173B1 (en) * 2013-03-06 2018-12-25 Google Llc Methods for generating a cover photo with user provided pictures
CN109257645A (en) * 2018-09-11 2019-01-22 传线网络科技(上海)有限公司 Video cover generation method and device
CN110337011A (en) * 2019-07-17 2019-10-15 百度在线网络技术(北京)有限公司 Method for processing video frequency, device and equipment
CN110446063A (en) * 2019-07-26 2019-11-12 腾讯科技(深圳)有限公司 Generation method, device and the electronic equipment of video cover

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10163173B1 (en) * 2013-03-06 2018-12-25 Google Llc Methods for generating a cover photo with user provided pictures
CN108804452A (en) * 2017-04-28 2018-11-13 合信息技术(北京)有限公司 Multimedia resource cover methods of exhibiting and device
CN108833938A (en) * 2018-06-20 2018-11-16 上海连尚网络科技有限公司 Method and apparatus for selecting video cover
CN108845742A (en) * 2018-06-22 2018-11-20 腾讯科技(深圳)有限公司 The acquisition methods and device and computer readable storage medium of a kind of image frame
CN109257645A (en) * 2018-09-11 2019-01-22 传线网络科技(上海)有限公司 Video cover generation method and device
CN110337011A (en) * 2019-07-17 2019-10-15 百度在线网络技术(北京)有限公司 Method for processing video frequency, device and equipment
CN110446063A (en) * 2019-07-26 2019-11-12 腾讯科技(深圳)有限公司 Generation method, device and the electronic equipment of video cover

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112118467A (en) * 2020-09-17 2020-12-22 咪咕文化科技有限公司 Video cover showing method, electronic equipment and storage medium
CN112118467B (en) * 2020-09-17 2022-08-05 咪咕文化科技有限公司 Video cover showing method, electronic equipment and storage medium
CN113965798A (en) * 2021-10-25 2022-01-21 北京百度网讯科技有限公司 Video information generating and displaying method, device, equipment and storage medium
CN115474084A (en) * 2022-08-10 2022-12-13 北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for generating video cover image
CN115474084B (en) * 2022-08-10 2023-10-31 北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for generating video cover image

Similar Documents

Publication Publication Date Title
KR101810578B1 (en) Automatic media sharing via shutter click
CN103631819B (en) A kind of method and system of picture name
CN110781347A (en) Video processing method, device, equipment and readable storage medium
WO2019021088A1 (en) Navigating video scenes using cognitive insights
CN111491209A (en) Video cover determining method and device, electronic equipment and storage medium
CN112883258B (en) Information recommendation method and device, electronic equipment and storage medium
CN113079417B (en) Method, device and equipment for generating bullet screen and storage medium
CN103050025A (en) Mobile terminal learning method and learning system thereof
CN114342353A (en) Method and system for video segmentation
JP2022088304A (en) Method for processing video, device, electronic device, medium, and computer program
WO2022194102A1 (en) Image processing method and apparatus, computer device, storage medium, and program product
US20170132267A1 (en) Pushing system and method based on natural information recognition, and a client end
US20240037142A1 (en) Systems and methods for filtering of computer vision generated tags using natural language processing
CN111263186A (en) Video generation, playing, searching and processing method, device and storage medium
CN104462590A (en) Information searching method and device
CN107547922B (en) Information processing method, device, system and computer readable storage medium
US20210151038A1 (en) Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media
CN110019906B (en) Method and apparatus for displaying information
CN112580599A (en) Video identification method and device and computer readable storage medium
CN109791545B (en) Contextual information for resources including display of images
CN114186074A (en) Video search word recommendation method and device, electronic equipment and storage medium
CN110825954A (en) Keyword recommendation method and device and electronic equipment
CN115098729A (en) Video processing method, sample generation method, model training method and device
CN110661693A (en) Methods, computing device-readable storage media, and computing devices facilitating media-based content sharing performed in a computing device
CN115019138A (en) Video subtitle erasing, model training and interaction method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200804