CN111491209A

CN111491209A - Video cover determining method and device, electronic equipment and storage medium

Info

Publication number: CN111491209A
Application number: CN202010269593.7A
Authority: CN
Inventors: 孙玲
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2020-08-04

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for determining a video cover, wherein the method comprises the following steps: determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed; searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information. According to the video cover determining method and device, the electronic equipment and the storage medium, the video frame with higher relevance to the scene is used as the video cover for displaying through matching of the key information of the scene and the frame feature description information, the diversity of video cover displaying is increased, and therefore the success rate of video recommendation is improved.

Description

Video cover determining method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of internet, and in particular, to a method and an apparatus for determining a video cover, an electronic device, and a storage medium.

Background

The video cover is used for rapidly displaying the video content, so that a user can judge whether the video content is interested before browsing the video in a video website or a video APP.

In the prior art, for a specific video, the video cover of the specific video is not changed once being determined. For example, there is a movie video "red sea action" in a certain video APP, and its video cover is a poster picture of the movie, and the video cover is unchanged no matter what scene the user sees the video "red sea action".

Assuming that a user sees hotspot information of 'zhanghaiqing' from movie video 'red sea action' in a video website, and sees true heart after the zhanghaiqing is difficult, the user has an interest in the hotspot information, and clicks the hotspot information, the hotspot scene recommends and displays a plurality of videos related to the hotspot information for the user, wherein the videos comprise 'red sea action' of zhanghai translation and haiqing lead actor. However, when the user sees the video, the user cannot judge how the video is related to the hotspot information of "zhanghai action after the user has difficulty in seeing the real heart" through the video cover of the "honghai action", so that the user does not know whether to click to watch the video. Therefore, when the cover of the video and the scene where the video is located do not show an intuitive relationship, the user cannot pre-judge whether the video content is interested, so that the video cannot be clicked to watch, and the success rate of recommendation of the video is reduced.

Disclosure of Invention

In view of at least one of the above technical problems in the prior art, embodiments of the present invention provide a method and an apparatus for determining a video cover, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present invention provides a method for determining a cover of a video, including:

determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed;

searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;

and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.

Optionally, determining a video cover of the target video in the scene to be displayed based on a frame index corresponding to the target frame feature description information, including:

determining a target video frame from the target video based on a frame index corresponding to the target frame feature description information;

and determining a video cover of the target video in the scene to be displayed according to the target video frame.

Optionally, the video cover is cut from the target video frame by using FFmpeg technology.

sending the frame index corresponding to the target frame feature description information to a terminal;

receiving a video cover of the target video in the scene to be displayed, which is sent by a terminal; wherein the video cover is determined by the terminal based on a target video frame determined from the target video by the frame index.

Optionally, the video cover is obtained by the terminal by intercepting a target video frame through an API provided by a local operating system.

Optionally, the frame feature description information in the frame feature database corresponding to the target video is obtained based on an AI identification manner and/or a manual editing manner.

Optionally, the frame feature description information includes one or a combination of the following information: person names, role names, places, events, and lines.

In a second aspect, an embodiment of the present invention provides a video cover determination apparatus, including:

the key information determining module is used for determining key information corresponding to a scene to be displayed, and the key information is used for representing semantic information of the scene to be displayed;

the matching module is used for searching the target frame feature description information matched with the key information in a frame feature database corresponding to the target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;

and the cover determining module is used for determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the video cover determination method according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the video cover determination method according to the first aspect.

According to the video cover determining method and device, the electronic equipment and the storage medium, the video frame with higher relevance to the scene is used as the video cover for displaying through matching of the key information of the scene and the frame feature description information, the diversity of video cover displaying is increased, and therefore the success rate of video recommendation is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for determining a cover of a video according to an embodiment of the present invention;

FIG. 2 is another flow chart of a method for determining cover pages in a video according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a video cover determination apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a video cover determination method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

s110, determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed;

specifically, the scene to be shown in the embodiment of the present invention may refer to a use scene containing specific information in a video website or a video APP. The scene may be a scene that a website or an APP shows according to certain hot spot information, or a scene that a user uses a specific search keyword when searching for a video, and the present invention is not particularly limited. For example, in the current popular movie in red sea, the video APP shows the hotspot information "zhangyaqing after having trouble seeing the real heart" in the website for the user to click on according to the impressive scenario in the movie. After clicking the hotspot, the user is in a scene to be displayed, namely 'Zhanghaiqing is difficult to see true heart'.

Specifically, the scene to be shown in the embodiment of the present invention includes certain semantic information, and the semantic information may be read by a server or a terminal, regardless of the scene to be shown according to some hot spot information, or a scene formed by a user using a specific search keyword when searching for a video, or other scenes to be shown that can be understood by those skilled in the art and are not listed in the embodiment of the present invention. Therefore, the key information corresponding to the scene to be displayed, which is to be determined in this step, is used to represent the semantic information of the scene to be displayed, and may be specifically represented as several keywords capable of representing the semantic information of the scene most. For example, when the user is in a scene to be displayed, that is, "zhangyihaiqing is difficult to see after the user finds true heart", the key information corresponding to the scene may be names of two actors, that is, "zhangyinjing" and "haiqing", because the two keywords are most likely to be the information focused by the user.

Further, the specific method for determining the key information corresponding to the scene to be displayed in this step may adopt an automatic text summarization (automatic summarization) method in Natural language Processing (N L P, Natural L angle Processing), and automatically implement text analysis, content induction, and automatic summary generation by using a background algorithm, or may also adopt segmentation and dictionary lookup in Natural language Understanding (N L U, Natural L angle Understanding) method, so as to use an artificial intelligence algorithm to understand the text semantics and extract information having specific semantics in the text, such as information of actors, roles, locations, and the like.

S120, searching target frame feature description information matched with the key information in a frame feature database corresponding to the target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;

specifically, the video in the embodiment of the present invention is composed of a plurality of video frames, and for each video frame, information indicating the feature of the video frame, that is, the frame index and the frame feature description information in this step, may be extracted from the video frame. When the frame feature data are stored, the frame indexes and the frame feature descriptors of a plurality of video frames can be combined into a frame feature database corresponding to the video, and the frame feature database can be used as a carrier for storage.

In particular, the frame index may be used to characterize temporal information of the video frame, i.e., a starting time of the video frame in the entire video, which information may be used to locate the video frame in the video. The frame feature description information refers to feature information extracted for specific contents of the video frame. The frame feature description information includes one or a combination of the following information: person names, role names, places, events, and lines. The character name refers to the name of an actor appearing in the video frame, the character name refers to the name of a character appearing in the video frame, the location refers to the location where a scenario in the video frame occurs, the event refers to the summary of the event occurring in the scenario in the video frame, and the speech-line refers to the speech-line spoken by the character in the video frame.

For example, for a video frame in the video "red sea action", the corresponding frame characterization information may include the following information:

character names: zhang Ying, Hai Qing;

role name: yangrui, Xiahan;

a place: a harbor wharf;

event: after the task is finished, the Yankee chat and the summer nan chat;

the lines: we are not already a combat friend.

As can be seen, the frame feature description information includes feature information of multiple dimensions of the corresponding video frame, and the feature information is related to the content of the video frame. In the embodiment of the invention, in order to realize the matching of the displayed video cover and the scene to be displayed, namely, which video frame in the video is selected as the video cover matched with the scene for displaying, the key information corresponding to the scene to be displayed can be matched with the frame characteristic description information corresponding to the video frame. Specifically, for a specific scene, the frame feature description information in the frame feature database may be searched and matched with the key information of the scene, and the matching result may be that the frame feature description information corresponding to which video frame includes the key information of the scene to be displayed.

For example, when the user is in a scene to be displayed, namely "zhangyihaiqing is difficult to see true heart", the key information corresponding to the scene may be names of two actors, namely "zhangyingqing" and "haiqing". When the frame feature database of the video 'red sea action' is searched for matching according to the key information 'translation' and 'haiqing', the character name information in the frame feature description information corresponding to one of the video frames is found to contain the 'translation' and the 'haiqing'. At this time, the frame feature description information is the matching result.

S130, determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.

Specifically, the frame index and the frame feature description information have a corresponding relationship, and therefore, after matching with appropriate frame feature description information, the corresponding frame index information can be obtained. According to the time information of the video frame in the whole video represented by the frame index information, a frame image corresponding to the frame index can be obtained, wherein the frame image refers to an image picture corresponding to the video frame and belongs to the specific image content of the video frame.

Specifically, the following two methods for determining the video cover of the target video in the scene to be shown according to the frame index are provided.

The first method comprises the following steps: determining a target video frame from the target video based on a frame index corresponding to the target frame feature description information; and determining a video cover of the target video in the scene to be displayed according to the target video frame.

Specifically, the method may be that the server directly searches for video frame data locally stored by the server according to the frame index, finds a frame image corresponding to the frame index as a video cover by means of dynamic interception, and stores the frame image as a picture file with a specific format on the server for subsequent display in a scene to be displayed. The technology for dynamically intercepting and storing the frame picture can adopt an FFmpeg instruction, and the intercepting instruction is shown as the following form:

ffmpeg-i inputfile.avi-r 1-ss 01:30:14-vframes120 4cif-f image2image-％05d.jpeg。

the method has the advantages that the frame image is searched locally in the server, transmission of network data is not needed, and the method is simple and efficient to operate.

The second method is that the frame index corresponding to the target frame feature description information is sent to a terminal; receiving a video cover of the target video in the scene to be displayed, which is sent by a terminal; wherein the video cover is determined by the terminal based on a target video frame determined from the target video by the frame index.

The receiving of the frame index sent by the server in the embodiment of the present invention may refer to that the user terminal positions a specific frame image as a video cover to be subsequently displayed in a scene to be displayed through guidance of the frame index information. That is, the frame index received by the user terminal may refer to a frame index corresponding to a frame image to be displayed in the scene to be displayed of the video.

Specifically, the frame index determining process described in the embodiment of the present invention may be executed at a server, and the main steps at least include: the server side determines key information corresponding to a scene to be displayed, and searches frame feature description information matched with the key information in a frame feature database.

Specifically, the frame index is sent to the user terminal, the user terminal obtains a corresponding frame image from the server after learning the frame index, and calls an API provided by an operating system such as local Android or iOS of the terminal to intercept, and the intercepted video cover is used for being subsequently displayed in a scene to be displayed.

The method has the advantages that the method supports the user to carry out personalized operation, and can intercept frame images with corresponding sizes, parameters and picture styles as video covers according to the actual requirements of the user.

After the frame image corresponding to the frame index is obtained, the frame image needs to be displayed in the scene to be displayed. Specifically, the frame image is displayed in the scene to be displayed as the video cover of the video. Taking the scene "zhangyihaiqing is difficult to see after the attack" to be shown as an example, there are a plurality of recommended videos in the scene, including video "red sea action". In the method of an embodiment of the invention, the video cover of the video "red sea action" is no longer the default movie poster, but rather a specific frame image containing the actor's interpretation and the clearance as determined by the method of an embodiment of the invention.

According to the video cover determining method provided by the embodiment of the invention, the video cover with higher relevance to the scene is displayed through matching the key information of the scene with the frame characteristic description information, so that the diversity of video cover display is increased, and the success rate of video recommendation is improved.

Based on any of the above embodiments, fig. 2 is a schematic flow chart of a video cover determination method according to an embodiment of the present invention, before step S110, or between step S110 and step S120, the method further includes:

and S100, acquiring the frame feature description information based on an AI identification mode and/or a manual editing mode.

Specifically, before constructing the frame feature database, the embodiment of the present invention needs to extract frame feature information of a plurality of video frames, in particular, frame feature description information related to the content of the video frames. The embodiment of the invention at least provides three methods for extracting the frame feature description information of a plurality of video frames in a video, namely an AI identification mode, a manual editing mode and a mode combining AI identification and manual editing.

Specifically, in the AI recognition method, since a video often includes a large number of video frames, it is difficult to extract frame feature information of each video frame in practical applications. Therefore, the frame pictures of the video frames are collected once every preset time interval, and the frame index information of each video frame is determined. Then, recognizing the frame feature description information of the video frame according to intelligent recognition technologies such as face recognition, image recognition, character recognition and the like, wherein the frame feature description information at least comprises one or the combination of the following information: the person name, the role name, the location, the event, and the speech, wherein the specific identification technology is not limited in the embodiments of the present invention. The AI identification mode has the advantages that a large amount of frame characteristic information can be generated in a short time, and the efficiency is high; the method has the disadvantage that due to the limitation of the adopted identification algorithm, the situation of false identification can exist, so that the frame feature description information is caused.

Specifically, in the manual editing mode, an editor fills in a preset interval time in an adopted auxiliary tool to collect frame pictures of video frames, and at this time, frame index information of each video frame is determined. Then, the editor manually fills in the frame feature description information such as the character name, the role name, the place, the event, the speech and the like corresponding to each video frame by manually reading the content of the frame picture of each video frame. The AI identification mode has the advantages that the manually filled frame feature description information can be ensured to be as accurate as possible; the disadvantage is that the editor is required to spend a lot of effort to fill in information, which is inefficient.

Specifically, the characteristics of the two methods are integrated in a mode of combining AI identification and manual editing. Firstly, generating frame characteristic information by adopting an AI identification mode, and then providing an artificial verification auxiliary system for an editor, wherein the system can provide a visual browsing and editing interface for the generated frame characteristic data for the editor. The editor can manually edit the frame feature data online. In the process, the manually edited frame feature data has higher priority due to higher accuracy, and the manually edited data cannot be overwritten by data identified by the subsequent AI. Therefore, the mode of combining AI identification and manual editing gives consideration to the respective advantages of the two methods, and simultaneously ensures certain accuracy and efficiency.

In the three methods for extracting frame feature description information of a plurality of video frames in a video provided in the embodiment of the present invention, any one of the three methods may be selected according to an actual application environment, and the embodiment of the present invention is not limited specifically.

According to the video cover determining method provided by the embodiment of the invention, the frame feature description information of a plurality of video frames in the video is extracted by providing an AI identification mode, a manual editing mode, a combination of AI identification and manual editing and the like, so that various choices based on different accuracies and different efficiencies are provided for a user when the frame feature description information is extracted.

Based on any of the above embodiments, fig. 3 is a schematic structural diagram of a video cover determination apparatus provided in an embodiment of the present invention, where the apparatus includes:

a key information determining module 310, configured to determine key information corresponding to a scene to be displayed, where the key information is used to represent semantic information of the scene to be displayed;

specifically, the scene to be shown in the key information determining module 310 refers to a use scene containing specific information in a video website or a video APP. The scene may be a scene that a website or an APP shows according to certain hot spot information, or a scene that a user uses a specific search keyword when searching for a video, and the present invention is not particularly limited.

Specifically, the scene to be shown in the embodiment of the present invention includes certain semantic information, and the semantic information may be read by a server or a terminal, regardless of the scene to be shown according to some hot spot information, or a scene formed by a user using a specific search keyword when searching for a video, or other scenes to be shown that can be understood by those skilled in the art and are not listed in the embodiment of the present invention. Therefore, the key information corresponding to the scene to be displayed, which is to be determined in the embodiment of the present invention, refers to the summary information in the semantic information in the scene to be displayed, which may be specifically represented as several keywords capable of representing the semantic information of the scene most.

A matching module 320, configured to search, in a frame feature database corresponding to a target video, target frame feature description information matched with the key information; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index;

specifically, the video in the embodiment of the present invention is composed of a plurality of video frames, and for each video frame, information indicating a feature of the video frame, that is, frame feature data, may be extracted from the video frame. When the frame feature data are saved, the frame feature data of a plurality of frames can be formed into a frame feature database, and the frame feature data can be stored by using a computer database as a carrier.

Specifically, the frame feature data corresponding to each video frame is composed of at least two parts of content: frame index and frame characterization information. The frame index is used to represent the time information of the video frame, i.e. the starting time of the video frame in the whole video, which can be used to locate the video frame in the video. The frame feature description information refers to feature information extracted for specific contents of the video frame.

The visible frame feature description information includes feature information of multiple dimensions of the corresponding video frame, and the feature information is related to the content of the video frame. The matching module 320 may match the key information corresponding to the scene to be displayed with the frame feature description information corresponding to the video frame in order to match the displayed video cover with the scene to be displayed, that is, select which video frame in the video to be displayed as the video cover matched with the scene for displaying. Specifically, for a specific scene, the frame feature description information in the frame feature database may be searched and matched with the key information of the scene, and the matching result may be that the frame feature description information corresponding to which video frame includes the key information of the scene to be displayed.

A cover determining module 330, configured to determine, based on the frame index corresponding to the target frame feature description information, a video cover of the target video in the scene to be displayed.

After the frame image corresponding to the frame index is obtained, the frame image needs to be displayed in the scene to be displayed. Specifically, the frame image in the cover determination module 330 is displayed in the scene to be displayed as the video cover of the video.

According to the video cover display device provided by the embodiment of the invention, the video cover with higher relevance to the scene is displayed through matching the key information of the scene with the frame characteristic description information, so that the diversity of video cover display is increased, and the success rate of video recommendation is improved.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform the following method: determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed; searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index; and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the video cover determination method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: determining key information corresponding to a scene to be displayed, wherein the key information is used for representing semantic information of the scene to be displayed; searching target frame feature description information matched with the key information in a frame feature database corresponding to a target video; wherein the frame feature database comprises a frame index of a video frame of the target video and frame feature description information corresponding to the frame index and used for representing semantic information of the video frame indicated by the frame index; and determining a video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for video cover determination, comprising:

2. The method for determining the video cover according to claim 1, wherein determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information comprises:

3. The method of claim 2, wherein the video cover is captured from the target video frame by using FFmpeg techniques.

4. The method for determining the video cover according to claim 1, wherein determining the video cover of the target video in the scene to be displayed based on the frame index corresponding to the target frame feature description information comprises:

5. The method of claim 4, wherein the video cover is captured from the target video frame by the terminal through an API provided by a local operating system.

6. The method for determining the cover of video according to any one of claims 1 to 5, wherein the frame feature description information in the frame feature database corresponding to the target video is obtained based on AI identification and/or manual editing.

7. The method of any of claims 1-5, wherein the frame feature description information comprises one or a combination of the following information: person names, role names, places, events, and lines.

8. A video cover determination apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the video cover determination method according to any one of claims 1 to 7 are implemented when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the video cover determination method according to any one of claims 1 to 7.