CN114390369A

CN114390369A - Dynamic cover generation method, device, equipment and storage medium

Info

Publication number: CN114390369A
Application number: CN202011143646.7A
Authority: CN
Inventors: 张好; 王志豪; 刘洛麒
Original assignee: Beijing Hongxiang Technical Service Co Ltd
Current assignee: Beijing Hongxiang Technical Service Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2022-04-22

Abstract

The invention discloses a dynamic cover generation method, a device, equipment and a storage medium, wherein the method comprises the steps of performing frame extraction on a target video to obtain a video picture set, obtaining picture visual quality information corresponding to each video picture in the set, performing picture content analysis on each video picture, selecting an initial picture from the target video according to the picture visual quality information and a content analysis result, reading a video sequence with a preset length by taking the initial picture as a starting point, and generating a dynamic cover according to the video sequence. The method has the advantages that the initial picture is selected from the video according to the picture visual quality information and the content analysis result of each frame of picture, the video sequence with the preset length is read from the video according to the initial picture, and the dynamic cover is generated according to the video sequence.

Description

Dynamic cover generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a dynamic cover.

Background

The video products attract users to click through a dynamic cover page mode when various long and short videos are displayed. At present, most of dynamic covers are still manually edited by a User or platform operation by using a video editing tool, which seriously affects the production efficiency of User's original Content (UGC), Professional produced Content (PGC), and other contents, and therefore how to automatically generate high-quality dynamic covers can be quickly and efficiently achieved becomes an urgent problem to be solved.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for generating a dynamic cover, and aims to solve the technical problem of how to quickly and efficiently realize automatic generation of a high-quality dynamic cover.

In order to achieve the above object, the present invention provides a dynamic cover generation method, which comprises the following steps:

performing frame extraction on a target video to obtain a video picture set;

acquiring picture visual quality information corresponding to each video picture in the video picture set, and performing picture content analysis on each video picture to obtain a content analysis result;

selecting an initial picture from the target video according to the picture visual quality information and the content analysis result;

and reading a video sequence with a preset length from the target video by taking the starting picture as a starting point, and generating a dynamic cover according to the video sequence.

Optionally, the step of performing frame extraction on the target video to obtain a video picture set includes:

performing frame extraction on a target video to obtain a to-be-processed picture set;

carrying out film head and film tail detection on the target video, and determining a picture to be removed according to a detection result;

and removing the picture to be removed from the picture set to be processed to obtain a video picture set.

Optionally, the step of obtaining the picture visual quality information corresponding to each video picture in the video picture set includes:

acquiring picture definition and/or picture brightness corresponding to each video picture in the video picture set;

and determining picture visual quality information corresponding to each video picture according to the picture definition and/or the picture brightness.

Optionally, the step of performing picture content analysis on each video picture to obtain a content analysis result includes:

acquiring video title information of the target video;

extracting title features corresponding to the video title information and picture features corresponding to all video pictures;

calculating the title matching degree between each video picture and the video title information according to the title characteristics and the picture characteristics;

and taking the title matching degree as a content analysis result.

Optionally, before the step of using the title matching degree as a content analysis result, the method further includes:

extracting video title entities from the video title information, and acquiring entity concept matching degrees between each video picture and the video title entities;

calculating the image-text matching degree corresponding to each video picture according to the title matching degree and the entity concept matching degree;

correspondingly, the step of using the title matching degree as a content analysis result comprises the following steps:

and taking the image-text matching degree as a content analysis result.

Optionally, before the step of obtaining the entity concept matching degree between each video picture and the video title entity, the method further includes:

carrying out knowledge graph analysis on field objects contained in the video title entity to obtain a graph analysis result;

identifying each video picture by adopting an image target identification technology to obtain a picture object contained in each video picture;

and determining entity concept matching degree between each video picture and the video title entity according to the picture object and the map analysis result.

Optionally, the step of obtaining the picture visual quality information corresponding to each video picture in the video picture set, and performing picture content analysis on each video picture to obtain a content analysis result includes:

traversing each video picture in the video picture set, and performing key frame analysis on each traversed frame of video picture to obtain a key frame picture;

acquiring picture visual quality information corresponding to each key frame picture;

performing picture content analysis on the key frame pictures to obtain content analysis results corresponding to the key frame pictures;

the step of selecting an initial picture from the target video according to the picture visual quality information and the content analysis result includes:

sequencing the key frame pictures according to the picture visual quality information corresponding to each key frame picture and the content analysis result corresponding to each key frame picture;

and selecting a starting picture from the obtained key frame pictures according to the sequencing result.

In addition, to achieve the above object, the present invention further provides a dynamic cover generation apparatus, including:

the video frame extracting module is used for extracting frames of the target video to obtain a video picture set;

the picture analysis module is used for acquiring picture visual quality information corresponding to each video picture in the video picture set, and performing picture content analysis on each video picture to obtain a content analysis result;

the picture selection module is used for selecting an initial picture from the target video according to the picture visual quality information and the content analysis result;

and the cover generation module is used for reading a video sequence with a preset length from the target video by taking the starting picture as a starting point and generating a dynamic cover according to the video sequence.

In addition, to achieve the above object, the present invention also provides a dynamic cover creation apparatus, including: a memory, a processor, and a dynamic cover generation program stored on the memory and executable on the processor, the dynamic cover generation program configured to implement the steps of the dynamic cover generation method as described above.

In addition, to achieve the above object, the present invention further provides a storage medium having a dynamic cover generation program stored thereon, wherein the dynamic cover generation program, when executed by a processor, implements the steps of the dynamic cover generation method as described above.

The method comprises the steps of obtaining a video picture set by frame extraction of a target video, obtaining picture visual quality information corresponding to each video picture in the video picture set, analyzing picture content of each video picture to obtain a content analysis result, selecting an initial picture from the target video according to the picture visual quality information and the content analysis result, reading a video sequence with a preset length from the target video by taking the initial picture as a starting point, and generating a dynamic cover according to the video sequence. The method has the advantages that the initial picture is selected from the video according to the picture visual quality information and the content analysis result of each frame of picture, the video sequence with the preset length is read from the video according to the initial picture, and the dynamic cover is generated according to the video sequence.

Drawings

FIG. 1 is a schematic diagram of a dynamic cover generation device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for generating dynamic covers according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for generating dynamic covers according to the present invention;

FIG. 4 is a flowchart illustrating a method for generating dynamic covers according to a third embodiment of the present invention;

FIG. 5 is a block diagram of a first embodiment of the dynamic cover creation apparatus of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a dynamic cover creation device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the dynamic cover generation apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the dynamic cover generation apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and a dynamic cover page generation program.

In the dynamic cover production apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the dynamic cover generation device of the present invention may be provided in the dynamic cover generation device, and the dynamic cover generation device calls the dynamic cover generation program stored in the memory 1005 through the processor 1001 and executes the dynamic cover generation method provided by the embodiment of the present invention.

An embodiment of the present invention provides a dynamic cover generation method, and referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the dynamic cover generation method according to the present invention.

In this embodiment, the dynamic cover generation method includes the following steps:

step S10: performing frame extraction on a target video to obtain a video picture set;

it should be noted that the execution subject of the method of this embodiment may be a computing service device with image processing, network communication and program running functions, such as a smart phone, a tablet computer or a personal computer, or an application program or a product platform capable of providing video product services. The present embodiment and the following embodiments are described below by taking a product platform as an example.

In this step, the target video may be an audio/video file received by the product platform and required to generate a dynamic cover page according to the target video, and the video may be uploaded to the product platform by a user. The frame extraction may be to extract picture frames or image frames in the target video in a set manner, where the set manner may be random extraction or uniform extraction (for example, a certain duration or a certain number of frames at a certain interval), and this embodiment does not limit this.

In specific implementation, when the product platform receives a target video serving as a basis for generating a dynamic cover page, the target video can be subjected to frame extraction according to a set mode to obtain a video picture set.

It should be understood that, for a piece of video, there are generally video segments in the video that are not related or related to the content of the video main body, such as the head, the end, etc. of the video, and these video segments usually cannot represent the main body idea of a piece of video. Therefore, in order to ensure that the finally generated dynamic cover page can represent the core content or main idea of the video and reduce the processing workload of the product platform on the video in practical applications, the step S10 in this embodiment may specifically include:

step S101: performing frame extraction on a target video to obtain a to-be-processed picture set;

step S102: carrying out film head and film tail detection on the target video, and determining a picture to be removed according to a detection result;

it should be noted that, in this embodiment, the method for detecting the beginning and the end of a video may be implemented by using a pre-trained image feature model (or algorithm) and combining the statistical results.

The image feature model can be used for extracting feature description vectors of the pictures, and then calculating feature similarity of adjacent pictures through the feature description vectors, wherein the pictures with the feature similarity smaller than a set threshold are identified as the same scene segment. Meanwhile, the product platform in this embodiment can also count the proportion of the duration of the head and the tail of a large number of videos in the whole video, then setting a leader and trailer time length threshold according to the statistical result, wherein the scene segment with the ending time of the scene segment less than or equal to the leader time length threshold is determined as a leader segment, the scene segment with the starting time of the scene segment greater than or equal to the trailer time length threshold is determined as a trailer segment, for example, according to the statistical result, the leader duration accounts for 8% of the total duration of the video, and the trailer duration accounts for 5% of the total duration of the video, then for a video with a one-minute duration, its leader and trailer duration thresholds are 5 seconds and 3 seconds respectively, therefore, a scene segment whose ending time is within 5 seconds before the video starts is considered as a head segment, and a scene segment whose starting time is within 3 seconds before the video ends is considered as a tail segment.

In this embodiment, the pictures to be removed, that is, the pictures determined to belong to the head picture or the end picture according to the detection result, for example, the pictures generally containing the introduction of the video content, are basically concentrated on the head or the end of the video, and such pictures need to be removed when a dynamic seal is generated.

Step S103: and removing the picture to be removed from the picture set to be processed to obtain a video picture set.

In a specific implementation, the product platform performs preliminary screening on the picture set to be processed according to the above manner, and then obtains a video picture set.

Step S20: acquiring picture visual quality information corresponding to each video picture in the video picture set, and performing picture content analysis on each video picture to obtain a content analysis result;

it should be noted that the picture visual quality information may be information or parameters capable of characterizing the picture effect, such as the definition, brightness, resolution, and the like of the image. In the specific implementation, the method simplifies the acquisition mode of the picture visual quality information and ensures the reliability of the picture visual quality information. The product platform can acquire the picture definition and/or the picture brightness corresponding to each video picture in the video picture set; and then determining picture visual quality information corresponding to each video picture according to the picture definition and/or the picture brightness.

In this embodiment, the picture content analysis may be to analyze the degree of correlation (or matching degree) between image elements included in the picture and a video title or a theme, or to analyze the degree of visual emphasis corresponding to the picture. The visual wonderful degree of the pictures can be realized through a pre-trained deep learning regression model, and the trained model can score the wonderful degree of each frame of picture in the video to obtain a corresponding visual wonderful degree result.

It should be appreciated that whether a piece of strange video is appealing to the user for their click-through viewing depends largely on whether the summary information (e.g., video subject, cover, vignette, etc.) of the video is sufficiently appealing to the eye. In terms of the cover of the video, the dynamic cover is more attractive to the user than the static cover, the visual effect presented by the dynamic cover can be represented by the attribute of the picture, the visual quality and the wonderful degree of the picture are considered preferentially in the embodiment, and the picture with the visual quality and the wonderful degree meeting the requirements is selected from a section of video to be used as the generated picture of the dynamic cover, so that the finally generated dynamic cover can express the core content of the video and can also attract the user.

In specific implementation, after the product platform acquires the video picture set, the product platform can also acquire picture visual quality information such as definition, brightness, resolution and the like corresponding to each video picture in the video picture set; and simultaneously, analyzing the content of each video picture to obtain a content analysis result.

Step S30: selecting an initial picture from the target video according to the picture visual quality information and the content analysis result;

it should be understood that after the product platform acquires the picture visual quality information and the content analysis result corresponding to each video picture in the video picture set, the product platform can select an initial picture from the target video according to the picture visual quality information and the content analysis result.

As an implementation manner, in this embodiment, the selection of the starting picture may be: firstly, according to picture visual quality information (such as definition, brightness and/or resolution), performing descending sorting on the quality of each video picture to obtain a first sorting result, and then according to a content analysis result (such as matching degree with a video title and/or visual wonderful degree), performing descending sorting on the content of each video picture to obtain a second sorting result; and finally, selecting a starting picture from the target video according to the two sequencing results. For example, if the candidate picture determined by the product platform according to the first sorting result is A, B, C and the candidate picture determined according to the second sorting result is C, D, E, the candidate picture C can be selected as the starting picture according to the two sorting results. This is merely an example, and does not represent a limitation on the manner of selecting the starting picture.

Further, considering that there may be a plurality of parameters with different dimensions in the picture visual quality information and the content analysis result, for example, the unit of image resolution is dpi, and the unit of picture brightness is cd/m²If parameters with different dimensions are directly subjected to data operation, the weights of some parameters are reduced, so that the operation result is distorted and the error is large. Therefore, in order to effectively quantify the quality of each video picture through the picture visual quality information, the product platform of the embodiment further performs normalization processing on the picture visual quality information and the data of the numerical value type included in the content analysis result, and then performs corresponding sorting operation according to the data after the normalization processing.

As another implementation, the selection of the starting picture in this embodiment may also be: grading each video picture according to picture visual quality information (such as definition, brightness and/or resolution) to obtain a first grading result; scoring each video picture according to the content analysis result (such as the matching degree with the video title and/or the visual wonderful degree) to obtain a second scoring result; and then calculating the total score of each frame of video picture according to the first scoring result and the second scoring result, performing descending order on the total score, and selecting the first ordered video picture as an initial picture according to the ordering result. This is merely an example, and does not represent a limitation on the manner of selecting the starting picture.

It should be noted that, in the above embodiment, the total score of each frame of video picture may be calculated by weighted summation, or two scores may be directly added, which is not limited in this embodiment.

Step S40: and reading a video sequence with a preset length from the target video by taking the starting picture as a starting point, and generating a dynamic cover according to the video sequence.

It should be understood that after determining a starting picture which can best represent the core content or main idea of the target video and has a high picture visual quality, the product platform may use the starting picture as a starting frame, read a video sequence of a preset length from the target video according to the playing order of the video, for example, continuously read a video sequence of several frames or several seconds from the starting frame backwards, and then generate a dynamic cover page according to the video sequence.

In the embodiment, a video picture set is obtained by performing frame extraction on a target video, picture visual quality information corresponding to each video picture in the video picture set is obtained, picture content analysis is performed on each video picture, a content analysis result is obtained, a starting picture is selected from the target video according to the picture visual quality information and the content analysis result, a video sequence with a preset length is read from the target video by taking the starting picture as a starting point, and a dynamic cover is generated according to the video sequence. The method has the advantages that the initial picture is selected from the video according to the picture visual quality information and the content analysis result of each frame of picture, the video sequence with the preset length is read from the video according to the initial picture, and the dynamic cover is generated according to the video sequence.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a dynamic cover generation method according to the present invention.

Based on the first embodiment, in this embodiment, the step of performing picture content analysis on each video picture in the step S20 to obtain a content analysis result may include:

step S201: acquiring video title information of the target video;

it should be noted that the video title information may be the names of the subjects corresponding to the videos, such as "game videos of basketball team a and basketball team B", and the video title information includes characters or fields representing the main content and subject of the video.

Step S202: extracting title features corresponding to the video title information and picture features corresponding to all video pictures;

it should be noted that the title features may be features capable of characterizing the main content and subject matter of the video title, such as keywords or keywords in the video title. The picture features are features of image elements in the video picture.

In specific implementation, the product platform can perform text feature recognition on video title information to extract corresponding title features, and perform image feature recognition on video pictures to extract corresponding picture features.

Step S203: calculating the title matching degree between each video picture and the video title information according to the title characteristics and the picture characteristics;

in a specific implementation, after the title features and the picture features are obtained, the product platform may first detect whether vectorization is required for the title features and the picture features, if so, firstly vectorize the title features and the picture features, and then calculate the similarity between the vectorized features by using a similarity calculation method (e.g., a cosine similarity calculation method), so as to determine the title matching degree between the video picture and the video title information.

Step S204: and taking the title matching degree as a content analysis result.

In practical application, after the product platform calculates the title matching degree, the title matching degree can be used as a content analysis result. According to the method and the device, the title matching degree between the video pictures and the video title information is calculated, and then the title matching degree is used as a content analysis result, so that the matching degree of the finally screened initial pictures and the target video is high, namely the matching degree of the pictures and the titles is high.

Furthermore, in an actual situation, only the title matching degree is used as the analysis result of the picture content, which results in an excessively single dimension of the analysis result and cannot accurately reflect the actual content situation of the picture. Therefore, before the step S204, the present embodiment further includes:

step S2031: extracting video title entities from the video title information, and acquiring entity concept matching degrees between each video picture and the video title entities;

it should be noted that the video title entity may be a keyword set corresponding to the video title, for example, the video title entity of the match video of basketball team a and basketball team B may be { basketball team a, basketball team B, match }.

In a specific implementation, after the video title entity is extracted, the product platform can calculate the entity concept matching degree between each video picture and the video title entity.

It should be understood that the image object recognition technology is to recognize an image by comparing stored information (information stored in a memory) with current information (information entering a sense at that time), and in this embodiment, the product platform may recognize each video picture by the image object recognition technology to obtain a picture object included in the video picture.

In this embodiment, the calculation method of the entity concept matching degree may be: carrying out knowledge graph analysis on field objects contained in the video title entity to obtain a graph analysis result; identifying each video picture by adopting an image target identification technology to obtain a picture object contained in each video picture; and determining entity concept matching degree between each video picture and the video title entity according to the picture object and the map analysis result.

It should be understood that a Knowledge Graph (knowledgegraph), which may be referred to as a Knowledge domain visualization or a Knowledge domain mapping map, is a series of various different graphs that show the relationship of the progress of Knowledge development to the structure. In this embodiment, the knowledge graph analysis is performed on the title entity, so that the association relationship between the field objects included in the video title entity can be obtained, for example, the field object a basketball team is analyzed, so that graph analysis results such as basketball team logos, team member names, team member images, game information and the like related to the field object a basketball team can be obtained.

In practical application, the product platform can determine the entity concept matching degree between each video picture and the video title entity according to the map analysis results of the picture object and the video title entity.

For example, the picture a is a shooting picture of a player a during a match between a basketball team a and a basketball team B, and if the product platform performs knowledge map analysis on a video title entity { a basketball team, a match } to obtain a map analysis result { a basketball team, a basketball team a team logo, a basketball team a player name (a, B, c, d, e), a basketball team B team logo, a basketball team B player name (hex, heptyl, octyl, nonyl, decyl), basketball, a basketball frame }. The picture object identified by the image target identification technology for the picture a comprises the following steps: { basketball team A team logo, basketball team A, basketball and blue frame }, determining entity concept matching degree according to the identified picture object of the picture a and the map analysis result of the video title entity { basketball team A, basketball team B, match }.

In this embodiment, the entity probability matching degree may be calculated by first determining a repeated object in the graph analysis result and the picture object, and then calculating a ratio of the repeated object in the graph analysis result, for example, the entity concept matching degree of the picture a is: the number of the repeated objects (the team logo of the basketball team a, the team a nail, the basketball and the blue frame)/(the total number of the objects in the map analysis result) is 0.5 (4/8).

Step S2032: calculating the image-text matching degree corresponding to each video picture according to the title matching degree and the entity concept matching degree;

in specific implementation, the product platform can determine the image-text matching degree corresponding to the video image after calculating the title matching degree and the entity concept matching degree. In this embodiment, the image-text matching degree may be a direct summation of the title matching degree and the entity concept matching degree, or a weighted summation, and a weight of each summation object in the weighted summation process may be set according to an actual need, which is not specifically limited in this embodiment.

Correspondingly, in this embodiment, after the product platform calculates the image-text matching degree, the image-text matching degree can be used as a content analysis result.

Further, consider that the dynamic cover is to show the core content of the video as much as possible, and the most exciting scenes in the video are usually the scenes that show the core content most and attract the user to watch, such as the confrontation scenes of basketball stars in a basketball game, the fighting scenes between the main corners in a swordsman-like action video, and so on. Therefore, in order to ensure that the finally generated dynamic cover can achieve the above effect, in this embodiment, after obtaining the image-text matching degree corresponding to each video picture, the product platform further calculates the visual wonderness degree corresponding to each video picture through a preset image wonderness model; and correspondingly, the visual wonderful degree and the image-text matching degree are used as content analysis results.

It should be noted that the preset image saliency model may be a pre-trained image saliency model for scoring the visual saliency of the picture. Therefore, the dynamic cover generation method provided by this embodiment further includes:

step S1: acquiring an initial deep learning regression model to be trained and a preset amount of video data;

it should be noted that the product platform of this embodiment can train the deep learning regression model to obtain the final preset image fineness model. The specific value of the preset number is not limited in this embodiment.

Step S2: performing highlight frame marking on the video data to obtain a video data sample;

it should be noted that the highlight frame labeling can be realized through a human-computer interaction interface provided by the product platform, and the staff of the product platform can label each frame of picture according to the highlight degree of the picture displayed by the video, and the highlight degree can be divided into: the grade may be different, and the present embodiment does not specifically limit the grade.

Furthermore, the workload of manual labeling is huge and the time consumption is long, so that the workload of labeling is reduced. In this embodiment, the product platform may obtain a user viewing log corresponding to the video data; then, cleaning the video data according to the user watching log to obtain video data to be marked; and performing highlight frame marking on the video data to obtain a video data sample.

It should be noted that the user viewing log may be an operation behavior log of the user in the process of viewing a section of video, for example, the user performs screenshot on a certain frame of video picture, repeatedly plays a certain section of the video, clicks a video play progress bar to skip a certain video section, and the like. Therefore, in an actual situation, the product platform can firstly clean the video data according to the user watching log so as to eliminate some segments of the video which are not concerned, not loved or not in line with the highlight frame marking requirement by the user, and obtain the video data to be marked.

Further, in order to accurately clean the video data, the product platform in this embodiment may also read the video picture click information, the video playing behavior information, and/or the video watching duration information in the user watching log; and then cleaning the video data according to the video picture clicking information, the video playing behavior information and/or the video watching duration information to obtain the video data to be marked.

In a specific implementation, after the product platform obtains the video data to be marked in the above manner, the product platform can mark the video data to be marked with a highlight frame to obtain a video data sample.

Step S3: and training the initial deep learning regression model through supervised learning according to the video data sample to obtain a preset image wonderness model for grading the visual wonderness of the picture.

In a specific implementation, after the product platform obtains the video data sample, the initial deep learning regression model can be trained through supervised learning, so as to obtain a preset image wonderness model for scoring the visual wonderness of the picture.

It should be noted that, in this embodiment, the quantized scores corresponding to the highlights of different levels are different, for example, the score corresponding to the picture of the highlight level may be set to 70, and the score corresponding to the picture of the highlight level may be set to 90.

According to the embodiment, the video title information of the target video is obtained, the title characteristics corresponding to the video title information and the picture characteristics corresponding to each video picture are extracted, the title matching degree between each video picture and the video title information is calculated according to the title characteristics and the picture characteristics, and then the title matching degree is used as a content analysis result, so that the matching degree between a video sequence finally obtained according to the initial picture and the target video can be ensured to be higher, and the generated dynamic cover can accurately represent the main content of the target video.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method for generating a dynamic cover according to a third embodiment of the present invention.

Based on the foregoing embodiments, in this embodiment, after the step S10, the method further includes:

step S20': traversing each video picture in the video picture set, and performing key frame analysis on each traversed frame of video picture to obtain a key frame picture;

it should be understood that if the dynamic cover generation method provided in the first embodiment is used: the method has the advantages that all video pictures in the target video are subjected to picture content analysis, and picture visual quality information is obtained, so that the calculation amount of a product platform is very large, and the generation efficiency of the dynamic cover page is not high enough. It is considered that the core content of a piece of video can actually be embodied by some core pictures, or key frames (frames capable of describing the main content of a shot) in the piece of video. Therefore, in this embodiment, the product platform can use the key frame in the video picture set as the search basis for the starting picture, quickly and efficiently search the final starting picture, and then generate the dynamic cover.

It should be noted that, in this embodiment, the key frame analysis may determine whether the picture is a key frame according to the correlation between the picture and the video title or the video subject, or the visual highlight degree corresponding to the picture.

As another implementation, the key frame analysis in this embodiment may also be implemented in the following manner:

step S4: traversing each video picture in the video picture set, and acquiring image characteristic information corresponding to the traversed current video picture;

it should be noted that, in order to accurately acquire the key frame pictures in the video picture set, the present embodiment may perform key frame analysis on each frame picture in the set in a traversal manner.

Step S5: acquiring image characteristic information corresponding to a previous frame of video picture of the current video picture;

it should be understood that image feature information is data information that characterizes a picture feature or characteristic.

Step S6: calculating the frame similarity between the current video picture and the previous frame of video picture according to the image characteristic information corresponding to the current video picture and the image characteristic information corresponding to the previous frame of video picture;

it should be understood that the key frames should be representative, and not only should represent features of the video subject matter, but also should be different depending on the features. Therefore, the selection of the key frames generally adopts a conservative principle, i.e. "how much to do not little". Meanwhile, in the case where the representative feature is not specific, the repeated (or redundant) frame is generally removed. When selecting the key frame, generally, the dissimilarity between the pictures is preferably considered, that is, the similarity between the frames is taken as a measurement basis, and each time the key frame is searched, the minimum similarity between each key frame and the picture of the previous frame is ensured, so that the key frame has the maximum information content.

Step S7: and when the frame similarity is smaller than a preset similarity threshold, taking the current video picture as a key frame picture.

In specific implementation, when detecting that the frame similarity between the current video picture and the previous frame of video picture is preset by a similarity threshold (the value is adjustable), the product platform can use the current video picture as a key frame picture.

Step S201' acquires picture visual quality information corresponding to each key frame picture;

step S202', picture content analysis is carried out on the key frame pictures to obtain content analysis results corresponding to the key frame pictures;

the method for analyzing the picture content of the key frame picture and obtaining the picture visual quality information corresponding to the key frame picture in this embodiment may refer to the above embodiments, and will not be described herein again.

Accordingly, the step S30: the method comprises the following steps:

step S301': sequencing the key frame pictures according to the picture visual quality information corresponding to each key frame picture and the content analysis result corresponding to each key frame picture;

step S302': and selecting a starting picture from the obtained key frame pictures according to the sequencing result.

In a specific implementation, the product platform may score each key frame picture according to picture visual quality information (such as definition, brightness, and/or resolution) to obtain a third scoring result; scoring each key frame picture according to the content analysis result (such as the matching degree with the video title and/or the visual wonderful degree) to obtain a fourth scoring result; and then, calculating the total score of each key frame according to the third scoring result and the fourth scoring result, then performing descending arrangement on the total score, and finally selecting the key frame picture which is arranged first in sequence as an initial picture according to the sorting result.

In the embodiment, each video picture in the video picture set is traversed, and each traversed frame of video picture is subjected to key frame analysis to obtain a key frame picture; acquiring picture visual quality information corresponding to each key frame picture; performing picture content analysis on the key frame pictures to obtain content analysis results corresponding to the key frame pictures; sequencing the key frame pictures according to the picture visual quality information corresponding to each key frame picture and the content analysis result corresponding to each key frame picture; and selecting an initial picture from the obtained key frame pictures according to the sequencing result, quickly and accurately screening the most key picture in the video, and then determining a video sequence for generating the dynamic cover according to the picture, so that the finally generated video cover has better representativeness.

In addition, an embodiment of the present invention further provides a storage medium, where a dynamic cover generation program is stored, and the dynamic cover generation program, when executed by a processor, implements the steps of the dynamic cover generation method described above.

Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of the dynamic cover creation apparatus according to the present invention.

As shown in fig. 5, the dynamic cover generation apparatus provided in the embodiment of the present invention includes:

the video frame extracting module 501 is configured to perform frame extraction on a target video to obtain a video picture set;

the picture analysis module 502 is configured to obtain picture visual quality information corresponding to each video picture in the video picture set, and perform picture content analysis on each video picture to obtain a content analysis result;

a picture selecting module 503, configured to select an initial picture from the target video according to the picture visual quality information and the content analysis result;

a cover generation module 504, configured to read a video sequence with a preset length from the target video with the start picture as a starting point, and generate a dynamic cover according to the video sequence.

Based on the first embodiment of the dynamic cover generation apparatus of the present invention, a second embodiment of the dynamic cover generation apparatus of the present invention is provided.

In this embodiment, the video frame extracting module is further configured to perform frame extraction on the target video to obtain a to-be-processed picture set; carrying out film head and film tail detection on the target video, and determining a picture to be removed according to a detection result; and removing the picture to be removed from the picture set to be processed to obtain a video picture set.

Further, the picture analysis module is further configured to obtain picture sharpness and/or picture brightness corresponding to each video picture in the video picture set; and determining picture visual quality information corresponding to each video picture according to the picture definition and/or the picture brightness.

Further, the picture analysis module is further configured to obtain video title information of the target video; extracting title features corresponding to the video title information and picture features corresponding to all video pictures; calculating the title matching degree between each video picture and the video title information according to the title characteristics and the picture characteristics; and taking the title matching degree as a content analysis result.

Further, the picture analysis module is further configured to extract video title entities from the video title information, and obtain entity concept matching degrees between each video picture and the video title entities; calculating the image-text matching degree corresponding to each video picture according to the title matching degree and the entity concept matching degree; and taking the image-text matching degree as a content analysis result.

Further, the picture analysis module is further configured to perform a knowledge graph analysis on the field objects included in the video title entity to obtain a graph analysis result; identifying each video picture by adopting an image target identification technology to obtain a picture object contained in each video picture; and determining entity concept matching degree between each video picture and the video title entity according to the picture object and the map analysis result.

Further, the picture analysis module is further configured to calculate a visual wonderness degree corresponding to each video picture through a preset image wonderness model; and taking the visual wonderful degree and the image-text matching degree as a content analysis result.

Further, the dynamic cover generation apparatus further includes: the model training module is used for acquiring an initial deep learning regression model to be trained and a preset amount of video data; performing highlight frame marking on the video data to obtain a video data sample; and training the initial deep learning regression model through supervised learning according to the video data sample to obtain a preset image wonderness model for grading the visual wonderness of the picture.

Further, the model training module is further configured to obtain a user viewing log corresponding to the video data; cleaning the video data according to the user watching log to obtain video data to be marked; and performing highlight frame marking on the video data to obtain a video data sample.

Furthermore, the model training module is further configured to read video picture click information, video playing behavior information and/or video watching duration information in the user watching log; and cleaning the video data according to the video picture clicking information, the video playing behavior information and/or the video watching duration information to obtain the video data to be marked.

Further, the picture analysis module is further configured to traverse each video picture in the video picture set, and perform key frame analysis on each frame of the traversed video pictures to obtain a key frame picture; acquiring picture visual quality information corresponding to each key frame picture; performing picture content analysis on the key frame pictures to obtain content analysis results corresponding to the key frame pictures; sequencing the key frame pictures according to the picture visual quality information corresponding to each key frame picture and the content analysis result corresponding to each key frame picture; and selecting a starting picture from the obtained key frame pictures according to the sequencing result.

Further, the picture analysis module is further configured to traverse each video picture in the video picture set, and acquire image feature information corresponding to the traversed current video picture; acquiring image characteristic information corresponding to a previous frame of video picture of the current video picture; calculating the frame similarity between the current video picture and the previous frame of video picture according to the image characteristic information corresponding to the current video picture and the image characteristic information corresponding to the previous frame of video picture; and when the frame similarity is smaller than a preset similarity threshold, taking the current video picture as a key frame picture.

Other embodiments or specific implementation manners of the dynamic cover generation apparatus of the present invention may refer to the above method embodiments, and are not described herein again.

The invention discloses a1 dynamic cover generation method, which comprises the following steps:

performing frame extraction on a target video to obtain a video picture set;

A2, the method for generating dynamic cover page as described in a1, wherein the step of extracting frames from the target video to obtain the video picture set includes:

A3, the method for generating dynamic covers as in a1, wherein the step of obtaining the picture visual quality information corresponding to each video picture in the video picture set includes:

A4, the method for generating dynamic covers as in any one of a1 to A3, wherein the step of analyzing the picture content of each video picture to obtain the content analysis result comprises:

acquiring video title information of the target video;

and taking the title matching degree as a content analysis result.

A5, the method for generating dynamic cover page as in a4, wherein before the step of using the title matching degree as the content analysis result, the method further comprises:

and taking the image-text matching degree as a content analysis result.

A6, the method for generating dynamic cover page as in a5, wherein before the step of obtaining the entity concept matching degree between each video picture and the video title entity, the method further comprises:

A7, the method for generating dynamic cover page as described in a5, wherein before the step of using the matching degree of text as the result of content analysis, the method further comprises:

calculating the visual wonderness degree corresponding to each video picture through a preset image wonderness model;

correspondingly, the step of using the image-text matching degree as a content analysis result comprises the following steps:

and taking the visual wonderful degree and the image-text matching degree as a content analysis result.

A8, the dynamic cover generation method of a7, the dynamic cover generation method further comprising:

acquiring an initial deep learning regression model to be trained and a preset amount of video data;

performing highlight frame marking on the video data to obtain a video data sample;

and training the initial deep learning regression model through supervised learning according to the video data sample to obtain a preset image wonderness model for grading the visual wonderness of the picture.

A9, the method for generating dynamic cover page as described in A8, wherein the step of performing highlight labeling on the video data to obtain video data samples comprises:

acquiring a user watching log corresponding to the video data;

cleaning the video data according to the user watching log to obtain video data to be marked;

and performing highlight frame marking on the video data to obtain a video data sample.

A10, the method for generating dynamic cover pages as in a9, wherein the step of cleaning the video data according to the user viewing log to obtain the video data to be labeled comprises:

reading video picture clicking information, video playing behavior information and/or video watching duration information in the user watching log;

and cleaning the video data according to the video picture clicking information, the video playing behavior information and/or the video watching duration information to obtain the video data to be marked.

A11, the method for generating dynamic covers as in a1, wherein the step of obtaining the picture visual quality information corresponding to each video picture in the video picture set, and performing picture content analysis on each video picture to obtain a content analysis result includes:

A12, the method for generating dynamic cover pages as in a11, wherein the step of traversing each video picture in the video picture set and performing key frame analysis on each traversed frame of video picture to obtain a key frame picture comprises:

traversing each video picture in the video picture set, and acquiring image characteristic information corresponding to the traversed current video picture;

acquiring image characteristic information corresponding to a previous frame of video picture of the current video picture;

calculating the frame similarity between the current video picture and the previous frame of video picture according to the image characteristic information corresponding to the current video picture and the image characteristic information corresponding to the previous frame of video picture;

and when the frame similarity is smaller than a preset similarity threshold, taking the current video picture as a key frame picture.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., a rom/ram, a magnetic disk, an optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A dynamic cover generation method is characterized by comprising the following steps:

performing frame extraction on a target video to obtain a video picture set;

2. The method of dynamic cover generation as claimed in claim 1, wherein the step of decimating the target video to obtain the video picture set comprises:

3. The method of claim 1, wherein the step of obtaining visual quality information of the images corresponding to the video images in the video image set comprises:

4. The method for generating dynamic covers according to any one of claims 1 to 3, wherein the step of analyzing the content of each video picture to obtain the content analysis result comprises:

acquiring video title information of the target video;

and taking the title matching degree as a content analysis result.

5. The dynamic cover generation method of claim 4, wherein prior to the step of using the title matching degree as a content analysis result, the method further comprises:

and taking the image-text matching degree as a content analysis result.

6. The method of dynamic cover creation as claimed in claim 5 wherein prior to the step of obtaining the entity concept match between each video picture and the video title entity, the method further comprises:

7. The method of claim 1, wherein the step of obtaining the picture visual quality information corresponding to each video picture in the video picture set and performing the picture content analysis on each video picture to obtain the content analysis result comprises:

8. A dynamic cover generation apparatus, comprising:

9. A dynamic cover generation apparatus, the apparatus comprising: a memory, a processor, and a dynamic cover generation program stored on the memory and executable on the processor, the dynamic cover generation program configured to implement the steps of the dynamic cover generation method of any of claims 1 to 7.

10. A storage medium having stored thereon a dynamic cover generation program, which when executed by a processor implements the steps of the dynamic cover generation method of any of claims 1 to 7.