CN111143613B - Method, system, electronic device and storage medium for selecting video cover - Google Patents

Method, system, electronic device and storage medium for selecting video cover Download PDF

Info

Publication number
CN111143613B
CN111143613B CN201911395856.2A CN201911395856A CN111143613B CN 111143613 B CN111143613 B CN 111143613B CN 201911395856 A CN201911395856 A CN 201911395856A CN 111143613 B CN111143613 B CN 111143613B
Authority
CN
China
Prior art keywords
image
video
target
category
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911395856.2A
Other languages
Chinese (zh)
Other versions
CN111143613A (en
Inventor
成丹妮
罗超
吉聪睿
胡泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN201911395856.2A priority Critical patent/CN111143613B/en
Publication of CN111143613A publication Critical patent/CN111143613A/en
Application granted granted Critical
Publication of CN111143613B publication Critical patent/CN111143613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Abstract

The invention discloses a method, a system, an electronic device and a storage medium for selecting a video cover, wherein the method comprises the following steps: extracting multi-frame target images from the target video; identifying an image category corresponding to the target image of each frame; determining one of the identified image categories as a video category of the target video; and selecting one frame from the target images corresponding to the video categories as the cover of the target video. The invention takes the attached video category and the representative image as the video cover based on the understanding of the video content, which not only can accurately display the video information, but also is convenient for users to browse and screen the video quickly, thereby improving the viscosity of the users and further improving the preset conversion rate in the OTA.

Description

Method, system, electronic device and storage medium for selecting video cover
Technical Field
The present invention relates to the field of computers, and in particular, to a method, a system, an electronic device, and a storage medium for selecting a video cover.
Background
With the continuous enrichment of internet information and the continuous upgrading of internet technology, traditional text information and picture information can not meet the requirement of users for browsing information, which promotes the rapid development of video information technology. For example, for OTA (online travel company), accurate and aesthetic presentation of OTA video information can greatly increase user viscosity, which can increase the predetermined conversion rate in OTA. The video cover is used as the first eye information of the video content, so that the clicking will of the user is greatly influenced, particularly under the condition that the video display area is limited, the first frame or the last frame of image of the video is usually used as the video cover by the current OTA platform, so that the wonderful content of the video is hidden, the interests of the user are difficult to attract, and the experience of the user is poor.
Disclosure of Invention
The invention aims to overcome the defect that the first frame or the last frame of the video is taken as a video cover in the prior art, and provides a method, a system, electronic equipment and a storage medium for selecting the video cover.
The invention solves the technical problems by the following technical scheme:
a method of selecting a video cover, the method comprising:
extracting multi-frame target images from the target video;
identifying an image category corresponding to the target image of each frame;
determining one of the identified image categories as a video category of the target video;
and selecting one frame from the target images corresponding to the video categories as the cover of the target video.
Preferably, after the step of extracting the multi-frame target image from the target video, the method further comprises:
filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image is smaller than a first threshold, the definition of the target image is smaller than a second threshold, and the color single degree of the target image is larger than a third threshold.
Preferably, the step of identifying the image category corresponding to the target image for each frame includes:
identifying the image category corresponding to the target image of each frame according to the image identification model;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the step of selecting a frame from the target images corresponding to the video categories as the cover of the target video includes:
determining a target image corresponding to the video category as a candidate image;
evaluating the image scores corresponding to the candidate images of each frame according to an image score model;
determining the candidate image with the highest image score as the cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
Preferably, the step of determining one of the identified target image categories as a video category of the target video includes:
determining the image category with the largest number of corresponding target images as the video category;
or,
the step of determining one of the identified image categories as a video category of the target video includes:
acquiring comment information corresponding to the target video;
determining the image category matched with the evaluation information in the identified image categories as a candidate category;
and determining the candidate category with the largest number of corresponding target images as the video category.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the methods of selecting a video cover described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs any of the steps of the method of selecting a video cover described above.
A system for selecting a video cover, the system comprising:
the extraction module is used for extracting multi-frame target images from the target video;
the identification module is used for identifying the image category corresponding to each frame of the target image;
a determining module, configured to determine one of the identified image categories as a video category of the target video;
and the selection module is used for selecting one frame from the target images corresponding to the video categories as the cover of the target video.
Preferably, the system further comprises:
the filtering module is used for filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image is smaller than a first threshold, the definition of the target image is smaller than a second threshold, and the color single degree of the target image is larger than a third threshold.
Preferably, the identification module is specifically configured to identify, according to an image identification model, an image category corresponding to the target image of each frame;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the selection module comprises:
the first determining unit is used for determining that the target image corresponding to the video category is a candidate image;
the image scoring unit is used for evaluating the image scores corresponding to the candidate images of each frame according to the image scoring model;
the second determining unit is used for determining that the candidate image with the highest image score is the front cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
Preferably, the determining module is specifically configured to determine, as the video category, an image category with the largest number of corresponding target images;
or,
the determining module includes:
the acquisition unit is used for acquiring comment information corresponding to the target video;
a third determination unit configured to determine, as a candidate category, an image category matching the comment information among the identified image categories;
and a fourth determining unit, configured to determine, as the video category, a candidate category with the largest number of corresponding target images.
The invention has the positive progress effects that: the invention takes the attached video category and the representative image as the video cover based on the understanding of the video content, which not only can accurately display the video information, but also is convenient for users to browse and screen the video quickly, thereby improving the viscosity of the users and further improving the preset conversion rate in the OTA.
Drawings
Fig. 1 is a flowchart of a method of selecting a video cover according to embodiment 1 of the present invention.
Fig. 2 is a schematic hardware structure of an electronic device according to embodiment 2 of the present invention.
Fig. 3 is a block diagram of a system for selecting a video cover according to embodiment 4 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
The present embodiment provides a method for selecting a video cover, and fig. 1 shows a flowchart of the present embodiment. Referring to fig. 1, the method of the present embodiment includes:
s101, extracting multi-frame target images from target videos.
In this embodiment, considering that the amount of information included in the target video is large and the feature dimension is large, the calculation dimension of the target video may be reduced by the extraction method, specifically, for a video with n transmission frames per second and a duration of t, the number of extracted target images may be determined to be f (f < n×t), and then the target videos may be extracted at uniform intervals to obtain f frame target images.
S102, filtering the extracted multi-frame target image according to the filtering condition.
In this embodiment, in order to further reduce the calculation dimension of the target video, some target images with poor objective indexes may be filtered according to the filtering condition, where the objective indexes may include, but are not limited to, brightness, definition, color single degree, and the like.
For example, in this embodiment, the filtering condition may include that the brightness of the target image is smaller than a first threshold, where the first threshold may be set in a customized manner according to the practical application, and the brightness may be calculated according to the following formula:
Luminance(I rgb )=0.2126I r +0.7152I g +0.0722I b
in the above formula, I rgb Representing a color chart, I r 、I g And I b Three channels red, green and blue of the color image are represented, respectively.
For example, in this embodiment, the filtering condition may include that the sharpness of the target image is smaller than a second threshold, where the second threshold may be set in a customized manner according to an actual application, and the sharpness may be calculated according to the following formula:
in the above formula, I gray Gray scale map, delta, representing gray scale color image x And delta y Representing the gradient map in x and y directions on the image, respectively.
For another example, in this embodiment, the filtering condition may include that the color single degree of the target image is greater than a third threshold, where the third threshold may be set in a customized manner according to the practical application, and the color single degree may be calculated according to the following formula:
in the above equation, hist (), which is used to characterize the color singleness, is ordered by gray value duty cycle, and the gray value of 5% of the duty cycle is found to be the proportion of all pixels.
S103, identifying the image category corresponding to each frame of target image.
In this embodiment, specifically, the image type corresponding to each frame of target image may be identified according to an image identification model, where the input of the image identification model is a frame of target image, and the input is the image type corresponding to the frame of target image.
Specifically, in the image recognition model of the present embodiment, 159 network layers may be included, and 7 dense blocks are adopted, where the size of the feature map in each dense block is unchanged, and different convolution layers in the dense block are connected in a jump manner, so as to ensure the transfer of feature information. The activation function of the last layer in the network layer can be a softmax function, the number of neurons is N, and the output value p of the neuron i (i is a positive integer less than or equal to N) i Between 0-1, the network weights can be updated during training based on the cross entropy loss function back propagation. For each image class i, τ can be set i As a threshold value for each image class, p i ≥τ i The frame target image contains a label for this image category.
In this embodiment, a set of image categories may be obtained after the obtained multi-frame target images are respectively input into the image recognition model, for example, the output of the image recognition model may include a transition frame, a foreground, a swimming pool, an appearance, etc., where the transition frame may be used to represent that the frame target image does not have an actual meaning for representing the image category, the foreground, the swimming pool, the appearance, etc. may be used to represent the image category, and the tag sequence of the set of image categories obtained by processing the target video may include: appearance, transition frame, foreground, background, and the like front desk, hall, transition frame transition frame, swimming pool swimming pool, transition frame.
S104, determining one of the identified image categories as a video category of the target video.
Specifically, in this embodiment, the image category with the largest number of corresponding target images may be determined as the video category, for example, in the tag sequence of the image categories shown above, the image category with the largest number of corresponding target images is a swimming pool, and then the swimming pool may be determined as the video category corresponding to the target video.
In this embodiment, the video category may also be determined in combination with comment information corresponding to the target video, where the comment information may include, but is not limited to, comments, descriptions, and the like, and specifically, step S104 may include a step of acquiring comment information corresponding to the target video, a step of determining, as a candidate category, an image category matching the comment information from the identified image categories, and a step of determining, as a video category, a candidate category having the largest number of corresponding target images. For example, when the obtained comment information is "the hotel is very praise, the foreground is in service enthusiasm, the swimming pool is large, and the room is comfortable", the matching image categories can be obtained as the foreground and the swimming pool after matching the keywords by combining the tag sequences of the image categories shown above, and the number of target images corresponding to the swimming pool is greater than the number of target images corresponding to the foreground, so that the swimming pool can be determined as the video category corresponding to the target video.
S105, selecting one frame from target images corresponding to the video categories as a cover of the target video.
In this embodiment, step S105 may include a step of determining that a target image corresponding to a video category is a candidate image, a step of evaluating an image score corresponding to each frame of candidate image according to an image score model, and a step of determining that a candidate image with the highest image score is a cover of a target video, where an input of the image score model is a candidate image and an output is an image score corresponding to the candidate image.
Specifically, in the present embodiment, the quality of each frame of image may be evaluated and scored manually first to construct a training set, for example, 1000 frames of images out of 100 videos may be randomly extracted, by3 American staff scored the image from the angles of picture color, composition and the like, wherein the scoring range comprises: 1. 2, 3, 4, 5, and rounding the average value of the 3 person scores to be the image score of the frame image. Then, based on a training set training image scoring model, the image scoring model in the embodiment can comprise 43 network layers, and Res blocks are adopted, wherein the size of a feature map in each Res block is unchanged, and different convolution layers in the Res blocks are connected in a jumping manner, so that the transmission of feature information is ensured. The activation function of the last layer in the network layer can be softmax function, the number of neurons is 5, and the output value p of each neuron i Between 0-1, the probabilities for five image scoring class categories are represented. The network weights can be updated during training based on the cross entropy loss function back propagation. For each rank category i of image scoring, the output of the image scoring model is the probability p i Then can be usedAn image score representing the target image. In this embodiment, after determining the image score corresponding to each frame of candidate image, if the number of candidate images with the highest image score is a plurality of candidate images, the first frame of the plurality of candidate images may be selected as the cover of the target video.
According to the embodiment, based on understanding of video content, the attached video category and the representative image are used as the video cover, so that video information can be accurately displayed, a user can conveniently and quickly browse and screen videos, the viscosity of the user can be improved, the preset conversion rate in OTA can be improved, the image with poor objective index performance is filtered, the image is scored based on picture aesthetics, the content and quality of the video are comprehensively considered, the finally determined cover is the most representative and high-quality cover, better visual experience is achieved, and the click rate of the user on the video can be improved.
Example 2
The present embodiment provides an electronic device, which may be expressed in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the method for selecting a video cover provided in embodiment 1 when executing the computer program.
Fig. 2 shows a schematic hardware structure of the present embodiment, and as shown in fig. 2, the electronic device 9 specifically includes:
at least one processor 91, at least one memory 92, and a bus 93 for connecting the different system components (including the processor 91 and the memory 92), wherein:
the bus 93 includes a data bus, an address bus, and a control bus.
The memory 92 includes volatile memory such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 91 executes various functional applications and data processing such as the method of selecting a video cover provided in embodiment 1 of the present invention by running a computer program stored in the memory 92.
The electronic device 9 may further communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 96. The network adapter 96 communicates with other modules of the electronic device 9 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module according to embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 3
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of selecting a video cover provided by embodiment 1.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of the method of implementing the selection of a video cover in embodiment 1, when said program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
Example 4
The present embodiment provides a system for selecting a video cover, and fig. 3 shows a schematic block diagram of the present embodiment. Referring to fig. 3, the method of the present embodiment includes:
and the extraction module 1 is used for extracting multi-frame target images from the target video.
In this embodiment, considering that the amount of information included in the target video is large and the feature dimension is large, the calculation dimension of the target video may be reduced by the extraction method, specifically, for a video with n transmission frames per second and a duration of t, the number of extracted target images may be determined to be f (f < n×t), and then the target videos may be extracted at uniform intervals to obtain f frame target images.
And the filtering module 2 is used for filtering the extracted multi-frame target image according to the filtering condition.
In this embodiment, in order to further reduce the calculation dimension of the target video, some target images with poor objective indexes may be filtered according to the filtering condition, where the objective indexes may include, but are not limited to, brightness, definition, color single degree, and the like.
For example, in this embodiment, the filtering condition may include that the brightness of the target image is smaller than a first threshold, where the first threshold may be set in a customized manner according to the practical application, and the brightness may be calculated according to the following formula:
Luminance(I rgb )=0.2126I r +0.7152I g +0.0722I b
in the above formula, I rgb Representing a color chart, I r 、I g And I b Three channels red, green and blue of the color image are represented, respectively.
For example, in this embodiment, the filtering condition may include that the sharpness of the target image is smaller than a second threshold, where the second threshold may be set in a customized manner according to an actual application, and the sharpness may be calculated according to the following formula:
in the above formula, I gray Gray scale map, delta, representing gray scale color image x And delta y Representing the gradient map in x and y directions on the image, respectively.
For another example, in this embodiment, the filtering condition may include that the color single degree of the target image is greater than a third threshold, where the third threshold may be set in a customized manner according to the practical application, and the color single degree may be calculated according to the following formula:
in the above equation, hist (), which is used to characterize the color singleness, is ordered by gray value duty cycle, and the gray value of 5% of the duty cycle is found to be the proportion of all pixels.
And the identification module 3 is used for identifying the image category corresponding to each frame of target image.
In this embodiment, the identifying module 3 may specifically identify, according to an image identifying model, an image category corresponding to each frame of target image, where an input of the image identifying model is a frame of target image, and output the input of the image identifying model is the image category corresponding to the frame of target image.
Specifically, in the image recognition model of the present embodiment, 159 network layers may be included, and 7 dense blocks are adopted, where the size of the feature map in each dense block is unchanged, and different convolution layers in the dense block are connected in a jump manner, so as to ensure the transfer of feature information. The activation function of the last layer in the network layer can be a softmax function, the number of neurons is N, and the output value p of the neuron i (i is a positive integer less than or equal to N) i Between 0-1, the network weights can be updated during training based on the cross entropy loss function back propagation. For each image class i, τ can be set i As a threshold value for each image class, p i ≥τ i The frame target image contains a label for this image category.
In this embodiment, a set of image categories may be obtained after the obtained multi-frame target images are respectively input into the image recognition model, for example, the output of the image recognition model may include a transition frame, a foreground, a swimming pool, an appearance, etc., where the transition frame may be used to represent that the frame target image does not have an actual meaning for representing the image category, the foreground, the swimming pool, the appearance, etc. may be used to represent the image category, and the tag sequence of the set of image categories obtained by processing the target video may include: appearance, transition frame, foreground, background, and the like front desk, hall, transition frame transition frame, swimming pool swimming pool, transition frame.
A determining module 4, configured to determine one of the identified image categories as a video category of the target video.
Specifically, in this embodiment, the image category with the largest number of corresponding target images may be determined as the video category, for example, in the tag sequence of the image categories shown above, the image category with the largest number of corresponding target images is a swimming pool, and then the swimming pool may be determined as the video category corresponding to the target video.
In this embodiment, the video category may also be determined in combination with comment information corresponding to the target video, where the comment information may include, but is not limited to, comments, descriptions, and the like, and specifically, the determining module 4 may include an obtaining unit for obtaining comment information corresponding to the target video, a third determining unit for determining, as a candidate category, an image category matching the comment information among the identified image categories, and a fourth determining unit for determining, as a video category, a candidate category having the largest number of corresponding target images. For example, when the obtained comment information is "the hotel is very praise, the foreground is in service enthusiasm, the swimming pool is large, and the room is comfortable", the matching image categories can be obtained as the foreground and the swimming pool after matching the keywords by combining the tag sequences of the image categories shown above, and the number of target images corresponding to the swimming pool is greater than the number of target images corresponding to the foreground, so that the swimming pool can be determined as the video category corresponding to the target video.
And the selection module 5 is used for selecting one frame from the target images corresponding to the video categories as the cover of the target video.
In this embodiment, the selection module 5 may include a first determining unit for determining that a target image corresponding to a video category is a candidate image, an image scoring unit for evaluating an image score corresponding to each frame of candidate image according to an image scoring model, and a second determining unit for determining that a candidate image with a highest image score is a cover of the target video, where an input of the image scoring model is the candidate image, and an output is an image score corresponding to the candidate image.
Specifically, in this embodiment, the quality of each frame of image may be evaluated and scored manually first to construct a training set, for example, 1000 frames of images in 100 videos may be randomly extracted, and the images may be scored from the angles of screen color, composition, and the like by 3 building staff, where the scoring range includes: 1. 2, 3, 4, 5, and rounding the average value of the 3 person scores to be the image score of the frame image. Then, based on a training set training image scoring model, the image scoring model in the embodiment can comprise 43 network layers, and Res blocks are adopted, wherein the size of a feature map in each Res block is unchanged, and different convolution layers in the Res blocks are connected in a jumping manner, so that the transmission of feature information is ensured. The activation function of the last layer in the network layer can be softmax function, the number of neurons is 5, and the output value p of each neuron i Between 0-1, the probabilities for five image scoring class categories are represented. The network weights can be updated during training based on the cross entropy loss function back propagation. For each rank category i of image scoring, the output of the image scoring model is the probability p i Then can be usedAn image score representing the target image. In this embodiment, after determining the image score corresponding to each frame of candidate image, if the number of candidate images with the highest image score is a plurality of candidate images, the first frame of the plurality of candidate images may be selected as the cover of the target video.
According to the embodiment, based on understanding of video content, the attached video category and the representative image are used as the video cover, so that video information can be accurately displayed, a user can conveniently and quickly browse and screen videos, the viscosity of the user can be improved, the preset conversion rate in OTA can be improved, the image with poor objective index performance is filtered, the image is scored based on picture aesthetics, the content and quality of the video are comprehensively considered, the finally determined cover is the most representative and high-quality cover, better visual experience is achieved, and the click rate of the user on the video can be improved.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims (8)

1. A method of selecting a video cover, the method comprising:
extracting multi-frame target images from the target video;
filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image being less than a first threshold, the sharpness of the target image being less than a second threshold, and the color single degree of the target image being greater than a third threshold;
identifying an image category corresponding to the target image of each frame;
determining one of the identified image categories as a video category of the target video;
the step of determining one of the identified image categories as a video category of the target video includes:
acquiring comment information corresponding to the target video;
determining the image category matched with the evaluation information in the identified image categories as a candidate category;
determining the candidate category with the largest number of corresponding target images as the video category;
and selecting one frame from the target images corresponding to the video categories as the cover of the target video.
2. The method of selecting a video cover as claimed in claim 1, wherein the step of identifying an image category corresponding to the target image for each frame includes:
identifying the image category corresponding to the target image of each frame according to the image identification model;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the step of selecting a frame from the target images corresponding to the video categories as the cover of the target video includes:
determining a target image corresponding to the video category as a candidate image;
evaluating the image scores corresponding to the candidate images of each frame according to an image score model;
determining the candidate image with the highest image score as the cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
3. The method of selecting a video cover as recited in claim 1, wherein the step of determining one of the identified target image categories as the video category of the target video comprises:
and determining the image category with the largest number of corresponding target images as the video category.
4. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of selecting a video cover as claimed in any one of claims 1-3 when the computer program is executed.
5. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of selecting a video cover as claimed in any one of claims 1 to 3.
6. A system for selecting a video cover, the system comprising:
the extraction module is used for extracting multi-frame target images from the target video;
the filtering module is used for filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image being less than a first threshold, the sharpness of the target image being less than a second threshold, and the color single degree of the target image being greater than a third threshold;
the identification module is used for identifying the image category corresponding to each frame of the target image;
a determining module, configured to determine one of the identified image categories as a video category of the target video;
the determining module includes:
the acquisition unit is used for acquiring comment information corresponding to the target video;
a third determination unit configured to determine, as a candidate category, an image category matching the comment information among the identified image categories;
a fourth determining unit, configured to determine, as the video category, a candidate category having the largest number of corresponding target images;
and the selection module is used for selecting one frame from the target images corresponding to the video categories as the cover of the target video.
7. The system for selecting a video cover as recited in claim 6, wherein the identification module is specifically configured to identify an image category corresponding to the target image for each frame based on an image recognition model;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the selection module comprises:
the first determining unit is used for determining that the target image corresponding to the video category is a candidate image;
the image scoring unit is used for evaluating the image scores corresponding to the candidate images of each frame according to the image scoring model;
the second determining unit is used for determining that the candidate image with the highest image score is the front cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
8. The system for selecting a video cover as recited in claim 6, wherein the determination module is specifically configured to determine the image category that corresponds to the greatest number of target images as the video category.
CN201911395856.2A 2019-12-30 2019-12-30 Method, system, electronic device and storage medium for selecting video cover Active CN111143613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911395856.2A CN111143613B (en) 2019-12-30 2019-12-30 Method, system, electronic device and storage medium for selecting video cover

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911395856.2A CN111143613B (en) 2019-12-30 2019-12-30 Method, system, electronic device and storage medium for selecting video cover

Publications (2)

Publication Number Publication Date
CN111143613A CN111143613A (en) 2020-05-12
CN111143613B true CN111143613B (en) 2024-02-06

Family

ID=70521857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911395856.2A Active CN111143613B (en) 2019-12-30 2019-12-30 Method, system, electronic device and storage medium for selecting video cover

Country Status (1)

Country Link
CN (1) CN111143613B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831615B (en) * 2020-05-28 2024-03-12 北京达佳互联信息技术有限公司 Method, device and system for generating video file
CN111601160A (en) * 2020-05-29 2020-08-28 北京百度网讯科技有限公司 Method and device for editing video
CN111918130A (en) * 2020-08-11 2020-11-10 北京达佳互联信息技术有限公司 Video cover determining method and device, electronic equipment and storage medium
WO2022087826A1 (en) * 2020-10-27 2022-05-05 深圳市大疆创新科技有限公司 Video processing method and apparatus, mobile device, and readable storage medium
CN112363660B (en) * 2020-11-09 2023-03-24 北京达佳互联信息技术有限公司 Method and device for determining cover image, electronic equipment and storage medium
CN113794890B (en) * 2021-07-30 2023-10-24 北京达佳互联信息技术有限公司 Data processing method, device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105611413A (en) * 2015-12-24 2016-05-25 小米科技有限责任公司 Method and device for adding video clip class markers
CN107918656A (en) * 2017-11-17 2018-04-17 北京奇虎科技有限公司 Video front cover extracting method and device based on video title
CN108650524A (en) * 2018-05-23 2018-10-12 腾讯科技(深圳)有限公司 Video cover generation method, device, computer equipment and storage medium
CN109146921A (en) * 2018-07-02 2019-01-04 华中科技大学 A kind of pedestrian target tracking based on deep learning
CN109271542A (en) * 2018-09-28 2019-01-25 百度在线网络技术(北京)有限公司 Cover determines method, apparatus, equipment and readable storage medium storing program for executing
CN110263743A (en) * 2019-06-26 2019-09-20 北京字节跳动网络技术有限公司 The method and apparatus of image for identification
CN110390025A (en) * 2019-07-24 2019-10-29 百度在线网络技术(北京)有限公司 Cover figure determines method, apparatus, equipment and computer readable storage medium
CN110399848A (en) * 2019-07-30 2019-11-01 北京字节跳动网络技术有限公司 Video cover generation method, device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990223B2 (en) * 2012-06-29 2015-03-24 Rovi Guides, Inc. Systems and methods for matching media content data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105611413A (en) * 2015-12-24 2016-05-25 小米科技有限责任公司 Method and device for adding video clip class markers
CN107918656A (en) * 2017-11-17 2018-04-17 北京奇虎科技有限公司 Video front cover extracting method and device based on video title
CN108650524A (en) * 2018-05-23 2018-10-12 腾讯科技(深圳)有限公司 Video cover generation method, device, computer equipment and storage medium
CN109146921A (en) * 2018-07-02 2019-01-04 华中科技大学 A kind of pedestrian target tracking based on deep learning
CN109271542A (en) * 2018-09-28 2019-01-25 百度在线网络技术(北京)有限公司 Cover determines method, apparatus, equipment and readable storage medium storing program for executing
CN110263743A (en) * 2019-06-26 2019-09-20 北京字节跳动网络技术有限公司 The method and apparatus of image for identification
CN110390025A (en) * 2019-07-24 2019-10-29 百度在线网络技术(北京)有限公司 Cover figure determines method, apparatus, equipment and computer readable storage medium
CN110399848A (en) * 2019-07-30 2019-11-01 北京字节跳动网络技术有限公司 Video cover generation method, device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合背景下的短视频发展状况及趋势;黄楚新;;人民论坛・学术前沿(第23期);第42-49页 *

Also Published As

Publication number Publication date
CN111143613A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN111143613B (en) Method, system, electronic device and storage medium for selecting video cover
CN111696112B (en) Automatic image cutting method and system, electronic equipment and storage medium
US10671895B2 (en) Automated selection of subjectively best image frames from burst captured image sequences
CN113395578B (en) Method, device, equipment and storage medium for extracting video theme text
CN112119388A (en) Training image embedding model and text embedding model
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN112074828A (en) Training image embedding model and text embedding model
CN107908641A (en) A kind of method and system for obtaining picture labeled data
WO2019118236A1 (en) Deep learning on image frames to generate a summary
CN113761253A (en) Video tag determination method, device, equipment and storage medium
CN111612010A (en) Image processing method, device, equipment and computer readable storage medium
CN111259245B (en) Work pushing method, device and storage medium
WO2022156534A1 (en) Video quality assessment method and device
CN113301382A (en) Video processing method, device, medium, and program product
CN114723652A (en) Cell density determination method, cell density determination device, electronic apparatus, and storage medium
CN110704650A (en) OTA picture tag identification method, electronic device and medium
CN113627342B (en) Method, system, equipment and storage medium for video depth feature extraction optimization
CN109960745A (en) Visual classification processing method and processing device, storage medium and electronic equipment
CN115964560A (en) Information recommendation method and equipment based on multi-mode pre-training model
CN108229263B (en) Target object identification method and device and robot
CN117795551A (en) Method and system for automatically capturing and processing user images
CN115630188A (en) Video recommendation method and device and electronic equipment
CN114372580A (en) Model training method, storage medium, electronic device, and computer program product
CN103729532A (en) Information supplying method and device based on images of fruits and vegetables
US20210157826A1 (en) Data model proposals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant