CN111143613B - Method, system, electronic device and storage medium for selecting video cover - Google Patents
Method, system, electronic device and storage medium for selecting video cover Download PDFInfo
- Publication number
- CN111143613B CN111143613B CN201911395856.2A CN201911395856A CN111143613B CN 111143613 B CN111143613 B CN 111143613B CN 201911395856 A CN201911395856 A CN 201911395856A CN 111143613 B CN111143613 B CN 111143613B
- Authority
- CN
- China
- Prior art keywords
- image
- video
- target
- category
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000001914 filtration Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 abstract description 5
- 230000009182 swimming Effects 0.000 description 20
- 230000006870 function Effects 0.000 description 13
- 230000007704 transition Effects 0.000 description 12
- 238000012549 training Methods 0.000 description 10
- 210000002569 neuron Anatomy 0.000 description 8
- 230000004913 activation Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
Abstract
The invention discloses a method, a system, an electronic device and a storage medium for selecting a video cover, wherein the method comprises the following steps: extracting multi-frame target images from the target video; identifying an image category corresponding to the target image of each frame; determining one of the identified image categories as a video category of the target video; and selecting one frame from the target images corresponding to the video categories as the cover of the target video. The invention takes the attached video category and the representative image as the video cover based on the understanding of the video content, which not only can accurately display the video information, but also is convenient for users to browse and screen the video quickly, thereby improving the viscosity of the users and further improving the preset conversion rate in the OTA.
Description
Technical Field
The present invention relates to the field of computers, and in particular, to a method, a system, an electronic device, and a storage medium for selecting a video cover.
Background
With the continuous enrichment of internet information and the continuous upgrading of internet technology, traditional text information and picture information can not meet the requirement of users for browsing information, which promotes the rapid development of video information technology. For example, for OTA (online travel company), accurate and aesthetic presentation of OTA video information can greatly increase user viscosity, which can increase the predetermined conversion rate in OTA. The video cover is used as the first eye information of the video content, so that the clicking will of the user is greatly influenced, particularly under the condition that the video display area is limited, the first frame or the last frame of image of the video is usually used as the video cover by the current OTA platform, so that the wonderful content of the video is hidden, the interests of the user are difficult to attract, and the experience of the user is poor.
Disclosure of Invention
The invention aims to overcome the defect that the first frame or the last frame of the video is taken as a video cover in the prior art, and provides a method, a system, electronic equipment and a storage medium for selecting the video cover.
The invention solves the technical problems by the following technical scheme:
a method of selecting a video cover, the method comprising:
extracting multi-frame target images from the target video;
identifying an image category corresponding to the target image of each frame;
determining one of the identified image categories as a video category of the target video;
and selecting one frame from the target images corresponding to the video categories as the cover of the target video.
Preferably, after the step of extracting the multi-frame target image from the target video, the method further comprises:
filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image is smaller than a first threshold, the definition of the target image is smaller than a second threshold, and the color single degree of the target image is larger than a third threshold.
Preferably, the step of identifying the image category corresponding to the target image for each frame includes:
identifying the image category corresponding to the target image of each frame according to the image identification model;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the step of selecting a frame from the target images corresponding to the video categories as the cover of the target video includes:
determining a target image corresponding to the video category as a candidate image;
evaluating the image scores corresponding to the candidate images of each frame according to an image score model;
determining the candidate image with the highest image score as the cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
Preferably, the step of determining one of the identified target image categories as a video category of the target video includes:
determining the image category with the largest number of corresponding target images as the video category;
or,
the step of determining one of the identified image categories as a video category of the target video includes:
acquiring comment information corresponding to the target video;
determining the image category matched with the evaluation information in the identified image categories as a candidate category;
and determining the candidate category with the largest number of corresponding target images as the video category.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the methods of selecting a video cover described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs any of the steps of the method of selecting a video cover described above.
A system for selecting a video cover, the system comprising:
the extraction module is used for extracting multi-frame target images from the target video;
the identification module is used for identifying the image category corresponding to each frame of the target image;
a determining module, configured to determine one of the identified image categories as a video category of the target video;
and the selection module is used for selecting one frame from the target images corresponding to the video categories as the cover of the target video.
Preferably, the system further comprises:
the filtering module is used for filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image is smaller than a first threshold, the definition of the target image is smaller than a second threshold, and the color single degree of the target image is larger than a third threshold.
Preferably, the identification module is specifically configured to identify, according to an image identification model, an image category corresponding to the target image of each frame;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the selection module comprises:
the first determining unit is used for determining that the target image corresponding to the video category is a candidate image;
the image scoring unit is used for evaluating the image scores corresponding to the candidate images of each frame according to the image scoring model;
the second determining unit is used for determining that the candidate image with the highest image score is the front cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
Preferably, the determining module is specifically configured to determine, as the video category, an image category with the largest number of corresponding target images;
or,
the determining module includes:
the acquisition unit is used for acquiring comment information corresponding to the target video;
a third determination unit configured to determine, as a candidate category, an image category matching the comment information among the identified image categories;
and a fourth determining unit, configured to determine, as the video category, a candidate category with the largest number of corresponding target images.
The invention has the positive progress effects that: the invention takes the attached video category and the representative image as the video cover based on the understanding of the video content, which not only can accurately display the video information, but also is convenient for users to browse and screen the video quickly, thereby improving the viscosity of the users and further improving the preset conversion rate in the OTA.
Drawings
Fig. 1 is a flowchart of a method of selecting a video cover according to embodiment 1 of the present invention.
Fig. 2 is a schematic hardware structure of an electronic device according to embodiment 2 of the present invention.
Fig. 3 is a block diagram of a system for selecting a video cover according to embodiment 4 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
The present embodiment provides a method for selecting a video cover, and fig. 1 shows a flowchart of the present embodiment. Referring to fig. 1, the method of the present embodiment includes:
s101, extracting multi-frame target images from target videos.
In this embodiment, considering that the amount of information included in the target video is large and the feature dimension is large, the calculation dimension of the target video may be reduced by the extraction method, specifically, for a video with n transmission frames per second and a duration of t, the number of extracted target images may be determined to be f (f < n×t), and then the target videos may be extracted at uniform intervals to obtain f frame target images.
S102, filtering the extracted multi-frame target image according to the filtering condition.
In this embodiment, in order to further reduce the calculation dimension of the target video, some target images with poor objective indexes may be filtered according to the filtering condition, where the objective indexes may include, but are not limited to, brightness, definition, color single degree, and the like.
For example, in this embodiment, the filtering condition may include that the brightness of the target image is smaller than a first threshold, where the first threshold may be set in a customized manner according to the practical application, and the brightness may be calculated according to the following formula:
Luminance(I rgb )=0.2126I r +0.7152I g +0.0722I b
in the above formula, I rgb Representing a color chart, I r 、I g And I b Three channels red, green and blue of the color image are represented, respectively.
For example, in this embodiment, the filtering condition may include that the sharpness of the target image is smaller than a second threshold, where the second threshold may be set in a customized manner according to an actual application, and the sharpness may be calculated according to the following formula:
in the above formula, I gray Gray scale map, delta, representing gray scale color image x And delta y Representing the gradient map in x and y directions on the image, respectively.
For another example, in this embodiment, the filtering condition may include that the color single degree of the target image is greater than a third threshold, where the third threshold may be set in a customized manner according to the practical application, and the color single degree may be calculated according to the following formula:
in the above equation, hist (), which is used to characterize the color singleness, is ordered by gray value duty cycle, and the gray value of 5% of the duty cycle is found to be the proportion of all pixels.
S103, identifying the image category corresponding to each frame of target image.
In this embodiment, specifically, the image type corresponding to each frame of target image may be identified according to an image identification model, where the input of the image identification model is a frame of target image, and the input is the image type corresponding to the frame of target image.
Specifically, in the image recognition model of the present embodiment, 159 network layers may be included, and 7 dense blocks are adopted, where the size of the feature map in each dense block is unchanged, and different convolution layers in the dense block are connected in a jump manner, so as to ensure the transfer of feature information. The activation function of the last layer in the network layer can be a softmax function, the number of neurons is N, and the output value p of the neuron i (i is a positive integer less than or equal to N) i Between 0-1, the network weights can be updated during training based on the cross entropy loss function back propagation. For each image class i, τ can be set i As a threshold value for each image class, p i ≥τ i The frame target image contains a label for this image category.
In this embodiment, a set of image categories may be obtained after the obtained multi-frame target images are respectively input into the image recognition model, for example, the output of the image recognition model may include a transition frame, a foreground, a swimming pool, an appearance, etc., where the transition frame may be used to represent that the frame target image does not have an actual meaning for representing the image category, the foreground, the swimming pool, the appearance, etc. may be used to represent the image category, and the tag sequence of the set of image categories obtained by processing the target video may include: appearance, transition frame, foreground, background, and the like front desk, hall, transition frame transition frame, swimming pool swimming pool, transition frame.
S104, determining one of the identified image categories as a video category of the target video.
Specifically, in this embodiment, the image category with the largest number of corresponding target images may be determined as the video category, for example, in the tag sequence of the image categories shown above, the image category with the largest number of corresponding target images is a swimming pool, and then the swimming pool may be determined as the video category corresponding to the target video.
In this embodiment, the video category may also be determined in combination with comment information corresponding to the target video, where the comment information may include, but is not limited to, comments, descriptions, and the like, and specifically, step S104 may include a step of acquiring comment information corresponding to the target video, a step of determining, as a candidate category, an image category matching the comment information from the identified image categories, and a step of determining, as a video category, a candidate category having the largest number of corresponding target images. For example, when the obtained comment information is "the hotel is very praise, the foreground is in service enthusiasm, the swimming pool is large, and the room is comfortable", the matching image categories can be obtained as the foreground and the swimming pool after matching the keywords by combining the tag sequences of the image categories shown above, and the number of target images corresponding to the swimming pool is greater than the number of target images corresponding to the foreground, so that the swimming pool can be determined as the video category corresponding to the target video.
S105, selecting one frame from target images corresponding to the video categories as a cover of the target video.
In this embodiment, step S105 may include a step of determining that a target image corresponding to a video category is a candidate image, a step of evaluating an image score corresponding to each frame of candidate image according to an image score model, and a step of determining that a candidate image with the highest image score is a cover of a target video, where an input of the image score model is a candidate image and an output is an image score corresponding to the candidate image.
Specifically, in the present embodiment, the quality of each frame of image may be evaluated and scored manually first to construct a training set, for example, 1000 frames of images out of 100 videos may be randomly extracted, by3 American staff scored the image from the angles of picture color, composition and the like, wherein the scoring range comprises: 1. 2, 3, 4, 5, and rounding the average value of the 3 person scores to be the image score of the frame image. Then, based on a training set training image scoring model, the image scoring model in the embodiment can comprise 43 network layers, and Res blocks are adopted, wherein the size of a feature map in each Res block is unchanged, and different convolution layers in the Res blocks are connected in a jumping manner, so that the transmission of feature information is ensured. The activation function of the last layer in the network layer can be softmax function, the number of neurons is 5, and the output value p of each neuron i Between 0-1, the probabilities for five image scoring class categories are represented. The network weights can be updated during training based on the cross entropy loss function back propagation. For each rank category i of image scoring, the output of the image scoring model is the probability p i Then can be usedAn image score representing the target image. In this embodiment, after determining the image score corresponding to each frame of candidate image, if the number of candidate images with the highest image score is a plurality of candidate images, the first frame of the plurality of candidate images may be selected as the cover of the target video.
According to the embodiment, based on understanding of video content, the attached video category and the representative image are used as the video cover, so that video information can be accurately displayed, a user can conveniently and quickly browse and screen videos, the viscosity of the user can be improved, the preset conversion rate in OTA can be improved, the image with poor objective index performance is filtered, the image is scored based on picture aesthetics, the content and quality of the video are comprehensively considered, the finally determined cover is the most representative and high-quality cover, better visual experience is achieved, and the click rate of the user on the video can be improved.
Example 2
The present embodiment provides an electronic device, which may be expressed in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the method for selecting a video cover provided in embodiment 1 when executing the computer program.
Fig. 2 shows a schematic hardware structure of the present embodiment, and as shown in fig. 2, the electronic device 9 specifically includes:
at least one processor 91, at least one memory 92, and a bus 93 for connecting the different system components (including the processor 91 and the memory 92), wherein:
the bus 93 includes a data bus, an address bus, and a control bus.
The memory 92 includes volatile memory such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 91 executes various functional applications and data processing such as the method of selecting a video cover provided in embodiment 1 of the present invention by running a computer program stored in the memory 92.
The electronic device 9 may further communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 96. The network adapter 96 communicates with other modules of the electronic device 9 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module according to embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 3
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of selecting a video cover provided by embodiment 1.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of the method of implementing the selection of a video cover in embodiment 1, when said program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
Example 4
The present embodiment provides a system for selecting a video cover, and fig. 3 shows a schematic block diagram of the present embodiment. Referring to fig. 3, the method of the present embodiment includes:
and the extraction module 1 is used for extracting multi-frame target images from the target video.
In this embodiment, considering that the amount of information included in the target video is large and the feature dimension is large, the calculation dimension of the target video may be reduced by the extraction method, specifically, for a video with n transmission frames per second and a duration of t, the number of extracted target images may be determined to be f (f < n×t), and then the target videos may be extracted at uniform intervals to obtain f frame target images.
And the filtering module 2 is used for filtering the extracted multi-frame target image according to the filtering condition.
In this embodiment, in order to further reduce the calculation dimension of the target video, some target images with poor objective indexes may be filtered according to the filtering condition, where the objective indexes may include, but are not limited to, brightness, definition, color single degree, and the like.
For example, in this embodiment, the filtering condition may include that the brightness of the target image is smaller than a first threshold, where the first threshold may be set in a customized manner according to the practical application, and the brightness may be calculated according to the following formula:
Luminance(I rgb )=0.2126I r +0.7152I g +0.0722I b
in the above formula, I rgb Representing a color chart, I r 、I g And I b Three channels red, green and blue of the color image are represented, respectively.
For example, in this embodiment, the filtering condition may include that the sharpness of the target image is smaller than a second threshold, where the second threshold may be set in a customized manner according to an actual application, and the sharpness may be calculated according to the following formula:
in the above formula, I gray Gray scale map, delta, representing gray scale color image x And delta y Representing the gradient map in x and y directions on the image, respectively.
For another example, in this embodiment, the filtering condition may include that the color single degree of the target image is greater than a third threshold, where the third threshold may be set in a customized manner according to the practical application, and the color single degree may be calculated according to the following formula:
in the above equation, hist (), which is used to characterize the color singleness, is ordered by gray value duty cycle, and the gray value of 5% of the duty cycle is found to be the proportion of all pixels.
And the identification module 3 is used for identifying the image category corresponding to each frame of target image.
In this embodiment, the identifying module 3 may specifically identify, according to an image identifying model, an image category corresponding to each frame of target image, where an input of the image identifying model is a frame of target image, and output the input of the image identifying model is the image category corresponding to the frame of target image.
Specifically, in the image recognition model of the present embodiment, 159 network layers may be included, and 7 dense blocks are adopted, where the size of the feature map in each dense block is unchanged, and different convolution layers in the dense block are connected in a jump manner, so as to ensure the transfer of feature information. The activation function of the last layer in the network layer can be a softmax function, the number of neurons is N, and the output value p of the neuron i (i is a positive integer less than or equal to N) i Between 0-1, the network weights can be updated during training based on the cross entropy loss function back propagation. For each image class i, τ can be set i As a threshold value for each image class, p i ≥τ i The frame target image contains a label for this image category.
In this embodiment, a set of image categories may be obtained after the obtained multi-frame target images are respectively input into the image recognition model, for example, the output of the image recognition model may include a transition frame, a foreground, a swimming pool, an appearance, etc., where the transition frame may be used to represent that the frame target image does not have an actual meaning for representing the image category, the foreground, the swimming pool, the appearance, etc. may be used to represent the image category, and the tag sequence of the set of image categories obtained by processing the target video may include: appearance, transition frame, foreground, background, and the like front desk, hall, transition frame transition frame, swimming pool swimming pool, transition frame.
A determining module 4, configured to determine one of the identified image categories as a video category of the target video.
Specifically, in this embodiment, the image category with the largest number of corresponding target images may be determined as the video category, for example, in the tag sequence of the image categories shown above, the image category with the largest number of corresponding target images is a swimming pool, and then the swimming pool may be determined as the video category corresponding to the target video.
In this embodiment, the video category may also be determined in combination with comment information corresponding to the target video, where the comment information may include, but is not limited to, comments, descriptions, and the like, and specifically, the determining module 4 may include an obtaining unit for obtaining comment information corresponding to the target video, a third determining unit for determining, as a candidate category, an image category matching the comment information among the identified image categories, and a fourth determining unit for determining, as a video category, a candidate category having the largest number of corresponding target images. For example, when the obtained comment information is "the hotel is very praise, the foreground is in service enthusiasm, the swimming pool is large, and the room is comfortable", the matching image categories can be obtained as the foreground and the swimming pool after matching the keywords by combining the tag sequences of the image categories shown above, and the number of target images corresponding to the swimming pool is greater than the number of target images corresponding to the foreground, so that the swimming pool can be determined as the video category corresponding to the target video.
And the selection module 5 is used for selecting one frame from the target images corresponding to the video categories as the cover of the target video.
In this embodiment, the selection module 5 may include a first determining unit for determining that a target image corresponding to a video category is a candidate image, an image scoring unit for evaluating an image score corresponding to each frame of candidate image according to an image scoring model, and a second determining unit for determining that a candidate image with a highest image score is a cover of the target video, where an input of the image scoring model is the candidate image, and an output is an image score corresponding to the candidate image.
Specifically, in this embodiment, the quality of each frame of image may be evaluated and scored manually first to construct a training set, for example, 1000 frames of images in 100 videos may be randomly extracted, and the images may be scored from the angles of screen color, composition, and the like by 3 building staff, where the scoring range includes: 1. 2, 3, 4, 5, and rounding the average value of the 3 person scores to be the image score of the frame image. Then, based on a training set training image scoring model, the image scoring model in the embodiment can comprise 43 network layers, and Res blocks are adopted, wherein the size of a feature map in each Res block is unchanged, and different convolution layers in the Res blocks are connected in a jumping manner, so that the transmission of feature information is ensured. The activation function of the last layer in the network layer can be softmax function, the number of neurons is 5, and the output value p of each neuron i Between 0-1, the probabilities for five image scoring class categories are represented. The network weights can be updated during training based on the cross entropy loss function back propagation. For each rank category i of image scoring, the output of the image scoring model is the probability p i Then can be usedAn image score representing the target image. In this embodiment, after determining the image score corresponding to each frame of candidate image, if the number of candidate images with the highest image score is a plurality of candidate images, the first frame of the plurality of candidate images may be selected as the cover of the target video.
According to the embodiment, based on understanding of video content, the attached video category and the representative image are used as the video cover, so that video information can be accurately displayed, a user can conveniently and quickly browse and screen videos, the viscosity of the user can be improved, the preset conversion rate in OTA can be improved, the image with poor objective index performance is filtered, the image is scored based on picture aesthetics, the content and quality of the video are comprehensively considered, the finally determined cover is the most representative and high-quality cover, better visual experience is achieved, and the click rate of the user on the video can be improved.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.
Claims (8)
1. A method of selecting a video cover, the method comprising:
extracting multi-frame target images from the target video;
filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image being less than a first threshold, the sharpness of the target image being less than a second threshold, and the color single degree of the target image being greater than a third threshold;
identifying an image category corresponding to the target image of each frame;
determining one of the identified image categories as a video category of the target video;
the step of determining one of the identified image categories as a video category of the target video includes:
acquiring comment information corresponding to the target video;
determining the image category matched with the evaluation information in the identified image categories as a candidate category;
determining the candidate category with the largest number of corresponding target images as the video category;
and selecting one frame from the target images corresponding to the video categories as the cover of the target video.
2. The method of selecting a video cover as claimed in claim 1, wherein the step of identifying an image category corresponding to the target image for each frame includes:
identifying the image category corresponding to the target image of each frame according to the image identification model;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the step of selecting a frame from the target images corresponding to the video categories as the cover of the target video includes:
determining a target image corresponding to the video category as a candidate image;
evaluating the image scores corresponding to the candidate images of each frame according to an image score model;
determining the candidate image with the highest image score as the cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
3. The method of selecting a video cover as recited in claim 1, wherein the step of determining one of the identified target image categories as the video category of the target video comprises:
and determining the image category with the largest number of corresponding target images as the video category.
4. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of selecting a video cover as claimed in any one of claims 1-3 when the computer program is executed.
5. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of selecting a video cover as claimed in any one of claims 1 to 3.
6. A system for selecting a video cover, the system comprising:
the extraction module is used for extracting multi-frame target images from the target video;
the filtering module is used for filtering the extracted multi-frame target image according to the filtering condition; wherein the filtering conditions include:
at least one of the brightness of the target image being less than a first threshold, the sharpness of the target image being less than a second threshold, and the color single degree of the target image being greater than a third threshold;
the identification module is used for identifying the image category corresponding to each frame of the target image;
a determining module, configured to determine one of the identified image categories as a video category of the target video;
the determining module includes:
the acquisition unit is used for acquiring comment information corresponding to the target video;
a third determination unit configured to determine, as a candidate category, an image category matching the comment information among the identified image categories;
a fourth determining unit, configured to determine, as the video category, a candidate category having the largest number of corresponding target images;
and the selection module is used for selecting one frame from the target images corresponding to the video categories as the cover of the target video.
7. The system for selecting a video cover as recited in claim 6, wherein the identification module is specifically configured to identify an image category corresponding to the target image for each frame based on an image recognition model;
the input of the image recognition model is the target image, and the input of the image recognition model is the image category corresponding to the target image;
and/or the number of the groups of groups,
the selection module comprises:
the first determining unit is used for determining that the target image corresponding to the video category is a candidate image;
the image scoring unit is used for evaluating the image scores corresponding to the candidate images of each frame according to the image scoring model;
the second determining unit is used for determining that the candidate image with the highest image score is the front cover of the target video;
and the input of the image scoring model is the candidate image, and the image scoring model is output as the image scoring corresponding to the candidate image.
8. The system for selecting a video cover as recited in claim 6, wherein the determination module is specifically configured to determine the image category that corresponds to the greatest number of target images as the video category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911395856.2A CN111143613B (en) | 2019-12-30 | 2019-12-30 | Method, system, electronic device and storage medium for selecting video cover |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911395856.2A CN111143613B (en) | 2019-12-30 | 2019-12-30 | Method, system, electronic device and storage medium for selecting video cover |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111143613A CN111143613A (en) | 2020-05-12 |
CN111143613B true CN111143613B (en) | 2024-02-06 |
Family
ID=70521857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911395856.2A Active CN111143613B (en) | 2019-12-30 | 2019-12-30 | Method, system, electronic device and storage medium for selecting video cover |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111143613B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111831615B (en) * | 2020-05-28 | 2024-03-12 | 北京达佳互联信息技术有限公司 | Method, device and system for generating video file |
CN111601160A (en) * | 2020-05-29 | 2020-08-28 | 北京百度网讯科技有限公司 | Method and device for editing video |
CN111918130A (en) * | 2020-08-11 | 2020-11-10 | 北京达佳互联信息技术有限公司 | Video cover determining method and device, electronic equipment and storage medium |
WO2022087826A1 (en) * | 2020-10-27 | 2022-05-05 | 深圳市大疆创新科技有限公司 | Video processing method and apparatus, mobile device, and readable storage medium |
CN112363660B (en) * | 2020-11-09 | 2023-03-24 | 北京达佳互联信息技术有限公司 | Method and device for determining cover image, electronic equipment and storage medium |
CN113794890B (en) * | 2021-07-30 | 2023-10-24 | 北京达佳互联信息技术有限公司 | Data processing method, device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105611413A (en) * | 2015-12-24 | 2016-05-25 | 小米科技有限责任公司 | Method and device for adding video clip class markers |
CN107918656A (en) * | 2017-11-17 | 2018-04-17 | 北京奇虎科技有限公司 | Video front cover extracting method and device based on video title |
CN108650524A (en) * | 2018-05-23 | 2018-10-12 | 腾讯科技(深圳)有限公司 | Video cover generation method, device, computer equipment and storage medium |
CN109146921A (en) * | 2018-07-02 | 2019-01-04 | 华中科技大学 | A kind of pedestrian target tracking based on deep learning |
CN109271542A (en) * | 2018-09-28 | 2019-01-25 | 百度在线网络技术(北京)有限公司 | Cover determines method, apparatus, equipment and readable storage medium storing program for executing |
CN110263743A (en) * | 2019-06-26 | 2019-09-20 | 北京字节跳动网络技术有限公司 | The method and apparatus of image for identification |
CN110390025A (en) * | 2019-07-24 | 2019-10-29 | 百度在线网络技术(北京)有限公司 | Cover figure determines method, apparatus, equipment and computer readable storage medium |
CN110399848A (en) * | 2019-07-30 | 2019-11-01 | 北京字节跳动网络技术有限公司 | Video cover generation method, device and electronic equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8990223B2 (en) * | 2012-06-29 | 2015-03-24 | Rovi Guides, Inc. | Systems and methods for matching media content data |
-
2019
- 2019-12-30 CN CN201911395856.2A patent/CN111143613B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105611413A (en) * | 2015-12-24 | 2016-05-25 | 小米科技有限责任公司 | Method and device for adding video clip class markers |
CN107918656A (en) * | 2017-11-17 | 2018-04-17 | 北京奇虎科技有限公司 | Video front cover extracting method and device based on video title |
CN108650524A (en) * | 2018-05-23 | 2018-10-12 | 腾讯科技(深圳)有限公司 | Video cover generation method, device, computer equipment and storage medium |
CN109146921A (en) * | 2018-07-02 | 2019-01-04 | 华中科技大学 | A kind of pedestrian target tracking based on deep learning |
CN109271542A (en) * | 2018-09-28 | 2019-01-25 | 百度在线网络技术(北京)有限公司 | Cover determines method, apparatus, equipment and readable storage medium storing program for executing |
CN110263743A (en) * | 2019-06-26 | 2019-09-20 | 北京字节跳动网络技术有限公司 | The method and apparatus of image for identification |
CN110390025A (en) * | 2019-07-24 | 2019-10-29 | 百度在线网络技术(北京)有限公司 | Cover figure determines method, apparatus, equipment and computer readable storage medium |
CN110399848A (en) * | 2019-07-30 | 2019-11-01 | 北京字节跳动网络技术有限公司 | Video cover generation method, device and electronic equipment |
Non-Patent Citations (1)
Title |
---|
融合背景下的短视频发展状况及趋势;黄楚新;;人民论坛・学术前沿(第23期);第42-49页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111143613A (en) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111143613B (en) | Method, system, electronic device and storage medium for selecting video cover | |
CN111696112B (en) | Automatic image cutting method and system, electronic equipment and storage medium | |
US10671895B2 (en) | Automated selection of subjectively best image frames from burst captured image sequences | |
CN113395578B (en) | Method, device, equipment and storage medium for extracting video theme text | |
CN112119388A (en) | Training image embedding model and text embedding model | |
CN112634296A (en) | RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism | |
CN112074828A (en) | Training image embedding model and text embedding model | |
CN107908641A (en) | A kind of method and system for obtaining picture labeled data | |
WO2019118236A1 (en) | Deep learning on image frames to generate a summary | |
CN113761253A (en) | Video tag determination method, device, equipment and storage medium | |
CN111612010A (en) | Image processing method, device, equipment and computer readable storage medium | |
CN111259245B (en) | Work pushing method, device and storage medium | |
WO2022156534A1 (en) | Video quality assessment method and device | |
CN113301382A (en) | Video processing method, device, medium, and program product | |
CN114723652A (en) | Cell density determination method, cell density determination device, electronic apparatus, and storage medium | |
CN110704650A (en) | OTA picture tag identification method, electronic device and medium | |
CN113627342B (en) | Method, system, equipment and storage medium for video depth feature extraction optimization | |
CN109960745A (en) | Visual classification processing method and processing device, storage medium and electronic equipment | |
CN115964560A (en) | Information recommendation method and equipment based on multi-mode pre-training model | |
CN108229263B (en) | Target object identification method and device and robot | |
CN117795551A (en) | Method and system for automatically capturing and processing user images | |
CN115630188A (en) | Video recommendation method and device and electronic equipment | |
CN114372580A (en) | Model training method, storage medium, electronic device, and computer program product | |
CN103729532A (en) | Information supplying method and device based on images of fruits and vegetables | |
US20210157826A1 (en) | Data model proposals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |