CN115412765A - Video highlight determining method and device, electronic equipment and storage medium - Google Patents

Video highlight determining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115412765A
CN115412765A CN202211054859.1A CN202211054859A CN115412765A CN 115412765 A CN115412765 A CN 115412765A CN 202211054859 A CN202211054859 A CN 202211054859A CN 115412765 A CN115412765 A CN 115412765A
Authority
CN
China
Prior art keywords
segment
video
clip
highlight
alternative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211054859.1A
Other languages
Chinese (zh)
Other versions
CN115412765B (en
Inventor
侯佳芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202211054859.1A priority Critical patent/CN115412765B/en
Publication of CN115412765A publication Critical patent/CN115412765A/en
Application granted granted Critical
Publication of CN115412765B publication Critical patent/CN115412765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application provides a method and a device for determining a video highlight, electronic equipment and a storage medium, wherein the method for determining the video highlight comprises the following steps: obtaining a first alternative segment and a second alternative segment of a target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with the wonderness meeting the wonderness condition; based on the first candidate segment and the second candidate segment, a highlight segment of the target video is determined. By applying the technical scheme provided by the embodiment of the application, the highlight segment determined based on the first alternative segment and the second alternative segment is fused with the set action label and the feature of the highlight, so that the determined highlight segment is more accurate, and the method is helpful for helping a user to locate an interested video.

Description

Video highlight determining method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer application technologies, and in particular, to a method and an apparatus for determining a highlight of a video, an electronic device, and a storage medium.
Background
Today, watching video is becoming a way for people to entertain and relax. Most of the video providers extract the highlight segments from the videos for video distribution or information distribution, so that the users can know the highlight portions of the corresponding videos through the highlight segments.
Therefore, how to accurately determine the highlight segments in the video to help the user locate the interested video is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for determining a highlight of a video, so as to accurately determine the highlight of the video, which is helpful for helping a user locate an interested video. The specific technical scheme is as follows:
in a first aspect, a method for determining a highlight segment of a video is provided, including:
obtaining a first alternative segment and a second alternative segment of a target video, wherein the first alternative segment is a video segment with a set action tag, and the second alternative segment is a video segment with the wonderness meeting a wonderness condition;
determining a highlight segment of the target video based on the first candidate segment and the second candidate segment.
In a specific embodiment of the present application, the determining a highlight segment of the target video based on the first candidate segment and the second candidate segment includes:
determining whether the first alternative clip and the second alternative clip have a superposition part on the time axis of the target video;
and if the overlapped part exists, determining the highlight of the target video according to the first alternative clip and the second alternative clip with the overlapped part on the time axis.
In a specific embodiment of the present application, the determining a highlight of the target video according to the first alternative clip and the second alternative clip having overlapping portions on the time axis includes:
respectively taking the second alternative clips with overlapped parts on the time axis and the corresponding first alternative clips as a clip pair;
and determining a highlight clip of the target video according to the time of the second alternative clip and the first alternative clip in each clip pair on the time axis.
In a specific embodiment of the present application, the determining a highlight clip of the target video according to the time of the second candidate clip and the first candidate clip in each clip pair on the time axis includes:
for each clip pair, if the start time of the first alternative clip in the current clip pair on the time axis is before the start time of the second alternative clip in the current clip pair on the time axis, acquiring a first to-be-used clip in the target video, where the start time of the first to-be-used clip is the start time of the second alternative clip in the current clip pair on the time axis, and the duration of the first to-be-used clip is a first duration;
determining whether the first to-be-used segment has the action tag;
determining the segment pair corresponding to the first standby segment with the action tag as a standby segment pair;
for each standby segment pair, if the chroma of the second alternative segment in the current standby segment pair is higher than a first chroma threshold value, or the chroma of the second alternative segment in the current standby segment pair is one of the top N1 highlights, determining the second alternative segment in the current segment pair as the highlight segment of the target video, wherein N1 is a positive integer.
In a specific embodiment of the present application, the determining a highlight clip of the target video according to the time of the second candidate clip and the first candidate clip in each clip pair on the time axis includes:
for each of the clip pairs, if a start time of the first alternative clip in the current clip pair on the timeline is later than a start time of the second alternative clip in the current clip pair on the timeline, acquiring a second standby clip in the target video, the start time of the second standby clip being a start time of the first alternative clip in the current clip pair on the timeline, and the end time of the second standby clip being an end time of the second alternative clip in the current clip pair on the timeline;
and if the highlight of the second alternative segment in the current segment pair is higher than a second highlight threshold, or the highlight of the second alternative segment in the current segment pair is one of the top N2 highlights, determining the second standby segment as the highlight of the target video, wherein N2 is a positive integer.
In a specific embodiment of the present application, in a case that it is determined that there is no overlapping portion of the first alternative clip and the second alternative clip on the time axis, the method further includes:
determining the probability that each first candidate segment has the action tag, determining the first candidate segment with the probability greater than a probability threshold value as a highlight segment of the target video, or determining the first candidate segment corresponding to the top N3 probabilities as the highlight segment of the target video, wherein N3 is a positive integer;
alternatively, the first and second electrodes may be,
and determining the second candidate segment with the high chroma higher than a third highlight threshold value as the highlight segment of the target video, or determining the second candidate segment corresponding to the top N4 highlights as the highlight segment of the target video, wherein N4 is a positive integer.
In a specific embodiment of the present application, before the determining the highlight segment of the target video based on the first candidate segment and the second candidate segment, the method further includes:
determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the video inter-frame image change difference situation;
rejecting the second candidate segments with a dynamic rate less than the average dynamic rate of the target video.
In one embodiment of the present application, after the determining the highlight segment of the target video, the method further includes:
detecting whether a target object exists in the first second duration of the highlight;
if the target object does not exist, searching the time when the target object exists backwards from the starting time of the highlight on the time axis of the target video;
updating the starting time of the highlight segment to the time when the target object exists.
In one embodiment of the present application, after the determining the highlight segment of the target video, the method further includes:
determining whether the duration of the highlight meets a first duration requirement;
and if the duration of the highlight does not meet the first duration requirement, performing interception operation or supplement operation on the highlight based on the first duration requirement.
In a specific embodiment of the present application, the first alternative segment includes one or more first video segments of a first video segment set of the target video, the first video segment set is obtained by:
segmenting the target video according to shots to obtain a plurality of shot segments;
and combining the shot segments according to a second duration requirement to obtain the first video segment set, wherein the duration of each first video segment in the first video segment set meets the second duration requirement.
In a specific embodiment of the present application, the second alternative segment includes one or more second video segments in a second video segment set of the target video, and the chroma of each second video segment in the second video segment set is determined by:
for each second video clip in the second video clip set, inputting the current second video clip into a classification model obtained by pre-training, and obtaining the probability that the current second video clip output by the classification model is a highlight case;
and determining the wonderness of the current second video clip according to the probability that the current second video clip is a wonderful example.
In one embodiment of the present application, the classification model is obtained by training in advance through the following steps:
obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a wonderful positive example and a negative example;
training a pre-constructed initial model by using the training sample set, and terminating training until a set training termination condition is reached to obtain the classification model;
wherein, the training of the pre-constructed initial model by using the training sample set comprises:
for each sample pair, inputting the current sample pair into a pre-constructed initial model;
determining model loss according to the current sample pair and the output result of the current sample pair output by the initial model;
and adjusting parameters of the initial model according to the model loss.
In one embodiment of the present application, the method further includes:
obtaining second viewing data of the target video;
the determining the wonderness of the current second video clip according to the probability that the current second video clip is wonderful comprises:
and determining the wonderness of the current second video clip according to the second viewing data and the probability that the current second video clip is wonderful.
In a second aspect, there is provided a video highlight determination apparatus, comprising:
the obtaining module is used for obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action tag, and the second alternative segment is a video segment with the wonderness meeting the wonderness condition;
a determining module for determining a highlight segment of the target video based on the first candidate segment and the second candidate segment.
In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method for determining a highlight reel of a video according to the first aspect when executing a program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the method for determining a highlight reel of a video according to the first aspect.
In a fifth aspect, a computer program product is provided, the computer program product comprising computer instructions stored in a computer readable storage medium and adapted to be read and executed by a processor to cause an electronic device having the processor to perform the steps of the video highlight determination method of the first aspect.
By applying the technical scheme provided by the embodiment of the application, after the first alternative segment and the second alternative segment of the target video are obtained, the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The first alternative segment is a video segment with a set action label, the second alternative segment is determined according to the wonderness, the wonderness segment determined based on the first alternative segment and the second alternative segment is fused with the set action label and the wonderness characteristic, the determined wonderness segment is more accurate, and then the determined wonderness segment is subjected to video distribution or information delivery, so that a user can be helped to locate an interested video, the user click rate of the video is improved, and the watching probability of the user on a target video is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart of an implementation of a method for determining a highlight of a video according to an embodiment of the present application;
FIG. 2 is a schematic flowchart illustrating another implementation of the method for determining a highlight of a video according to the embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus for determining highlight of video in this embodiment;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The core of the application is to provide a method for determining the highlight of the video, and the method can be applied to scenes such as video recommendation and distribution, information delivery and the like. For example, when a target video is recommended to a user, a highlight of the target video can be accurately determined based on the technical scheme provided by the embodiment of the application, and then the highlight is used for video distribution, so that the user can know the highlight of the target video through the highlight, the user can be helped to locate an interested video, and the user click rate of the video and the watching probability of the user on the video are improved.
The technical scheme provided by the embodiment of the application is particularly suitable for determining the highlight of the animation and the children type videos, and the highlight determination accuracy can be improved because the animation and the children type videos have larger differences in the aspects of pictures, shooting styles, subject matters and the like compared with videos of movie, art and life shooting types, and the set action labels and the features of the highlight are fused in the determined highlight.
Referring to fig. 1, a flowchart of an implementation of a method for determining a highlight of a video according to an embodiment of the present application is shown, where the method may include the following steps:
s110: and obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with the highlight meeting a preset highlight condition.
In the embodiment of the present application, the target video may be any one of videos, such as any one of videos to be recommended for distribution. After the target video is determined, a first alternative segment and a second alternative segment of the target video can be obtained. Specifically, a video segment having a set action tag in the target video may be used as the first candidate segment, and a video segment having a highlight meeting the highlight condition in the target video may be used as the second candidate segment. The highlight condition may be a preset condition, for example, when the highlight of a certain video segment is higher than a highlight threshold, the highlight of the certain video segment may be considered to satisfy the highlight condition, or the highlight of the certain video segment is within the top N highest highlights, and the highlight of the certain video segment may be considered to satisfy the highlight condition. I.e., the second alternative fragment has a higher chroma of saturation.
In one embodiment, the target video may be segmented to obtain a first set of video segments and a second set of video segments of the target video.
The first video segment set and the second video segment set can be obtained through the same segmentation mode or different segmentation modes. For example, a first video segment set is obtained by shot segmentation, and a second video segment set is obtained by sliding window segmentation. The first video clip set comprises at least one first video clip, and the second video clip set comprises at least one second video clip.
For the first alternative segment, the determination may be made according to a first video segment in the first video segment set having a set action tag.
The action tags set in the embodiment of the present application may include multiple types of tags such as special effects, hugging, eating and drinking, running, hand pulling, dancing, applause, emotional excitement, and the like, so as to adapt to videos with multiple subjects. Emotional agitation such as crying, laughing, surprise, etc.
It may be determined whether each first video segment in the first video segment set has a set action tag. Specifically, the probability that each first video segment has each type of action tag can be obtained through a spatio-temporal self-attention-based Timesformer model obtained through pre-training, and the action tag of each first video segment can be identified according to the probability. For each first video segment, the action label with the highest probability may be determined as the action label that the first video segment has. For example, a first video segment a in the first set of video segments has a hand-held type of action tag and a first video segment B has a running type of action tag. The types of the action labels of different first video clips in the first video clip set can be the same or different, some first video clips have set action labels, and some first video clips do not have set action labels.
From a first video segment of the first set of video segments having a set action tag, a first candidate segment may be determined. Each first video segment with the set action tag may be determined as a first candidate segment, or when different first video segments with the set action tag are consecutive, the consecutive first video segments with the set action tag may be merged first, and then the merged video segment may be determined as a first candidate segment.
For the second alternative segment, the determination may be made according to the highlights of each second video segment in the set of second video segments.
The highlight of each second video segment in the set of second video segments may be determined first. Specifically, the highlight of each second video segment in the second video segment set can be determined according to the watching behavior of the user, a preset highlight scoring algorithm and the like.
A second candidate segment may be determined based on the sharpness of each second video segment in the set of second video segments. Specifically, the second video segment with the higher chroma than the set highlight threshold may be determined as the second candidate segment, or the first N second video segments with high chroma to low chroma may be determined as the second candidate segment. Each second video segment meeting the chroma condition may be determined as a second candidate segment, and when there are continuous segments in the second video segments meeting the chroma condition, the continuous second video segments may be merged first, and then the merged video segment is determined as a second candidate segment.
The wonderness threshold may be set and adjusted according to actual conditions, such as an average value or a median of the wonderness of all the second video segments.
S120: based on the first candidate segment and the second candidate segment, a highlight segment of the target video is determined.
After the first candidate segment and the second candidate segment of the target video are obtained, a highlight segment of the target video can be determined based on the first candidate segment and the second candidate segment. Because the first alternative segment is the video segment with the set action label, and the second alternative segment is the video segment with the wonderness meeting the wonderness condition, namely the video segment with high wonderness, the determined wonderness segment integrates the characteristics of the action label and the wonderness, and the determination accuracy is improved.
By applying the method provided by the embodiment of the application, after the first alternative segment and the second alternative segment of the target video are obtained, the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The first alternative segment is a video segment with a set action label, the second alternative segment is determined according to the wonderness, the wonderness segment determined based on the first alternative segment and the second alternative segment is combined with the set action label and the wonderness characteristic, the determined wonderness segment is more accurate, and the determined wonderness segment is used for video distribution or information delivery, so that a user can be helped to locate an interested video, the user click rate of the video is improved, and the watching probability of the user on the video is improved.
In one embodiment of the present application, determining the highlight segment of the target video based on the first candidate segment and the second candidate segment may include the following steps:
the method comprises the following steps: determining whether the first alternative clip and the second alternative clip have a superposition part on the time axis of the target video, and if so, executing the second step;
step two: and determining a highlight of the target video according to the first alternative clip and the second alternative clip with overlapped parts on the time axis.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the present application, after the first alternative clip and the second alternative clip of the target video are obtained, it may be determined whether there is a coincidence portion of the first alternative clip and the second alternative clip on the time axis of the target video, that is, whether there is a coincidence time between the first alternative clip and the second alternative clip. For example, if the time of a second candidate clip on the time axis of the target video is 2 minutes 5 seconds to 2 minutes 10 seconds, and the time of a first candidate clip on the time axis of the target video is 1 minutes 50 seconds to 2 minutes 8 seconds, it may be determined that there is an overlapping portion between the second candidate clip and the first candidate clip on the time axis of the target video, that is, the overlapping portion is between 2 minutes 5 seconds and 2 minutes 8 seconds of the time axis of the target video.
If the first alternative clip and the second alternative clip have the overlapped part on the time axis, the highlight clip of the target video can be determined according to the first alternative clip and the second alternative clip having the overlapped part on the time axis, so that the determined highlight clip can be further ensured to be capable of fusing the characteristics of the action label and the highlight, and the highlight clip determination accuracy is improved.
In an embodiment of the present application, determining a highlight of a target video according to a first candidate clip and a second candidate clip having overlapping portions on a time axis may include the following steps:
the first step is as follows: respectively taking the second alternative clips with the overlapped parts on the time axis and the corresponding first alternative clips as a clip pair;
the second step is that: and determining the highlight of the target video according to the time of the second alternative clip and the first alternative clip in each clip pair on the time axis.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the present application, in a case where it is determined that the first candidate clip and the second candidate clip have overlapping portions on the time axis, the second candidate clip and the corresponding first candidate clip, which have overlapping portions on the time axis, may be respectively regarded as one clip pair. If there is a coincidence between a second alternative clip and a first alternative clip on the time axis, the second alternative clip and the first alternative clip are a clip pair. If there is an overlap between a second alternative clip and the first alternative clips on the time axis, the second alternative clip is a clip pair with each of the first alternative clips.
And determining the highlight of the target video according to the time of the second alternative clip and the first alternative clip in each clip pair on the time axis. The determined highlight clip can contain part or all of time in the second alternative clip and the first alternative clip with the overlapped part on the time axis, the characteristics of the action label and the highlight are fused, and the highlight clip determination accuracy is improved.
In an embodiment of the present application, determining the highlight of the target video according to the time of the second alternative clip and the first alternative clip in each clip pair on the time axis may include the following steps:
the method comprises the following steps: for each clip pair, if the starting time of a first alternative clip in the current clip pair on the time axis is before the starting time of a second alternative clip in the current clip pair on the time axis, acquiring a first to-be-used clip in the target video, wherein the starting time of the first to-be-used clip is the starting time of the second alternative clip in the current clip pair, and the duration of the first to-be-used clip is a first duration;
step two: determining whether the first to-be-used segment has an action tag;
step three: determining a segment pair corresponding to a first to-be-used segment with an action label as a to-be-used segment pair;
step four: and for each standby segment pair, if the chroma of the second alternative segment in the current standby segment pair is higher than a first chroma threshold value, or the chroma of the second alternative segment in the current standby segment pair is one of the top N1 highlights, determining the second alternative segment in the current segment pair as the highlight segment of the target video, wherein N1 is a positive integer.
For convenience of description, the above four steps are combined for illustration.
In the embodiment of the present application, a second candidate clip and a corresponding first candidate clip, which have overlapping portions on a time axis, are respectively used as a clip pair, so that a plurality of clip pairs can be obtained, where each clip pair includes a first candidate clip and a second candidate clip.
For each clip pair, it may be determined whether the start time in the time axis of the first alternative clip in the current clip pair is before the start time in the time axis of the second alternative clip in the current clip pair. The current segment pair refers to the segment pair for which the current operation is directed.
If the starting time of the first alternative clip in the current clip pair on the time axis is before the starting time of the second alternative clip in the current clip pair on the time axis, it may be considered that the highlight of a part of time of the first alternative clip in the current clip pair is not high enough, and the first to-be-used clip with the first duration in the target video may be obtained by taking the starting time of the second alternative clip in the current clip pair on the time axis as a starting point. That is, the start time of the first to-be-used clip is the start time of the second alternative clip in the current clip pair on the time axis, and part of the time of the first alternative clip in the current clip pair is removed. The first time period may be predetermined, such as 5 seconds. Because there is an overlap between the first alternative clip and the second alternative clip in the current clip pair on the time axis, if the start time of the first alternative clip on the time axis is before the start time of the second alternative clip on the time axis, the end time of the first alternative clip on the time axis may be before or after the end time of the second alternative clip on the time axis and after the start time of the second alternative clip on the time axis.
After the first to-be-used segment is acquired, whether the first to-be-used segment has the set action tag or not can be determined. If the first to-be-used segment has the set action tag, the first to-be-used segment can be considered to still meet the determination requirement of the highlight segment. If the first to-be-used segment does not have the set action tag, the first to-be-used segment can be considered not to meet the determination requirement of the highlight segment, and the first to-be-used segment can be ignored.
The segment pair corresponding to the first to-be-used segment with the action tag may be determined as a to-be-used segment pair.
For each inactive segment pair, it may be determined whether the highlight of the second alternative segment in the current inactive segment pair is above a first highlight threshold, or whether the highlight of the second alternative segment in the current inactive segment pair is one of the top N1 highlights. If the highlight of the second alternative segment in the current standby segment pair is higher than the first highlight threshold, or the highlight of the second alternative segment in the current standby segment pair is one of the top N1 highlights, the highlight of the second alternative segment in the current standby segment pair can be considered to be higher, and the current standby segment pair is a segment pair corresponding to the first standby segment with the action label, and the second alternative segment in the current standby segment pair has the action label, and the second alternative segment in the current segment pair can be determined as the highlight of the target video. Further guarantee that the wonderful section of confirming has the action label, and the wonderful degree is higher. The first wonderful degree threshold value and the N1 can be set and adjusted according to actual conditions.
The current inactive segment pair refers to the inactive segment pair for which the current operation is directed.
In an embodiment of the present application, determining the highlight of the target video according to the time of the second alternative clip and the first alternative clip in each clip pair on the time axis may include the following steps:
the first step is as follows: for each clip pair, if the starting time of the first alternative clip in the current clip pair on the time axis is later than the starting time of the second alternative clip in the current clip pair on the time axis, acquiring a second standby clip in the target video, wherein the starting time of the second standby clip is the starting time of the first alternative clip in the current clip pair on the time axis, and the ending time of the second standby clip is the ending time of the second alternative clip in the current clip pair on the time axis;
the second step is that: and if the highlight of the second alternative segment in the current segment pair is higher than a second highlight threshold, or the highlight of the second alternative segment in the current segment pair is one of the top N2 highlights, determining the second standby segment as the highlight of the target video, wherein N2 is a positive integer.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the present application, a second candidate clip and a corresponding first candidate clip, which have overlapping portions on a time axis, are respectively used as a clip pair, so that a plurality of clip pairs can be obtained, where each clip pair includes a first candidate clip and a second candidate clip.
For each clip pair, it may be determined whether the start time of the first alternative clip in the current clip pair on the timeline is after the start time of the second alternative clip in the current clip pair on the timeline. If the starting time of the first alternative clip in the current clip pair on the time axis is later than the starting time of the second alternative clip in the current clip pair on the time axis, the video clip from the starting time of the second alternative clip in the current clip pair on the time axis to the starting time of the first alternative clip in the current clip pair on the time axis can be considered to have no action tag, the starting time of the first alternative clip in the current clip pair on the time axis can be taken as the starting point, and the ending time of the second alternative clip in the current clip pair on the time axis can be taken as the ending point, so as to obtain the second standby clip of the target video. This ensures that the second inactive segment has an action tag. The current segment pair refers to the segment pair for which the current operation is directed. Because there is a coincidence between the first alternative clip and the second alternative clip in the current clip pair on the time axis, if the start time of the first alternative clip on the time axis is after the start time of the second alternative clip on the time axis, the start time of the first alternative clip on the time axis must be before the end time of the second alternative clip on the time axis, and the end time of the first alternative clip on the time axis may be before or after the end time of the second alternative clip on the time axis.
The highlight of the second candidate segment in the current segment pair may be compared with the highlights of the second candidate segments in the other segment pairs, and if the highlight of the second candidate segment in the current segment pair is higher than a second highlight threshold or the highlight of the second candidate segment in the current segment pair is one of the top N2 highlights, the second standby segment may be determined as the highlight of the target video. Further guarantee that the wonderful fragment of confirming has the action label, and the wonderful degree is higher.
The second wonderful degree threshold value and the N2 can be set and adjusted according to actual conditions. The first and second highlight thresholds may be the same or different, and N1 and N2 may be the same or different.
In an embodiment of the present application, in the case that it is determined that there is no overlapping portion of the first alternative clip and the second alternative clip on the time axis, the method may further include the steps of:
determining the probability that each first candidate segment has an action tag, determining the first candidate segment with the probability greater than a probability threshold value as a highlight segment of the target video, or determining the first candidate segment corresponding to the top N3 probabilities as the highlight segment of the target video, wherein N3 is a positive integer;
alternatively, the first and second liquid crystal display panels may be,
and determining a second alternative segment with the high chroma higher than a third high chroma threshold value as the high chroma segment of the target video, or determining second alternative segments corresponding to the top N4 high highlights as the high chroma segments of the target video, wherein N4 is a positive integer.
In the embodiment of the application, after the first candidate clip and the second candidate clip of the target video are obtained, whether the first candidate clip and the second candidate clip have a superposition part on a time axis is determined. If there is no overlapping portion on the time axis, the highlight of the target video can be determined by the action tag or the highlight.
Specifically, the probability that each first candidate segment has an action tag may be determined, the first candidate segment with the probability greater than the probability threshold is determined as the highlight segment of the target video, or the first candidate segment corresponding to the top N3 probabilities is determined as the highlight segment of the target video. When determining whether each first candidate segment has an action tag, the probability that each first candidate segment is determined to have an action tag may be obtained. For each first candidate segment, the higher the probability that the first candidate segment has an action tag, the more apparent the action in the first candidate segment can be considered. The first alternative segment with the higher probability can be determined as the highlight segment of the target video, so that the determined highlight segment has richer actions.
The second alternative segment with the high chroma higher than the third highlight threshold value can be determined as the highlight segment of the target video, or the second alternative segment corresponding to the top N4 highlights is determined as the highlight segment of the target video, so that the determined highlight segment has high chroma, and the user can be attracted more easily.
In an embodiment of the present application, before determining the highlight segment of the target video based on the first candidate segment and the second candidate segment, the method may further include the steps of:
the method comprises the following steps: determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the difference situation of image change among video frames;
step two: and rejecting second alternative segments with the dynamic rate smaller than the average dynamic rate of the target video.
For convenience of description, the above two steps are combined for illustration.
In one embodiment of the present application, after obtaining the second candidate segments, a dynamic rate for each of the second candidate segments may be determined. The dynamic rate is used for representing the video frame-to-frame image change difference situation. Specifically, the difference value may be determined by an image difference value between video frames. The average dynamic rate of the target video may be determined simultaneously, or before or after. The average dynamic rate of the target video may be an average of the dynamic rates of all second alternative segments of the target video.
And judging the size relation between the dynamic rate of each second alternative segment and the average dynamic rate of the target video. For each second candidate segment, if the dynamic rate of the second candidate segment is higher than the average dynamic rate of the target video, the inter-frame image change difference of the second candidate segment may be considered to be large, and the interest of the user is easier to increase.
Second candidate segments with a dynamic rate smaller than the average dynamic rate of the target video can be eliminated, and then the highlight segments of the target video are determined based on the first candidate segments and the second candidate segments. The dynamic rate of the determined highlight is high.
In one embodiment of the present application, after determining the highlight segments of the target video, the method may further comprise the steps of:
the first step is as follows: detecting whether a target object exists in the first second duration of the wonderful segment, and if the target object does not exist, executing the second step;
the second step is that: searching the time of the target object from the starting time of the highlight on the time axis of the target video;
the third step: the start time of the highlight is updated to the time when the target object exists.
For convenience of description, the above three steps are combined for illustration.
In the embodiment of the present application, after determining the highlight of the target video, whether the target object exists in the first second duration of the highlight may be detected. The target object may be a primary character and/or a secondary character in the target video. Detection can be performed according to a people-recognition index algorithm, such as detecting whether a cartoon character exists or not. The second time period can be set and adjusted according to actual conditions, such as 5 seconds.
It can be understood that when a user watches a video, a target object such as a character in the video is more likely to arouse the interest of the user. If the target object exists in the second duration before the highlight segment, the subsequent processing can be omitted. If the target object does not exist in the second duration before the highlight, the time when the target object exists can be searched backwards from the starting time of the highlight on the time axis of the target video, and then the starting time of the highlight is updated to the latest searched time point when the target object exists. Therefore, the target object appears at the beginning stage of the adjusted highlight, and the watching probability of the user is further improved.
In one embodiment of the present application, after determining the highlight segments of the target video, the method may further comprise the steps of:
the method comprises the following steps: determining whether the duration of the highlight meets a first duration requirement;
step two: and if the duration of the highlight does not meet the first duration requirement, performing interception operation or supplement operation on the highlight based on the first duration requirement.
For convenience of description, the above three steps are combined for illustration.
In the embodiment of the application, the duration of the highlight is limited, because if the duration of the highlight is too short, the highlight content presented for the user is too little, and the interest of the user in watching the target video cannot be effectively improved, and if the duration of the highlight is too long, the highlight content presented for the user is too much, and the probability of the user in watching the target video is easily reduced.
Therefore, in the embodiment of the present application, after determining the highlight of the target video, it may also be determined whether the duration of the highlight meets the first duration requirement. The first time period requirement may be predetermined, such as a requirement time period in the range of 30 seconds to 2 minutes.
If the duration of the highlight does not meet the first duration requirement, if the duration is greater than the maximum of the first duration requirement, the highlight can be intercepted, so that the duration of the intercepted highlight is less than or equal to the maximum of the first duration requirement, and the first duration requirement is met. For example, the highlight clip can be cut back for 1 minute from the start time of the highlight clip on the time axis, and the cut video clip is used as the highlight clip. If the duration of the highlight is smaller than the minimum value of the first duration requirement, the highlight can be supplemented, so that the duration of the supplemented highlight is larger than or equal to the minimum value of the first duration requirement, and the first duration requirement is met. If a video clip can be cut back from the termination time of the highlight clip on the time axis and added into the highlight clip, the added video clip is taken as the highlight clip.
According to the time length of the highlight and the first time length requirement, the highlight is adjusted, so that the time length of the highlight is not too long or too short, and the probability of watching the target video by the user can be effectively improved.
In one embodiment of the present application, the first alternative segment includes one or more first video segments of a first set of video segments of the target video, and the first set of video segments may be obtained by:
the first step is as follows: segmenting a target video according to shots to obtain a plurality of shot segments;
the second step is that: and merging the lens segments according to the second time length requirement to obtain a first video segment set, wherein the time length of each first video segment in the first video segment set meets the second time length requirement.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the application, the target video can be segmented according to the shots, and a plurality of shot segments are obtained. Specifically, the segmentation may be performed by using algorithms such as image similarity, histogram statistics, and picture variation segmentation (KTS).
However, since there is often a problem that the motion is discontinuous and it is difficult to determine the motion tag when the shot is rapidly switched, in the embodiment of the present application, after obtaining a plurality of shot segments, the shot segments may be merged according to the second duration requirement to obtain the first video segment set. For example, the second duration is required to have a required duration of 5 seconds to 6 seconds, and if the duration of the obtained plurality of shot segments is [1s,3s,2s,5.1s,4s,1s ], the first set of video segments of [6s,5.1s,5s ] can be obtained after merging.
The duration of each first video clip in the first set of video clips satisfies the second duration requirement. This facilitates determination of the action tag.
In one embodiment of the present application, the second alternative segment includes one or more second video segments of a second set of video segments of the target video, and the highlight of each second video segment of the second set of video segments may be determined by:
the method comprises the following steps: obtaining first viewing data of a target video;
step two: determining the wonderness of each second video clip in the second video clip set according to the first watching data.
For convenience of description, the above three steps are combined for illustration.
In the embodiment of the application, for the target video, if there are more user viewing behaviors, such as playing, pausing, playback, fast forwarding, and sending a bullet screen, etc., the wonderness of each second video segment can be determined through the viewing data.
First viewing data of the target video, which is data including the above viewing behavior, may be obtained first. Based on the first viewing data, a highlight for each of the set of second video segments can be determined.
Specifically, the number of viewers of each second video clip in the second video clip set may be determined according to the first viewing data, and the wonderness of each second video clip may be determined according to the number of viewers of each second video clip. If the high-chroma can be determined according to the preset corresponding relation between the number of the viewers and the high-chroma.
The wonderness of each second video segment can be quickly determined according to the viewing data.
Furthermore, when the second candidate segment is determined according to the highlight of each second video segment in the second video segment set, the highlight threshold value can be set according to the number of viewers. For example, the wonderness corresponding to the average value of the number of people watching the second video segment in the second video segment set is determined as a wonderness threshold value, the second candidate segment determined in this way is the second video segment with the number of people watching the second video segment set larger than the average value of the number of people watching the second video segment, or the wonderness corresponding to the median number of the number of people watching the second video segment in the second video segment set is determined as a wonderness threshold value, and the second candidate segment determined in this way is the second video segment with the number of people watching the second video segment set larger than the median number of people watching the second video segment.
In one embodiment of the present application, the second video segment of the second set of video segments does not include video segments at the beginning and end of the slice.
It will be appreciated that in general, the leader and trailer of a video are viewed more frequently, but it does not make much sense to consider the leader or trailer as a highlight. Therefore, before determining the highlight of each second video clip in the second set of video clips according to the first viewing data, the second video clips at the head and the end of the title may be removed from the second set of video clips, i.e. the second video clips in the second set of video clips do not include the video clips at the head and the end of the title. Therefore, the situation that the finally determined highlight is the leader or the trailer of the target video can be effectively avoided, and the highlight determination accuracy is improved.
In one embodiment of the present application, the second alternative segment includes one or more second video segments of a second set of video segments of the target video, and the highlight of each second video segment of the second set of video segments may be determined by:
the first step is as follows: aiming at each second video clip in the second video clip set, inputting the current second video clip into a classification model obtained by pre-training to obtain the probability that the current second video clip output by the classification model is a highlight case;
the second step is that: and determining the wonderness of the current second video clip according to the probability that the current second video clip is wonderful.
For convenience of description, the above two steps are combined for illustration.
In the case that the target video is temporarily not on-line or the on-line time is short, the viewing data of the target video is less. The highlight of the second video segment is more difficult to determine by the user's viewing behavior. Therefore, the embodiment of the application provides another mode for determining the wonderness.
For each second video clip in the second video clip set, the current second video clip can be input into a classification model obtained by pre-training, so as to obtain the probability that the current second video clip output by the classification model is a highlight case. The classification model may be obtained by pre-training, which may output as input a probability that the second video segment is a highlight. The higher the probability that the current second video segment is a highlight, the more highlight the current second video segment is. The current second video segment is the second video segment for which the current operation is directed.
The highlight of the current second video segment may be determined according to a probability that the current second video segment is a highlight. Specifically, the corresponding relationship between the probability that the second video segment is the highlight and the highlight can be preset, and the highlight of the corresponding second video segment can be determined according to the corresponding relationship and the probability that the second video segment output by the classification model is the highlight.
The wonderness of each second video segment can be accurately determined through the classification model, and basic guarantee is provided for determination of subsequent wonderness segments.
In one embodiment of the present application, the classification model may be obtained by pre-training by:
the method comprises the following steps: obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a wonderful positive example and a negative example;
step two: training a pre-constructed initial model by using a training sample set, and terminating training until a set training termination condition is reached to obtain a classification model;
the method for training the pre-constructed initial model by using the training sample set comprises the following steps:
inputting the current sample pair into a pre-constructed initial model for each sample pair;
determining model loss according to the current sample pair and the output result of the current sample pair output by the initial model;
and adjusting parameters of the initial model according to the model loss.
In the embodiment of the present application, a training sample set may be obtained in advance, where the training sample set includes a plurality of sample pairs, and each sample pair includes a highlight positive example and a negative example. Specifically, a plurality of history videos and a highlight of each history video can be obtained in advance, for each history video, the highlight of each history video can be extracted from the history video as a highlight example, a section of set time length is randomly extracted from other parts of the history video except the highlight example as a negative example, the set time length can be 5 seconds or 10 seconds, the highlight example and the negative example can form a sample pair, and a plurality of sample pairs form a training sample set.
And training the pre-constructed initial model by using the training sample set, and terminating the training until a set training termination condition is reached to obtain a classification model. The initial model may be a Timesformer model, with model parameters of the initial model having initial states.
The training termination condition may be that the training frequency reaches a preset frequency threshold, or may be that the model precision reaches a set precision threshold.
In model training, for each sample pair, the current sample pair may be input into a pre-constructed initial model, which may include a backbone network and a classification network. The characteristics of the highlight positive examples and the highlight negative examples in the current sample pair can be extracted through a Transformer serving as an initial model backbone network and then input into the classification network. And finally, obtaining an output result of the current sample pair output by the initial model.
The model loss may be determined based on the current sample pair and the output of the current sample pair from the initial model. Model penalties may include cross-entropy classification penalties and ordering penalties, i.e., ranking penalties. Specifically, the model penalty may be the sum of the cross-entropy classification penalty and the ordering penalty, or a weighted sum.
Specifically, for the positive examples of the highlights in the current sample pair, the output result of the positive examples or the negative examples of the highlights output by the initial model may be compared with the real information of the positive examples of the highlights, and for the negative examples in the current sample pair, the output result of the positive examples or the negative examples of the negatives output by the initial model may be compared with the real information of the negatives, so as to obtain the cross entropy classification loss.
And aiming at the positive examples and the negative examples of the highlights in the current sample pair, obtaining the scores of the positive examples and the negative examples output by the initial model, and determining the ranking loss according to the scores, wherein the scores can correspond to the probability.
The rank penalty L can be determined, for example, according to the following formula p (s + ,s - ):
L p (s + ,s - )=max(0,1-h(s + )+h(s - )) p
Wherein s is + Represents a wonderful example, s - Denotes the negative case, h(s) + ) Score, h(s), representing the highlights of the initial model output - ) Representing the score of the negative example of the initial model output, p represents the sample pair, and the rank penalty is such that the score of the positive example of the highlight is greater than the score of the negative example, preferably close to 1.
After the model loss is determined, parameters of the initial model can be adjusted according to the model loss, and the purpose of reducing the model loss is achieved, so that the model loss is converged continuously, and the accuracy of the initial model is improved continuously.
And a classification model is obtained by pre-training, so that the subsequent determination of the wonderness is facilitated.
In one embodiment of the present application, the method may further comprise the steps of:
obtaining second watching data of the target video;
determining the highlight of the current second video segment according to the probability that the current second video segment is the highlight example may include the following steps:
and determining the wonderness of the current second video clip according to the second watching data and the probability that the current second video clip is wonderful.
In the embodiment of the present application, the second viewing data of the target video may be obtained.
And aiming at each second video clip in the second video clip set, inputting the current second video clip into a classification model obtained by pre-training, and after the probability that the current second video clip output by the classification model is the highlight case is obtained, determining the highlight degree of the current second video clip according to the second watching data and the probability that the current second video clip is the highlight case. Specifically, a first reference fine-saturation of the current second video segment may be determined according to the second viewing data, a second reference fine-saturation of the current second video segment may be determined according to a probability that the current second video segment is a highlight positive example, and an average value of the first reference fine-saturation and the second reference fine-saturation may be used as the fine-saturation of the current second video segment. The method combines the watching behavior of the user with the detection of the classification model to determine the wonderness of the second video segment, so that the determined wonderness is more accurate, and a basic guarantee is provided for the subsequent determination of the wonderness.
As shown in fig. 2, a specific process for video highlight determination:
firstly, a first video clip set and a second video clip set of a target video are obtained;
then determining whether each first video clip in the first video clip set has a set action tag, and determining the wonderness of each second video clip in the second video clip set;
determining a first alternative segment according to a first video segment with a set action tag in the first video segment set, and determining a second alternative segment according to the wonderness of each second video segment in the second video segment set;
then determining a highlight segment of the target video based on the first alternative segment and the second alternative segment;
and finally, adjusting the determined highlight, detecting whether a target object exists in the second time length before the highlight, if the target object does not exist, searching the time for the existence of the target object from the starting time of the highlight on the time axis backwards, updating the starting time of the highlight into the time for the existence of the target object, and determining whether the time length of the highlight meets the first time length requirement, if the time length of the highlight is greater than the maximum value of the first time length requirement, intercepting the highlight so that the intercepted time length of the highlight is less than or equal to the maximum value of the first time length requirement, and if the time length of the highlight is less than the minimum value of the first time length requirement, performing a supplementing operation on the highlight so that the supplemented time length of the highlight is greater than or equal to the maximum value of the first time length requirement.
After the second candidate segments are determined, before the highlight segments of the target video are determined based on the first candidate segments and the second candidate segments, the dynamic rate of each second candidate segment can be further determined, and the second candidate segments with the dynamic rate smaller than the average dynamic rate of the target video are removed.
The highlight segment determined by the embodiment of the application is combined with the characteristics of the action label and the highlight, and the determined highlight segment is adjusted, so that the determined highlight segment is more beneficial to improving the watching interest of the user, and the watching probability of the user to the target video can be improved.
It should be noted that the time referred to in the present embodiment refers to the time on the time axis of the target video.
Corresponding to the above method embodiments, the present application further provides a video highlight determining apparatus, and the video highlight determining apparatus described below and the video highlight determining method described above may be referred to in correspondence with each other.
Referring to fig. 3, the video highlight determination apparatus 300 may include the following modules:
an obtaining module 310, configured to obtain a first alternative segment and a second alternative segment of a target video, where the first alternative segment is a video segment with a set action tag, and the second alternative segment is a video segment with a highlight meeting a highlight condition;
a determining module 320 for determining a highlight segment of the target video based on the first candidate segment and the second candidate segment.
By applying the device provided by the embodiment of the application, after the first alternative segment and the second alternative segment of the target video are obtained, the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The first alternative segment is a video segment with a set action label, the second alternative segment is a video segment with the wonderness meeting the wonderness condition, so that the wonderness segment determined based on the first alternative segment and the second alternative segment combines the set action label and the wonderness characteristic, the determined wonderness segment is more accurate, the determined wonderness segment is used for video distribution or information delivery, the user can be helped to locate the interested video, the user click rate of the video is improved, and the watching probability of the user on the video is improved.
In a specific embodiment of the present application, the determining module 320 is configured to:
determining whether the first alternative clip and the second alternative clip have a superposition part on the time axis of the target video;
and if the overlapped part exists, determining the highlight of the target video according to the first alternative clip and the second alternative clip with the overlapped part on the time axis.
In one embodiment of the present application, the determining module 320 is configured to:
respectively taking the second alternative clips with overlapped parts on the time axis and the corresponding first alternative clips as a clip pair;
and determining the highlight of the target video according to the time of the second alternative clip and the first alternative clip in each clip pair on the time axis.
In a specific embodiment of the present application, the determining module 320 is configured to:
for each clip pair, if the starting time of a first alternative clip in the current clip pair on a time axis is before the starting time of a second alternative clip in the current clip pair on the time axis, acquiring a first clip to be used in the target video, wherein the starting time of the first clip to be used is the starting time of the second alternative clip in the current clip pair on the time axis, and the duration of the first clip to be used is a first duration;
determining whether the first to-be-used segment has an action tag;
determining a segment pair corresponding to a first to-be-used segment with an action label as a to-be-used segment pair;
and for each standby segment pair, if the chroma of the second alternative segment in the current standby segment pair is higher than a first chroma threshold value, or the chroma of the second alternative segment in the current standby segment pair is one of the top N1 highlights, determining the second alternative segment in the current segment pair as the highlight segment of the target video, wherein N1 is a positive integer.
In a specific embodiment of the present application, the determining module 320 is configured to:
for each clip pair, if the starting time of the first alternative clip in the current clip pair on the time axis is later than the starting time of the second alternative clip in the current clip pair on the time axis, acquiring a second standby clip in the target video, wherein the starting time of the second standby clip is the starting time of the first alternative clip in the current clip pair on the time axis, and the ending time of the second standby clip is the ending time of the second alternative clip in the current clip pair on the time axis;
and if the highlight of the second alternative segment in the current segment pair is higher than a second highlight threshold, or the highlight of the second alternative segment in the current segment pair is one of the top N2 highlights, determining the second standby segment as the highlight of the target video, wherein N2 is a positive integer.
In a specific embodiment of the present application, the determining module 320 is further configured to:
under the condition that the first alternative clips and the second alternative clips are determined not to have overlapped parts on a time axis, determining the probability that each first alternative clip has an action label, determining the first alternative clip with the probability greater than a probability threshold value as a highlight clip of the target video, or determining the first alternative clip corresponding to the top N3 probabilities as the highlight clip of the target video, wherein N3 is a positive integer;
alternatively, the first and second electrodes may be,
and determining a second alternative segment with the high chroma higher than a third high chroma threshold value as the high chroma segment of the target video, or determining second alternative segments corresponding to the top N4 high highlights as the high chroma segments of the target video, wherein N4 is a positive integer.
In a specific embodiment of the present application, the system further includes a rejection module, configured to:
before the highlight segments of the target video are determined on the basis of the first alternative segments and the second alternative segments, determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the difference situation of image change among video frames;
and rejecting second alternative segments with the dynamic rate smaller than the average dynamic rate of the target video.
In a specific embodiment of the present application, the apparatus further includes a first adjusting module, configured to:
after determining the highlight of the target video, detecting whether a target object exists in the first second duration of the highlight;
if the target object does not exist, searching the time when the target object exists backwards from the starting time of the highlight on the time axis of the target video;
the start time of the highlight is updated to the time when the target object exists.
In a specific embodiment of the present application, the apparatus further includes a second adjusting module, configured to:
after determining the highlight of the target video, determining whether the duration of the highlight meets a first duration requirement;
and if the duration of the highlight does not meet the first duration requirement, performing interception operation or supplement operation on the highlight based on the first duration requirement.
In a specific embodiment of the present application, the first alternative segment includes one or more first video segments in a first video segment set of the target video, and the obtaining module 310 is configured to obtain the first video segment set by:
segmenting a target video according to shots to obtain a plurality of shot segments;
and merging the lens segments according to the second duration requirement to obtain a first video segment set, wherein the duration of each first video segment in the first video segment set meets the second duration requirement.
In a specific embodiment of the present application, the second alternative segment includes one or more second video segments of a second video segment set of the target video, and the obtaining module 310 is configured to determine the highlight of each second video segment of the second video segment set by:
aiming at each second video clip in the second video clip set, inputting the current second video clip into a classification model obtained by pre-training to obtain the probability that the current second video clip output by the classification model is a highlight case;
and determining the wonderness of the current second video clip according to the probability that the current second video clip is a wonderful example.
In one embodiment of the present application, the second determining module 330 is further configured to obtain the classification model through pre-training by:
obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a highlight positive example and a negative example;
training a pre-constructed initial model by using a training sample set, and terminating training until a set training termination condition is reached to obtain a classification model;
the method for training the pre-constructed initial model by using the training sample set comprises the following steps:
inputting the current sample pair into a pre-constructed initial model for each sample pair;
determining model loss according to the current sample pair and the output result of the current sample pair output by the initial model;
and adjusting parameters of the initial model according to the model loss.
In one embodiment of the present application, the model penalties include cross-entropy classification penalties and ordering penalties.
In a specific embodiment of the present application, the apparatus further includes a fourth determining module, configured to:
obtaining second watching data of the target video;
an obtaining module 310 configured to:
and determining the wonderness of the current second video clip according to the second watching data and the probability that the current second video clip is wonderful.
With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
The embodiment of the present application further provides an electronic device, as shown in fig. 4, which includes a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication through the communication bus 404,
a memory 403 for storing a computer program;
the processor 401, when executing the program stored in the memory 403, implements the following steps:
obtaining a first alternative segment and a second alternative segment of a target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with the wonderness meeting the wonderness condition;
based on the first candidate segment and the second candidate segment, a highlight segment of the target video is determined.
The communication bus 404 mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 404 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface 402 is used for communication between the above-described terminal and other devices.
The Memory 403 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 403 may also be at least one memory device located remotely from the aforementioned processor.
The Processor 401 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.
In yet another embodiment provided by the present application, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the steps of the video highlight determination method of any of the above embodiments.
In a further embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the video highlight determination method of any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (16)

1. A method for video highlight determination, comprising:
obtaining a first alternative segment and a second alternative segment of a target video, wherein the first alternative segment is a video segment with a set action tag, and the second alternative segment is a video segment with the wonderness meeting a wonderness condition;
determining a highlight segment of the target video based on the first candidate segment and the second candidate segment.
2. The method of claim 1, wherein the determining the highlight of the target video based on the first candidate segment and the second candidate segment comprises:
determining whether the first alternative clip and the second alternative clip have a superposition part on a time axis of the target video;
and if the overlapped part exists, determining the highlight of the target video according to the first alternative clip and the second alternative clip with the overlapped part on the time axis.
3. The method for determining video highlights according to claim 2, wherein said determining highlights of the target video according to the first and second alternative highlights with overlapping sections on the time axis comprises:
respectively taking the second alternative clips with overlapped parts on the time axis and the corresponding first alternative clips as a clip pair;
and determining the highlight clips of the target video according to the time of the second alternative clips and the first alternative clips in each clip pair on the time axis.
4. The method according to claim 3, wherein determining the highlight of the target video according to the time of the second candidate clip and the first candidate clip in each clip pair on the time axis comprises:
for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is before the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a first segment to be used in the target video, where the starting time of the first segment to be used is the starting time of the second alternative segment in the current segment pair on the time axis, and the duration of the first segment to be used is a first duration;
determining whether the first to-be-used segment has the action tag;
determining the segment pair corresponding to the first standby segment with the action tag as a standby segment pair;
for each standby segment pair, if the chroma of the second alternative segment in the current standby segment pair is higher than a first chroma threshold value, or the chroma of the second alternative segment in the current standby segment pair is one of the top N1 highlights, determining the second alternative segment in the current segment pair as the highlight segment of the target video, wherein N1 is a positive integer.
5. The method according to claim 3, wherein determining the highlight of the target video according to the time of the second candidate clip and the first candidate clip in each clip pair on the time axis comprises:
for each of the clip pairs, if a start time of the first alternative clip in the current clip pair on the timeline is later than a start time of the second alternative clip in the current clip pair on the timeline, acquiring a second standby clip in the target video, the start time of the second standby clip being a start time of the first alternative clip in the current clip pair on the timeline, and an end time of the second standby clip being an end time of the second alternative clip in the current clip pair on the timeline;
and if the highlight of the second alternative segment in the current segment pair is higher than a second highlight threshold, or the highlight of the second alternative segment in the current segment pair is one of the top N2 highlights, determining the second standby segment as the highlight of the target video, wherein N2 is a positive integer.
6. The method of claim 2, wherein in the case where it is determined that there is no overlapping portion of the first candidate clip and the second candidate clip on the time axis, further comprising:
determining the probability that each first candidate segment has the action tag, determining the first candidate segment with the probability greater than a probability threshold value as a highlight segment of the target video, or determining the first candidate segment corresponding to the top N3 probabilities as the highlight segment of the target video, wherein N3 is a positive integer;
alternatively, the first and second electrodes may be,
and determining the second candidate segment with the high chroma higher than a third highlight threshold value as the highlight segment of the target video, or determining the second candidate segment corresponding to the top N4 highlights as the highlight segment of the target video, wherein N4 is a positive integer.
7. The method of claim 1, wherein prior to said determining a highlight segment of said target video based on said first candidate segment and said second candidate segment, further comprising:
determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the image change difference situation between video frames;
rejecting the second candidate segments with a dynamic rate less than the average dynamic rate of the target video.
8. The method of claim 1, further comprising, after said determining a highlight of the target video:
detecting whether a target object exists in the first second duration of the highlight;
if the target object does not exist, searching the time when the target object exists backwards from the starting time of the highlight on the time axis of the target video;
updating the starting time of the highlight segment to the time when the target object exists.
9. The method of claim 1, further comprising, after said determining a highlight segment of said target video:
determining whether the duration of the highlight meets a first duration requirement;
and if the duration of the highlight does not meet the first duration requirement, performing interception operation or supplement operation on the highlight based on the first duration requirement.
10. The method of claim 1, wherein the first candidate segment comprises one or more first video segments of a first set of video segments of the target video, the first set of video segments being obtained by:
the target video is segmented according to shots, and a plurality of shot segments are obtained;
and combining the shot segments according to a second duration requirement to obtain the first video segment set, wherein the duration of each first video segment in the first video segment set meets the second duration requirement.
11. The method of any of claims 1 to 10, wherein the second candidate segment comprises one or more second video segments of a second set of video segments of the target video, the sharpness of each of the second video segments of the second set of video segments being determined by:
for each second video clip in the second video clip set, inputting the current second video clip into a classification model obtained by pre-training, and obtaining the probability that the current second video clip output by the classification model is a highlight case;
and determining the wonderness of the current second video clip according to the probability that the current second video clip is a wonderful example.
12. The method of claim 11, wherein the classification model is obtained by pre-training by:
obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a highlight positive example and a negative example;
training a pre-constructed initial model by using the training sample set, and terminating training until a set training termination condition is reached to obtain the classification model;
wherein, the training of the pre-constructed initial model by using the training sample set comprises:
for each sample pair, inputting the current sample pair into a pre-constructed initial model;
determining model loss according to the current sample pair and an output result of the current sample pair output by the initial model;
and adjusting parameters of the initial model according to the model loss.
13. The method of video highlight determination according to claim 11, further comprising:
obtaining second viewing data of the target video;
the determining the wonderness of the current second video clip according to the probability that the current second video clip is wonderful comprises:
and determining the wonderness of the current second video clip according to the second viewing data and the probability that the current second video clip is wonderful.
14. A video highlight determination apparatus, comprising:
the obtaining module is used for obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action tag, and the second alternative segment is a video segment with the wonderness meeting the wonderness condition;
a determining module for determining a highlight segment of the target video based on the first candidate segment and the second candidate segment.
15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the video highlight determination method according to any one of claims 1 to 13 when executing a program stored on a memory.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for video highlight determination according to any of claims 1 to 13.
CN202211054859.1A 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium Active CN115412765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211054859.1A CN115412765B (en) 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211054859.1A CN115412765B (en) 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115412765A true CN115412765A (en) 2022-11-29
CN115412765B CN115412765B (en) 2024-03-26

Family

ID=84163759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211054859.1A Active CN115412765B (en) 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115412765B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102427507A (en) * 2011-09-30 2012-04-25 北京航空航天大学 Football video highlight automatic synthesis method based on event model
CN104994425A (en) * 2015-06-30 2015-10-21 北京奇艺世纪科技有限公司 Video labeling method and device
CN109977735A (en) * 2017-12-28 2019-07-05 优酷网络技术(北京)有限公司 Move the extracting method and device of wonderful
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
US20200213686A1 (en) * 2017-09-21 2020-07-02 Tencent Technology (Shenzhen) Company Limited Video highlight determination method and apparatus, storage medium, and electronic device
CN111669656A (en) * 2020-06-19 2020-09-15 北京奇艺世纪科技有限公司 Method and device for determining wonderful degree of video clip
US20200322647A1 (en) * 2019-04-02 2020-10-08 Wangsu Science & Technology Co., Ltd. Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection
CN112511854A (en) * 2020-11-27 2021-03-16 刘亚虹 Live video highlight generation method, device, medium and equipment
US20210201038A1 (en) * 2019-12-30 2021-07-01 Alibaba Group Holding Limited Method and apparatus for video processing
CN113194359A (en) * 2021-04-27 2021-07-30 武汉星巡智能科技有限公司 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN113365147A (en) * 2021-08-11 2021-09-07 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point
WO2021184852A1 (en) * 2020-03-16 2021-09-23 平安科技(深圳)有限公司 Action region extraction method, device and apparatus, and computer-readable storage medium
US20220078530A1 (en) * 2019-12-17 2022-03-10 Tencent Technology (Shenzhen) Company Limited Video labeling method and apparatus, device, and computer-readable storage medium
CN114329072A (en) * 2021-12-23 2022-04-12 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium
CN114845149A (en) * 2021-02-01 2022-08-02 腾讯科技(北京)有限公司 Editing method of video clip, video recommendation method, device, equipment and medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102427507A (en) * 2011-09-30 2012-04-25 北京航空航天大学 Football video highlight automatic synthesis method based on event model
CN104994425A (en) * 2015-06-30 2015-10-21 北京奇艺世纪科技有限公司 Video labeling method and device
US20200213686A1 (en) * 2017-09-21 2020-07-02 Tencent Technology (Shenzhen) Company Limited Video highlight determination method and apparatus, storage medium, and electronic device
CN109977735A (en) * 2017-12-28 2019-07-05 优酷网络技术(北京)有限公司 Move the extracting method and device of wonderful
US20200322647A1 (en) * 2019-04-02 2020-10-08 Wangsu Science & Technology Co., Ltd. Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
US20220078530A1 (en) * 2019-12-17 2022-03-10 Tencent Technology (Shenzhen) Company Limited Video labeling method and apparatus, device, and computer-readable storage medium
US20210201038A1 (en) * 2019-12-30 2021-07-01 Alibaba Group Holding Limited Method and apparatus for video processing
WO2021184852A1 (en) * 2020-03-16 2021-09-23 平安科技(深圳)有限公司 Action region extraction method, device and apparatus, and computer-readable storage medium
CN111669656A (en) * 2020-06-19 2020-09-15 北京奇艺世纪科技有限公司 Method and device for determining wonderful degree of video clip
CN112511854A (en) * 2020-11-27 2021-03-16 刘亚虹 Live video highlight generation method, device, medium and equipment
CN114845149A (en) * 2021-02-01 2022-08-02 腾讯科技(北京)有限公司 Editing method of video clip, video recommendation method, device, equipment and medium
CN113194359A (en) * 2021-04-27 2021-07-30 武汉星巡智能科技有限公司 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN113365147A (en) * 2021-08-11 2021-09-07 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point
CN114329072A (en) * 2021-12-23 2022-04-12 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115412765B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN111460219B (en) Video processing method and device and short video platform
CN112929744B (en) Method, apparatus, device, medium and program product for segmenting video clips
KR102068790B1 (en) Estimating and displaying social interest in time-based media
US11914639B2 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
CN113613065B (en) Video editing method and device, electronic equipment and storage medium
CN111861572B (en) Advertisement putting method and device, electronic equipment and computer readable storage medium
CN110475154B (en) Network television video playing method and device, network television and computer medium
CN111258435B (en) Comment method and device for multimedia resources, electronic equipment and storage medium
CN111279709B (en) Providing video recommendations
CN110287375B (en) Method and device for determining video tag and server
CN111314732A (en) Method for determining video label, server and storage medium
CN112511854A (en) Live video highlight generation method, device, medium and equipment
CN112291634B (en) Video processing method and device
WO2022087826A1 (en) Video processing method and apparatus, mobile device, and readable storage medium
TWI725375B (en) Data search method and data search system thereof
CN113259754B (en) Video generation method, device, electronic equipment and storage medium
CN113423014A (en) Playing information pushing method and device, terminal equipment and storage medium
Tsao et al. Thumbnail image selection for VOD services
CN115412765A (en) Video highlight determining method and device, electronic equipment and storage medium
CN113573097A (en) Video recommendation method and device, server and storage medium
CN108882024B (en) Video playing method and device and electronic equipment
CN114302231B (en) Video processing method and device, electronic equipment and storage medium
CN115080792A (en) Video association method and device, electronic equipment and storage medium
CN115190357A (en) Video abstract generation method and device
CN112770176B (en) Video frame determination method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant