WO2022198934A1

WO2022198934A1 - Method and apparatus for generating video synchronized to beat of music

Info

Publication number: WO2022198934A1
Application number: PCT/CN2021/117668
Authority: WO
Inventors: 汪谷
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2021-03-24
Filing date: 2021-09-10
Publication date: 2022-09-29
Also published as: CN113099297B; CN113099297A

Abstract

The present disclosure relates to a method and apparatus for generating a video synchronized to the beat of a music, an electronic device, and a storage medium, and relates to the technical field of video processing. The method comprises: acquiring candidate frame pictures in a multimedia resource and position information corresponding to the candidate frame pictures; determining, on the basis of the candidate frame pictures and of the position information corresponding to the candidate frame pictures, a matching target music and target frame pictures; performing image processing with respect to the target frame pictures to produce images synchronized to the beat of the music in the target frame pictures; and generating, on the basis of the multimedia resource, of the target music, and of the images synchronized to the beat of the music, a target video synchronized to the beat of the music.

Description

卡点视频的生成方法及装置Method and device for generating video clips

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请基于申请号为202110314586.9、申请日为2021年03月24日的中国专利申请提出，并要求该中国专利申请的优先权，该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number of 202110314586.9 and the filing date of March 24, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

技术领域technical field

本公开涉及视频处理技术，尤其涉及一种卡点视频的生成方法、装置、电子设备及存储无介质。The present disclosure relates to video processing technology, and in particular, to a method, device, electronic device and storage medium for generating a video of a stuck point.

背景技术Background technique

随着互联网技术的快速发展，短视频的应用越来越广泛。而卡点视频的出现更是受到了越来越多人的喜爱，卡点视频是指在特定的时间点添加有特效画面的视频，具有良好的播放效果。With the rapid development of Internet technology, the application of short video is more and more extensive. The appearance of stuck video has been loved by more and more people. A stuck video refers to a video with special effects added at a specific time point, which has a good playback effect.

相关技术中，制作卡点视频一般通过较为高级的剪辑软件打开原视频，然后手动查找目标帧，并对目标帧进行抠图，进一步基于音乐模板将抠图制作成动画。In the related art, for making a stuck video, the original video is generally opened through a relatively advanced editing software, and then the target frame is manually searched, the target frame is cut out, and the cutout is further made into an animation based on a music template.

发明内容SUMMARY OF THE INVENTION

本公开提供一种卡点视频的生成方法、装置、电子设备及存储无介质。The present disclosure provides a method, an apparatus, an electronic device, and a storage-free medium for generating a stuck video.

根据本公开实施例的第一方面，提供一种卡点视频的生成方法，包括：According to a first aspect of the embodiments of the present disclosure, a method for generating a stuck video is provided, including:

获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；Obtain candidate frame pictures in the multimedia resource and position information corresponding to the candidate frame pictures;

基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；Determine matching target music and target frame images based on the candidate frame images and the position information corresponding to the candidate frame images;

对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；Image processing is performed on the target frame picture to obtain a card point image in the target frame picture;

根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The target jam video is generated according to the multimedia resource, the target music and the jam image.

在其中一个实施例中，所述多媒体资源包括若干帧画面；所述获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息，包括：获取所述多媒体资源中的若干帧画面以及每一个帧画面位于所述多媒体资源中的位置信息；从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面。In one embodiment, the multimedia resource includes several frames; the acquiring candidate frames in the multimedia resource and position information corresponding to the candidate frames includes: acquiring several frames in the multimedia resource and The position information of each frame picture in the multimedia resource; the frame picture that satisfies the preset condition is selected from the several frame pictures as candidate frame pictures.

在其中一个实施例中，所述预设条件为基于图像质量预设的质量分值；所述从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面，包括：基于每一个帧画面的图像质量，获取每一个帧画面对应的质量分值；获取所述质量分值满足预设的质量分值的帧画面作为候选帧画面。In one embodiment, the preset condition is a preset quality score based on image quality; the selecting a frame image that satisfies the preset condition from the several frame images as a candidate frame image includes: based on each frame image For the image quality of one frame picture, the quality score corresponding to each frame picture is obtained; the frame picture whose quality score meets the preset quality score value is obtained as a candidate frame picture.

在其中一个实施例中，所述基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面，包括：获取所述多媒体资源的总时长，在卡点音乐库中筛选出与所述多媒体资源的总时长匹配的候选音乐，所述候选音乐中具有若干个卡点，所述卡点具有卡点时间点；基于所述卡点时间点确定匹配的目标音乐以及目标帧画面。In one embodiment, the determining the matching target music and the target frame based on the candidate frame picture and the position information corresponding to the candidate frame picture includes: acquiring the total duration of the multimedia resources, and clicking the music on the card The candidate music that matches the total duration of the multimedia resources is screened out from the library, and the candidate music has several card points, and the card points have card point time points; based on the card point time points, the matching target music is determined and the target frame picture.

在其中一个实施例中，所述多媒体资源为包括若干帧画面的视频，所述候选帧画面对应的位置信息为所述候选帧画面位于所述视频中的时间点；所述基于所述卡点时间点确定匹配的目标音乐以及目标帧画面，包括：基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述视频中的时间点的匹配，确定每个候选音乐的卡点个数；获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。In one embodiment, the multimedia resource is a video including several frames, and the position information corresponding to the candidate frame is a time point when the candidate frame is located in the video; Determining the matched target music and target frame at a time point includes: determining the card of each candidate music based on the matching between the time point of the candidate music and the time point at which each candidate frame is located in the video. The number of points; the candidate music with the largest number of matching card points is obtained as the target music, and the candidate frame picture that matches the card point time point in the target music is obtained as the target frame picture.

在其中一个实施例中，所述多媒体资源为包括若干帧画面的图像集；所述获取所述多媒体资源的总时长，包括：获取所述图像集中帧画面的个数；基于预设的每一个所述帧画面的展示时长以及所述图像集中帧画面的个数，计算得到所述图像集的展示总时长，将所述图像集的展示总时长确定为所述多媒体资源的总时长。In one embodiment, the multimedia resource is an image set including several frames; the obtaining the total duration of the multimedia resource includes: obtaining the number of frames in the image set; The display duration of the frame picture and the number of frame pictures in the image set are calculated to obtain the total display duration of the image set, and the total display duration of the image set is determined as the total duration of the multimedia resource.

在其中一个实施例中，所述候选帧画面对应的位置信息为所述候选帧画面位于所述图像集中的展示时间点；所述基于所述卡点时间点确定匹配的目标音乐以及目标帧画面，包括：基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述图像集中的展示时间点的匹配，确定每个候选音乐的卡点个数；获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。In one embodiment, the position information corresponding to the candidate frame picture is the display time point of the candidate frame picture in the image set; the matching target music and target frame picture are determined based on the stuck point time point. , including: based on the matching of the jam point time point of the candidate music and the display time point of each candidate frame picture in the image set, determining the number of jam points of each candidate music; obtaining the matching number of jam points The candidate music with the largest number is used as the target music, and the candidate frame picture matching the time point of the jam point in the target music is obtained as the target frame picture.

在其中一个实施例中，所述方法还包括：获取所述目标帧画面的个数；在所述目标帧画面的个数小于目标值的情况下，则从所述候选帧画面中筛选所述候选帧画面作为目标帧画面，直到所述目标帧画面的个数达到目标值。In one of the embodiments, the method further includes: acquiring the number of the target frame pictures; if the number of the target frame pictures is less than a target value, filtering the candidate frame pictures for the The candidate frame picture is used as the target frame picture until the number of the target frame picture reaches the target value.

在其中一个实施例中，所述对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像，包括：识别所述目标帧画面中的主体对象；提取所述主体对象作为所述目标帧画面中的卡点图像。In one of the embodiments, the performing image processing on the target frame picture to obtain the card point image in the target frame picture includes: recognizing the main object in the target frame picture; extracting the main object as a The card point image in the target frame picture.

在其中一个实施例中，所述提取所述主体对象作为所述目标帧画面中的卡点图像，包括：在所述目标帧画面中存在多个主体对象的情况下，则获取每个所述主体对象在所述目标帧画面中的占比；提取占比最多的所述主体对象作为所述目标帧画面中的卡点图像。In one embodiment, the extracting the subject object as the card point image in the target frame image includes: in the case that there are multiple subject objects in the target frame image, acquiring each of the subject objects The proportion of the main object in the target frame picture; the main object with the largest proportion is extracted as the card point image in the target frame picture.

在其中一个实施例中，所述根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频，包括：将所述多媒体资源和所述目标音乐进行合并，得到合并后的视频；根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段；基于所述卡点图像在所述视频中的显示时间段，在所述视频中***对应的卡点图像，生成所述目标卡点视频。In one embodiment, the generating the target jam video according to the multimedia resource, the target music and the jam image includes: merging the multimedia resource and the target music to obtain a merged video; The card point image and the position information of the target frame image corresponding to the card point image determine the display time period of the card point image in the video; based on the display time period of the card point image in the video, Insert the corresponding snap image into the video to generate the target snap video.

在其中一个实施例中，所述卡点图像在所述视频中的显示时间段包括所述卡点图像在所述视频中的显示开始时间点和显示结束时间点；所述根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段，包括：基于所述卡点图像对应的目标帧画面的位置信息，确定所述目标帧画面在所述视频中的时间点；将所述卡点图像对应的目标帧画面在所述视频中的时间点确定为所述卡点图像在所述视频中的显示结束时间点；基于预设的卡点时间配置将所述视频的开始时间点或当前所述卡点图像的上一帧卡点图像对应的目标帧画面在所述视频中的时间点，确定为所述卡点图像在所述视频中的显示开始时间点。In one embodiment, the display time period of the stuck image in the video includes a display start time point and a display end time of the stuck image in the video; The image and the position information of the target frame picture corresponding to the card point image determine the display time period of the card point image in the video, including: based on the position information of the target frame picture corresponding to the card point image, determining the target frame picture corresponding to the card point image. The time point of the target frame picture in the video; the time point of the target frame picture corresponding to the stuck image in the video is determined as the display end time point of the stuck image in the video; Based on the preset jam time configuration, the start time point of the video or the time point in the video corresponding to the target frame image corresponding to the previous jam image of the current jam image is determined as the jam point The point in time at which the image is displayed in the video.

在其中一个实施例中，所述方法还包括：响应于对所述目标卡点视频的播放指令，播放所述目标卡点视频；响应于所述卡点图像在所述视频中的显示开始时间点到达，在播放画面的任意位置处以特效模式展示所述卡点图像；直到所述卡点图像与所述卡点图像对应的目标帧画面中相应位置重合，结束对所述卡点图像的展示。In one embodiment, the method further includes: in response to a play instruction for the target card point video, playing the target card point video; in response to the display start time of the card point image in the video When the point arrives, the card point image is displayed in the special effect mode at any position on the playback screen; until the card point image and the corresponding position in the target frame corresponding to the card point image are coincident, the display of the card point image is ended. .

根据本公开实施例的第二方面，提供一种卡点视频的生成装置，包括：According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for generating a stuck video, including:

候选帧画面获取模块，被配置为获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；A candidate frame picture acquisition module, configured to acquire a candidate frame picture in the multimedia resource and the position information corresponding to the candidate frame picture;

匹配模块，被配置为基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；a matching module, configured to determine matching target music and target frame images based on the candidate frame images and position information corresponding to the candidate frame images;

图像处理模块，被配置为对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；an image processing module, configured to perform image processing on the target frame picture to obtain a card point image in the target frame picture;

卡点视频生成模块，被配置为根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The card point video generation module is configured to generate the target card point video according to the multimedia resource, target music and the card point image.

在其中一个实施例中，所述多媒体资源包括若干帧画面；所述候选帧画面获取模块被配置为：获取所述多媒体资源中的若干帧画面以及每一个帧画面位于所述多媒体资源中的位置信息；从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面。In one embodiment, the multimedia resource includes several frames; the candidate frame acquisition module is configured to: acquire several frames in the multimedia resource and the position of each frame in the multimedia resource information; select a frame image that satisfies a preset condition from the several frame images as a candidate frame image.

在其中一个实施例中，所述预设条件为基于图像质量预设的质量分值；所述候选帧画面获取模块还被配置为：基于每一个帧画面的图像质量，获取每一个帧画面对应的质量分值；获取所述质量分值满足预设的质量分值的帧画面作为候选帧画面。In one embodiment, the preset condition is a preset quality score based on image quality; the candidate frame image acquisition module is further configured to: based on the image quality of each frame image, acquire the corresponding image of each frame image The quality score is obtained; the frame picture whose quality score meets the preset quality score value is obtained as a candidate frame picture.

在其中一个实施例中，所述匹配模块包括：总时长获取单元，被配置为获取所述多媒体资源的总时长，筛选单元，被配置为在卡点音乐库中筛选出与所述多媒体资源的总时长匹配的候选音乐，所述候选音乐中具有若干个卡点，所述卡点具有卡点时间点；确定单元，被配置为基于所述卡点时间点确定匹配的目标音乐以及目标帧画面。In one of the embodiments, the matching module includes: a total duration acquisition unit configured to acquire the total duration of the multimedia resources, and a screening unit configured to filter out the content related to the multimedia resources in the card point music library The candidate music whose total duration matches, there are several card points in the candidate music, and the card points have the card point time points; the determining unit is configured to determine the matching target music and the target frame picture based on the card point time points .

在其中一个实施例中，所述多媒体资源为包括若干帧画面的视频，所述候选帧画面对应的位置信息为所述候选帧画面位于所述视频中的时间点；所述确定单元被配置为：基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述视频中的时间点的匹配，确定每个候选音乐的卡点个数；获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。In one embodiment, the multimedia resource is a video including several frames, and the position information corresponding to the candidate frame is a time point when the candidate frame is located in the video; the determining unit is configured to: : Based on the matching between the time point of the candidate music and the time point when each candidate frame picture is located in the video, determine the number of card points of each candidate music; obtain the card with the largest number of matching card points. The candidate music is used as the target music, and the candidate frame pictures that match the jam points in the target music are obtained as the target frame pictures.

在其中一个实施例中，所述多媒体资源为包括若干帧画面的图像集；所述总时长获取单元被配置为：获取所述图像集中帧画面的个数；基于预设的每一个所述帧画面的展示时长以及所述图像集中帧画面的个数，计算得到所述图像集的展示总时长，将所述图像集的展示总时长确定为所述多媒体资源的总时长。In one embodiment, the multimedia resource is an image set including several frames; the total duration obtaining unit is configured to: obtain the number of frames in the image set; based on each preset frame The display duration of the picture and the number of frame pictures in the image set are calculated to obtain the total display duration of the image set, and the total display duration of the image set is determined as the total duration of the multimedia resource.

在其中一个实施例中，所述候选帧画面对应的位置信息为所述候选帧画面位于所述图像集中的展示时间点；所述确定单元被配置为：基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述图像集中的展示时间点的匹配，确定每个候选音乐的卡点个数；获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。In one embodiment, the position information corresponding to the candidate frame picture is the display time point of the candidate frame picture in the image set; the determining unit is configured to: based on the jam point time point of the candidate music Matching with the display time point that each candidate frame picture is located in the image set, determine the number of card points of each candidate music; obtain the candidate music with the largest number of matching card points as the target music, and obtain the music with the The candidate frame pictures that match the jam point time points in the target music are used as the target frame pictures.

在其中一个实施例中，所述匹配模块被配置为：获取所述目标帧画面的个数；在所述目标帧画面的个数小于目标值的情况下，则从所述候选帧画面中筛选所述候选帧画面作为目标帧画面，直到所述目标帧画面的个数达到目标值。In one of the embodiments, the matching module is configured to: acquire the number of the target frame pictures; in the case that the number of the target frame pictures is less than the target value, filter the candidate frame pictures The candidate frame picture is used as the target frame picture until the number of the target frame picture reaches the target value.

在其中一个实施例中，所述图像处理模块被配置为：识别所述目标帧画面中的主体对象；提取所述主体对象作为所述目标帧画面中的卡点图像。In one of the embodiments, the image processing module is configured to: identify the main object in the target frame picture; extract the main object as the card point image in the target frame picture.

在其中一个实施例中，所述图像处理模块被配置为：在所述目标帧画面中存在多个主体对象的情况下，则获取每个所述主体对象在所述目标帧画面中的占比；提取占比最多的所述主体对象作为所述目标帧画面中的卡点图像。In one of the embodiments, the image processing module is configured to: in the case that there are multiple subject objects in the target frame, obtain the proportion of each subject object in the target frame ; Extract the main object with the largest proportion as the card point image in the target frame picture.

在其中一个实施例中，所述卡点视频生成模块包括：合并单元，被配置为将所述多媒体资源和所述目标音乐进行合并，得到合并后的视频；显示时间段确定单元，被配置为根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段；目标卡点视频生成单元，被配置为基于所述卡点图像在所述视频中的显示时间段，在所述视频中***对应的卡点图像，生成所述目标卡点视频。In one of the embodiments, the stuck video generation module includes: a merging unit, configured to merge the multimedia resource and the target music to obtain a merged video; a display time period determining unit, configured to The display time period of the stuck image in the video is determined according to the stuck image and the position information of the target frame corresponding to the stuck image; the target stuck video generation unit is configured to be based on the stuck image. The display time period of the point image in the video, insert the corresponding card point image in the video, and generate the target card point video.

在其中一个实施例中，所述卡点图像在所述视频中的显示时间段包括所述卡点图像在所述视频中的显示开始时间点和显示结束时间点；所述显示时间段确定单元被配置为：基于所述卡点图像对应的目标帧画面的位置信息，确定所述目标帧画面在所述视频中的时间点；将所述卡点图像对应的目标帧画面在所述视频中的时间点确定为所述卡点图像在所述视频中的显示结束时间点；基于预设的卡点时间配置将所述视频的开始时间点或当前所述卡点图像的上一帧卡点图像对应的目标帧画面在所述视频中的时间点，确定为所述卡点图像在所述视频中的显示开始时间点。In one embodiment, the display time period of the stuck image in the video includes a display start time point and a display end time point of the stuck image in the video; the display time period determining unit is configured to: determine the time point of the target frame picture in the video based on the position information of the target frame picture corresponding to the card point image; put the target frame picture corresponding to the card point image in the video The time point is determined as the display end time point of the stuck image in the video; based on the preset stuck time configuration, the starting time point of the video or the last frame of the current stuck image is stuck. The time point of the target frame picture corresponding to the image in the video is determined as the display start time point of the stuck image in the video.

在其中一个实施例中，所述装置还包括展示模块，被配置为：响应于对所述目标卡点视频的播放指令，播放所述目标卡点视频；响应于所述卡点图像在所述视频中的显示开始时间点到达，在播放画面的任意位置处以特效模式展示所述卡点图像；直到所述卡点图像与所述卡点图像对应的目标帧画面中相应位置重合，结束对所述卡点图像的展示。In one of the embodiments, the apparatus further includes a display module configured to: in response to a play instruction for the target card point video, play the target card point video; in response to the card point image in the When the display start time point in the video arrives, the card image is displayed in a special effect mode at any position on the playback screen; until the card image and the corresponding position in the target frame corresponding to the card image are coincident, the end of the display. The display of the card point image.

根据本公开实施例的第三方面，提供一种电子设备，包括：处理器；用于存储所述处理器可执行指令的存储器；其中，所述处理器被配置为执行所述指令，使得电子设备执行第一方面的任一项实施例中所述的卡点视频的生成方法。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions so that the electronic The device executes the method for generating a stuck video described in any one of the embodiments of the first aspect.

根据本公开实施例的第四方面，提供一种存储介质，当所述存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行第一方面的任一项实施例中所述的卡点视频的生成方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute any one of the embodiments of the first aspect. The generation method of the card point video.

根据本公开实施例的第五方面，提供一种计算机程序产品，所述程序产品包括计算机程序，所述计算机程序存储在可读存储介质中，设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序，使得设备执行第一方面的任一项实施例中所述的卡点视频的生成方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, and at least one processor of a device from the readable storage medium The computer program is read and executed, so that the device executes the method for generating a stuck video described in any one of the embodiments of the first aspect.

上述卡点视频的生成方法，通过获取多媒体资源中的候选帧画面以及候选帧画面对应的位置信息，基于候选帧画面以及对应的位置信息确定匹配的目标音乐以及目标帧画面，并对目标帧画面进行图像处理，得到目标帧画面中的卡点图像，进而根据多媒体资源、目标音乐以及卡点图像生成目标卡点视频。由于本公开是基于候选帧画面以及对应的位置信息自动匹配目标音乐以及目标帧画面，从而极大的提高了卡点视频的生成效率及匹配准确度。The above-mentioned method for generating a stuck video is to determine the matching target music and the target frame based on the candidate frame and the corresponding position information by acquiring the candidate frame in the multimedia resource and the position information corresponding to the candidate frame, and to the target frame. Image processing is performed to obtain the jamming image in the target frame, and then the target jamming video is generated according to the multimedia resources, the target music and the jamming image. Since the present disclosure automatically matches target music and target frame images based on candidate frame images and corresponding position information, the generation efficiency and matching accuracy of stuck video are greatly improved.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理，并不构成对本公开的不当限定。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the principles of the present disclosure and do not unduly limit the present disclosure.

图1是根据一示例性实施例示出的一种卡点视频的生成方法的应用环境图。FIG. 1 is an application environment diagram of a method for generating stuck video according to an exemplary embodiment.

图2是根据一示例性实施例示出的一种卡点视频的生成方法的流程图。Fig. 2 is a flow chart of a method for generating a stuck video according to an exemplary embodiment.

图3是根据一示例性实施例示出的获取候选帧画面步骤的流程示意图。FIG. 3 is a schematic flowchart of a step of acquiring a candidate frame picture according to an exemplary embodiment.

图4是根据一示例性实施例示出的确定目标音乐及目标帧画面步骤的流程示意图。FIG. 4 is a schematic flowchart illustrating steps of determining target music and target frame images according to an exemplary embodiment.

图5是根据一示例性实施例示出的生成卡点视频步骤的流程示意图。FIG. 5 is a schematic flowchart of steps of generating a video of a stuck point according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种卡点视频的生成装置的框图。Fig. 6 is a block diagram of an apparatus for generating stuck video according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种电子设备的框图。Fig. 7 is a block diagram of an electronic device according to an exemplary embodiment.

具体实施方式Detailed ways

为了使本领域普通人员更好地理解本公开的技术方案，下面将结合附图，对本公开实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

需要说明的是，本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

本公开所提供的卡点视频的生成方法，可以应用于如图1所示的应用环境中。其中，终端102通过网络与服务器104进行交互。在一些实施例中，终端102中安装有服务器104提供服务的目标应用程序，使用终端102能够通过该目标应用程序与服务器104进行数据交互，以实现例如数据传输及消息交互等功能。其中，目标应用程序可以是视频播放应用程序、短视频播放应用程序、具有视频播放功能的社交应用程序或信息浏览应用程序等。相应的，终端102通过该目标应用程序可以基于多媒体资源生成并展示卡点视频。The method for generating a stuck video provided by the present disclosure can be applied to the application environment shown in FIG. 1 . The terminal 102 interacts with the server 104 through the network. In some embodiments, a target application provided by the server 104 is installed in the terminal 102, and the terminal 102 can perform data interaction with the server 104 through the target application to realize functions such as data transmission and message interaction. The target application may be a video playback application, a short video playback application, a social networking application with a video playback function, or an information browsing application, and the like. Correspondingly, the terminal 102 can generate and display the video of the card point based on the multimedia resource through the target application.

其中，终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备等，服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。在本公开的实施例中，卡点视频的生成方法可以应用于终端102。在一些实施例中，在用户需要制作卡点视频的情况下，可以通过终端102向目标应用程序发起对应的编辑指令，目标应用程序则通过终端102响应于该编辑指令，并从用于制作卡点视频的素材即多媒体资源中获取候选帧画面以及对应的位置信息，基于候选帧画面以及对应的位置信息确定匹配的目标音乐以及目标帧画面，并对目标帧画面进行图像处理，得到目标帧画面中的卡点图像，进而根据多媒体资源、目标音乐以及卡点图像生成目标卡点视频。由于本公开是基于候选帧画面以及对应的位置信息自动匹配目标音乐以及目标帧画面，从而极大地提高了卡点视频的生成效率及匹配准确度。The terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, etc. The server 104 can be implemented by an independent server or a server cluster composed of multiple servers. In the embodiment of the present disclosure, the method for generating the stuck video may be applied to the terminal 102 . In some embodiments, when the user needs to make a video of card points, a corresponding editing instruction can be initiated to the target application through the terminal 102, and the target application will respond to the editing instruction through the terminal 102, and use the terminal 102 to make a card. The material of the video, that is, the multimedia resource, obtains the candidate frame picture and the corresponding position information, determines the matching target music and the target frame picture based on the candidate frame picture and the corresponding position information, and performs image processing on the target frame picture to obtain the target frame picture. and then generate the target card video according to the multimedia resources, target music and the card image. Because the present disclosure automatically matches target music and target frame images based on candidate frame images and corresponding position information, the generation efficiency and matching accuracy of stuck video are greatly improved.

图2是根据一示例性实施例示出的一种卡点视频的生成方法的流程图，如图2所示，以该方法应用于图1中的终端102为例进行说明，包括以下步骤。FIG. 2 is a flowchart of a method for generating a stuck video according to an exemplary embodiment. As shown in FIG. 2 , the method is applied to the terminal 102 in FIG. 1 as an example for description, including the following steps.

在步骤S210中，获取多媒体资源中的候选帧画面以及候选帧画面对应的位置信息。In step S210, a candidate frame picture in the multimedia resource and position information corresponding to the candidate frame picture are acquired.

其中，多媒体资源是指用于制作卡点视频的素材，包括但不限于文字、图片、照片、声音、动画和影片等。而卡点视频是指在特定的时间点添加有特效画面的视频。候选帧画面则是从多媒体资源中提取的用于生成特效画面的帧画面，即最终生成特效画面的帧画面是从候选帧画面中选取，位置信息则是指相应的候选帧画面位于多媒体资源中的时间分布信息。例如，在多媒体资源为动画的情况下，候选帧画面则是基于一定规则从该动画的若干帧中提取的一帧或多帧用于生成特效画面的帧画面，在该动画的总时长为2分钟的情况下，位置信息则是指提取的一帧或多帧画面位于动画中的显示时间点。在多媒体资源为静态图像集的情况下，候选帧画面则是基于一定规则从该图像集的若干帧中提取的一帧或多帧用于生成特效画面的帧画面，在该图像集中有N帧画面的情况下，位置信息则可以是指提取的一帧或多帧画面位于图像集中的位置顺序，位置信息还可以是基于一帧或多帧画面位于图像集中的位置顺序以及预设的每一帧画面的展示时间而确定的该一帧或多帧画面位于图像集中的展示时间点。在本公开的实施例中，用户制作卡点视频，可以通过终端向目标应用程序发起对用于制作卡点视频的素材即多媒体资源的编辑指令，目标应用程序则通过终端响应于该编辑指令，并从多媒体资源中获取候选帧画面以及与候选帧画面对应的位置信息。Among them, the multimedia resources refer to the materials used to make the card point video, including but not limited to text, pictures, photos, sounds, animations, and videos. The stuck video refers to a video with special effects added at a specific point in time. The candidate frame picture is the frame picture that is extracted from the multimedia resource and used to generate the special effect picture, that is, the frame picture that finally generates the special effect picture is selected from the candidate frame picture, and the location information means that the corresponding candidate frame picture is located in the multimedia resource. time distribution information. For example, in the case where the multimedia resource is an animation, the candidate frame picture is one or more frames extracted from several frames of the animation based on certain rules and used to generate the frame picture of the special effect picture, and the total duration of the animation is 2 In the case of minutes, the location information refers to the display time point at which the extracted frame or frames are located in the animation. When the multimedia resource is a static image set, the candidate frame picture is one or more frames extracted from several frames of the image set based on certain rules and used to generate the special effect picture. There are N frames in the image set. In the case of a picture, the position information may refer to the position sequence of the extracted one or more frames in the image set, and the position information may also be based on the position sequence of the one or more frames in the image set and each preset sequence. The one or more frames determined by the display time of the frame are located at the display time point in the image set. In the embodiment of the present disclosure, when a user makes a video clip, the user can initiate an editing instruction to the target application through the terminal for the material used for making the clip video, that is, the multimedia resource, and the target application responds to the editing command through the terminal, The candidate frame picture and the position information corresponding to the candidate frame picture are obtained from the multimedia resource.

在步骤S220中，基于候选帧画面以及候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面。In step S220, the matched target music and the target frame are determined based on the candidate frame and the position information corresponding to the candidate frame.

其中，目标音乐是指从卡点音乐库中筛选的与多媒体资源匹配的具有卡点信息的卡点音乐，而卡点信息则包括卡点以及卡点出现的时间点，卡点又俗称为节奏点或鼓点。目标帧画面则是从候选帧画面中选取的且与目标音乐匹配的最终用于生成特效画面的帧画面。在本公开的实施例中，由于卡点音乐中具有卡点以及卡点出现的时间点等卡点信息，而候选帧画面也具有对应的位置信息，即上述确定的时间分布信息，因此，基于多媒体资源的总时长，并通过将卡点音乐中卡点出现的时间点与候选帧画面对应的时间分布信息进行匹配，从而得到匹配的目标音乐以及目标帧画面。Among them, the target music refers to the card point music with the card point information that matches the multimedia resources filtered from the card point music library, and the card point information includes the card point and the time point when the card point appears, and the card point is also commonly known as rhythm. beat or drum beat. The target frame picture is the frame picture selected from the candidate frame pictures and matched with the target music and finally used to generate the special effect picture. In the embodiment of the present disclosure, since the jam point music has jam point information such as the jam point and the time point when the jam point appears, and the candidate frame picture also has the corresponding position information, that is, the time distribution information determined above, therefore, based on The total duration of the multimedia resources, and the matching target music and the target frame are obtained by matching the time when the card appears in the card music with the time distribution information corresponding to the candidate frame.

在步骤S230中，对目标帧画面进行图像处理，得到目标帧画面中的卡点图像。In step S230, image processing is performed on the target frame picture to obtain a stuck image in the target frame picture.

其中，图像处理包括图像识别、对识别的图像进行抠图以及对所抠图像进行风格化处理(包括动漫化、提取描边等)等。卡点图像是指卡点视频中的特效画面。在一些实施例中，卡点图像可以通过对目标帧画面进行图像处理得到，例如，基于对目标帧画面中的主体对象进行识别以及抠图，并对所抠图像进行风格化处理后得到对应的卡点图像。The image processing includes image recognition, matting the recognized image, and stylizing the matted image (including animation, extracting strokes, etc.). The stuck image refers to the special effect picture in the stuck video. In some embodiments, the stuck image can be obtained by performing image processing on the target frame, for example, based on identifying and matting the main object in the target frame, and stylizing the matted image to obtain the corresponding Card dot image.

在步骤S240中，根据多媒体资源、目标音乐以及卡点图像生成目标卡点视频。In step S240, a target jam video is generated according to the multimedia resource, the target music and the jam image.

在本公开的实施例中，基于卡点图像对应的时间分布信息，将多媒体资源、目标音乐以及卡点图像在时间上进行对齐并合并，从而得到合成后的目标卡点视频。在播放目标卡点视频的过程中，当目标音乐中卡点出现的时候，对应时间点的卡点图像通过特效画面的形式展示。In the embodiment of the present disclosure, based on the time distribution information corresponding to the jamming images, the multimedia resources, the target music and the jamming images are temporally aligned and merged to obtain a synthesized target jamming video. In the process of playing the target clip video, when the clip appears in the target music, the clip image corresponding to the time point is displayed in the form of a special effect screen.

上述卡点视频的生成方法，通过获取多媒体资源中的候选帧画面以及候选帧画面对应的位置信息，基于候选帧画面以及候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面，并对目标帧画面进行图像处理，得到目标帧画面中的卡点图像，进而根据多媒体资源、目标音乐以及卡点图像生成目标卡点视频。由于本公开是基于候选帧画面以及对应的位置信息自动匹配目标音乐以及目标帧画面，从而极大的提高了卡点视频的生成效率及匹配准确度。The method for generating the above-mentioned card point video, by obtaining the candidate frame picture in the multimedia resource and the position information corresponding to the candidate frame picture, determining the matching target music and the target frame picture based on the candidate frame picture and the position information corresponding to the candidate frame picture, and comparing the target music and the target frame picture. Image processing is performed on the target frame image to obtain the jam point image in the target frame image, and then the target jam point video is generated according to the multimedia resources, the target music and the jam point image. Since the present disclosure automatically matches target music and target frame images based on candidate frame images and corresponding position information, the generation efficiency and matching accuracy of stuck video are greatly improved.

在一示例性实施例中，多媒体资源包括若干帧画面，则如图3所示，在步骤S210中，获取多媒体资源中的候选帧画面以及候选帧画面对应的位置信息，可以通过以下步骤实现：In an exemplary embodiment, the multimedia resource includes several frame pictures, then as shown in FIG. 3 , in step S210, obtaining the candidate frame picture in the multimedia resource and the position information corresponding to the candidate frame picture can be realized by the following steps:

在步骤S312中，获取多媒体资源中的若干帧画面以及每一个帧画面位于多媒体资源中的位置信息。In step S312, several frames in the multimedia resource and position information of each frame in the multimedia resource are acquired.

其中，位置信息可以是指相应的帧画面位于多媒体资源中的时间分布信息。在本公开的实施例中，以多媒体资源为视频为例来说，则将对应视频的所有帧进行分帧处理，从而得到视频中的每一个帧画面以及对应的位置信息，其中，位置信息可以是指对应帧画面位于视频中的时间点。举例来说，对长度为5秒的视频进行分帧，则得到对应的若干帧画面，且每一个帧画面在视频中具有对应的时间点，例如，某一个帧画面对应视频中的时间点为3.86秒，则得到该帧画面对应的时间点信息为3.86秒。基于此，通过对视频进行分帧处理，从而得到视频的若干帧画面以及每一帧画面对应的时间点信息。以多媒体资源为静态图像集为例来说，则基于一定的排序规则获取该图像集中的每一帧画面，以及每一帧画面位于图像集中的位置顺序，也可以基于该位置顺序以及预设的每一帧画面的展示时间而获取每一帧画面位于图像集中的展示时间点，基于此，得到图像集的若干帧画面以及每一帧画面对应的展示时间点。Wherein, the location information may refer to the time distribution information that the corresponding frame picture is located in the multimedia resource. In the embodiment of the present disclosure, taking the multimedia resource as a video as an example, all frames of the corresponding video are processed into frames, so as to obtain each frame picture in the video and the corresponding position information, wherein the position information can be Refers to the time point at which the corresponding frame is located in the video. For example, if a video with a length of 5 seconds is divided into frames, several corresponding frames are obtained, and each frame has a corresponding time point in the video. For example, the time point in the video corresponding to a certain frame is 3.86 seconds, the time point information corresponding to the frame is obtained as 3.86 seconds. Based on this, by dividing the video into frames, several frames of the video and the time point information corresponding to each frame are obtained. Taking the multimedia resource as a static image set as an example, each frame of the image in the image set is obtained based on a certain sorting rule, and the position sequence of each frame in the image set can also be based on the position sequence and presets. The display time of each frame is obtained from the display time of each frame in the image set. Based on this, several frames of the image set and the display time corresponding to each frame are obtained.

在步骤S314中，从若干帧画面中选取满足预设条件的帧画面，作为候选帧画面。In step S314, a frame picture that satisfies a preset condition is selected from several frame pictures as a candidate frame picture.

其中，预设条件可以是基于图像质量预设的质量分值。在本公开的实施例中，可以基于每一个帧画面的图像质量，而得到每一个帧画面对应的质量分值，从而获取质量分值满足预设的质量分值的帧画面作为候选帧画面。在一些实施例中，可以基于预设的图像质量赋值规则为每一个帧画面的图像质量进行赋值处理，从而得到每一个帧画面对应的质量分值。举例来说，预设的赋值规则可以是基于帧画面中的图像质量以及占比确定，例如，在帧画面中包含对象正脸的情况下，则可以赋予该帧画面对应的质量分值或权重，在帧画面中包含对象全身的情况下，也可以赋予该帧画面对应的质量分值或权重，在帧画面中的主体对象达到整个画面的一定比例的情况下，则还可以赋予该帧画面对应的质量分值或权重。其中，赋予的质量分值或权重可以基于实际场景设置。The preset condition may be a preset quality score based on image quality. In the embodiment of the present disclosure, a quality score corresponding to each frame can be obtained based on the image quality of each frame, so as to obtain a frame whose quality score meets a preset quality score as a candidate frame. In some embodiments, an assignment process may be performed for the image quality of each frame picture based on a preset image quality assignment rule, so as to obtain a quality score corresponding to each frame picture. For example, the preset assignment rule may be determined based on the image quality and proportion in the frame image. For example, in the case that the frame image contains the front face of the object, the corresponding quality score or weight may be assigned to the frame image. , in the case that the frame picture contains the whole body of the object, the quality score or weight corresponding to the frame picture can also be assigned, and when the main object in the frame picture reaches a certain proportion of the whole picture, the frame picture can also be assigned The corresponding quality score or weight. Among them, the assigned quality score or weight can be set based on the actual scene.

举例来说，以上述预设条件为例来说，在帧画面中的对象为人物的情况下，则通过对帧画面进行图像识别，确定该帧画面中是否包含人物正脸、是否包含人物全身以及该帧画面中的主体人物是否达到整个画面的一定比例，从而确定该帧画面的质量分值。For example, taking the above preset conditions as an example, in the case where the object in the frame is a person, the frame is subjected to image recognition to determine whether the frame contains the face of the person, and whether it contains the whole body of the person. And whether the main character in the frame picture reaches a certain proportion of the whole picture, so as to determine the quality score of the frame picture.

进一步地，为了提高候选帧画面的图像质量，还可以对上述满足预设的质量分值的帧画面进行进一步地筛选。在一些实施例中，可以基于图像质量，将达到预设质量分值的帧画面中不清楚的帧画面以及曝光异常的帧画面排除，从而将剩下的达到预设质量分值的且图像质量较好的帧画面作为候选帧画面。Further, in order to improve the image quality of the candidate frame pictures, the above-mentioned frame pictures satisfying the preset quality score may be further screened. In some embodiments, based on the image quality, unclear frames and frames with abnormal exposure in the frames that reach the preset quality score can be excluded, so that the remaining image quality that reaches the preset quality score is eliminated. The better frame picture is used as the candidate frame picture.

上述实施例中，通过获取多媒体资源中的若干帧画面以及每一个帧画面位于多媒体资源中的位置信息，并从若干帧画面中选取满足预设条件的帧画面，作为候选帧画面，从而为后续生成卡点视频所需要的特效画面提供依据，以提高视频生成效率以及视频质量。In the above-mentioned embodiment, by acquiring several frames of pictures in the multimedia resource and the position information of each frame picture in the multimedia resource, and selecting the frame pictures that meet the preset conditions from the several frames of pictures, as candidate frame pictures, so as to be a follow-up picture. Provide the basis for the special effects pictures required to generate the stuck video, so as to improve the video generation efficiency and video quality.

在一示例性实施例中，如图4所示，在步骤S220中，基于候选帧画面以及候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面，可以通过以下步骤实现：In an exemplary embodiment, as shown in FIG. 4 , in step S220, determining the matched target music and the target frame based on the candidate frame picture and the position information corresponding to the candidate frame picture can be achieved by the following steps:

在步骤S422中，获取多媒体资源的总时长，在卡点音乐库中筛选出与多媒体资源的总时长匹配的候选音乐。In step S422, the total duration of the multimedia resources is obtained, and candidate music that matches the total duration of the multimedia resources is screened out in the card music library.

其中，卡点音乐库中存储有若干个卡点音乐，而每个卡点音乐具有对应的音乐播放时长，且每个卡点音乐中具有若干个卡点信息，卡点信息则包括卡点和卡点出现的时间点。由于在生成卡点视频后，卡点音乐通常作为多媒体资源的背景音乐进行播放，因此，两者的播放时长应该相当。基于此，在本公开的实施例中，通过获取多媒体资源的总时长，从而在卡点音乐库中筛选出与多媒体资源的总时长匹配的候选音乐。也即，候选音乐是从卡点音乐库中筛选的与多媒体资源的总时长匹配的卡点音乐。多媒体资源的总时长则是指该多媒体资源中若干帧画面的展示总时长。举例来说，假设多媒体资源的总时长为X秒，则可以在卡点音乐库中筛选音乐播放时长为X秒-1.5倍的X秒的卡点音乐作为候选音乐，进而通过后续步骤确定目标音乐以及目标帧画面。Among them, there are several card points music stored in the card point music library, and each card point music has a corresponding music playing time, and each card point music has several card point information, and the card point information includes the card point and The point in time when the card point appears. Since the card point music is usually played as the background music of the multimedia resource after the card point video is generated, the playing time of the two should be equal. Based on this, in the embodiment of the present disclosure, by acquiring the total duration of the multimedia resources, candidate music that matches the total duration of the multimedia resources is screened out in the card music library. That is, the candidate music is the card point music selected from the card point music library and matching the total duration of the multimedia resource. The total duration of the multimedia resource refers to the total display duration of several frames in the multimedia resource. For example, assuming that the total duration of the multimedia resources is X seconds, the card music library with a music playback duration of X seconds - 1.5 times X seconds can be selected as candidate music, and then the target music can be determined through the subsequent steps. and the target frame picture.

在步骤S424中，基于卡点时间点确定匹配的目标音乐以及目标帧画面。In step S424, the matched target music and the target frame are determined based on the stuck time point.

在本公开的实施例中，在通过上述步骤确定匹配的候选音乐后，还可以进一步基于候选音乐中的卡点时间点以及候选帧画面的时间分布信息，而确定匹配的目标音乐以及目标帧画面。In the embodiment of the present disclosure, after the matching candidate music is determined through the above steps, the matching target music and the target frame may be further determined based on the time points of the jam points in the candidate music and the time distribution information of the candidate frames. .

在一些实施例中，以多媒体资源为包括若干帧画面的视频为例来说，则候选帧画面对应的位置信息为该候选帧画面位于视频中的时间点。则基于候选音乐的卡点时间点与每个候选帧画面位于视频中的时间点的匹配，确定每个候选音乐的卡点个数，获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。由于卡点视频是在特定的时间点添加有特效画面的视频。而卡点音乐中具有卡点，通常卡点出现的时间点即为卡点视频中的特定时间点，而特效画面则是与特定时间点匹配的候选帧画面。目标音乐是指从候选音乐中确定的最终用于生成卡点视频的卡点音乐。目标帧画面则是从候选帧画面中选取的与目标音乐匹配的最终用于生成特效画面的帧画面。举例来说，对于某个候选音乐，可以获取该候选音乐中的每一个卡点以及卡点出现的时间点，进而与候选帧画面对应的时间点进行匹配，在存在某一个卡点出现的时间点与某一个候选帧画面的时间点相同的情况下，则表示该候选音乐存在一个卡点成功的卡点信息。基于此，获取每一个候选音乐中与候选帧画面的时间点匹配的卡点信息的个数，即获取每一个候选点音乐中卡点成功的卡点个数。In some embodiments, taking the multimedia resource as a video including several frames as an example, the position information corresponding to the candidate frame is the time point at which the candidate frame is located in the video. Then, based on the matching of the jam point time point of the candidate music and the time point when each candidate frame picture is located in the video, determine the number of jam points of each candidate music, and obtain the candidate music with the largest number of matching jam points as the target music, A candidate frame picture that matches the jam point time point in the target music is obtained as the target frame picture. Because the card point video is a video with special effects added at a specific point in time. However, there are card points in the card point music. Usually, the time point when the card point appears is a specific time point in the card point video, and the special effect picture is a candidate frame picture that matches the specific time point. The target music refers to the jam music that is finally determined from the candidate music and used to generate the jam video. The target frame picture is the frame picture that is selected from the candidate frame pictures and matches the target music and is finally used to generate the special effect picture. For example, for a certain candidate music, each card point in the candidate music and the time point at which the card point appeared can be obtained, and then matched with the time point corresponding to the candidate frame, and the time point when a certain card point appeared. If the point is the same as the time point of a certain candidate frame, it means that the candidate music has a successful card point information. Based on this, the number of card point information in each candidate music that matches the time point of the candidate frame picture is acquired, that is, the number of successful card points in each candidate music is acquired.

进而基于每一个候选音乐中卡点成功的卡点个数，将卡点成功的卡点个数最多的候选音乐作为目标音乐。将与目标音乐中的卡点信息匹配的候选帧画面作为目标帧画面，在目标音乐中某一个卡点出现的时间点正好与某一个候选帧画面的时间点相同的情况下，则该候选帧画面则为目标帧画面。可以理解的是，由于卡点音乐是持续的，且存在多个卡点，因此，目标帧画面也可以存在多个。Further, based on the number of successful card points in each candidate music, the candidate music with the largest number of successful card points is selected as the target music. The candidate frame picture that matches the card point information in the target music is used as the target frame picture. If the time point of a certain card point in the target music is exactly the same as the time point of a certain candidate frame picture, then the candidate frame picture The picture is the target frame picture. It can be understood that, since the music of the card points is continuous and there are multiple card points, there may also be multiple target frames.

示例性地，以多媒体资源为包括若干帧画面的图像集为例来说，则首先获取多媒体资源的总时长，在一些实施例中，通过获取图像集中帧画面的个数，并基于预设的每一个帧画面的展示时长以及图像集中帧画面的个数，计算得到图像集的展示总时长，从而将图像集的展示总时长确定为多媒体资源的总时长。因此，每一帧画面对应的位置信息为该帧画面位于图像集中的展示时间点，可以基于一定的排序规则确定每一帧画面位于图像集中的位置顺序，并基于该位置顺序以及预设的每一帧画面的展示时间而确定每一帧画面位于图像集中的展示时间点。同理，基于卡点时间点与每个候选帧画面位于图像集中的展示时间点的匹配，确定每个候选音乐的卡点的个数，进而获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。Exemplarily, taking the multimedia resource as an image set including several frames as an example, the total duration of the multimedia resource is obtained first. The display duration of each frame and the number of frames in the image set are calculated to obtain the total display duration of the image set, thereby determining the total display duration of the image set as the total duration of the multimedia resources. Therefore, the position information corresponding to each frame is the display time point of the frame in the image set. The position sequence of each frame in the image set can be determined based on a certain sorting rule, and based on the position sequence and the preset each The display time of a frame of pictures determines the display time point of each frame of pictures in the image set. In the same way, based on the match between the time point of the jam point and the display time point of each candidate frame in the image set, the number of jam points of each candidate music is determined, and then the candidate music with the largest number of matching jam points is obtained as the target. music, and obtain a candidate frame picture that matches the jam point time point in the target music as the target frame picture.

上述实施例中，通过获取多媒体资源的总时长，在卡点音乐库中筛选出与多媒体资源的总时长匹配的候选音乐，进而基于候选音乐中的卡点时间点确定匹配的目标音乐以及目标帧画面。由于其通过多媒体资源的总时长自动匹配候选音乐，进而基于候选音乐中卡点出现的时间点与候选帧画面对应的时间点而自动匹配以确定目标音乐以及目标帧画面，相较于传统技术中手动查找目标帧画面的方式，极大地提高了卡点视频的生成效率。In the above-mentioned embodiment, by acquiring the total duration of the multimedia resources, candidate music that matches the total duration of the multimedia resources is screened out in the card music library, and then the matching target music and target frame are determined based on the card time points in the candidate music. screen. Because it automatically matches the candidate music through the total duration of the multimedia resources, and then automatically matches to determine the target music and the target frame based on the time point when the card point in the candidate music appears and the time point corresponding to the candidate frame picture, compared with the traditional technology. The method of manually finding the target frame picture greatly improves the generation efficiency of stuck video.

在一示例性实施例中，为了进一步提高卡点视频的质量，上述方法还包括：获取与目标音乐中的卡点信息匹配的目标帧画面的个数，在匹配的目标帧画面的个数小于目标值的情况下，则从候选帧画面中随机筛选其他候选帧画面共同作为目标帧画面，直到目标帧画面的个数达到目标值。其中，目标值可以基于多媒体资源的总时长来确定，通常来说，卡点视频播放时每间隔两秒出现一帧特效画面其效果较好。当然，该目标值也可以是预先设定的固定值，例如5帧、7帧等，即卡点视频中至少应具有目标值个数的特效帧画面(也即目标帧画面)。从而避免了因目标帧画面的个数较少，导致卡点视频的质量较差。In an exemplary embodiment, in order to further improve the quality of the stuck video, the method further includes: acquiring the number of target frames matched with the stuck information in the target music, when the number of matched target frames is less than In the case of the target value, other candidate frame pictures are randomly selected from the candidate frame pictures as the target frame picture until the number of target frame pictures reaches the target value. The target value can be determined based on the total duration of the multimedia resources. Generally speaking, it is better to have a frame of special effects images appear every two seconds during the playback of the video clip. Of course, the target value can also be a preset fixed value, such as 5 frames, 7 frames, etc., that is, the card video should have at least the number of special effect frames (ie target frames) of the target value. Thereby, it is avoided that the quality of the stuck video is poor due to the small number of target frames.

在一示例性实施例中，对于某些极端情况，例如，在每个候选音乐中与候选帧画面以及对应的时间点信息匹配的卡点信息的个数都为零的情况下，则表示没有与候选帧画面以及对应的时间点信息匹配的卡点音乐，此时，则可以通过线上的推荐模型获取评分最高的卡点音乐作为目标音乐。从而避免因匹配不上导致没有卡点音乐的情况。In an exemplary embodiment, for some extreme cases, for example, in each candidate music, in the case that the number of card point information that matches the candidate frame picture and the corresponding time point information is zero, it means that there is no For the card music that matches the candidate frame picture and the corresponding time point information, at this time, the card music with the highest score can be obtained through the online recommendation model as the target music. This avoids the situation where there is no music stuck due to unmatched matching.

在一示例性实施例中，在步骤S230中，对目标帧画面进行图像处理，得到目标帧画面中的卡点图像，包括：识别目标帧画面中的主体对象，提取主体对象作为目标帧画面中的卡点图像。其中，主体对象可以基于多媒体资源的素材确定，在多媒体资源是涉及人物对象的素材的情况下，则主体对象可以是人物对象；在多媒体资源是涉及其他动物对象的素材的情况下，则主体对象可以是动物对象。在本公开的实施例中，以多媒体资源是涉及人物对象的素材为例来说，则通过识别目标帧画面中的人物对象，进而对识别的人物对象进行抠图处理，从而将抠图得到的人物对象作为目标帧画面中的卡点图像。In an exemplary embodiment, in step S230, image processing is performed on the target frame image to obtain a card point image in the target frame image, including: recognizing the main object in the target frame image, extracting the main object as the target frame image. card dot image. Wherein, the main object can be determined based on the material of the multimedia resource. In the case that the multimedia resource is a material involving a human object, the main object can be a human object; in the case that the multimedia resource is a material involving other animal objects, the main object Can be an animal object. In the embodiment of the present disclosure, taking the multimedia resource as a material involving human objects as an example, by identifying the human objects in the target frame picture, and then performing image cutout processing on the identified human objects, the image obtained by cutout is The human object is used as the card point image in the target frame.

在一示例性实施例中，在目标帧画面中存在多个主体对象的情况下，则可以获取每个主体对象在目标帧画面中的占比，进而提取占比最多的主体对象作为目标帧画面中的卡点图像。In an exemplary embodiment, when there are multiple main objects in the target frame, the proportion of each main object in the target frame can be obtained, and then the main object with the largest proportion can be extracted as the target frame. Card dot image in .

进一步地，为了提高卡点视频中特效画面的特效效果，还可以对抠图得到的人物对象进行风格化处理，例如，对抠图得到的人物对象进行动漫化处理或提取描边处理等，从而使得处理后的卡点图像具有更好的特效效果。Further, in order to improve the special effect effect of the special effect picture in the card point video, the character objects obtained by the cutout can also be stylized, for example, the character objects obtained by the cutout are animated or extracted, so The processed card point image has better special effects.

在一示例性实施例中，如图5所示，在步骤S240中，根据多媒体资源、目标音乐以及卡点图像生成目标卡点视频，可以通过以下步骤实现：In an exemplary embodiment, as shown in FIG. 5 , in step S240, the target card point video is generated according to the multimedia resource, the target music and the card point image, which can be achieved by the following steps:

在步骤S542中，将多媒体资源和目标音乐进行合并，得到合并后的视频。In step S542, the multimedia resource and the target music are combined to obtain a combined video.

由于多媒体资源的总时长和目标音乐的音乐时长在时间长短上相当，因此，可以将多媒体资源和目标音乐基于其时长进行对齐合并，从而得到合并后的视频。Since the total duration of the multimedia resource and the music duration of the target music are comparable in time length, the multimedia resource and the target music can be aligned and merged based on their durations to obtain a merged video.

在步骤S544中，根据卡点图像以及卡点图像对应的目标帧画面的位置信息确定卡点图像在视频中的显示时间段。In step S544, the display time period of the stuck image in the video is determined according to the stuck image and the position information of the target frame image corresponding to the stuck image.

由于卡点图像是基于对目标帧画面进行图像处理后得到，又由于每一个目标帧画面具有对应的位置信息，因此，通过对目标帧画面进行图像处理后得到的卡点图像也具有对应的位置信息。而卡点图像又是在特定的时间点添加到视频中的特效画面，基于此，根据卡点图像以及对应的位置信息可以确定卡点图像在视频中的显示时间段。Since the jamming image is obtained based on image processing of the target frame, and since each target frame has corresponding position information, the jamming image obtained by performing image processing on the target frame also has a corresponding position. information. The stuck image is a special effect picture added to the video at a specific time point. Based on this, the display time period of the stuck image in the video can be determined according to the stuck image and the corresponding position information.

其中，卡点图像在视频中的显示时间段包括卡点图像在视频中的显示开始时间点和显示结束时间点。而显示结束时间点通常为该卡点图像在多媒体资源中的对应时间点，显示开始时间点则可以基于预设的卡点时间配置进行确定。举例来说，多媒体资源的第一个卡点图像从03:10:24秒取出，第二个卡点图像从05:10:23秒取出，…，第n个卡点图像从45:10:23秒取出。在预设的卡点时间配置是将这n个卡点图像均向前覆盖显示的情况下，则其对应的显示时间段分别为【0，03:10:24】，【0，05:10:23】，…，【0，45:10:23】。即每个卡点图像的显示开始时间点均为多媒体资源的开始时间点，而每个卡点图像的显示结束时间点则为该卡点图像在多媒体资源中的对应时间点。且在每个卡点图像的覆盖时间里，卡点图像在画面中的显示起点可以是画面里的任意位置，显示终点则为该图像在多媒体资源中的原始位置，在显示时间段内，卡点图像是以特效模式展示，响应于卡点图像与卡点图像对应的目标帧画面中相应位置重合，结束对卡点图像的展示。需要说明的是，由于此种显示方式是在卡点视频的播放开始时就将所有的卡点图像显示出来，又由于受到显示画面大小的限制，因此，此种方式下对于卡点图像的个数具有一定的要求，通常以显示画面中能够显示的最多卡点图像的个数为准，且能够保证画面美观。Wherein, the display time period of the card point image in the video includes the display start time point and the display end time point of the card point image in the video. The display end time point is usually the corresponding time point of the card point image in the multimedia resource, and the display start time point may be determined based on the preset card point time configuration. For example, the first card point image of the multimedia resource is taken from 03:10:24 seconds, the second card point image is taken from 05:10:23 seconds, ..., the nth card point image is taken from 45:10: Take out in 23 seconds. In the case where the preset card point time configuration is to cover and display the n card point images forward, the corresponding display time periods are respectively [0, 03:10:24], [0, 05:10] :23], ..., [0, 45:10:23]. That is, the display start time point of each card point image is the start time point of the multimedia resource, and the display end time point of each card point image is the corresponding time point of the card point image in the multimedia resource. And in the coverage time of each card point image, the display start point of the card point image in the screen can be any position in the screen, and the display end point is the original position of the image in the multimedia resource. The point image is displayed in a special effect mode, and the display of the card point image is ended in response to the overlap between the card point image and the corresponding position in the target frame image corresponding to the card point image. It should be noted that, because this display method displays all the card point images at the beginning of the card point video playback, and is limited by the size of the display screen, therefore, in this method, the individual card point images are not displayed. There are certain requirements for the number of images, which is usually based on the maximum number of card point images that can be displayed in the display screen, and the appearance of the screen can be guaranteed.

在一示例性实施例中，还是以上述多媒体资源的第一个卡点图像从03:10:24秒取出，第二个卡点图像从05:10:23秒取出，…，第n个卡点图像从45:10:23秒取出为例来说。在预设的卡点时间配置是在同一时间段只允许一个卡点图像向前覆盖显示的情况下，则第一个卡点图像对应的显示时间段为【0，03:10:24】，第二个卡点图像对应的显示时间段为【03:10:24，05:10:23】，…，第n个卡点图像对应的显示时间段为【t_{n-1}，45:10:23】。即每个卡点图像的显示结束时间点均为该卡点图像在多媒体资源中的对应时间点，而每个卡点图像的显示开始时间点则为该卡点图像的上一帧卡点图像对应的时间点。且在每个卡点图像的覆盖时间里，卡点图像在画面中的显示起点可以是画面里的任意位置，显示终点则为该图像在多媒体资源中的原始位置，即卡点图像与卡点图像对应的目标帧画面中相应位置重合，则结束对卡点图像的展示。在每个卡点图像的显示时间段内，对应的卡点图像可以在画面内运动，从而达到较好的特效效果。In an exemplary embodiment, the first card point image of the multimedia resource is taken out from 03:10:24 seconds, the second card point image is taken out from 05:10:23 seconds, ..., the nth card For example, the point image is taken from 45:10:23 seconds. In the case where the preset card time configuration allows only one card image to be overlaid and displayed in the same time period, the display time period corresponding to the first card image is [0, 03:10:24], The display time period corresponding to the second card point image is [03:10:24, 05:10:23], ..., and the display time period corresponding to the nth card point image is [t_{n-1}, 45: 10:23]. That is, the display end time point of each card point image is the corresponding time point of the card point image in the multimedia resource, and the display start time point of each card point image is the previous card point image of the card point image. corresponding time point. And in the coverage time of each card point image, the display start point of the card point image in the screen can be any position in the screen, and the display end point is the original position of the image in the multimedia resource, that is, the card point image and the card point. If the corresponding positions in the target frame image corresponding to the image overlap, the display of the card point image is ended. During the display time period of each card point image, the corresponding card point image can move in the screen, so as to achieve better special effects.

在步骤S546中，基于卡点图像在视频中的显示时间段，在视频中***对应的卡点图像，生成目标卡点视频。In step S546, based on the display time period of the stuck image in the video, insert the corresponding stuck image into the video to generate the target stuck video.

在一些实施例中，基于上述确定的卡点图像在视频中的显示时间段，在视频的对应时间段***相应的卡点图像，从而生成目标卡点视频，以实现在无需手动操作的情况下基于多媒体资源自动生成卡点视频，极大地提高了卡点视频的生成效率。In some embodiments, based on the above-determined display time period of the stuck image in the video, a corresponding stuck image is inserted in a corresponding time segment of the video, so as to generate a target stuck video, so as to realize the need of manual operation without manual operation. Automatically generate stuck video based on multimedia resources, which greatly improves the generation efficiency of stuck video.

应该理解的是，虽然图1至图5的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图1至图5中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts of FIG. 1 to FIG. 5 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 1 to FIG. 5 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The order of execution is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages within the other steps.

图6是根据一示例性实施例示出的一种卡点视频的生成装置框图。参照图6，该装置包括候选帧画面获取模块602，匹配模块604，图像处理模块606和卡点视频生成模块608。Fig. 6 is a block diagram of an apparatus for generating a stuck video according to an exemplary embodiment. Referring to FIG. 6 , the apparatus includes a candidate frame picture acquisition module 602 , a matching module 604 , an image processing module 606 and a stuck video generation module 608 .

候选帧画面获取模块602，被配置获取多媒体资源中的候选帧画面以及候选帧画面对应的位置信息；The candidate frame picture obtaining module 602 is configured to obtain the candidate frame picture in the multimedia resource and the position information corresponding to the candidate frame picture;

匹配模块604，被配置为基于所述候选帧画面以及候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；The matching module 604 is configured to determine matching target music and target frame images based on the candidate frame images and the position information corresponding to the candidate frame images;

图像处理模块606，被配置为对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；The image processing module 606 is configured to perform image processing on the target frame picture to obtain a card point image in the target frame picture;

卡点视频生成模块608，被配置为根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The jam video generation module 608 is configured to generate a target jam video according to the multimedia resource, target music and jam image.

在一示例性实施例中，所述多媒体资源包括若干帧画面；所述候选帧画面获取模块被配置为：获取所述多媒体资源中的若干帧画面以及每一个帧画面位于所述多媒体资源中的位置信息；从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面。In an exemplary embodiment, the multimedia resource includes several frames; the candidate frame acquisition module is configured to: acquire several frames in the multimedia resource and a frame where each frame is located in the multimedia resource. Location information; select a frame image that meets a preset condition from the several frame images as a candidate frame image.

在一示例性实施例中，所述预设条件为基于图像质量预设的质量分值；所述候选帧画面获取模块还被配置为：基于每一个帧画面的图像质量，获取每一个帧画面对应的质量分值；获取所述质量分值满足预设的质量分值的帧画面作为候选帧画面。In an exemplary embodiment, the preset condition is a preset quality score based on image quality; the candidate frame image obtaining module is further configured to: obtain each frame image based on the image quality of each frame image Corresponding quality score; acquiring the frame picture whose quality score meets the preset quality score as a candidate frame picture.

在一示例性实施例中，所述匹配模块包括：总时长获取单元，被配置为获取所述多媒体资源的总时长，筛选单元，被配置为在卡点音乐库中筛选出与所述多媒体资源的总时长匹配的候选音乐，所述候选音乐中具有若干个卡点，所述卡点具有卡点时间点；确定单元，被配置为基于所述卡点时间点确定匹配的目标音乐以及目标帧画面。In an exemplary embodiment, the matching module includes: a total duration acquisition unit configured to acquire the total duration of the multimedia resources, and a screening unit configured to filter out the multimedia resources in the card point music library. The candidate music that matches the total duration of the candidate music, there are several card points in the candidate music, and the card points have the card point time points; the determining unit is configured to determine the matching target music and target frames based on the card point time points. screen.

在一示例性实施例中，所述多媒体资源为包括若干帧画面的视频，所述候选帧画面对应的位置信息为所述候选帧画面位于所述视频中的时间点；所述确定单元被配置为：基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述视频中的时间点的匹配，确定每个候选音乐的卡点个数；获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。In an exemplary embodiment, the multimedia resource is a video including several frames, and the position information corresponding to the candidate frame is a time point when the candidate frame is located in the video; the determining unit is configured to is: based on the matching between the time point of the candidate music and the time point when each candidate frame picture is located in the video, determine the number of card points of each candidate music; obtain the maximum number of matching card points The candidate music of the target music is taken as the target music, and the candidate frame pictures that match the card point time points in the target music are obtained as the target frame pictures.

在一示例性实施例中，所述多媒体资源为包括若干帧画面的图像集；所述总时长获取单元被配置为：获取所述图像集中帧画面的个数；基于预设的每一个所述帧画面的展示时长以及所述图像集中帧画面的个数，计算得到所述图像集的展示总时长，将所述图像集的展示总时长确定为所述多媒体资源的总时长。In an exemplary embodiment, the multimedia resource is an image set including several frames; the total duration obtaining unit is configured to: obtain the number of frames in the image set; The display duration of the frame picture and the number of frame pictures in the image set are calculated to obtain the total display duration of the image set, and the total display duration of the image set is determined as the total duration of the multimedia resource.

在一示例性实施例中，所述候选帧画面对应的位置信息为所述候选帧画面位于所述图像集中的展示时间点；所述确定单元被配置为：基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述图像集中的展示时间点的匹配，确定每个候选音乐的卡点个数；获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。In an exemplary embodiment, the position information corresponding to the candidate frame picture is a display time point at which the candidate frame picture is located in the image set; the determining unit is configured to: based on the jamming time of the candidate music The matching between the points and the display time points of each candidate frame picture in the image set, determine the number of card points of each candidate music; obtain the candidate music with the largest number of matching card points as the target music, and obtain the music with the most matching card points. The candidate frame picture that matches the jam point time point in the target music is used as the target frame picture.

在一示例性实施例中，所述匹配模块被配置为：获取所述目标帧画面的个数；在所述目标帧画面的个数小于目标值的情况下，则从所述候选帧画面中筛选所述候选帧画面作为目标帧画面，直到所述目标帧画面的个数达到目标值。In an exemplary embodiment, the matching module is configured to: obtain the number of the target frame pictures; in the case that the number of the target frame pictures is less than the target value, obtain the number of the target frame pictures from the candidate frame pictures. The candidate frame pictures are screened as the target frame pictures until the number of the target frame pictures reaches the target value.

在一示例性实施例中，所述图像处理模块被配置为：识别所述目标帧画面中的主体对象；提取所述主体对象作为所述目标帧画面中的卡点图像。In an exemplary embodiment, the image processing module is configured to: identify a subject object in the target frame picture; extract the subject object as a snap point image in the target frame picture.

在一示例性实施例中，所述图像处理模块被配置为：在所述目标帧画面中存在多个主体对象的情况下，则获取每个所述主体对象在所述目标帧画面中的占比；提取占比最多的所述主体对象作为所述目标帧画面中的卡点图像。In an exemplary embodiment, the image processing module is configured to: in the case that there are multiple subject objects in the target frame picture, obtain the occupancy of each subject object in the target frame picture. ratio; extracting the main object with the largest proportion as the card point image in the target frame picture.

在一示例性实施例中，所述卡点视频生成模块包括：合并单元，被配置为将所述多媒体资源和所述目标音乐进行合并，得到合并后的视频；显示时间段确定单元，被配置为根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段；目标卡点视频生成单元，被配置为基于所述卡点图像在所述视频中的显示时间段，在所述视频中***对应的卡点图像，生成所述目标卡点视频。In an exemplary embodiment, the stuck video generation module includes: a merging unit, configured to merge the multimedia resource and the target music to obtain a merged video; a display time period determining unit, configured In order to determine the display time period of the stuck image in the video according to the stuck image and the position information of the target frame corresponding to the stuck image; the target stuck video generation unit is configured to be based on the For the display time period of the stuck image in the video, insert the corresponding stuck image into the video to generate the target stuck video.

在一示例性实施例中，所述卡点图像在所述视频中的显示时间段包括所述卡点图像在所述视频中的显示开始时间点和显示结束时间点；所述显示时间段确定单元被配置为：基于所述卡点图像对应的目标帧画面的位置信息，确定所述目标帧画面在所述视频中的时间点；将所述卡点图像对应的目标帧画面在所述视频中的时间点确定为所述卡点图像在所述视频中的显示结束时间点；基于预设的卡点时间配置将所述视频的开始时间点或当前所述卡点图像的上一帧卡点图像对应的目标帧画面在所述视频中的时间点，确定为所述卡点图像在所述视频中的显示开始时间点。In an exemplary embodiment, the display time period of the stuck image in the video includes a display start time point and a display end time point of the stuck image in the video; the display time period is determined. The unit is configured to: determine the time point of the target frame in the video based on the position information of the target frame corresponding to the stuck image; place the target frame corresponding to the stuck image in the video The time point in the video is determined as the display end time point of the stuck image in the video; based on the preset stuck time configuration, the start time point of the video or the previous frame of the current stuck image is stuck. The time point in the video of the target frame picture corresponding to the point image is determined as the display start time point of the card point image in the video.

在一示例性实施例中，所述装置还包括展示模块，被配置为：响应于对所述目标卡点视频的播放指令，播放所述目标卡点视频；响应于所述卡点图像在所述视频中的显示开始时间点到达，在播放画面的任意位置处以特效模式展示所述卡点图像；响应于所述卡点图像与所述卡点图像对应的目标帧画面中相应位置重合，结束对所述卡点图像的展示。In an exemplary embodiment, the apparatus further includes a presentation module configured to: in response to a play instruction for the target video clip, play the target clip video; in response to the clip image in the When the display start time point in the video arrives, the card image is displayed in a special effect mode at any position on the playback screen; in response to the overlapping of the card image and the corresponding position in the target frame corresponding to the card image, the end A presentation of the card point image.

关于上述实施例中的装置，其中各个模块执行操作的实施方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiments, the implementation manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

图7是根据一示例性实施例示出的一种用于卡点视频的生成方法的设备700的框图。例如，设备700可以是移动电话、计算机、数字广播终端、消息收发设备、游戏控制台、平板设备、医疗设备、健身设备、个人数字助理等。FIG. 7 is a block diagram of a device 700 for generating a stuck video according to an exemplary embodiment. For example, device 700 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, or the like.

参照图7，设备700可以包括以下一个或多个组件：处理组件702、存储器704、电源组件706、多媒体组件708、音频组件710、输入/输出(I/O)的接口712、传感器组件714以及通信组件716。7, device 700 may include one or more of the following components: processing component 702, memory 704, power supply component 706, multimedia component 708, audio component 710, input/output (I/O) interface 712, sensor component 714, and Communication component 716 .

处理组件702通常控制设备700的整体操作，诸如与显示、电话呼叫、数据通信、相机操作和记录操作相关联的操作。处理组件702可以包括一个或多个处理器720来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件702可以包括一个或多个模块，便于处理组件702和其他组件之间的交互。例如，处理组件702可以包括多媒体模块，以方便多媒体组件708和处理组件702之间的交互。The processing component 702 generally controls the overall operation of the device 700, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 702 may include one or more modules to facilitate interaction between processing component 702 and other components. For example, processing component 702 may include a multimedia module to facilitate interaction between multimedia component 708 and processing component 702.

存储器704被配置为存储各种类型的数据以支持在设备700的操作。这些数据的示例包括用于在设备700上操作的任何应用程序或方法的指令、联系人数据、电话簿数据、消息、图片、视频等。存储器704可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)、电可擦除可编程只读存储器(EEPROM)、可擦除可编程只读存储器(EPROM)、可编程只读存储器(PROM)、只读存储器(ROM)、磁存储器、快闪存储器、磁盘或光盘。 Memory 704 is configured to store various types of data to support operation at device 700 . Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and the like. Memory 704 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电源组件706为设备700的各种组件提供电力。电源组件706可以包括电源管理***，一个或多个电源，及其他与为设备700生成、管理和分配电力相关联的组件。 Power supply component 706 provides power to various components of device 700 . Power supply components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 700 .

多媒体组件708包括在所述设备700和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。在屏幕包括触摸面板的情况下，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件708包括一个前置摄像头和/或后置摄像头。在设备700处于操作模式，如拍摄模式或视频模式的情况下，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜***或具有焦距和光学变焦能力。 Multimedia component 708 includes a screen that provides an output interface between the device 700 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). In the case where the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, multimedia component 708 includes a front-facing camera and/or a rear-facing camera. When the device 700 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件710被配置为输出和/或输入音频信号。例如，音频组件710包括一个麦克风(MIC)，在设备700处于操作模式，如呼叫模式、记录模式和语音识别模式的情况下，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器704或经由通信组件716发送。在一些实施例中，音频组件710还包括一个扬声器，用于输出音频信号。 Audio component 710 is configured to output and/or input audio signals. For example, audio component 710 includes a microphone (MIC) that is configured to receive external audio signals when device 700 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 704 or transmitted via communication component 716 . In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

I/O接口712为处理组件702和***接口模块之间提供接口，上述***接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 712 provides an interface between the processing component 702 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

传感器组件714包括一个或多个传感器，用于为设备700提供各个方面的状态评估。例如，传感器组件714可以检测到设备700的打开/关闭状态，组件的相对定位，例如所述组件为设备700的显示器和小键盘，传感器组件714还可以检测设备700或设备700一个组件的位置改变，用户与设备700接触的存在或不存在，设备700方位或加速/减速和设备700的温度变化。传感器组件714可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件714还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件714还可以包括加速度传感器、陀螺仪传感器、磁传感器、压力传感器或温度传感器。 Sensor assembly 714 includes one or more sensors for providing status assessments of various aspects of device 700 . For example, the sensor component 714 can detect the open/closed state of the device 700, the relative positioning of components, such as the display and keypad of the device 700, and the sensor component 714 can also detect a change in the position of the device 700 or a component of the device 700 , the presence or absence of user contact with the device 700 , the orientation or acceleration/deceleration of the device 700 and the temperature change of the device 700 . Sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件716被配置为便于设备700和其他设备之间有线或无线方式的通信。设备700可以接入基于通信标准的无线网络，如WiFi，运营商网络(如2G、3G、4G或5G)，或它们的组合。在一个示例性实施例中，通信组件716经由广播信道接收来自外部广播管理***的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件716还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。 Communication component 716 is configured to facilitate wired or wireless communications between device 700 and other devices. Device 700 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 716 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，设备700可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In an exemplary embodiment, device 700 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.

在示例性实施例中，还提供了一种包括指令的计算机可读存储介质，例如包括指令的存储器704，上述指令可由设备700的处理器720执行以完成上述方法。例如，计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a computer-readable storage medium including instructions, such as a memory 704 including instructions, executable by the processor 720 of the device 700 to perform the method described above. For example, the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

在示例性实施例中，还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，所述计算机程序存储在可读存储介质中，设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序，使得设备执行上述实施例中所述的卡点视频的生成方法。In an exemplary embodiment, there is also provided a computer program product comprising a computer program stored in a readable storage medium from which at least one processor of the device reads The computer program is retrieved and executed, so that the device executes the method for generating stuck video described in the above embodiments.

本公开所有实施例均可以单独被执行，也可以与其他实施例相结合被执行，均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.

Claims

一种卡点视频的生成方法，包括：A method for generating a stuck video, comprising:

获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；Obtain candidate frame pictures in the multimedia resource and position information corresponding to the candidate frame pictures;

基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；Determine matching target music and target frame images based on the candidate frame images and the position information corresponding to the candidate frame images;

对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；Image processing is performed on the target frame picture to obtain a card point image in the target frame picture;

根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The target jam video is generated according to the multimedia resource, the target music and the jam image.
根据权利要求1所述的方法，其中，所述多媒体资源包括若干帧画面；所述获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息，包括：The method according to claim 1, wherein the multimedia resource includes several frames; the acquiring candidate frames in the multimedia resource and position information corresponding to the candidate frames comprises:

获取所述多媒体资源中的若干帧画面以及每一个帧画面位于所述多媒体资源中的位置信息；Acquiring several frames in the multimedia resource and the location information of each frame in the multimedia resource;

从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面。A frame picture that satisfies a preset condition is selected from the several frame pictures as a candidate frame picture.
根据权利要求2所述的方法，其中，所述预设条件为基于图像质量预设的质量分值；所述从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面，包括：The method according to claim 2, wherein the preset condition is a preset quality score based on image quality; and selecting a frame image that satisfies the preset condition from the several frame images as a candidate frame image, include:

基于每一个帧画面的图像质量，获取每一个帧画面对应的质量分值；Based on the image quality of each frame, obtain the quality score corresponding to each frame;

获取所述质量分值满足预设的质量分值的帧画面作为候选帧画面。A frame picture whose quality score meets the preset quality score is acquired as a candidate frame picture.
根据权利要求1所述的方法，其中，所述基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面，包括：The method according to claim 1, wherein determining the matched target music and target frame based on the candidate frame and the position information corresponding to the candidate frame comprises:

获取所述多媒体资源的总时长，在卡点音乐库中筛选出与所述多媒体资源的总时长匹配的候选音乐，所述候选音乐中具有若干个卡点，所述卡点具有卡点时间点；Obtain the total duration of the multimedia resources, and filter out candidate music that matches the total duration of the multimedia resources in the card music library, where the candidate music has several card points, and the card points have card time points ;

基于所述卡点时间点确定匹配的目标音乐以及目标帧画面。The matched target music and the target frame are determined based on the stuck time point.
根据权利要求4所述的方法，其中，所述多媒体资源为包括若干帧画面的视频，所述候选帧画面对应的位置信息为所述候选帧画面位于所述视频中的时间点；所述基于所述卡点时间点确定匹配的目标音乐以及目标帧画面，包括：The method according to claim 4, wherein the multimedia resource is a video including several frames, and the position information corresponding to the candidate frame is a time point when the candidate frame is located in the video; The time point of the stuck point determines the matched target music and the target frame, including:

基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述视频中的时间点的匹配，确定每个候选音乐的卡点个数；Based on the matching between the jam point time point of the candidate music and the time point when each candidate frame picture is located in the video, determine the number of jam points of each candidate music;

获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。Obtain the candidate music with the largest number of matching card points as the target music, and obtain the candidate frame picture that matches the card point time point in the target music as the target frame picture.
根据权利要求4所述的方法，其中，所述多媒体资源为包括若干帧画面的图像集；所述获取所述多媒体资源的总时长，包括：The method according to claim 4, wherein the multimedia resource is an image set including several frames of pictures; and the acquisition of the total duration of the multimedia resource includes:

获取所述图像集中帧画面的个数；obtaining the number of frame pictures in the image set;

基于预设的每一个所述帧画面的展示时长以及所述图像集中帧画面的个数，计算得到所述图像集的展示总时长，将所述图像集的展示总时长确定为所述多媒体资源的总时长。Based on the preset display duration of each of the frames and the number of frames in the image set, the total display duration of the image set is calculated, and the total display duration of the image set is determined as the multimedia resource total duration.
根据权利要求6所述的方法，其中，所述候选帧画面对应的位置信息为所述候选帧画面位于所述图像集中的展示时间点；所述基于所述卡点时间点确定匹配的目标音乐以及目标帧画面，包括：The method according to claim 6, wherein the position information corresponding to the candidate frame picture is the display time point of the candidate frame picture in the image set; the matching target music is determined based on the jam point time point and target frame images, including:

基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述图像集中的展示时间点的匹配，确定每个候选音乐的卡点个数；Based on the matching between the jam point time point of the candidate music and the display time point of each candidate frame in the image set, determine the number of jam points of each candidate music;

获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。Obtain the candidate music with the largest number of matching card points as the target music, and obtain the candidate frame picture that matches the card point time point in the target music as the target frame picture.
根据权利要求5或7所述的方法，还包括：The method according to claim 5 or 7, further comprising:

获取所述目标帧画面的个数；obtaining the number of the target frame images;

在所述目标帧画面的个数小于目标值的情况下，则从所述候选帧画面中筛选所述候选帧画面作为目标帧画面，直到所述目标帧画面的个数达到目标值。When the number of the target frame pictures is less than the target value, the candidate frame pictures are selected from the candidate frame pictures as the target frame pictures until the number of the target frame pictures reaches the target value.
根据权利要求1至7任一项所述的方法，其中，所述对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像，包括：The method according to any one of claims 1 to 7, wherein the performing image processing on the target frame to obtain a stuck image in the target frame comprises:

识别所述目标帧画面中的主体对象；Identify the main object in the target frame picture;

提取所述主体对象作为所述目标帧画面中的卡点图像。The main object is extracted as a snap image in the target frame.
根据权利要求9所述的方法，其中，所述提取所述主体对象作为所述目标帧画面中的卡点图像，包括：The method according to claim 9, wherein the extracting the main object as the card point image in the target frame comprises:

在所述目标帧画面中存在多个主体对象的情况下，获取每个所述主体对象在所述目标帧画面中的占比；In the case that there are multiple subject objects in the target frame, obtain the proportion of each subject object in the target frame;

提取占比最多的所述主体对象作为所述目标帧画面中的卡点图像。The subject object with the largest proportion is extracted as the card point image in the target frame image.
根据权利要求1至7任一项所述的方法，其中，所述根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频，包括：The method according to any one of claims 1 to 7, wherein the generating a target jam video according to the multimedia resources, target music and jam images comprises:

将所述多媒体资源和所述目标音乐进行合并，得到合并后的视频；The multimedia resources and the target music are combined to obtain a combined video;

根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段；Determine the display time period of the stuck image in the video according to the stuck image and the position information of the target frame corresponding to the stuck image;

基于所述卡点图像在所述视频中的显示时间段，在所述视频中***对应的卡点图像，生成所述目标卡点视频。Based on the display time period of the stuck image in the video, inserting the corresponding stuck image into the video to generate the target stuck video.
根据权利要求11所述的方法，其中，所述卡点图像在所述视频中的显示时间段包括所述卡点图像在所述视频中的显示开始时间点和显示结束时间点；所述根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段，包括：The method according to claim 11, wherein the display time period of the stuck image in the video includes a display start time point and a display end time point of the stuck image in the video; The stuck image and the location information of the target frame corresponding to the stuck image determine the display time period of the stuck image in the video, including:

基于所述卡点图像对应的目标帧画面的位置信息，确定所述目标帧画面在所述视频中的时间点；determining the time point of the target frame in the video based on the position information of the target frame corresponding to the stuck image;

将所述卡点图像对应的目标帧画面在所述视频中的时间点确定为所述卡点图像在所述视频中的显示结束时间点；Determining the time point in the video of the target frame image corresponding to the card point image as the display end time point of the card point image in the video;

基于预设的卡点时间配置将所述视频的开始时间点或当前所述卡点图像的上一帧卡点图像对应的目标帧画面在所述视频中的时间点，确定为所述卡点图像在所述视频中的显示开始时间点。Based on the preset jam time configuration, the start time point of the video or the time point in the video corresponding to the target frame image corresponding to the previous jam image of the current jam image is determined as the jam point The point in time at which the image is displayed in the video.
根据权利要求12所述的方法，还包括：The method of claim 12, further comprising:

响应于对所述目标卡点视频的播放指令，播放所述目标卡点视频；In response to a playback instruction to the target jam video, play the target jam video;

响应于所述卡点图像在所述视频中的显示开始时间点到达，在播放画面的任意位置处以特效模式展示所述卡点图像；In response to the arrival of the display start time point of the stuck image in the video, displaying the stuck image in a special effect mode at any position on the playback screen;

响应于所述卡点图像与所述卡点图像对应的目标帧画面中相应位置重合，结束对所述卡点图像的展示。In response to the snapping point image being coincident with the corresponding position in the target frame corresponding to the snapping point image, the displaying of the snapping point image is ended.
一种卡点视频的生成装置，包括：A device for generating video clips, comprising:

候选帧画面获取模块，被配置为获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；A candidate frame picture acquisition module, configured to acquire a candidate frame picture in the multimedia resource and the position information corresponding to the candidate frame picture;

匹配模块，被配置为基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；a matching module, configured to determine matching target music and target frame images based on the candidate frame images and position information corresponding to the candidate frame images;

图像处理模块，被配置为对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；an image processing module, configured to perform image processing on the target frame picture to obtain a card point image in the target frame picture;

卡点视频生成模块，被配置为根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The card point video generation module is configured to generate the target card point video according to the multimedia resource, target music and the card point image.
根据权利要求14所述的装置，其中，所述多媒体资源包括若干帧画面；所述候选帧画面获取模块被配置为：The apparatus according to claim 14, wherein the multimedia resource includes several frames; the candidate frame acquisition module is configured to:

获取所述多媒体资源中的若干帧画面以及每一个帧画面位于所述多媒体资源中的位置信息；Acquiring several frames in the multimedia resource and the location information of each frame in the multimedia resource;

从所述若干帧画面中选取满足预设条件的帧画面，作为候选帧画面。A frame picture that satisfies a preset condition is selected from the several frame pictures as a candidate frame picture.
根据权利要求15所述的装置，其中，所述预设条件为基于图像质量预设的质量分值；所述候选帧画面获取模块还被配置为：The device according to claim 15, wherein the preset condition is a preset quality score based on image quality; the candidate frame picture acquisition module is further configured to:

基于每一个帧画面的图像质量，获取每一个帧画面对应的质量分值；Based on the image quality of each frame, obtain the quality score corresponding to each frame;

获取所述质量分值满足预设的质量分值的帧画面作为候选帧画面。A frame picture whose quality score meets the preset quality score is acquired as a candidate frame picture.
根据权利要求14所述的装置，其中，所述匹配模块包括：The apparatus of claim 14, wherein the matching module comprises:

总时长获取单元，被配置为获取所述多媒体资源的总时长；a total duration obtaining unit, configured to obtain the total duration of the multimedia resource;

筛选单元，被配置为在卡点音乐库中筛选出与所述多媒体资源的总时长匹配的候选音乐，所述候选音乐中具有若干个卡点，所述卡点具有卡点时间点；a screening unit, configured to filter out candidate music that matches the total duration of the multimedia resource in the card point music library, the candidate music has several card points, and the card points have card point time points;

确定单元，被配置为基于所述卡点时间点确定匹配的目标音乐以及目标帧画面。The determining unit is configured to determine the matching target music and the target frame based on the stuck time point.
根据权利要求17所述的装置，其中，所述多媒体资源为包括若干帧画面的视频，所述候选帧画面对应的位置信息为所述候选帧画面位于所述视频中的时间点；所述确定单元被配置为：The apparatus according to claim 17, wherein the multimedia resource is a video including several frames, and the position information corresponding to the candidate frame is a time point when the candidate frame is located in the video; the determining The unit is configured to:

基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述视频中的时间点的匹配，确定每个候选音乐的卡点个数；Based on the matching between the jam point time point of the candidate music and the time point when each candidate frame picture is located in the video, determine the number of jam points of each candidate music;

获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。Obtain the candidate music with the largest number of matching card points as the target music, and obtain the candidate frame picture that matches the card point time point in the target music as the target frame picture.
根据权利要求17所述的装置，其中，所述多媒体资源为包括若干帧画面的图像集；所述总时长获取单元被配置为：The apparatus according to claim 17, wherein the multimedia resource is an image set including several frames; the total duration obtaining unit is configured to:

获取所述图像集中帧画面的个数；obtaining the number of frame pictures in the image set;

基于预设的每一个所述帧画面的展示时长以及所述图像集中帧画面的个数，计算得到所述图像集的展示总时长，将所述图像集的展示总时长确定为所述多媒体资源的总时长。Based on the preset display duration of each of the frames and the number of frames in the image set, the total display duration of the image set is calculated, and the total display duration of the image set is determined as the multimedia resource total duration.
根据权利要求19所述的装置，其中，所述候选帧画面对应的位置信息为所述候选帧画面位于所述图像集中的展示时间点；所述确定单元被配置为：The apparatus according to claim 19, wherein the position information corresponding to the candidate frame picture is a display time point at which the candidate frame picture is located in the image set; and the determining unit is configured to:

基于所述候选音乐的卡点时间点与每个所述候选帧画面位于所述图像集中的展示时间点的匹配，确定每个候选音乐的卡点个数；Based on the matching between the jam point time point of the candidate music and the display time point of each candidate frame in the image set, determine the number of jam points of each candidate music;

获取匹配的卡点个数最多的候选音乐作为目标音乐，获取与所述目标音乐中的卡点时间点匹配的候选帧画面作为目标帧画面。Obtain the candidate music with the largest number of matching card points as the target music, and obtain the candidate frame picture that matches the card point time point in the target music as the target frame picture.
根据权利要求18或20所述的装置，其中，所述匹配模块被配置为：The apparatus of claim 18 or 20, wherein the matching module is configured to:

获取所述目标帧画面的个数；obtaining the number of the target frame images;

在所述目标帧画面的个数小于目标值的情况下，从所述候选帧画面中筛选所述候选帧画面作为目标帧画面，直到所述目标帧画面的个数达到目标值。When the number of the target frame pictures is less than the target value, the candidate frame pictures are selected from the candidate frame pictures as the target frame pictures until the number of the target frame pictures reaches the target value.
根据权利要求14至20任一项所述的装置，其中，所述图像处理模块被配置为：The apparatus of any one of claims 14 to 20, wherein the image processing module is configured to:

识别所述目标帧画面中的主体对象；Identify the main object in the target frame picture;

提取所述主体对象作为所述目标帧画面中的卡点图像。The main object is extracted as a snap image in the target frame.
根据权利要求22所述的装置，其中，所述图像处理模块被配置为：The apparatus of claim 22, wherein the image processing module is configured to:

在所述目标帧画面中存在多个主体对象的情况下，则获取每个所述主体对象在所述目标帧画面中的占比；In the case that there are multiple subject objects in the target frame picture, obtain the proportion of each of the subject objects in the target frame picture;

提取占比最多的所述主体对象作为所述目标帧画面中的卡点图像。The subject object with the largest proportion is extracted as the card point image in the target frame image.
根据权利要求14至20任一项所述的装置，其中，所述卡点视频生成模块包括：The device according to any one of claims 14 to 20, wherein the stuck video generation module comprises:

合并单元，被配置为将所述多媒体资源和所述目标音乐进行合并，得到合并后的视频；a merging unit, configured to merge the multimedia resource and the target music to obtain a merged video;

显示时间段确定单元，被配置为根据所述卡点图像以及所述卡点图像对应的目标帧画面的位置信息确定所述卡点图像在所述视频中的显示时间段；a display time period determination unit, configured to determine the display time period of the stuck image in the video according to the stuck image and the position information of the target frame corresponding to the stuck image;

目标卡点视频生成单元，被配置为基于所述卡点图像在所述视频中的显示时间段，在所述视频中***对应的卡点图像，生成所述目标卡点视频。The target card point video generating unit is configured to insert a corresponding card point image into the video based on the display time period of the card point image in the video, to generate the target card point video.
根据权利要求24所述的装置，其中，所述卡点图像在所述视频中的显示时间段包括所述卡点图像在所述视频中的显示开始时间点和显示结束时间点；所述显示时间段确定单元被配置为：The device according to claim 24, wherein the display time period of the stuck image in the video includes a display start time point and a display end time point of the stuck image in the video; the display The time period determination unit is configured to:

基于所述卡点图像对应的目标帧画面的位置信息，确定所述目标帧画面在所述视频中的时间点；determining the time point of the target frame in the video based on the position information of the target frame corresponding to the stuck image;

将所述卡点图像对应的目标帧画面在所述视频中的时间点确定为所述卡点图像在所述视频中的显示结束时间点；The time point of the target frame picture corresponding to the card point image in the video is determined as the display end time point of the card point image in the video;

基于预设的卡点时间配置将所述视频的开始时间点或当前所述卡点图像的上一帧卡点图像对应的目标帧画面在所述视频中的时间点，确定为所述卡点图像在所述视频中的显示开始时间点。Based on the preset jam time configuration, the start time point of the video or the time point in the video corresponding to the target frame image corresponding to the previous jam image of the current jam image is determined as the jam point The point in time at which the image is displayed in the video.
根据权利要求25所述的装置，其中，所述装置还包括展示模块，被配置为：The apparatus of claim 25, wherein the apparatus further comprises a presentation module configured to:

响应于对所述目标卡点视频的播放指令，播放所述目标卡点视频；In response to a playback instruction to the target jam video, play the target jam video;

响应于所述卡点图像在所述视频中的显示开始时间点到达，在播放画面的任意位置处以特效模式展示所述卡点图像；In response to the arrival of the display start time point of the stuck image in the video, displaying the stuck image in a special effect mode at any position on the playback screen;

响应于所述卡点图像与所述卡点图像对应的目标帧画面中相应位置重合，结束对所述卡点图像的展示。In response to the snapping point image being coincident with the corresponding position in the target frame corresponding to the snapping point image, the displaying of the snapping point image is ended.
一种电子设备，其中，包括：An electronic device comprising:

处理器；processor;

用于存储所述处理器可执行指令的存储器；a memory for storing the processor-executable instructions;

其中，所述处理器被配置为执行所述指令，以实现以下步骤：wherein the processor is configured to execute the instructions to implement the following steps:

获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；Obtain candidate frame pictures in the multimedia resource and position information corresponding to the candidate frame pictures;

基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；Determine matching target music and target frame images based on the candidate frame images and the position information corresponding to the candidate frame images;

对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；Image processing is performed on the target frame picture to obtain a card point image in the target frame picture;

根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The target jam video is generated according to the multimedia resource, the target music and the jam image.
一种计算机可读存储介质，其中，当所述计算机可读存储介质中的指令由电子设备的处理器执行时，使得所述电子设备能够执行以下步骤：A computer-readable storage medium, wherein, when the instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device is enabled to perform the following steps:

获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；Obtain candidate frame pictures in the multimedia resource and position information corresponding to the candidate frame pictures;

基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；Determine matching target music and target frame images based on the candidate frame images and the position information corresponding to the candidate frame images;

对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；Image processing is performed on the target frame picture to obtain a card point image in the target frame picture;

根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The target jam video is generated according to the multimedia resource, the target music and the jam image.
一种计算机程序产品，包括计算机程序，其中，所述计算机程序被处理器执行时实现以下步骤：A computer program product comprising a computer program, wherein the computer program implements the following steps when executed by a processor:

获取多媒体资源中的候选帧画面以及所述候选帧画面对应的位置信息；Obtain candidate frame pictures in the multimedia resource and position information corresponding to the candidate frame pictures;

基于所述候选帧画面以及所述候选帧画面对应的位置信息确定匹配的目标音乐以及目标帧画面；Determine matching target music and target frame images based on the candidate frame images and the position information corresponding to the candidate frame images;

对所述目标帧画面进行图像处理，得到所述目标帧画面中的卡点图像；Image processing is performed on the target frame picture to obtain a card point image in the target frame picture;

根据所述多媒体资源、目标音乐以及卡点图像生成目标卡点视频。The target jam video is generated according to the multimedia resource, the target music and the jam image.