WO2021175040A1

WO2021175040A1 - Video processing method and related device

Info

Publication number: WO2021175040A1
Application number: PCT/CN2021/073333
Authority: WO
Inventors: 尹康
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-03-02
Filing date: 2021-01-22
Publication date: 2021-09-10
Also published as: CN111274446A

Abstract

The present application provides a video processing method and a related device, comprising: first, extracting N pieces of video feature data of N videos comprised in a video data set, N being a positive integer; next, obtaining matching degree data of every two pieces of video feature data among the N pieces of video feature data; then, dividing the N videos into M video clustering clusters on the basis of the matching degree data, M being a positive integer less than or equal to N; and finally, performing deduplication processing on the M video clustering clusters one by one on the basis of a preset deduplication rule to obtain a deduplicated video data set, the deduplicated video data set comprising M videos. Repeated videos in the video data set can be accurately clustered by means of an efficient feature extraction algorithm, and then the clustered repeated videos are subjected to deduplication, thereby greatly improving the accuracy of video deduplication.

Description

视频处理方法及相关装置Video processing method and related device

技术领域Technical field

本申请涉及数据去重技术领域，特别是一种视频处理方法及相关装置。This application relates to the technical field of data deduplication, in particular to a video processing method and related devices.

背景技术Background technique

随着技术的发展，深度学习理论已经成为图像分类、目标检测等基础图像处理领域的主流解决方案，在视频处理领域也获得了广泛关注。在构建视频处理相关的模型时，需要大量训练数据进行训练，且训练数据集的规模和质量直接影响了模型的构建速度和准确性。但是，视频数据在采集过程中会不可避免地引入大量重复数据，所以为了提升基于深度学习理论的视频处理模型的性能，有必要预先对数据集进行去重操作。With the development of technology, deep learning theory has become the mainstream solution in basic image processing fields such as image classification and target detection, and has also received extensive attention in the field of video processing. When building a video processing-related model, a large amount of training data is needed for training, and the scale and quality of the training data set directly affect the speed and accuracy of model building. However, the video data collection process will inevitably introduce a large amount of duplicate data. Therefore, in order to improve the performance of the video processing model based on the deep learning theory, it is necessary to perform deduplication operations on the data set in advance.

目前的常用视频去重算法是基于关键点匹配来进行去重，但利用关键点提取图像特征的过程太过繁琐，并且在特征匹配时使用的k-means等聚类算法需要人工预先设置类别数等参数，无法保证最终去重处理的准确性。The current common video deduplication algorithm is based on key point matching to remove duplication, but the process of using key points to extract image features is too cumbersome, and clustering algorithms such as k-means used in feature matching require manual pre-setting of the number of categories Such parameters cannot guarantee the accuracy of the final de-duplication processing.

发明内容Summary of the invention

基于上述问题，本申请提出了一种视频处理方法及相关装置，可以通过高效的特征提取算法对视频数据集中的重复视频进行准确聚类，再将聚类的重复视频进行去重，大大提升了视频去重的准确性。Based on the above problems, this application proposes a video processing method and related devices, which can accurately cluster the repeated videos in the video data set through an efficient feature extraction algorithm, and then de-duplicate the clustered repeated videos, which greatly improves The accuracy of video deduplication.

本申请实施例第一方面提供了一种视频处理方法，包括：The first aspect of the embodiments of the present application provides a video processing method, including:

提取视频数据集包括的N个视频的N个视频特征数据，N为正整数；Extract N video feature data of N videos included in the video data set, where N is a positive integer;

获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据；Acquiring the matching degree data of every two pieces of video feature data among the N pieces of video feature data;

基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，M为小于或等于N的正整数；Dividing the N videos into M video clusters based on the matching degree data, where M is a positive integer less than or equal to N;

基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，所述去重后的视频数据集包括M个视频。Perform deduplication processing on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video data set, and the deduplicated video data set includes M videos.

本申请实施例第二方面提供了一种视频处理装置，所述装置包括处理单元和通信单元，其中，A second aspect of the embodiments of the present application provides a video processing device. The device includes a processing unit and a communication unit, wherein:

所述处理单元，用于提取视频数据集包括的N个视频的N个视频特征数据，N为正整数；获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据；基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，M为小于或等于N的正整数；基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，所述去重后的视频数据集包括M个视频。The processing unit is configured to extract N video feature data of N videos included in the video data set, where N is a positive integer; obtain the matching degree data of every two video feature data between the N video feature data; based on The matching degree data divides the N videos into M video clusters, where M is a positive integer less than or equal to N; perform deduplication processing on the M video clusters one by one based on a preset deduplication rule To obtain a deduplicated video data set, where the deduplicated video data set includes M videos.

本申请实施例第三方面提供了一种电子设备，包括应用处理器、通信接口和存储器，所述应用处理器、通信接口和存储器相互连接，其中，所述存储器用于存储计算机程序，所述计算机程序包括程序指令，所述应用处理器被配置用于调用所述程序指令，执行如本申请实施例第一方面所描述的全部或部分方法的步骤。The third aspect of the embodiments of the present application provides an electronic device, including an application processor, a communication interface, and a memory. The application processor, the communication interface, and the memory are connected to each other. The memory is used to store a computer program. The computer program includes program instructions, and the application processor is configured to invoke the program instructions to execute all or part of the steps of the method described in the first aspect of the embodiments of the present application.

本申请实施例第四方面提供了一种计算机存储介质，所述计算机存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令当被处理器执行时使所述处理器执行如本申请实施例第一方面所描述的全部或部分方法的步骤。The fourth aspect of the embodiments of the present application provides a computer storage medium, the computer storage medium stores a computer program, the computer program includes program instructions, and when executed by a processor, the program instructions cause the processor to execute such as All or part of the steps of the method described in the first aspect of the embodiments of the present application.

本申请实施例第五方面提供了一种计算机程序产品，其中，上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，上述计算机程序可操作来使计算机执行如本申请实施例第一方面任一方法中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。The fifth aspect of the embodiments of the present application provides a computer program product, wherein the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the above-mentioned computer program is operable to cause a computer to execute the computer program as in the embodiment of the present application. Part or all of the steps described in any method of the first aspect. The computer program product may be a software installation package.

通过实施上述申请实施例，可以得到以下有益效果：By implementing the above application examples, the following beneficial effects can be obtained:

上述视频处理方法及相关装置，首先，提取视频数据集包括的N个视频的N个视频特征数据，N为正整数；接着，获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据；然后，基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，M为小于或等于N的正整数；最后，基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，所述去重后的视频数据集包括M个视频。可以通过高效的特征提取算法对视频数据集中的重复视频进行准确聚类，再将聚类的重复视频进行去重，大大提升了视频去重的准确性。In the above video processing method and related device, firstly, N video feature data of N videos included in the video data set are extracted, where N is a positive integer; then, the value of every two video feature data between the N video feature data is obtained. Matching degree data; then, based on the matching degree data, the N videos are divided into M video clusters, where M is a positive integer less than or equal to N; finally, one pair of the M is paired based on a preset deduplication rule Deduplication processing is performed on two video clusters to obtain a deduplicated video data set, and the deduplicated video data set includes M videos. The repeated videos in the video data set can be accurately clustered through an efficient feature extraction algorithm, and then the clustered repeated videos can be deduplicated, which greatly improves the accuracy of the video deduplication.

附图说明Description of the drawings

为了更清楚地说明本发明实施例技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. Ordinary technicians can obtain other drawings based on these drawings without creative work.

图1为本申请实施例提供的一种视频处理方法的***架构图；FIG. 1 is a system architecture diagram of a video processing method provided by an embodiment of this application;

图2为本申请实施例提供的一种视频处理方法的流程示意图；2 is a schematic flowchart of a video processing method provided by an embodiment of this application;

图3为本申请实施例提供的一种电子设备的结构示意图；FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the application;

图4为本申请实施例提供的一种视频处理装置的功能单元组成框图。FIG. 4 is a block diagram of functional units of a video processing device provided by an embodiment of the application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

本申请实施例所涉及到的电子设备可以是具备通信能力的电子设备，该电子设备可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备，以及各种形式的用户设备 (User Equipment，UE)，移动台(Mobile Station，MS)，终端设备(terminal device)等等。The electronic devices involved in the embodiments of the application may be electronic devices with communication capabilities. The electronic devices may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices, or other devices connected to wireless modems. Processing equipment, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal equipment (terminal device), and so on.

下面对本申请实施例进行详细介绍。The following describes the embodiments of the application in detail.

图1为本申请实施例提供的一种视频处理方法的***构架图，包括视频获取模块110、匹配模块120、分类模块130以及去重处理模块140，上述视频获取模块110、上述匹配模块120、上述分类模块130和上述去重处理模块140之间相互连接，上述视频获取模块110可以获取待处理的视频组成的视频数据集，并将上述视频数据集发送至上述匹配模块120，上述匹配模块120可以将接收到的上述视频数据集中的待处理的视频进行匹配，并将匹配的结果发送至上述分类模块130，上述分类模块130可以根据匹配的结果对上述待处理的视频进行分类得到多个视频聚类簇，每个视频聚类簇为一个视频或多个重复的视频，最后上述去重处理模块140将每个视频聚类簇进行去重处理，得到去重后的视频数据集，完成视频去重的步骤。Figure 1 is a system architecture diagram of a video processing method provided by an embodiment of the application, including a video acquisition module 110, a matching module 120, a classification module 130, and a deduplication processing module 140. The video acquisition module 110, the matching module 120, The classification module 130 and the deduplication processing module 140 are connected to each other. The video acquisition module 110 may acquire a video data set composed of videos to be processed, and send the video data set to the matching module 120, and the matching module 120 The video to be processed in the received video data set may be matched, and the matching result may be sent to the classification module 130. The classification module 130 may classify the to-be-processed video according to the matching result to obtain multiple videos. Clustering, each video cluster is one video or multiple repeated videos, and finally the deduplication processing module 140 performs deduplication processing on each video cluster to obtain a deduplicated video data set, and complete the video Steps to de-duplication.

需要说明的是，神经网络模型的训练数据可能会存在大量重复数据，将所有训练数据都用于训练模型效率不高且会导致神经模型的精度降低，所以对大量训练数据进行去重处理，自动选择出训练效果较好的训练数据十分重要，本申请实施例中的***架构可以应用于筛选视频处理相关的神经网络模型的训练数据的场景。It should be noted that the training data of the neural network model may have a large amount of duplicate data. Using all training data to train the model is not efficient and will cause the accuracy of the neural model to decrease. Therefore, a large amount of training data is deduplicated and automatically processed. It is very important to select training data with better training effects. The system architecture in the embodiment of the present application can be applied to a scenario of screening training data of a neural network model related to video processing.

通过上述***架构，可以通过高效的特征提取算法对视频数据集中的重复视频进行准确聚类，再对聚类的重复视频进行去重处理，大大提升了视频去重的准确性。Through the above system architecture, the repeated videos in the video data set can be accurately clustered through an efficient feature extraction algorithm, and then the clustered repeated videos can be deduplicated, which greatly improves the accuracy of video deduplication.

下面结合图2对本申请实施例中的一种视频处理方法作详细说明，图2为本申请实施例提供的一种视频处理方法的流程示意图，具体包括以下步骤：The following describes in detail a video processing method in an embodiment of the present application with reference to FIG. 2. FIG. 2 is a schematic flowchart of a video processing method provided by an embodiment of the present application, which specifically includes the following steps:

步骤201，提取视频数据集包括的N个视频的N个视频特征数据。Step 201: Extract N video feature data of N videos included in the video data set.

其中，上述视频数据集为N个待处理视频组成的集合，N为任意正整数，可以对每个视频进行处理提取到每个视频对应的视频特征数据，为便于理解，下面以任意一个视频的视频特征数据提取步骤进行说明。Among them, the above-mentioned video data set is a set of N to-be-processed videos, and N is any positive integer, and each video can be processed to extract the video feature data corresponding to each video. For ease of understanding, the following is a set of any video The video feature data extraction step is explained.

首先对该视频进行逐帧读取，获取到该视频的每帧图像数据，之后通过感知哈希算法(Perceptual Hash Algorithm，PHA)提取每帧图像数据的特征向量，为便于理解，下面对任意一帧图像数据的特征向量提取步骤进行具体说明，上述单帧图像数据为彩色图像，包括红(Red)、绿(Green)、蓝(Blue)三个颜色通道，可以先将该RGB三通道的单帧图像转换为单通道的灰度图像，并通过双线性插值算法将转换后的灰度图像的大小归一化为32×32像素大小，以提升提取效率，接着进行离散余弦变换(Discrete Cosine Transform，DCT)，获取到32×32的系数矩阵，之后选取每个系数矩阵左上角位置8×8区域的64个系数进行普通量化得到二值图像，上述普通量化处理的规则如下公式：First, read the video frame by frame to obtain the image data of each frame of the video, and then extract the feature vector of each frame of image data through the Perceptual Hash Algorithm (PHA). The feature vector extraction steps of one frame of image data are described in detail. The above single frame of image data is a color image, including three color channels of red (Red), green (Green), and blue (Blue). The single-frame image is converted into a single-channel gray-scale image, and the size of the converted gray-scale image is normalized to a size of 32×32 pixels through a bilinear interpolation algorithm to improve the extraction efficiency, and then a discrete cosine transform (Discrete cosine transform) is performed. Cosine Transform, DCT), obtain a 32×32 coefficient matrix, and then select 64 coefficients in the 8×8 area at the upper left corner of each coefficient matrix to perform ordinary quantization to obtain a binary image. The above-mentioned ordinary quantization processing rules are as follows:

最后将上述二值图像展平得到的64维向量作为该帧图像数据的特征向量，同理，对每帧图像数据进行处理得到每帧图像对应的特征向量。Finally, the 64-dimensional vector obtained by the above-mentioned binary image flattening is used as the feature vector of the frame of image data. Similarly, each frame of image data is processed to obtain the feature vector corresponding to each frame of image.

如上所述，在得到一个视频的每帧图像对应的特征向量之后，可以基于上述特征向量得到该视频对应的视频特征数据。As described above, after the feature vector corresponding to each frame of image of a video is obtained, the video feature data corresponding to the video can be obtained based on the aforementioned feature vector.

可选的，上述视频特征数据可以为特征序列，上述一个特征序列可以理解为一个视频对应的每帧图像数据对应的全部特征向量的集合，可以通过对该视频的每帧图像的特征向量进行级联来得到上述特征序列，具体的，可以先初始化一个空列表，然后获取该视频数据的每帧图像数据对应的时间戳，然后根据时间戳的先后顺序将上述每帧图像数据对应的64维向量依次排列，并加入上述空列表，得到该视频对应的一个特征序列，重复上述步骤直到获取到上述N个视频对应的N个特征序列，上述不同的视频对应不同的特征序列，需要说明的是，在生成特征序列时，可以基于不同的应用场景对上述特征向量进行下采样，即每隔2帧、4帧等抽取一个特征向量级联来得到特征序列，视频的图像帧数不同其对应的特征序列的长度也可能不同。Optionally, the above-mentioned video feature data may be a feature sequence, and the above-mentioned one feature sequence may be understood as a collection of all feature vectors corresponding to each frame of image data of a video, and the feature vector of each frame of the video can be graded. To obtain the above feature sequence. Specifically, you can initialize an empty list first, then obtain the timestamp corresponding to each frame of image data of the video data, and then calculate the 64-dimensional vector corresponding to each frame of image data according to the order of the timestamp Arrange them in sequence and add the above empty list to obtain a feature sequence corresponding to the video. Repeat the above steps until N feature sequences corresponding to the above N videos are obtained. The above different videos correspond to different feature sequences. It should be noted that, When generating a feature sequence, the above feature vectors can be down-sampled based on different application scenarios, that is, a feature vector cascade is extracted every 2 frames, 4 frames, etc. to obtain the feature sequence. The number of video frames is different and its corresponding features The length of the sequence may also be different.

可选的，上述视频特征数据可以为视频特征向量，上述视频特征向量为多维向量，由每帧图像的图像特征向量叠加组成，可以通过将上述每帧RGB三通道的图像转换为单通道的灰度图像，并通过双线性插值算法将转换后的灰度图像的大小归一化为32×32像素大小，以提升提取效率，接着进行离散余弦变换(Discrete Cosine Transform，DCT)，获取到32×32的系数矩阵，之后选取每个系数矩阵左上角位置8×8区域的64个系数进行特殊量化得到特殊二值图像，上述特殊量化处理的规则如下公式：Optionally, the above-mentioned video feature data may be a video feature vector, and the above-mentioned video feature vector is a multi-dimensional vector composed of image feature vectors of each frame of image. In order to improve the extraction efficiency, the size of the converted gray image is normalized to a size of 32×32 pixels through the bilinear interpolation algorithm, and then the discrete cosine transform (DCT) is performed to obtain 32 ×32 coefficient matrix, and then select 64 coefficients in the 8×8 area at the upper left corner of each coefficient matrix to perform special quantization to obtain a special binary image. The above-mentioned special quantization processing rules are as follows:

之后将上述特殊量化后的全部特殊二值图像展平得到的64维特殊向量进行叠加，最后对叠加后的64维特殊向量进行上述普通量化生成上述视频特征向量，上述视频特征向量可以反映对应的视频的内容信息。After that, the 64-dimensional special vector obtained by flattening all the special binary images after the above special quantization is superimposed, and finally the superimposed 64-dimensional special vector is subjected to the above-mentioned ordinary quantization to generate the above-mentioned video feature vector, and the above-mentioned video feature vector can reflect the corresponding The content information of the video.

通过提取视频数据集包括的N个视频的N个视频特征数据，可以以两种方式提取到两种视频特征数据，可以应付多种视频处理场景，大大提升了后续视频处理的灵活性。By extracting N video feature data of N videos included in the video data set, two types of video feature data can be extracted in two ways, which can cope with multiple video processing scenarios, and greatly improves the flexibility of subsequent video processing.

步骤202，获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据。Step 202: Obtain the matching degree data of every two pieces of video feature data among the N pieces of video feature data.

其中，上述匹配度数据表示N个视频特征数据中，每两个视频特征数据之间的相似性，上述匹配度数据可以等同于N个视频中每两个视频之间的相似性。Wherein, the above-mentioned matching degree data represents the similarity between every two video characteristic data in the N pieces of video feature data, and the above-mentioned matching degree data may be equivalent to the similarity between every two videos in the N pieces of video.

可选的，当上述视频特征数据为特征序列时，由于特征序列的长度可能不同，无法采用测量特征序列的欧氏距离等常规方法来计算每两个特征序列之间的相似性，可以通过匹配函数计算N个特征序列中每两个特征序列之间的最长公共子序列长度，举例来说，将视频数据集以V＝{v1,v2,…,vN}表示，即上述视频数据集V中存在N个视频v，将N个特征序列中任意两个不同的特征序列以Fi和Fj表示，其中Fi＝{fi1,fi2,…,fin}，Fj＝{fj1,fj2,…,fjm}，即任意一个特征序列Fi由包括n帧图像数据对应的n个特征向量f组成，另一个特征序列Fj由m帧图像数据对应的m个特征向量f组成，上述m和n可以相同也可以不同，匹配函数match(Fi,Fj)可以确定任意两个特征序列之间的最长公共子序列F*＝(f*1,f*2,…,f*k)，其中F*∈Fi、F*∈Fj且F*为所有公共子序列中的最长的公共子序列，k为上述最长公共子序列的长度，依次计算每两个特征序列的最长公共子序列的长度直到将N个特征序列都计算完毕，结合组合公式

得到

个最长公共子序列的长度。 Optionally, when the above-mentioned video feature data is a feature sequence, since the length of the feature sequence may be different, conventional methods such as measuring the Euclidean distance of the feature sequence cannot be used to calculate the similarity between every two feature sequences. The function calculates the length of the longest common subsequence between every two feature sequences in N feature sequences. For example, the video data set is represented by V={v1,v2,...,vN}, that is, the above video data set V There are N videos v, and any two different feature sequences in the N feature sequences are represented by Fi and Fj, where Fi={fi1,fi2,...,fin}, Fj={fj1,fj2,...,fjm} , That is, any feature sequence Fi is composed of n feature vectors f corresponding to n frames of image data, and another feature sequence Fj is composed of m feature vectors f corresponding to m frames of image data. The above m and n may be the same or different , The matching function match(Fi,Fj) can determine the longest common subsequence F*=(f*1,f*2,...,f*k) between any two feature sequences, where F*∈Fi, F *∈Fj and F* is the longest common subsequence among all common subsequences, k is the length of the above-mentioned longest common subsequence, and calculate the length of the longest common subsequence of every two feature sequences in turn until N The feature sequence is calculated, combined with the combination formula

get

The length of the longest common subsequence.

可选的，当上述视频特征数据为视频特征向量时，计算其曼哈顿距离(Manhattan Distance)，并将每两个视频特征向量之间的曼哈顿距离作为上述匹配度数据，具体的曼哈顿距离计算步骤可以采用现有的算法，在此不再赘述。Optionally, when the above-mentioned video feature data is a video feature vector, the Manhattan distance (Manhattan Distance) is calculated, and the Manhattan distance between every two video feature vectors is used as the above-mentioned matching degree data. The specific Manhattan distance calculation step can be The existing algorithm is used, and will not be repeated here.

步骤203，基于所述匹配度数据将所述N个视频划分为M个视频聚类簇。Step 203: Divide the N videos into M video clusters based on the matching degree data.

其中，M为小于或等于N的正整数，每个视频聚类簇中可以包括至少一个视频，即多个内容重复的视频会归类至同一视频聚类簇中，不存在内容重复的视频的单一视频可以自成一个视频聚类簇。Where M is a positive integer less than or equal to N, each video cluster can include at least one video, that is, multiple videos with repeated content will be classified into the same video cluster, and there is no video with repeated content. A single video can form a video cluster by itself.

可选的，当匹配度数据为最长公共子序列的长度时，可以设置预设长度阈值，若任意两个视频的最长公共子序列大于预设长度阈值，则表示该最长公共子序列对应的两个视频为重复视频集，N个视频中每个视频都需要与另外的视频两两匹配得到对应的最长公共子序列的长度。Optionally, when the matching degree data is the length of the longest common subsequence, a preset length threshold can be set. If the longest common subsequence of any two videos is greater than the preset length threshold, it means the longest common subsequence The corresponding two videos are repeated video sets, and each of the N videos needs to be matched with other videos pairwise to obtain the length of the corresponding longest common subsequence.

可以结合上述F _i、F _j的表示方式用伪代码进行说明，如下所示： It can be described _{in pseudo code in combination with the expressions of F i} and F _j above, as shown below:

其中，上述输出的视频聚类簇集合C包括M个视频聚类簇，上述N维标志向量的作用是判断该视频是否已经加入了某一个视频聚类簇，若已经加入了某一个视频聚类簇，则可以不再次判断其与其他视频的最长公共子序列是否大于上述预设长度阈值。Among them, the above-mentioned output video cluster cluster set C includes M video clusters, and the above-mentioned N-dimensional flag vector is used to determine whether the video has been added to a certain video cluster, if a certain video cluster has been added Cluster, it is not necessary to judge again whether the longest common subsequence between it and other videos is greater than the above-mentioned preset length threshold.

具体的，以第一个视频举例来说，可以判断第一个视频与第二个视频的最长公共子序列的长度是否大于上述预设长度阈值，若第一个视频与第二个视频的最长公共子序列的长度大于上述预设长度阈值，则说明第一个视频与第二个视频为重复视频集，需要将第一个视频与第二个视频划分为第一视频聚类簇，若第一个视频与第二个视频的最长公共子序列的长度小于或等于上述预设长度阈值，则第一个视频和第二个视频为不同的视频，不属于同一视频聚类簇；之后依次判断第一个视频与第三个视频、第四个视频直到第N个视频之间的最长公共子序列的长度是否大于预设长度阈值，若第一个视频与第三个视频的最长公共子序列的长度大于上述预设长度阈值，则说明第一个视频与第三个视频也为重复视频集，且第一个视频、第二个视频和第三个视频都为重复视频集，需要将第三个视频划分至上述第一视频聚类簇，若第一个视频与第三个视频的最长公共子序列的长度小于或等于上述预设长度阈值，则第一个视频与第三个视频不属于同一视频聚类簇；以此类推筛选出N个视频中与第一个视频为重复视频集的视频，并将其划分至上述第一视频聚类簇；上述第一视频聚类簇确定后，可以继续确定第二视频聚类簇，即判断第二个视频与第三个视频到第N个视频之间的最长公共子序列的长度是否大于上述预设长度阈值，如上所述筛选出N个视频中与第二个视频为重复视频集的视频，并将其划分至第二视频聚类簇，直到将上述N个视频划分为M个视频聚类簇。Specifically, taking the first video as an example, it can be determined whether the length of the longest common subsequence between the first video and the second video is greater than the aforementioned preset length threshold. The length of the longest common subsequence is greater than the above preset length threshold, it means that the first video and the second video are repeated video sets, and the first video and the second video need to be divided into the first video cluster. If the length of the longest common subsequence between the first video and the second video is less than or equal to the aforementioned preset length threshold, the first video and the second video are different videos and do not belong to the same video cluster; Then determine in turn whether the length of the longest common subsequence between the first video and the third video, and the fourth video to the Nth video is greater than the preset length threshold. The length of the longest common subsequence is greater than the above preset length threshold, it means that the first video and the third video are also repeated video sets, and the first video, the second video, and the third video are all repeated videos If the length of the longest common subsequence between the first video and the third video is less than or equal to the aforementioned preset length threshold, the third video needs to be divided into the first video cluster. It does not belong to the same video cluster as the third video; by analogy, the videos that are duplicate video sets with the first video among the N videos are screened out, and are divided into the first video cluster; the first After the video cluster is determined, you can continue to determine the second video cluster, that is, determine whether the length of the longest common subsequence between the second video and the third video to the Nth video is greater than the aforementioned preset length threshold As described above, the second video among the N videos is screened out as the videos of the repeated video set, and the second video is divided into the second video clusters, until the above N videos are divided into M video clusters.

可选的，当匹配度数据为视频特征向量之间的曼哈顿距离时，可以基于N个视频特征向量中每两个视频特征向量之间的曼哈顿距离，采用层次聚类算法(Hierarchical Density-Based Spatial Clustering of Applications with Noise，HDBSCAN)来将N个视频划分为M个视频聚类簇，需要说明的是，跟采用匹配函数来划分视频聚类簇的方法相比，采用HDBSCAN来划分视频聚类簇可以提升聚类的速度，但采用匹配函数的方法准确度更高，可以基于不同的应用需求灵活切换划分视频聚类簇的方法。Optionally, when the matching degree data is the Manhattan distance between video feature vectors, a hierarchical clustering algorithm (Hierarchical Density-Based Spatial Clustering of Applications with Noise, HDBSCAN) to divide N videos into M video clusters. It should be noted that, compared with the method of using matching functions to divide video clusters, HDBSCAN is used to divide video clusters. The clustering speed can be improved, but the method using the matching function is more accurate, and the method of dividing video clusters can be flexibly switched based on different application requirements.

可见，基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，可以快速且准确地将重复视频分别划分至对应的视频聚类簇中，大大提升了视频去重处理的准确性。It can be seen that dividing the N videos into M video clusters based on the matching degree data can quickly and accurately divide the repeated videos into the corresponding video clusters, which greatly improves the video deduplication processing. accuracy.

步骤204，基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集。Step 204: Perform deduplication processing on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video data set.

其中，上述去重后的视频数据集包括M个视频，即每个视频聚类簇只保留一个视频，上述预设去重规则可以包括至少一个去重指标数据，上述去重指标数据可以包括视频时长指标、视频编辑次数指标、视频画质指标、视频格式指标、视频质量指标等视频相关数据中的任意一个或任意组合，基于应用场景的不同来选择不同的去重指标数据，上述视频时长指标可以为视频时长最长或视频时长最短等时长限定，上述视频编辑次数指标可以为视频编辑次数最少或视频编辑次数最多等编辑限定，上述视频画质指标可以为视频画质最清晰或视频画质最模糊等画质限定，上述视频格式指标可以为MP4格式或AVI格式等格式限定，上述视频质量指标可以为视频质量最高或视频质量最低等质量限定。Wherein, the video data set after deduplication includes M videos, that is, only one video is reserved for each video cluster. The preset deduplication rule may include at least one deduplication index data, and the deduplication index data may include videos. Any one or any combination of video-related data such as duration index, video editing frequency index, video image quality index, video format index, video quality index, etc., select different deduplication index data based on different application scenarios, the above video duration index It can be the longest video duration or the shortest video duration, etc. The above-mentioned video editing frequency index can be the editing limit such as the least video editing frequency or the most video editing frequency, the above-mentioned video quality index can be the clearest video quality or the video quality The blurriest picture quality is limited, the above-mentioned video format indicator may be the MP4 format or the AVI format, and the above-mentioned video quality indicator may be the highest video quality or the lowest video quality.

举例来说，若去重指标数据为视频时长指标且此时视频时长指标为视频时长最长的限定，则预设去重规则为保留每个视频聚类簇中视频时长最长的视频，删去其他视频，得到去重后的视频数据集。同理，当去重指标数据为视频编辑次数指标、视频画质指标、视频格式指标、视频质量指标等视频相关数据中的任意一个或任意组合时，同样基于对应的预设去重规则对每个视频聚类簇进行去重处理，得到对应的去重后的视频数据集，在此不再赘述。For example, if the deduplication index data is a video duration index and the video duration index at this time is the limit of the longest video duration, the preset deduplication rule is to retain the video with the longest video duration in each video cluster, and delete Go to other videos to get the video data set after deduplication. Similarly, when the de-duplication index data is any one or any combination of video-related data such as video editing times index, video image quality index, video format index, video quality index, etc., it is also based on the corresponding preset de-duplication rule to Deduplication processing is performed on each video cluster to obtain a corresponding deduplication video data set, which will not be repeated here.

其中，视频数据集可以与预设去重规则中的去重指标数据可以存在映射关系，去重指标数据可以人为更改，也可以根据视频数据集自动选择最符合该视频数据集的去重指标数据，在此不作具体限定。Among them, the video data set can have a mapping relationship with the deduplication index data in the preset deduplication rules. The deduplication index data can be changed manually, or the deduplication index data that best fits the video data set can be automatically selected according to the video data set. , There is no specific limitation here.

可见，基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，可以灵活适应不同的应用场景来对视频数据集进行最合适的去重处理，大大提升了视频去重的准确性和泛用性。It can be seen that based on the preset deduplication rules, the M video clusters are deduplicated one by one to obtain the deduplicated video data set, which can be flexibly adapted to different application scenarios to perform the most appropriate deduplication on the video data set. Processing greatly improves the accuracy and versatility of video deduplication.

下面结合图3对本申请实施例中一种电子设备300进行说明，图3为本申请实施例提供的一种电子设备300的结构示意图，包括应用处理器301、通信接口302和存储器303，所述应用处理器301、通信接口302和存储器303通过总线304相互连接，总线304可以是外设部件互连标准(Peripheral Component Interconnect，简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture，简称EISA)总线等。总线304可以分为地址总线、数据总线、控制总线等。为便于表示，图3中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。其中，所述存储器303用于存储计算机程序，所述计算机程序包括程序指令，所述应用处理器301被配置用于调用所述程序指令，执行以下步骤的方法：The following describes an electronic device 300 in an embodiment of the application with reference to FIG. 3. FIG. 3 is a schematic structural diagram of an electronic device 300 provided in an embodiment of the application, including an application processor 301, a communication interface 302, and a memory 303. The application processor 301, the communication interface 302, and the memory 303 are connected to each other through a bus 304, which can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) Bus and so on. The bus 304 can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only a thick line is used in FIG. 3 to represent it, but it does not mean that there is only one bus or one type of bus. Wherein, the memory 303 is used to store a computer program, the computer program includes program instructions, and the application processor 301 is configured to call the program instructions to perform the following steps:

在一个可能的实施例中，所述视频特征数据包括特征序列；在所述提取视频数据集中的N个视频的N个视频特征数据方面，所述程序中的指令具体用于执行以下操作：In a possible embodiment, the video feature data includes a feature sequence; in terms of extracting the N video feature data of the N videos in the video data set, the instructions in the program are specifically used to perform the following operations:

获取所述每个视频的每帧图像数据；Acquiring each frame of image data of each video;

通过感知哈希算法提取所述每帧图像数据的特征向量；Extracting the feature vector of each frame of image data through a perceptual hash algorithm;

将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列，所述特征序列用于表示视频的内容特征。The feature vector of each frame of image data is concatenated to obtain a feature sequence corresponding to each video, and the feature sequence is used to represent the content feature of the video.

在一个可能的实施例中，在所述获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据方面，所述程序中的指令具体用于执行以下操作：In a possible embodiment, in terms of acquiring the matching degree data of every two pieces of video feature data among the N pieces of video feature data, the instructions in the program are specifically used to perform the following operations:

基于匹配函数获取N个特征序列中每两个特征序列之间的最长公共子序列；Obtain the longest common subsequence between every two feature sequences in N feature sequences based on the matching function;

将每个最长公共子序列的长度确定为所述N个视频特征数据之间每两个视频特征数据的匹配度数据。The length of each longest common subsequence is determined as the matching degree data of every two pieces of video feature data among the N pieces of video feature data.

在一个可能的实施例中，在所述基于所述匹配度数据将所述N个视频划分为M个视频聚类簇方面，所述程序中的指令具体用于执行以下操作：In a possible embodiment, in terms of dividing the N videos into M video clusters based on the matching degree data, the instructions in the program are specifically used to perform the following operations:

将所述最长公共子序列的长度大于预设长度阈值的所述最长公共子序列对应的两个视频确定为一个重复视频集；Determining two videos corresponding to the longest common subsequence whose length of the longest common subsequence is greater than a preset length threshold as a repeated video set;

获取每个重复视频集之间的重合视频信息，所述重合视频信息用于表示每个重复视频集之间是否存在相同视频；Acquiring coincident video information between each repeated video set, where the coincident video information is used to indicate whether the same video exists between each repeated video set;

根据所述重合视频信息将全部重复视频集包括的所述N个视频划分为所述M个视频聚类簇。According to the coincident video information, the N videos included in all repeated video sets are divided into the M video clusters.

在一个可能的实施例中，在所述基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集方面，所述程序中的指令具体用于执行以下操作：In a possible embodiment, in terms of performing deduplication processing on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video data set, the instructions in the program are specifically used To do the following:

获取所述视频数据集对应的预设去重规则，所述预设去重规则包括去重指标数据；Acquiring a preset deduplication rule corresponding to the video data set, where the preset deduplication rule includes deduplication index data;

筛选出每个视频聚类簇中满足所述去重指标数据的视频进行保留；Screen out the videos that meet the deduplication index data in each video cluster for retention;

将所述M个视频聚类簇中保留的M个视频作为所述去重后的视频数据集。The M videos retained in the M video clusters are used as the deduplicated video data set.

在一个可能的实施例中，在所述特征向量包括64维向量；所述通过感知哈希算法提取所述每帧图像数据的特征向量方面，所述程序中的指令具体用于执行以下操作：In a possible embodiment, where the feature vector includes a 64-dimensional vector; in terms of extracting the feature vector of each frame of image data through a perceptual hash algorithm, the instructions in the program are specifically used to perform the following operations:

将所述每帧图像数据转化为32×32像素的灰度图像数据；Converting the image data of each frame into 32×32 pixel grayscale image data;

通过离散余弦变换对所述灰度图像数据进行处理，得到32×32的系数矩阵；Processing the gray image data by discrete cosine transform to obtain a 32×32 coefficient matrix;

选取每个系数矩阵左上位置8×8区域的64个系数进行量化，得到所述每帧图像数据的64维向量。The 64 coefficients in the 8×8 area at the upper left position of each coefficient matrix are selected for quantization, and the 64-dimensional vector of each frame of image data is obtained.

在一个可能的实施例中，在所述将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列方面，所述程序中的指令具体用于执行以下操作：In a possible embodiment, in terms of cascading the feature vectors of each frame of image data to obtain the feature sequence corresponding to each video, the instructions in the program are specifically used to perform the following operations:

获取所述每帧图像数据的时间戳；Acquiring the time stamp of each frame of image data;

根据所述时间戳的先后顺序将所述64维向量依次排列，生成所述每个视频对应的特征序列。The 64-dimensional vectors are arranged in sequence according to the sequence of the time stamps to generate the feature sequence corresponding to each video.

上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是，电子设备为了实现上述功能，其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到，结合本文中所提供的实施例描述的各示例的单元及算法步骤，本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。The foregoing mainly introduces the solution of the embodiment of the present application from the perspective of the execution process on the method side. It can be understood that, in order to implement the above-mentioned functions, an electronic device includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments provided herein, this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

本申请实施例可以根据上述方法示例对电子设备进行功能单元的划分，例如，可以对应各个功能划分各个功能单元，也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。需要说明的是，本申请实施例中对单元的划分是示意性的，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。The embodiment of the present application may divide the electronic device into functional units according to the foregoing method examples. For example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.

图4是本申请实施例提供的一种视频处理装置400的功能单元组成框图。所述视频处理装置400应用于电子设备，包括处理单元401、通信单元402和存储单元403，其中，所述处理单元401，用于执行如上述方法实施例中的任一步骤，且在执行诸如发送等数据传输时，可选择的调用所述通信单元402来完成相应操作。下面进行详细说明。FIG. 4 is a block diagram of functional units of a video processing device 400 provided by an embodiment of the present application. The video processing device 400 is applied to an electronic device, and includes a processing unit 401, a communication unit 402, and a storage unit 403. The processing unit 401 is configured to perform any step in the above-mentioned method embodiment, and is performing such as During data transmission such as sending, the communication unit 402 can be optionally invoked to complete the corresponding operation. The detailed description will be given below.

所述处理单元401，用于提取视频数据集包括的N个视频的N个视频特征数据，N为正整数；The processing unit 401 is configured to extract N video feature data of N videos included in a video data set, where N is a positive integer;

在一个可能的实施例中，所述视频特征数据包括特征序列；在所述提取视频数据集中的N个视频的N个视频特征数据方面，所述处理单元401具体用于：In a possible embodiment, the video feature data includes a feature sequence; in terms of extracting N video feature data of N videos in the video data set, the processing unit 401 is specifically configured to:

在一个可能的实施例中，在所述获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据方面，所述处理单元401具体用于：In a possible embodiment, the processing unit 401 is specifically configured to:

在一个可能的实施例中，在所述基于所述匹配度数据将所述N个视频划分为M个视频聚类簇方面，所述处理单元401具体用于：In a possible embodiment, in terms of dividing the N videos into M video clusters based on the matching degree data, the processing unit 401 is specifically configured to:

在一个可能的实施例中，在所述基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集方面，所述处理单元401具体用于：In a possible embodiment, the processing unit 401 is specifically configured to perform deduplication processing on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video data set. :

在一个可能的实施例中，在所述特征向量包括64维向量；所述通过感知哈希算法提取所述每帧图像数据的特征向量方面，所述处理单元401具体用于：In a possible embodiment, where the feature vector includes a 64-dimensional vector; the processing unit 401 is specifically configured to:

在一个可能的实施例中，在所述将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列方面，所述处理单元401具体用于：In a possible embodiment, in the aspect of cascading the feature vectors of each frame of image data to obtain the feature sequence corresponding to each video, the processing unit 401 is specifically configured to:

本申请实施例还提供一种计算机存储介质，其中，该计算机存储介质存储用于电子数据交换的计算机程序，该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤，上述计算机包括电子设备。An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any method as recorded in the above method embodiment , The above-mentioned computer includes electronic equipment.

本申请实施例还提供一种计算机程序产品，上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包，上述计算机包括电子设备。The embodiments of the present application also provide a computer program product. The above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program. Part or all of the steps of the method. The computer program product may be a software installation package, and the above-mentioned computer includes electronic equipment.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, some steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如上述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个***，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例上述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the above integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the foregoing methods of the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储器中，存储器可以包括：闪存盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取器(英文：Random Access Memory，简称：RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.

以上对本申请实施例进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.

Claims

一种视频处理方法，其特征在于，所述方法包括：A video processing method, characterized in that the method includes:

提取视频数据集包括的N个视频的N个视频特征数据，N为正整数；Extract N video feature data of N videos included in the video data set, where N is a positive integer;

获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据；Acquiring the matching degree data of every two pieces of video feature data among the N pieces of video feature data;

基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，M为小于或等于N的正整数；Dividing the N videos into M video clusters based on the matching degree data, where M is a positive integer less than or equal to N;

基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，所述去重后的视频数据集包括M个视频。Perform deduplication processing on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video data set, and the deduplicated video data set includes M videos.
根据权利要求1所述的方法，其特征在于，所述视频特征数据包括特征序列；所述提取视频数据集中的N个视频的N个视频特征数据，包括：The method according to claim 1, wherein the video feature data includes a feature sequence; and the extraction of N video feature data of N videos in a video data set includes:

获取所述每个视频的每帧图像数据；Acquiring each frame of image data of each video;

通过感知哈希算法提取所述每帧图像数据的特征向量；Extracting the feature vector of each frame of image data through a perceptual hash algorithm;

将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列，所述特征序列用于表示视频的内容特征。The feature vector of each frame of image data is concatenated to obtain a feature sequence corresponding to each video, and the feature sequence is used to represent the content feature of the video.
根据权利要求2所述的方法，其特征在于，所述获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据，包括：The method according to claim 2, wherein the obtaining the matching degree data of every two pieces of video feature data among the N pieces of video feature data comprises:

基于匹配函数获取N个特征序列中每两个特征序列之间的最长公共子序列；Obtain the longest common subsequence between every two feature sequences in N feature sequences based on the matching function;

将每个最长公共子序列的长度确定为所述N个视频特征数据之间每两个视频特征数据的匹配度数据。The length of each longest common subsequence is determined as the matching degree data of every two pieces of video feature data among the N pieces of video feature data.
根据权利要求3所述的方法，其特征在于，所述基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，包括：The method according to claim 3, wherein the dividing the N videos into M video clusters based on the matching degree data comprises:

将所述最长公共子序列的长度大于预设长度阈值的所述最长公共子序列对应的两个视频确定为一个重复视频集；Determining two videos corresponding to the longest common subsequence whose length of the longest common subsequence is greater than a preset length threshold as a repeated video set;

获取每个重复视频集之间的重合视频信息，所述重合视频信息用于表示每个重复视频集之间是否存在相同视频；Acquiring coincident video information between each repeated video set, where the coincident video information is used to indicate whether the same video exists between each repeated video set;

根据所述重合视频信息将全部重复视频集包括的所述N个视频划分为所述M个视频聚类簇。According to the coincident video information, the N videos included in all repeated video sets are divided into the M video clusters.
根据权利要求1所述的方法，其特征在于，所述视频特征数据包括视频特征向量；所述提取视频数据集中的N个视频的N个视频特征数据，包括：The method according to claim 1, wherein the video feature data includes a video feature vector; and the extracting N video feature data of N videos in a video data set includes:

获取所述每个视频的每帧图像数据；Acquiring each frame of image data of each video;

通过感知哈希算法提取所述每帧图像数据的图像特征向量；Extracting the image feature vector of each frame of image data through a perceptual hash algorithm;

将所述每帧图像数据的图像特征向量叠加，组成所述视频特征向量。The image feature vector of each frame of image data is superimposed to form the video feature vector.
根据权利要求5所述的方法，其特征在于，所述获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据，包括：The method according to claim 5, wherein the obtaining the matching degree data of every two pieces of video feature data among the N pieces of video feature data comprises:

获取N个视频特征向量中每两个视频特征向量之间的的曼哈顿距离数据；Acquiring Manhattan distance data between every two video feature vectors in the N video feature vectors;

将每个曼哈顿距离数据确定为所述N个视频特征数据之间每两个视频特征数据的匹配度数据。Each Manhattan distance data is determined as the matching degree data of every two pieces of video feature data among the N pieces of video feature data.
根据权利要求1～6任一项所述的方法，其特征在于，所述基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，包括：The method according to any one of claims 1 to 6, wherein the deduplication processing is performed on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video data set, include:

获取所述视频数据集对应的预设去重规则，所述预设去重规则包括去重指标数据；Acquiring a preset deduplication rule corresponding to the video data set, where the preset deduplication rule includes deduplication index data;

筛选出每个视频聚类簇中满足所述去重指标数据的视频进行保留；Screen out the videos that meet the deduplication index data in each video cluster for retention;

将所述M个视频聚类簇中保留的M个视频作为所述去重后的视频数据集。The M videos retained in the M video clusters are used as the deduplicated video data set.
根据权利要求2所述的方法，其特征在于，所述特征向量包括64维向量；所述通过感知哈希算法提取所述每帧图像数据的特征向量，包括：The method according to claim 2, wherein the feature vector includes a 64-dimensional vector; and the extraction of the feature vector of each frame of image data through a perceptual hash algorithm includes:

将所述每帧图像数据转化为32×32像素的灰度图像数据；Converting the image data of each frame into 32×32 pixel grayscale image data;

通过离散余弦变换对所述灰度图像数据进行处理，得到32×32的系数矩阵；Processing the gray image data by discrete cosine transform to obtain a 32×32 coefficient matrix;

选取每个系数矩阵左上位置8×8区域的64个系数进行量化，得到所述每帧图像数据的64维向量。The 64 coefficients in the 8×8 area at the upper left position of each coefficient matrix are selected for quantization, and the 64-dimensional vector of each frame of image data is obtained.
根据权利要求6所述的方法，其特征在于，所述将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列，包括：The method according to claim 6, wherein the cascading the feature vectors of each frame of image data to obtain the feature sequence corresponding to each video comprises:

获取所述每帧图像数据的时间戳；Acquiring the time stamp of each frame of image data;

根据所述时间戳的先后顺序将所述64维向量依次排列，生成所述每个视频对应的特征序列。The 64-dimensional vectors are arranged in sequence according to the sequence of the time stamps to generate the feature sequence corresponding to each video.
一种视频处理装置，其特征在于，所述装置包括处理单元和通信单元，其中，A video processing device, characterized in that the device includes a processing unit and a communication unit, wherein:

所述处理单元，用于提取视频数据集包括的N个视频的N个视频特征数据，N为正整数；获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据；基于所述匹配度数据将所述N个视频划分为M个视频聚类簇，M为小于或等于N的正整数；基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集，所述去重后的视频数据集包括M个视频。The processing unit is configured to extract N video feature data of N videos included in the video data set, where N is a positive integer; obtain the matching degree data of every two video feature data between the N video feature data; based on The matching degree data divides the N videos into M video clusters, where M is a positive integer less than or equal to N; perform deduplication processing on the M video clusters one by one based on a preset deduplication rule To obtain a deduplicated video data set, where the deduplicated video data set includes M videos.
根据权利要求10所述的视频处理装置，其特征在于，所述视频特征数据包括特征序列；在所述提取视频数据集中的N个视频的N个视频特征数据方面，所述处理单元用于：The video processing device according to claim 10, wherein the video feature data includes a feature sequence; in terms of extracting N video feature data of N videos in the video data set, the processing unit is configured to:

获取所述每个视频的每帧图像数据；Acquiring each frame of image data of each video;

通过感知哈希算法提取所述每帧图像数据的特征向量；Extracting the feature vector of each frame of image data through a perceptual hash algorithm;

将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列，所述特征序列用于表示视频的内容特征。The feature vector of each frame of image data is concatenated to obtain a feature sequence corresponding to each video, and the feature sequence is used to represent the content feature of the video.
根据权利要求11所述的视频处理装置，其特征在于，在所述获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据方面，所述处理单元用于：The video processing device according to claim 11, wherein, in terms of obtaining the matching degree data of every two pieces of video feature data among the N pieces of video feature data, the processing unit is configured to:

基于匹配函数获取N个特征序列中每两个特征序列之间的最长公共子序列；Obtain the longest common subsequence between every two feature sequences in N feature sequences based on the matching function;

将每个最长公共子序列的长度确定为所述N个视频特征数据之间每两个视频特征数据的匹配度数据。The length of each longest common subsequence is determined as the matching degree data of every two pieces of video feature data among the N pieces of video feature data.
根据权利要求12所述的视频处理装置，其特征在于，在所述基于所述匹配度数据将所述N个视频划分为M个视频聚类簇方面，所述处理单元用于：The video processing device according to claim 12, wherein in the aspect of dividing the N videos into M video clusters based on the matching degree data, the processing unit is configured to:

将所述最长公共子序列的长度大于预设长度阈值的所述最长公共子序列对应的两个视频确定为一个重复视频集；Determining two videos corresponding to the longest common subsequence whose length of the longest common subsequence is greater than a preset length threshold as a repeated video set;

获取每个重复视频集之间的重合视频信息，所述重合视频信息用于表示每个重复视频集之间是否存在相同视频；Acquiring coincident video information between each repeated video set, where the coincident video information is used to indicate whether the same video exists between each repeated video set;

根据所述重合视频信息将全部重复视频集包括的所述N个视频划分为所述M个视频聚类簇。According to the coincident video information, the N videos included in all repeated video sets are divided into the M video clusters.
根据权利要求10所述的视频处理装置，其特征在于，所述视频特征数据包括视频特征向量；在所述提取视频数据集中的N个视频的N个视频特征数据方面，所述处理单元用于：The video processing device according to claim 10, wherein the video feature data includes a video feature vector; in terms of extracting N video feature data of N videos in the video data set, the processing unit is configured to :

获取所述每个视频的每帧图像数据；Acquiring each frame of image data of each video;

通过感知哈希算法提取所述每帧图像数据的图像特征向量；Extracting the image feature vector of each frame of image data through a perceptual hash algorithm;

将所述每帧图像数据的图像特征向量叠加，组成所述视频特征向量。The image feature vector of each frame of image data is superimposed to form the video feature vector.
根据权利要求14所述的视频处理装置，其特征在于，在所述获取所述N个视频特征数据之间每两个视频特征数据的匹配度数据方面，所述处理单元用于：The video processing device according to claim 14, wherein, in the aspect of obtaining the matching degree data of every two pieces of video feature data among the N pieces of video feature data, the processing unit is configured to:

获取N个视频特征向量中每两个视频特征向量之间的的曼哈顿距离数据；Acquiring Manhattan distance data between every two video feature vectors in the N video feature vectors;

将每个曼哈顿距离数据确定为所述N个视频特征数据之间每两个视频特征数据的匹配度数据。Each Manhattan distance data is determined as the matching degree data of every two pieces of video feature data among the N pieces of video feature data.
根据权利要求10～15任一项所述的视频处理装置，其特征在于，在所述基于预设去重规则逐一对所述M个视频聚类簇进行去重处理，得到去重后的视频数据集方面，所述处理单元用于：The video processing device according to any one of claims 10 to 15, wherein the deduplication processing is performed on the M video clusters one by one based on a preset deduplication rule to obtain a deduplicated video In terms of data sets, the processing unit is used to:

获取所述视频数据集对应的预设去重规则，所述预设去重规则包括去重指标数据；Acquiring a preset deduplication rule corresponding to the video data set, where the preset deduplication rule includes deduplication index data;

筛选出每个视频聚类簇中满足所述去重指标数据的视频进行保留；Screen out the videos that meet the deduplication index data in each video cluster for retention;

将所述M个视频聚类簇中保留的M个视频作为所述去重后的视频数据集。The M videos retained in the M video clusters are used as the deduplicated video data set.
根据权利要求11所述的视频处理装置，其特征在于，所述特征向量包括64维向量；在所述通过感知哈希算法提取所述每帧图像数据的特征向量方面，所述处理单元用于：The video processing device according to claim 11, wherein the feature vector comprises a 64-dimensional vector; in the aspect of extracting the feature vector of each frame of image data through the perceptual hash algorithm, the processing unit is configured to :

将所述每帧图像数据转化为32×32像素的灰度图像数据；Converting the image data of each frame into 32×32 pixel grayscale image data;

通过离散余弦变换对所述灰度图像数据进行处理，得到32×32的系数矩阵；Processing the gray image data by discrete cosine transform to obtain a 32×32 coefficient matrix;

选取每个系数矩阵左上位置8×8区域的64个系数进行量化，得到所述每帧图像数据的64维向量。The 64 coefficients in the 8×8 area at the upper left position of each coefficient matrix are selected for quantization, and the 64-dimensional vector of each frame of image data is obtained.
根据权利要求15所述的视频处理装置，其特征在于，在所述将所述每帧图像数据的特征向量进行级联得到所述每个视频对应的特征序列方面，所述处理单元用于：The video processing device according to claim 15, wherein in the aspect of concatenating the feature vectors of each frame of image data to obtain the feature sequence corresponding to each video, the processing unit is configured to:

获取所述每帧图像数据的时间戳；Acquiring the time stamp of each frame of image data;

根据所述时间戳的先后顺序将所述64维向量依次排列，生成所述每个视频对应的特征序列。The 64-dimensional vectors are arranged in sequence according to the sequence of the time stamps to generate the feature sequence corresponding to each video.
一种电子设备，其特征在于，包括应用处理器、通信接口和存储器，所述应用处理器、通信接口和存储器相互连接，其中，所述存储器用于存储计算机程序，所述计算机程序包括程序指令，所述应用处理器被配置用于调用所述程序指令，执行如权利要求1～9任一项所述的方法。An electronic device, characterized by comprising an application processor, a communication interface, and a memory, the application processor, the communication interface, and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions The application processor is configured to call the program instructions to execute the method according to any one of claims 1-9.
一种计算机存储介质，其特征在于，所述计算机存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令当被处理器执行时使所述处理器执行如权利要求1～9任一项所述的方法。A computer storage medium, wherein the computer storage medium stores a computer program, the computer program includes program instructions, and when executed by a processor, the program instructions cause the processor to execute as claimed in claims 1-9. Any of the methods.