WO2019137185A1

WO2019137185A1 - Image screening method and apparatus, storage medium and computer device

Info

Publication number: WO2019137185A1
Application number: PCT/CN2018/122841
Authority: WO
Inventors: 刁梁; 陈昕; 周华; 朱欤
Original assignee: 美的集团股份有限公司
Priority date: 2018-01-09
Filing date: 2018-12-21
Publication date: 2019-07-18
Also published as: CN108228844A; CN108228844B

Abstract

An image screening method and apparatus, a storage medium and a computer device. The method comprises: acquiring a first image set (201); extracting a feature vector of each picture in the first image set (202); based on the feature vector of each picture in the first image set, grouping each picture in the first image set into groups (203); determining a cluster center corresponding to each group of images, and determining the distance between the cluster center corresponding to each group of images and a reference center (204); and based on the distance between the cluster center corresponding to each group of images and the reference center, deleting, from the first image set, one or more groups of images which meet a pre-set condition, so as to obtain a second image set (205).

Description

一种图片筛选方法及装置、存储介质、计算机设备Picture screening method and device, storage medium and computer equipment

相关申请的交叉引用Cross-reference to related applications

本申请基于申请号为201810017485.3、申请日为2018年01月09日的中国专利申请提出，并要求该中国专利申请的优先权，该中国专利申请的全部内容在此引入本申请作为参考。The present application is filed on the basis of the Chinese Patent Application No. 20 181 001 s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s

技术领域Technical field

本申请涉及图片处理技术，尤其涉及一种图片筛选方法及装置、存储介质、计算机设备。The present application relates to a picture processing technology, and in particular, to a picture screening method and apparatus, a storage medium, and a computer device.

背景技术Background technique

随着人工智能以及大数据技术的快速发展，越来越多的产品开始向智能化发展，较之非智能化产品，智能化产品多有功能更加强大，用户体验更加舒适等特点。数据是智能化产品及其应用的基础，因此，挖掘出准确的数据对于智能化产品及其应用而言，具有重要的意义。With the rapid development of artificial intelligence and big data technology, more and more products are beginning to develop intelligently. Compared with non-intelligent products, intelligent products have more powerful functions and more comfortable user experience. Data is the foundation of intelligent products and their applications. Therefore, mining accurate data is of great significance for intelligent products and their applications.

图片是大数据技术中的一类重要数据类型，然而，由于互联网上的图片数量巨大且种类繁多，因此用户从互联网上爬取需要的图片时，往往会爬到一些垃圾图片，严重影响了人工智能的应用，基于此，如何识别出这些垃圾图片是亟待解决的问题。Pictures are an important type of data in big data technology. However, due to the large number and variety of pictures on the Internet, when users crawl the required pictures from the Internet, they often climb to some junk pictures, which seriously affects the manual. Intelligent applications, based on this, how to identify these junk images is an urgent problem to be solved.

申请内容Application content

为解决上述技术问题，本申请实施例提供了一种图片筛选方法及装置、存储介质、计算机设备。To solve the above technical problem, the embodiment of the present application provides a picture screening method and device, a storage medium, and a computer device.

本申请实施例提供的图片筛选方法，包括：The image screening method provided by the embodiment of the present application includes:

获取第一图片集合；Obtaining a first image collection;

提取所述第一图片集合中的各个图片的特征向量；Extracting feature vectors of respective pictures in the first picture set;

基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；And grouping each picture in the first picture set into a packet according to a feature vector of each picture in the first picture set;

确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；Determining a cluster center corresponding to each group of pictures, and determining a distance between the cluster center corresponding to each group of pictures and the reference center;

基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。And deleting one or more sets of pictures that meet the preset condition from the first picture set to obtain a second picture set, based on a distance between the cluster center corresponding to the group of pictures and the reference center.

本申请实施例中，所述基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中，包括：In the embodiment of the present application, the grouping, according to the feature vector of each picture in the first picture set, each picture in the first picture set to the group includes:

对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。Generating feature vectors of respective pictures in the first picture set, and grouping each picture in the first picture set into a group based on the clustering result.

本申请实施例中，所述对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中，包括：In the embodiment of the present application, the feature vector of each picture in the first picture set is clustered, and each picture in the first picture set is grouped into a group according to a clustering result, including:

设置聚类中心数量；Set the number of cluster centers;

聚类所述第一图片集合中的各个图片的特征向量；Generating a feature vector of each picture in the first picture set;

分组第一图片集合中的各个图片至分组中，其中组的数量与聚类中心的数量相同。Each picture in the first picture set is grouped into a group, wherein the number of groups is the same as the number of cluster centers.

本申请实施例中，所述确定各组图片对应的聚类中心，包括：In the embodiment of the present application, the determining a cluster center corresponding to each group of pictures includes:

基于所述聚类结果，确定各组图片对应的聚类中心。Based on the clustering result, a cluster center corresponding to each group of pictures is determined.

本申请实施例中，所述方法还包括：In the embodiment of the present application, the method further includes:

基于所述各组图片对应的聚类中心，计算所述参考中心。The reference center is calculated based on a cluster center corresponding to each group of pictures.

本申请实施例中，所述基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合，包括：In the embodiment of the present application, the distance between the cluster center corresponding to the group of pictures and the reference center is deleted, and one or more sets of pictures that meet the preset condition are deleted from the first picture set to obtain a second picture. Collections, including:

将聚类中心相对于所述参考中心的距离大于等于预设阈值的一组或多组图片，从所述第一图片集合中删除，得到第二图片集合。And deleting one or more sets of pictures whose distance from the reference center with respect to the reference center is greater than or equal to a preset threshold, and deleting from the first picture set to obtain a second picture set.

由大到小排序所述各组图片对应的聚类中心与参考中心的距离，并确定出距离最大的M组图片，M为正整数；Sorting the distance between the cluster center corresponding to each group of pictures and the reference center from large to small, and determining the M group picture with the largest distance, M is a positive integer;

从所述第一图片集合中删除所述M组图片，得到第二图片集合。And deleting the M group of pictures from the first picture set to obtain a second picture set.

本申请实施例提供的图片筛选装置，包括：The image screening device provided by the embodiment of the present application includes:

获取单元，配置为获取第一图片集合；Obtaining a unit, configured to obtain a first picture set;

提取单元，配置为提取所述第一图片集合中的各个图片的特征向量；An extracting unit configured to extract a feature vector of each picture in the first picture set;

分组单元，配置为基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；a grouping unit, configured to group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set;

距离确定单元，配置为确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；The distance determining unit is configured to determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center;

筛选单元，配置为基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。The filtering unit is configured to delete one or more sets of pictures that meet the preset condition from the first picture set based on the distance between the cluster center corresponding to the group of pictures and the reference center, to obtain a second picture set.

本申请实施例中，所述分组单元，配置为对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。In this embodiment, the grouping unit is configured to cluster feature vectors of each picture in the first picture set, and group each picture in the first picture set into a group according to a clustering result. .

本申请实施例中，所述分组单元包括：In the embodiment of the present application, the grouping unit includes:

设置子单元，配置为设置聚类中心数量；Set subunits, configured to set the number of cluster centers;

聚类子单元，配置为聚类所述第一图片集合中的各个图片的特征向量；a clustering subunit configured to cluster feature vectors of respective pictures in the first picture set;

划分子单元，配置为分组第一图片集合中的各个图片至分组中，其中组的数量与聚类中心的数量相同。The sub-units are configured to group each picture in the first picture set into a group, wherein the number of groups is the same as the number of cluster centers.

本申请实施例中，所述分组单元，还配置为基于所述聚类结果，确定各组图片对应的聚类中心。In the embodiment of the present application, the grouping unit is further configured to determine, according to the clustering result, a cluster center corresponding to each group of pictures.

本申请实施例中，所述装置还包括：In the embodiment of the present application, the device further includes:

参考中心计算单元，配置为基于所述各组图片对应的聚类中心，计算所述参考中心。The reference center calculation unit is configured to calculate the reference center based on the cluster centers corresponding to the groups of pictures.

本申请实施例中，所述筛选单元，配置为将聚类中心相对于所述参考中心的距离大于等于预设阈值的一组或多组图片，从所述第一图片集合中删除，得到第二图片集合。In the embodiment of the present application, the screening unit is configured to delete one or more sets of pictures whose distance from the reference center to the reference center is greater than or equal to a preset threshold, and delete the first picture set to obtain the first Two picture collections.

本申请实施例中，所述筛选单元，配置为由大到小排序所述各组图片对应的聚类中心与参考中心的距离，并确定出距离最大的M组图片，M为正整数；从所述第一图片集合中删除所述M组图片，得到第二图片集合。In the embodiment of the present application, the screening unit is configured to sort the distance between the cluster center corresponding to each group of pictures and the reference center from large to small, and determine the M group picture with the largest distance, where M is a positive integer; Deleting the M sets of pictures in the first picture set to obtain a second picture set.

本申请实施例提供的存储介质，其上存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现上述的图片筛选方法。The storage medium provided by the embodiment of the present application has stored thereon computer executable instructions, and the computer executable instructions are implemented by the processor to implement the image filtering method described above.

本申请实施例提供的计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可执行指令，所述处理器执行所述计算机可执行指令时实现上述的图片筛选方法。The computer device provided by the embodiment of the present application includes a memory, a processor, and computer executable instructions stored on the memory and executable on the processor, and the processor implements the image screening method when the computer executes the computer executable instructions. .

本申请实施例的技术方案中，获取第一图片集合；提取所述第一图片集合中的各个图片的特征向量；基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。采用本申请实施例的技术方案，首先，利用计算机视觉技术对爬取到的第一图片集合进行处理，得到第一图片集合中的各个图片的特征向量，然后，利用聚类算法对特征向量进行聚类处理，从而实现对第一图片集合中的各个图片进行分组，最后，自动清理第一图片集合中的垃圾图片，从而实现了图片的自动清洗，为人工智能的应用提供了精确的图片数据来源。In the technical solution of the embodiment of the present application, acquiring a first picture set, extracting feature vectors of each picture in the first picture set, and grouping the first according to feature vectors of each picture in the first picture set Determining each picture in the picture set into a group; determining a cluster center corresponding to each group of pictures, and determining a distance between the cluster center corresponding to each group of pictures and the reference center; and based on the cluster center corresponding to each group of pictures Referring to the distance of the center, one or more sets of pictures satisfying the preset condition are deleted from the first picture set to obtain a second picture set. With the technical solution of the embodiment of the present application, first, the first picture set that is crawled is processed by using computer vision technology to obtain feature vectors of each picture in the first picture set, and then the feature vector is performed by using a clustering algorithm. Clustering processing, thereby realizing grouping of each picture in the first picture set, and finally, automatically cleaning the garbage picture in the first picture set, thereby realizing automatic cleaning of the picture, providing accurate picture data for artificial intelligence application source.

附图说明DRAWINGS

图1为本申请实施例中进行信息交互的各方硬件实体的示意图；1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present application;

图2为本申请实施例的图片筛选方法的流程示意图一；2 is a schematic flowchart 1 of a picture screening method according to an embodiment of the present application;

图3为本申请实施例的图片筛选方法的流程示意图二；3 is a schematic flowchart 2 of a picture screening method according to an embodiment of the present application;

图4为本申请实施例的图片筛选方法的流程示意图三；4 is a schematic flowchart 3 of a picture screening method according to an embodiment of the present application;

图5为本申请实施例的图片筛选方法的流程示意图四；FIG. 5 is a schematic flowchart 4 of a picture screening method according to an embodiment of the present application;

图6为本申请实施例的图片筛选装置的结构组成示意图一；6 is a schematic structural diagram 1 of a picture screening device according to an embodiment of the present application;

图7为本申请实施例的图片筛选装置的结构组成示意图二；FIG. 7 is a second structural diagram of a picture screening apparatus according to an embodiment of the present application; FIG.

图8为本申请实施例的计算机设备的结构组成示意图。FIG. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

具体实施方式Detailed ways

为了能够更加详尽地了解本申请实施例的特点与技术内容，下面结合附图对本申请实施例的实现进行详细阐述，所附附图仅供参考说明之用，并非用来限定本申请实施例。The embodiments of the present application are described in detail with reference to the accompanying drawings.

图1为本申请实施例中进行信息交互的各方硬件实体的示意图，图1中包括：图片筛选装置、服务器1-服务器n，其中，图片筛选装置通过有线网络或者无线网络与服务器进行信息交互。一个示例中，图片筛选装置设置于终端中，终端的类型例如是手机、台式机、PC机、一体机等类型；终端至少提供如下两种功能：1)为用户提供用户界面(UI，Interface)；2)从服务器1-服务器n爬取图片并执行图片筛选的处理过程。另一个示例中，图片筛选装置设置于服务器中，该服务器提供如下功能：从服务器1-服务器n爬取图片并执行图片筛选的处理过程；此外，该服务器可以与面向用户的客户端进行信息交互，以接收用户的请求实现爬取图片并执行图片筛选的处理过程，还可以向用户的客户端发送图片筛选结果等数据，而客户端负责为用户提供UI。1 is a schematic diagram of hardware entities of each party performing information interaction in the embodiment of the present application. FIG. 1 includes: a picture screening device, a server 1 - a server n, wherein the image filtering device performs information interaction with a server through a wired network or a wireless network. . In one example, the image filtering device is disposed in the terminal, and the type of the terminal is, for example, a mobile phone, a desktop computer, a PC, an all-in-one, etc.; the terminal provides at least the following two functions: 1) providing a user interface (UI, Interface) for the user. 2) The process of crawling the picture from the server 1 - server n and performing picture filtering. In another example, the image filtering device is disposed in the server, and the server provides the following functions: a process of crawling the image from the server 1 - server n and performing image filtering; in addition, the server can perform information interaction with the client-oriented client The receiving user can implement the process of crawling the image and performing image filtering, and can also send data such as image screening results to the client of the user, and the client is responsible for providing the UI for the user.

上述图1的例子只是实现本申请实施例的一个***架构实例，本申请实施例并不限于上述图1所述的***结构，基于该***架构，提出本申请各个实施例。The above-mentioned example of FIG. 1 is only an example of a system architecture that implements the embodiments of the present application. The embodiment of the present application is not limited to the system structure described in FIG. 1 above, and various embodiments of the present application are proposed based on the system architecture.

图2为本申请实施例的图片筛选方法的流程示意图一，如图2所示，所述图片筛选方法包括以下步骤：2 is a schematic flowchart 1 of a picture screening method according to an embodiment of the present application. As shown in FIG. 2, the picture screening method includes the following steps:

步骤201：获取第一图片集合。Step 201: Acquire a first picture set.

本申请实施例中，获取第一图片集合的方式可以但不局限于是以下方式：获取用户输入的关键字(也可以是关键词)，根据关键字从各种类型的网站(也可以是数据库)上爬取与关键字相匹配的图片。例如：关键字为“空调”，从各种类型的网站上爬取与“空调”相匹配的图片，这里，与关“空调”相匹配的图片可以是图片上具有空调图案的图片，也可以是图片上具有空调文字的图片。在一实施方式中，网站的类型可以由用户自行设置，例如用户可以设置商业类型的网站、教育类型的网站、娱乐类型的网站等等，这样，就可以根据网站的类型针对性的爬取与关键字相匹配的图片。在另一实施方式中，网站的类型不做限制，具有访问权限的网站均可以实现图片的爬取。In the embodiment of the present application, the manner of obtaining the first set of pictures may be, but is not limited to, the following methods: acquiring keywords (also may be keywords) input by the user, and selecting various types of websites (also may be databases) according to the keywords. Crawl up the image that matches the keyword. For example, the keyword is “air conditioning”, and the pictures matching “air conditioning” are crawled from various types of websites. Here, the picture matching the “air conditioning” may be an image with an air conditioning pattern on the picture, or It is an image with air-conditioning text on the picture. In an embodiment, the type of the website may be set by the user, for example, the user may set a business type website, an education type website, an entertainment type website, etc., so that the type of the website may be crawled according to the type of the website. The image that matches the keyword. In another embodiment, the type of the website is not limited, and the website with the access right can implement the crawling of the picture.

上述方案中，第一图片集合是与关键字相匹配的一类图片的总和，第一图片集合中包括多个与关键字相匹配的图片，然而，第一图片集合中会概率性的存在一些垃圾图片，有需要将这些垃圾图片从第一图片集合中删除。例如：第一图片集合中包括图片1、图片2、图片3、图片4，图片5，其中，图片1和图片5是垃圾图片，需要从第一图片集合中删除，本申请实施例通过以下步骤来实现垃圾图片的删除过程。In the above solution, the first picture set is a sum of a type of picture that matches the keyword, and the first picture set includes a plurality of pictures that match the keyword, however, there are some probabilities in the first picture set. Grunge images, there is a need to remove these junk images from the first collection of images. For example, the first picture set includes a picture 1, a picture 2, a picture 3, a picture 4, and a picture 5. The picture 1 and the picture 5 are garbage pictures, and are deleted from the first picture set. To achieve the process of deleting the garbage pictures.

步骤202：提取所述第一图片集合中的各个图片的特征向量。Step 202: Extract feature vectors of respective pictures in the first picture set.

本申请实施例中，利用计算机视觉技术提取第一图片集合中的各个图片的特征向量。这里，计算机视觉技术是一种利用计算机代替人眼对图片进行识别以及处理的技术。In the embodiment of the present application, the feature vector of each picture in the first picture set is extracted by using computer vision technology. Here, computer vision technology is a technology that uses a computer instead of the human eye to recognize and process pictures.

进一步，本申请实施例使用深度学习(DL，Deep Learning)技术来提取第一图片集合中的各个图片的特征向量。这里，深度学习技术可以从大数据中自动学习特征向量的表示。卷积神经网络(CNN，Convolutional Neural Network)作为深度学习在图像领域的一个应用，其局部权值共享的特殊结构在图像处理方面有着独特的优越性，而且布局更加接近于实际的生物神经网络。Further, the embodiment of the present application uses a deep learning (DL, Deep Learning) technology to extract feature vectors of respective pictures in the first picture set. Here, the deep learning technique can automatically learn the representation of the feature vector from the big data. Convolutional Neural Network (CNN) is an application of deep learning in the field of image. The special structure of local weight sharing has unique advantages in image processing, and the layout is closer to the actual biological neural network.

在图像处理中，将图片表示为像素的向量，比如一个1000×1000的图片，可以表示为一个1000000的向量。将图片的向量数据输入到深度学习模型中，经过一系列的处理(如滤波、卷积、加权、加偏置等)，就可以得到该图片的特征向量。In image processing, a picture is represented as a vector of pixels, such as a 1000×1000 picture, which can be represented as a vector of 1000000. The vector data of the picture is input into the deep learning model, and after a series of processing (such as filtering, convolution, weighting, offsetting, etc.), the feature vector of the picture can be obtained.

例如：图片1的特征向量为P1，图片2的特征向量为P2，图片3的特征向量为P3，图片4的特征向量为P4，图片5的特征向量为P5。For example, the feature vector of picture 1 is P1, the feature vector of picture 2 is P2, the feature vector of picture 3 is P3, the feature vector of picture 4 is P4, and the feature vector of picture 5 is P5.

步骤203：基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中。Step 203: Group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set.

本申请实施例中，图片的特征向量表征了该图片的特征，如果两个图片的特征向量之间的距离越近，则代表这两个图片的相似度越高，如果两个图片的特征向量之间的距离越远，则代表这两个图片的相似度越低。In the embodiment of the present application, the feature vector of the picture represents the feature of the picture. If the distance between the feature vectors of the two pictures is closer, the similarity between the two pictures is higher, if the feature vector of the two pictures is The further the distance between them, the lower the similarity between the two pictures.

假设有两个特征向量：X，Y，其中，X，Y都包含N维特征，具体地，X＝(x1,x2,x3,……..,xn)，Y＝(y1,y2,y3,……..,yn)，计算X和Y的距离可以但不局限于通过以下方法：Suppose there are two feature vectors: X, Y, where X, Y both contain N-dimensional features, specifically, X = (x1, x2, x3, ........, xn), Y = (y1, y2, y3 ,........,yn), calculating the distance between X and Y can be, but is not limited to, by the following methods:

方法一：计算X和Y的欧几里得距离。Method 1: Calculate the Euclidean distance between X and Y.

具体地，X和Y的欧几里得距离为

Specifically, the Euclidean distance between X and Y is

方法二：计算X和Y的曼哈顿距离。Method 2: Calculate the Manhattan distance between X and Y.

具体地，X和Y的曼哈顿距离为

Specifically, the Manhattan distance between X and Y is

方法三：计算X和Y的明可夫斯基距离。Method 3: Calculate the Minkowski distances for X and Y.

具体地，X和Y的明可夫斯基距离为

Specifically, the Minkowski distances of X and Y are

方法四：计算X和Y的余弦相似度。Method 4: Calculate the cosine similarity of X and Y.

具体地，X和Y的余弦相似度为

Specifically, the cosine similarity of X and Y is

本申请实施例基于以上方法中的任意一种可以对第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。The embodiment of the present application may perform clustering on feature vectors of each picture in the first picture set based on any one of the foregoing methods, and group each picture in the first picture set into a group based on the clustering result.

以K-均值聚类法(K-meas)为例，在K-均值聚类法中，以空间中的若干个点(如N个点)为中心进行聚类，对最靠近他们的对象归类。应用于本申请实施例中，聚类的对象为特征向量，聚类的过程大致包括：Taking the K-means clustering method (K-meas) as an example, in the K-means clustering method, clustering is performed centering on several points in the space (such as N points), and the objects closest to them are returned. class. In the embodiment of the present application, the object of the clustering is a feature vector, and the process of clustering generally includes:

1)初始化过程：设置聚类中心的个数为N。1) Initialization process: Set the number of cluster centers to N.

选择(或人为指定)N个特征向量，作为聚类中心。Select (or manually specify) N feature vectors as the cluster center.

2)聚类所述第一图片集合中的各个图片的特征向量。2) Clustering feature vectors of respective pictures in the first picture set.

2.1)按就近原则将其他特征向量向聚类中心凝聚，得到N个分类。2.1) Converging other feature vectors to the cluster center according to the principle of proximity, and obtaining N classifications.

2.2)计算出各个分类的中心位置。2.2) Calculate the central location of each category.

2.3)用2.2)计算出的中心位置，作为新的聚类中心，循环执行2.1)-2.3)，直到聚类中心的位置收敛为止。2.3) Calculate the center position with 2.2) as a new cluster center and loop through 2.1)-2.3) until the position of the cluster center converges.

可见，基于聚类结果，可确定出各组图片对应的聚类中心。It can be seen that, based on the clustering result, the cluster center corresponding to each group of pictures can be determined.

3)分组第一图片集合中的各个图片至分组中，其中组的数量与聚类中心的数量相同。3) Grouping each picture in the first picture set into a group, wherein the number of groups is the same as the number of cluster centers.

例如：设置聚类中心的个数为20，对各个图片的特征向量进行聚类处理后，根据聚类结果将所有的图片划分为20个组，并得到20个聚类中心。For example, the number of cluster centers is set to 20. After clustering the feature vectors of each image, all the images are divided into 20 groups according to the clustering result, and 20 cluster centers are obtained.

步骤204：确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离。Step 204: Determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center.

本申请实施例中，每组图片的聚类中心代表了该组整体的特征，基于各组图片对应的聚类中心，可以计算得到参考中心O。In the embodiment of the present application, the cluster center of each group of pictures represents the overall feature of the group, and the reference center O can be calculated based on the cluster center corresponding to each group of pictures.

例如：共有10组图片，这10组图片对应的聚类中心分别为：O1、O2、O3、O4、O5、O6、O7、O8、O9、O10，参考中心O为这10个聚类中心的平均值。值得注意的是，一个组的聚类中心可以是该组中所包括的特征向量的平均值。例如：一个组中包括如下特征向量：P1、P2、P3，则该组的聚类中心为(P1+P2+P3)/3。For example, there are 10 groups of pictures, and the cluster centers corresponding to the 10 groups of pictures are: O1, O2, O3, O4, O5, O6, O7, O8, O9, O10, and the reference center O is the 10 cluster centers. average value. It is worth noting that the cluster center of a group can be the average of the feature vectors included in the group. For example, if a group includes the following feature vectors: P1, P2, and P3, the cluster center of the group is (P1+P2+P3)/3.

本申请实施例中，确定出各组图片对应的聚类中心后，计算所述各组图片对应的聚类中心与参考中心的距离。In the embodiment of the present application, after determining a cluster center corresponding to each group of pictures, calculating a distance between the cluster center corresponding to each group of pictures and the reference center.

例如：共有10个聚类中心，分别为：O1、O2、O3、O4、O5、O6、O7、O8、O9、O10，这10个聚类中心距离参考中心O的距离均可以通过但不局限于步骤203中的四种距离计算方法来计算。For example, there are 10 cluster centers, which are: O1, O2, O3, O4, O5, O6, O7, O8, O9, O10. The distance between the 10 cluster centers and the reference center O can be passed but not limited. The four distance calculation methods in step 203 are calculated.

步骤205：基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。Step 205: Delete one or more sets of pictures that meet the preset condition from the first picture set, and obtain a second picture set, based on the distance between the cluster center corresponding to the group of pictures and the reference center.

本申请实施例中，预设条件的作用是限定将距离参考中心较远的一组或多组图片从第一集合中删除，这里，满足预设条件一组或多组图片也可以称为垃圾图片，这些垃圾图片的特征向量相对于其他图片的特征向量而言，距离较远，因而相似度较低，将这些垃圾图片从第一图片集合中删除后，可以得到类型较为统一的第二图片集合。本申请实施例的技术方案通过计算机自动化流程实现了图片的筛选过程，极大降低了人工清理成本。In the embodiment of the present application, the preset condition is to limit one or more sets of pictures that are far away from the reference center from the first set. Here, one or more sets of pictures that meet the preset condition may also be referred to as garbage. Pictures, the feature vectors of these junk pictures are far away from the feature vectors of other pictures, so the similarity is low. After deleting these junk pictures from the first picture set, a second picture of a more uniform type can be obtained. set. The technical solution of the embodiment of the present application realizes the screening process of the picture through the computer automatic process, which greatly reduces the labor cleaning cost.

图3为本申请实施例的图片筛选方法的流程示意图二，如图3所示，所述图片筛选方法包括以下步骤：FIG. 3 is a schematic flowchart 2 of a picture screening method according to an embodiment of the present disclosure. As shown in FIG. 3, the picture screening method includes the following steps:

步骤301：获取第一图片集合。Step 301: Acquire a first picture set.

步骤302：提取所述第一图片集合中的各个图片的特征向量。Step 302: Extract feature vectors of respective pictures in the first picture set.

进一步，本申请实施例使用DL技术来提取第一图片集合中的各个图片的特征向量。这里，深度学习技术可以从大数据中自动学习特征向量的表示。CNN作为深度学习在图像领域的一个应用，其局部权值共享的特殊结构在图像处理方面有着独特的优越性，而且布局更加接近于实际的生物神经网络。Further, the embodiment of the present application uses the DL technology to extract feature vectors of respective pictures in the first picture set. Here, the deep learning technique can automatically learn the representation of the feature vector from the big data. As an application of deep learning in the field of image, CNN has a unique advantage in local image weight sharing, and the layout is closer to the actual biological neural network.

步骤303：基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中。Step 303: Group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set.

本申请实施例对第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。The embodiment of the present application clusters feature vectors of each picture in the first picture set, and groups each picture in the first picture set into a group according to the clustering result.

步骤304：确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离。Step 304: Determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center.

步骤305：将聚类中心相对于所述参考中心的距离大于等于预设阈值的一组或多组图片，从所述第一图片集合中删除，得到第二图片集合。Step 305: Delete one or more sets of pictures whose distance from the reference center with respect to the reference center is greater than or equal to a preset threshold, and delete the first picture set to obtain a second picture set.

本申请实施例中，如果聚类中心相对于所述参考中心的距离越大，则代表该聚类中心对应的一组图片为垃圾的图片的概率越大；反之，如果聚类中心相对于所述参考中心的距离越小，则代表该聚类中心对应的一组图片为垃圾的图片的概率越小。In the embodiment of the present application, if the distance between the cluster center and the reference center is larger, the probability that the group of pictures corresponding to the cluster center is garbage is larger; The smaller the distance of the reference center is, the smaller the probability that a group of pictures corresponding to the cluster center is a junk picture.

本申请实施例中，设置一个阈值，如果某个聚类中心相对于所述参考中心的距离大于等于该阈值，则代表该聚类中心对应的一组图片为垃圾图片，将该组图片从第一图片集合中删除，可以得到类型较为统一的第二图片集合。本申请实施例的技术方案通过计算机自动化流程实现了图片的筛选过程，极大降低了人工清理成本。In the embodiment of the present application, a threshold is set. If the distance of a cluster center relative to the reference center is greater than or equal to the threshold, the group of pictures corresponding to the cluster center is a junk image, and the group of pictures is from the first When a picture set is deleted, a second picture set of a more uniform type can be obtained. The technical solution of the embodiment of the present application realizes the screening process of the picture through the computer automatic process, which greatly reduces the labor cleaning cost.

图4为本申请实施例的图片筛选方法的流程示意图三，如图4所示，所述图片筛选方法包括以下步骤：4 is a schematic flowchart 3 of a picture screening method according to an embodiment of the present application. As shown in FIG. 4, the picture screening method includes the following steps:

步骤401：获取第一图片集合。Step 401: Acquire a first picture set.

步骤402：提取所述第一图片集合中的各个图片的特征向量。Step 402: Extract feature vectors of respective pictures in the first picture set.

步骤403：基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中。Step 403: Group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set.

步骤404：确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离。Step 404: Determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center.

步骤405：由大到小排序所述各组图片对应的聚类中心与参考中心的距离，并确定出距离最大的M组图片，M为正整数；从所述第一图片集合中删除所述M组图片，得到第二图片集合。Step 405: Sort the distance between the cluster center corresponding to each group of pictures and the reference center from large to small, and determine the M group picture with the largest distance, M is a positive integer; delete the first picture set from the first picture set M group pictures, get the second picture collection.

本申请实施例中，将各组图片的聚类中心与参考中心的距离按照由大至小进行排序，从第一图片集合中删除距离最大的M组图片，可以得到类型较为统一的第二图片集合。例如：有5组图片，对应的聚类中心分别为：O1、O2、O3、O4、O5，其中，这5个聚类中心与参考中心的距离分别为：S1、S2、S3、S4、S5，按照由大至小排序为：S2、S4、S3、S4、S1，假如需要删除2组图片，那么会将O2和O4对应的两组图片从第一图片集合中删除。本申请实施例的技术方案通过计算机自动化流程实现了图片的筛选过程，极大降低了人工清理成本。In the embodiment of the present application, the distance between the cluster center of each group of pictures and the reference center is sorted according to the largest to smallest, and the M group pictures with the largest distance are deleted from the first picture set, so that a second picture with a more uniform type can be obtained. set. For example, there are 5 groups of pictures, and the corresponding cluster centers are: O1, O2, O3, O4, O5, wherein the distance between the 5 cluster centers and the reference center are: S1, S2, S3, S4, S5 According to the order of S2, S4, S3, S4, and S1, if two sets of pictures need to be deleted, the two sets of pictures corresponding to O2 and O4 are deleted from the first picture set. The technical solution of the embodiment of the present application realizes the screening process of the picture through the computer automatic process, which greatly reduces the labor cleaning cost.

图5为本申请实施例的图片筛选方法的流程示意图四，如图5所示，所述图片筛选方法包括以下步骤：FIG. 5 is a schematic flowchart diagram of a picture screening method according to an embodiment of the present disclosure. As shown in FIG. 5, the picture screening method includes the following steps:

步骤501：获取关键字并爬取与该关键字匹配的图片，形成第一图片集合。Step 501: Acquire a keyword and crawl a picture matching the keyword to form a first picture set.

步骤502：提取所述第一图片集合中的各个图片的特征向量。Step 502: Extract feature vectors of respective pictures in the first picture set.

步骤503：设置聚类中心的个数为N。Step 503: Set the number of cluster centers to N.

步骤504：对各个图片的特征向量进行聚类，并基于聚类结果将各个图片划分为N组。Step 504: Cluster feature vectors of respective pictures, and divide each picture into N groups based on the clustering result.

步骤505：基于聚类结果确定各组图片对应的聚类中心，并基于各个聚类中心计算参考中心。Step 505: Determine a cluster center corresponding to each group of pictures based on the clustering result, and calculate a reference center based on each cluster center.

步骤506：计算每个聚类中心与参考中心的距离。Step 506: Calculate the distance between each cluster center and the reference center.

步骤507：对每个聚类中心与参考中心的距离由大至小进行排序。Step 507: Sort the distance between each cluster center and the reference center from large to small.

步骤508：将距离较远的M个聚类中心对应的M组图片从第一图片集合中删除，得到第二图片集合。Step 508: The M group pictures corresponding to the M cluster centers that are far away from each other are deleted from the first picture set to obtain a second picture set.

图6为本申请实施例的图片筛选装置的结构组成示意图一，如图6所示，所述图片筛选装置包括：FIG. 6 is a first schematic structural diagram of a picture screening apparatus according to an embodiment of the present application. As shown in FIG. 6, the picture screening apparatus includes:

获取单元601，配置为获取第一图片集合；The obtaining unit 601 is configured to acquire a first picture set.

提取单元602，配置为提取所述第一图片集合中的各个图片的特征向量；The extracting unit 602 is configured to extract a feature vector of each picture in the first picture set;

分组单元603，配置为基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；The grouping unit 603 is configured to group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set;

距离确定单元604，配置为确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；The distance determining unit 604 is configured to determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center;

筛选单元605，配置为基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。The filtering unit 605 is configured to delete one or more sets of pictures that meet the preset condition from the first picture set based on the distance between the cluster center corresponding to the group of pictures and the reference center, to obtain a second picture set.

本领域技术人员应当理解，图6所示的图片筛选装置中的各单元的实现功能可参照前述图片筛选方法的相关描述而理解。图6所示的图片筛选装置中的各单元的功能可通过运行于处理器上的程序而实现，也可通过具体的逻辑电路而实现。It should be understood by those skilled in the art that the implementation functions of the units in the picture screening apparatus shown in FIG. 6 can be understood by referring to the related description of the foregoing picture screening method. The functions of the respective units in the picture screening device shown in FIG. 6 can be realized by a program running on a processor, or can be realized by a specific logic circuit.

图7为本申请实施例的图片筛选装置的结构组成示意图二，如图7所示，所述图片筛选装置包括：FIG. 7 is a second schematic structural diagram of a picture screening apparatus according to an embodiment of the present application. As shown in FIG. 7, the picture screening apparatus includes:

获取单元701，配置为获取第一图片集合；The obtaining unit 701 is configured to acquire a first picture set.

提取单元702，配置为提取所述第一图片集合中的各个图片的特征向量；The extracting unit 702 is configured to extract a feature vector of each picture in the first picture set;

分组单元703，配置为基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；The grouping unit 703 is configured to group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set;

距离确定单元704，配置为确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；The distance determining unit 704 is configured to determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center;

筛选单元705，配置为基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。The filtering unit 705 is configured to delete one or more sets of pictures that meet the preset condition from the first picture set based on the distance between the cluster center corresponding to the group of pictures and the reference center, to obtain a second picture set.

在一实施方式中，所述分组单元703，配置为对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。In an embodiment, the grouping unit 703 is configured to cluster feature vectors of each picture in the first picture set, and group each picture in the first picture set to a group based on a clustering result. in.

在一实施方式中，所述分组单元703包括：In an embodiment, the grouping unit 703 includes:

设置子单元7031，配置为设置聚类中心数量；Setting a subunit 7031 configured to set a number of cluster centers;

聚类子单元7032，配置为聚类所述第一图片集合中的各个图片的特征向量；The clustering subunit 7032 is configured to cluster feature vectors of the respective pictures in the first picture set;

划分子单元7033，配置为分组第一图片集合中的各个图片至分组中，其中组的数量与聚类中心的数量相同。The dividing subunit 7033 is configured to group each picture in the first picture set into a group, wherein the number of groups is the same as the number of cluster centers.

在一实施方式中，所述分组单元703，还配置为基于所述聚类结果，确定各组图片对应的聚类中心。In an embodiment, the grouping unit 703 is further configured to determine a cluster center corresponding to each group of pictures based on the clustering result.

在一实施方式中，所述装置还包括：In an embodiment, the device further includes:

参考中心计算单元706，配置为基于所述各组图片对应的聚类中心，计算所述参考中心。The reference center calculation unit 706 is configured to calculate the reference center based on the cluster centers corresponding to the groups of pictures.

在一实施方式中，所述筛选单元705，配置为将聚类中心相对于所述参考中心的距离大于等于预设阈值的一组或多组图片，从所述第一图片集合中删除，得到第二图片集合。In an embodiment, the filtering unit 705 is configured to delete one or more sets of pictures whose distance from the reference center to the reference center is greater than or equal to a preset threshold, and delete the first picture set to obtain The second picture collection.

在另一实施方式中，所述筛选单元705，配置为由大到小排序所述各组图片对应的聚类中心与参考中心的距离，并确定出距离最大的M组图片，M为正整数；从所述第一图片集合中删除所述M组图片，得到第二图片集合。In another embodiment, the screening unit 705 is configured to sort the distance between the cluster center corresponding to each group of pictures and the reference center from large to small, and determine the M group picture with the largest distance, where M is a positive integer. Removing the M sets of pictures from the first set of pictures to obtain a second set of pictures.

本领域技术人员应当理解，图7所示的图片筛选装置中的各单元的实现功能可参照前述图片筛选方法的相关描述而理解。图7所示的图片筛选装置中的各单元的功能可通过运行于处理器上的程序而实现，也可通过具体的逻辑电路而实现。It should be understood by those skilled in the art that the implementation functions of the units in the picture screening apparatus shown in FIG. 7 can be understood by referring to the related description of the foregoing picture screening method. The functions of the respective units in the picture screening device shown in FIG. 7 can be realized by a program running on a processor, or can be realized by a specific logic circuit.

本申请实施例上述装置如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。这样，本申请实施例不限制于任何特定的硬件和软件结合。The above apparatus of the present application may also be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any particular combination of hardware and software.

相应地，本申请实施例还提供一种存储介质，其中存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现本申请实施例的上述图片筛选方法。Correspondingly, the embodiment of the present application further provides a storage medium, where the computer-executable instructions are executed, and the computer-executable instructions are executed by the processor to implement the above-mentioned image screening method in the embodiment of the present application.

图8为本申请实施例的计算机设备的结构组成示意图，如图8所示，所述计算机设备包括存储器801、处理器802及存储在存储器801上并可在处理器802上运行的计算机可执行指令，所述处理器802执行所述计算机可执行指令时实现如下方法步骤：FIG. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in FIG. 8, the computer device includes a memory 801, a processor 802, and a computer executable on the memory 801 and executable on the processor 802. The instructions, when the processor 802 executes the computer executable instructions, implement the following method steps:

获取第一图片集合；Obtaining a first image collection;

以上涉及计算机设备的描述，与上述方法描述是类似的，同方法的有益效果描述，不做赘述。The above description relates to the description of the computer device, which is similar to the description of the above method, and the description of the beneficial effects of the same method will not be repeated.

本申请实施例所记载的技术方案之间，在不冲突的情况下，可以任意组合。The technical solutions described in the embodiments of the present application can be arbitrarily combined without conflict.

在本申请所提供的几个实施例中，应该理解到，所揭露的方法和智能设备，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个***，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided by the present application, it should be understood that the disclosed method and smart device may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元，即可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

另外，在本申请各实施例中的各功能单元可以全部集成在一个第二处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one second processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application.

Claims

一种图片筛选方法，所述方法包括：A picture screening method, the method comprising:

获取第一图片集合；Obtaining a first image collection;

提取所述第一图片集合中的各个图片的特征向量；Extracting feature vectors of respective pictures in the first picture set;

基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；And grouping each picture in the first picture set into a packet according to a feature vector of each picture in the first picture set;

确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；Determining a cluster center corresponding to each group of pictures, and determining a distance between the cluster center corresponding to each group of pictures and the reference center;

基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。And deleting one or more sets of pictures that meet the preset condition from the first picture set to obtain a second picture set, based on a distance between the cluster center corresponding to the group of pictures and the reference center.
根据权利要求1所述的图片筛选方法，其中，所述基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中，包括：The picture screening method according to claim 1, wherein the grouping each picture in the first picture set into a group based on a feature vector of each picture in the first picture set comprises:

对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。Generating feature vectors of respective pictures in the first picture set, and grouping each picture in the first picture set into a group based on the clustering result.
根据权利要求2所述的图片筛选方法，其中，所述对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中，包括：The picture screening method according to claim 2, wherein the feature vectors of each picture in the first picture set are clustered, and each picture in the first picture set is grouped based on a clustering result to In the group, including:

设置聚类中心数量；Set the number of cluster centers;

聚类所述第一图片集合中的各个图片的特征向量；Generating a feature vector of each picture in the first picture set;

分组第一图片集合中的各个图片至分组中，其中组的数量与聚类中心的数量相同。Each picture in the first picture set is grouped into a group, wherein the number of groups is the same as the number of cluster centers.
根据权利要求2或3所述的图片筛选方法，其中，所述确定各组图片对应的聚类中心，包括：The picture screening method according to claim 2 or 3, wherein the determining a cluster center corresponding to each group of pictures comprises:

基于所述聚类结果，确定各组图片对应的聚类中心。Based on the clustering result, a cluster center corresponding to each group of pictures is determined.
根据权利要求4所述的图片筛选方法，其中，所述方法还包括：The picture screening method according to claim 4, wherein the method further comprises:

基于所述各组图片对应的聚类中心，计算所述参考中心。The reference center is calculated based on a cluster center corresponding to each group of pictures.
根据权利要求1所述的图片筛选方法，其中，所述基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合，包括：The picture screening method according to claim 1, wherein the one or more of the first picture set satisfying the preset condition are deleted from the first picture set based on the distance between the cluster center corresponding to the group of pictures and the reference center Group images to get a second collection of images, including:

将聚类中心相对于所述参考中心的距离大于等于预设阈值的一组或多组图片，从所述第一图片集合中删除，得到第二图片集合。And deleting one or more sets of pictures whose distance from the reference center with respect to the reference center is greater than or equal to a preset threshold, and deleting from the first picture set to obtain a second picture set.
根据权利要求1所述的图片筛选方法，其中，所述基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合，包括：The picture screening method according to claim 1, wherein the one or more of the first picture set satisfying the preset condition are deleted from the first picture set based on the distance between the cluster center corresponding to the group of pictures and the reference center Group images to get a second collection of images, including:

由大到小排序所述各组图片对应的聚类中心与参考中心的距离，并确定出距离最大的M组图片，M为正整数；Sorting the distance between the cluster center corresponding to each group of pictures and the reference center from large to small, and determining the M group picture with the largest distance, M is a positive integer;

从所述第一图片集合中删除所述M组图片，得到第二图片集合。And deleting the M group of pictures from the first picture set to obtain a second picture set.
一种图片筛选装置，所述装置包括：A picture screening device, the device comprising:

获取单元，配置为获取第一图片集合；Obtaining a unit, configured to obtain a first picture set;

提取单元，配置为提取所述第一图片集合中的各个图片的特征向量；An extracting unit configured to extract a feature vector of each picture in the first picture set;

分组单元，配置为基于所述第一图片集合中的各个图片的特征向量，分组所述第一图片集合中的各个图片至分组中；a grouping unit, configured to group each picture in the first picture set into a packet based on a feature vector of each picture in the first picture set;

距离确定单元，配置为确定各组图片对应的聚类中心，并确定所述各组图片对应的聚类中心与参考中心的距离；The distance determining unit is configured to determine a cluster center corresponding to each group of pictures, and determine a distance between the cluster center corresponding to each group of pictures and the reference center;

筛选单元，配置为基于所述各组图片对应的聚类中心与参考中心的距离，从所述第一图片集合中删除满足预设条件的一组或多组图片，得到第二图片集合。The filtering unit is configured to delete one or more sets of pictures that meet the preset condition from the first picture set based on the distance between the cluster center corresponding to the group of pictures and the reference center, to obtain a second picture set.
根据权利要求8所述的图片筛选装置，其中，所述分组单元，配置为对所述第一图片集合中的各个图片的特征向量进行聚类，并基于聚类结果分组所述第一图片集合中的各个图片至分组中。The picture screening apparatus according to claim 8, wherein the grouping unit is configured to cluster feature vectors of respective pictures in the first picture set, and group the first picture set based on a clustering result Each picture in the group is in the group.
根据权利要求9所述的图片筛选装置，其中，所述分组单元包括：The picture screening device according to claim 9, wherein the grouping unit comprises:

设置子单元，配置为设置聚类中心数量；Set subunits, configured to set the number of cluster centers;

聚类子单元，配置为聚类所述第一图片集合中的各个图片的特征向量；a clustering subunit configured to cluster feature vectors of respective pictures in the first picture set;

划分子单元，配置为分组第一图片集合中的各个图片至分组中，其中组的数量与聚类中心的数量相同。The sub-units are configured to group each picture in the first picture set into a group, wherein the number of groups is the same as the number of cluster centers.
根据权利要求9或10所述的图片筛选装置，其中，所述分组单元，还配置为基于所述聚类结果，确定各组图片对应的聚类中心。The picture screening apparatus according to claim 9 or 10, wherein the grouping unit is further configured to determine a cluster center corresponding to each group of pictures based on the clustering result.
根据权利要求11所述的图片筛选装置，其中，所述装置还包括：The picture screening device according to claim 11, wherein the device further comprises:

参考中心计算单元，配置为基于所述各组图片对应的聚类中心，计算所述参考中心。The reference center calculation unit is configured to calculate the reference center based on the cluster centers corresponding to the groups of pictures.
根据权利要求8所述的图片筛选装置，其中，所述筛选单元，配置为将聚类中心相对于所述参考中心的距离大于等于预设阈值的一组或多组图片，从所述第一图片集合中删除，得到第二图片集合。The picture screening device according to claim 8, wherein the screening unit is configured to set one or more sets of pictures whose distance from the cluster center to the reference center is greater than or equal to a preset threshold, from the first Delete the image collection to get the second image collection.
根据权利要求8所述的图片筛选装置，其中，所述筛选单元，配置为由大到小排序所述各组图片对应的聚类中心与参考中心的距离，并确定出距离最大的M组图片，M为正整数；从所述第一图片集合中删除所述M组图片，得到第二图片集合。The picture screening device according to claim 8, wherein the screening unit is configured to sort the distance between the cluster center corresponding to each group of pictures and the reference center from large to small, and determine the M group picture with the largest distance. , M is a positive integer; deleting the M group of pictures from the first picture set to obtain a second picture set.
一种存储介质，其上存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现权利要求1-7任一项所述的方法步骤。A storage medium having stored thereon computer executable instructions for performing the method steps of any of claims 1-7 when executed by a processor.
一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可执行指令，所述处理器执行所述计算机可执行指令时实现权利要求1-7任一项所述的方法步骤。A computer apparatus comprising a memory, a processor, and computer executable instructions stored on the memory and executable on the processor, the processor executing the computer executable instructions to implement any of claims 1-7 Method steps described.