CN111046209B

CN111046209B - Image clustering retrieval system

Info

Publication number: CN111046209B
Application number: CN201911249152.4A
Authority: CN
Inventors: 张峰; 李淼; 赵婷
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2023-07-25
Anticipated expiration: 2039-12-09
Also published as: CN111046209A

Abstract

The invention belongs to the field of image processing and software, in particular relates to an image clustering retrieval system, and aims to solve the problems of low clustering accuracy and high time axis retrieval difficulty caused by randomness and uncertainty of image shooting. The system comprises an image retrieval module, a text retrieval module, a browsing information retrieval module and a database; the database is configured to acquire and store a plurality of images in specific activities, and acquire a cluster image set of each target person and color labels and feature vectors of set parts through unsupervised clustering; the image retrieval module is configured to match the input character image with a clustered image set of the database; the text retrieval module is configured to acquire a cluster image set matched in the database based on the input color information; the browsing information retrieval module is configured to acquire a matched cluster image set in the database according to the image with the attention degree larger than the set threshold. The invention improves the clustering accuracy through the unsupervised clustering and reduces the retrieval difficulty through different retrieval modes.

Description

Image clustering retrieval system

Technical Field

The invention belongs to the field of image processing and software, and particularly relates to an image clustering retrieval system.

Background

With the rapid development of network and multimedia technologies, digital information including sound, graphics, images, video, and animation has been rapidly expanded. The image is focused by people as a media information with rich content and visual representation. In real life, a large number of images are generated at any moment, and how to find out images meeting the requirements of users from the image information is a problem that needs to be solved by researchers. For example, from a skier's perspective, it is desirable to preserve the moment of action that is wonderful during skiing, so many snowfields also provide shooting conditions for the skier to take a picture, record the moment of the wonderful on the snow track, and upload to the skier's software, which can be downloaded as needed. However, as the number of people in snowy skis increases rapidly, the number of uploaded images also increases rapidly, and the software on the market only provides searching according to a general time axis, so that the difficulty of searching the images of the user by the user of the skiing software is greatly increased, and even the user can miss the wanted images due to the too high browsing speed. Therefore, there is a strong need for an image cluster search system that efficiently clusters and filters all images and feeds them back to the user.

Ski-field images have a number of problems that can have a significant impact on image clustering. Such as: due to the problems of shooting angle and distance, the target person in the field of view is smaller; the illumination change in one day in snow season has great influence on the shooting effect of the image; the shielding of the snow cap and the snow mirror makes facial recognition difficult, and other features need to be extracted for clustering; the uncertainty of skier number results in failure to perform a fixed category number of clusters for all images, etc. Aiming at the cluster retrieval problem of the skiing field images, the key is to extract the effective characteristics of skiers and feed back proper images to users according to the requirements or operation habits of the users. Therefore, the invention provides an image clustering retrieval system.

Disclosure of Invention

In order to solve the problems in the prior art, namely, the problems of low clustering accuracy caused by randomness and uncertainty of image shooting in specific activities and high searching difficulty caused by image searching according to a time axis, the invention provides an image clustering searching system which comprises one or more clients and a server; the client is connected with the server and comprises an image retrieval module and/or a text retrieval module and/or a browsing information retrieval module; the server comprises a database;

the database is configured to acquire and store a plurality of images of a plurality of target persons in a specific activity, perform image clustering on the target persons through an unsupervised clustering method, acquire a clustering image set of each target person, and acquire color labels of set positions of the corresponding target persons in each clustering image set and feature vectors corresponding to the target persons;

the image retrieval module is configured to match an input person image with a clustering image set of each target person in the database to obtain a type of image with the largest matching degree;

the character retrieval module is configured to acquire a clustering image set of a target person matched with the input color information of the set part in the database;

the browsing information retrieval module is configured to acquire the attention degree of a user browsing image according to a set attention degree calculation rule, and acquire a clustering image set of a target person matched with the image according to an image with the attention degree larger than a set threshold value in the database.

In some preferred embodiments, "image clustering of target persons by an unsupervised clustering method" is performed by:

extracting feature vectors of a plurality of images of a plurality of target persons respectively, and performing dimension reduction processing to obtain low-dimension feature vectors corresponding to the images;

acquiring cluster centers of feature vectors of various images in the database, calculating the distance between each low-dimensional feature vector and each cluster center, judging the same class if the distance is smaller than a preset distance threshold value, and updating the value of the cluster center corresponding to the class of images; otherwise, the value of the low-dimensional feature vector is used as a new clustering center.

In some preferred embodiments, the method of extracting feature vectors of a plurality of images of a plurality of target persons and performing dimension reduction processing includes:

and extracting the feature vectors of the images of the target persons based on the convolutional neural network, and performing dimension reduction processing on the feature vectors of the images through the self-encoder.

and acquiring color histograms of all parts of the target persons in the images of the plurality of target persons based on the convolutional neural network, and performing dimension reduction processing on the color histograms through a principal component analysis algorithm PCA.

In some preferred embodiments, the method of "obtaining the color label of the set portion of the target person and the feature vector corresponding to the target person in each clustered image set" includes:

color histograms corresponding to the target person setting parts in each cluster image set obtained based on the convolutional neural network; obtaining a color label according to the set part and the corresponding color histogram;

and acquiring the clustering centers of all the images corresponding to the target person in the database, and taking the clustering centers as the feature vectors corresponding to the target person.

In some preferred embodiments, the text retrieval module obtains the cluster image set of the target person matched with the input set part based on the input color information of the set part, and the method comprises the following steps:

acquiring color information of a set part of an image to be retrieved by a user in a text or voice input mode;

and acquiring a cluster image set of the target person matched with the color information in the database according to the color information.

In some preferred embodiments, the method for acquiring the attention degree of the user browsing the image according to the set attention degree calculation rule in the browsing information retrieval module includes:

and acquiring the attention degree of the image browsed by the user according to the time of the image browsed by the user and/or the frequency of clicking the browsed image and/or the downloaded, concerned and praised image.

The invention has the beneficial effects that:

the invention improves the accuracy of clustering through unsupervised clustering and reduces the retrieval difficulty through different retrieval modes. According to the invention, the characteristic vector of the image in the image set is extracted and subjected to dimension reduction processing in two modes of a convolutional neural network and a self-encoder and a convolutional neural network and a principal component analysis algorithm, and the characteristic vector corresponding to the image category, the color label and the target person is obtained through unsupervised clustering, so that the defect caused by a single clustering mode is overcome, and the clustering accuracy is improved. When searching is carried out, three methods of image searching, text searching and browsing information searching are provided according to the requirements and operation habits of users, so that the searching difficulty is reduced, and the user experience satisfaction is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings.

FIG. 1 is a schematic diagram of a system architecture of an image cluster retrieval system of one embodiment of the invention;

FIG. 2 is a flow diagram of a first unsupervised clustering scheme in accordance with one embodiment of the present invention;

FIG. 3 is a flow diagram of a second unsupervised clustering scheme in accordance with one embodiment of the present invention;

FIG. 4 is a schematic diagram of a retrieval process of an image cluster retrieval system of one embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.

The image clustering retrieval system of the invention, as shown in figure 1, comprises one or more clients and servers; the client is connected with the server and comprises an image retrieval module and/or a text retrieval module and/or a browsing information retrieval module; the server includes a database:

In order to more clearly describe the image clustering retrieval system of the present invention, each step in one embodiment of the method of the present invention will be described in detail below with reference to the accompanying drawings.

1. The database is configured to acquire and store a plurality of images of a plurality of target persons in a specific activity, perform image clustering on the target persons through an unsupervised clustering method, acquire a clustering image set of each target person, and acquire color labels of set positions of the corresponding target persons in each clustering image set and feature vectors corresponding to the target persons.

The invention mainly aims at clustering images in specific activities, in the embodiment, the images of a skiing field are preferably used as objects for clustering and searching, and an unsupervised clustering mode is adopted for the clustering of the images. Because of randomness and uncertainty of skiers in the shot images, features of each skier cannot be extracted in advance to match the images, so that unsupervised clustering can only be performed on the images. When all the images corresponding to the same person are found, the feature vector corresponding to a certain target person is obtained, and meanwhile, the images of the target person, such as red snowwear, black snow trousers, blue snow boots and the like, can be marked according to the requirements to obtain the color label corresponding to the target person, namely the color label of a certain type of images.

The present embodiment provides two methods of unsupervised clustering, and the two clustering methods can be roughly summarized into two steps: firstly, extracting features of lower dimensionality of an image by performing dimension reduction treatment on the images to be clustered; and secondly, clustering the extracted features by using a clustering algorithm. Next, two unsupervised clustering methods are described.

The first unsupervised clustering method: as shown in fig. 2, a trained neural network in a large classified data set, such as VGG16, resNet, inceptionNet, is adopted, ski images to be clustered (images in a database) are input into a pre-trained neural network model, the output result of the last convolution layer of the network is extracted as the feature vector of the image, and in doing so, the input image is subjected to dimension reduction processing, for example, RGB images input as 224×224×3 can be reduced to 1024 dimensions.

And training a self-encoder by using the feature vectors after the dimension reduction of the image so as to further extract the feature vectors with lower dimension of the image. The self-encoder belongs to unsupervised training and consists of an encoding part and a decoding part. The encoding part compresses the input into a feature vector with lower dimension, the decoding part is opposite, and the compressed feature vector with lower dimension restores the original input data through the neuron layer symmetrical to the encoding part. Thus, the lower dimensional feature vectors derived from the encoding portion of the encoder can be considered to better represent the original 224 x 3 RGB image.

Then, for the low-dimensional feature vector obtained for each image, the distance between the vectors is calculated, and the distance may be euclidean distance, manhattan distance, angle cosine, or the like. And comparing the distance between the feature vectors of different images with a preset distance threshold value to determine whether the group of images are the same skier. When certain images are determined to be the same skier, the cluster centers of the feature vectors of those images may be determined. When a new image is generated, the image is input into a feature extraction network to extract features, the extracted features are compared with the obtained clustering centers, if the difference between the comparison result and a certain clustering center is smaller than a set threshold value, the image is allocated to the category of the clustering center, and the value of the clustering center is updated. Otherwise, if the difference between the characteristic of the image and any cluster center is far greater than the set threshold, the image is considered to belong to a new category, and the characteristic value of the image is taken as a new cluster center. And the like until all images get clustering results. Wherein, Y in fig. 2 and 3 represents that the difference between the extracted feature and the cluster center is smaller than a set threshold, and N represents that the difference between the extracted feature and the cluster center is larger than the set threshold.

The second unsupervised clustering method: as shown in fig. 3, the skier in the image (the image in the database) is analyzed for the human body by using the trained neural network, the parts such as the head, the upper body, the legs, the feet and the like are obtained, and then the color histograms of the parts of the skier in each image are counted. The color histogram obtained in this way has high dimension, and the Principal Component Analysis (PCA) algorithm can be used for reducing the dimension of the histogram data, so that the calculated amount is reduced, and the subsequent classification processing is facilitated. The histogram data with reduced dimension can also be regarded as a low-dimension feature vector of the original image, and the subsequent clustering flow is the same as the processing procedure in the first unsupervised clustering method. And, can mark the successful picture of all classification according to the colour of each position of health, for example, "red snow suit, black snow trousers, blue snow boots" etc.. In the present invention, human body analysis is preferably performed on a human body through a deep neural network in a sample-class manner, or a traditional computer vision manner is adopted, and in other embodiments, other human body analysis methods may be selected.

And carrying out image clustering on the target characters by the two unsupervised clustering methods to obtain a clustered image set of each target character, and obtaining color labels of set parts of the corresponding target characters in each clustered image set and feature vectors corresponding to the target characters. The feature vector corresponding to the target person is the clustering center corresponding to all images of the target person in the database.

After all the image gathering classes are stored in the database, the user can extract images according to the requirements, and the system can recommend some interesting images for the user according to the operation habits of the user. So when the user performs the search, the system is classified into image search, text search, and browsing information search, as described in fig. 4. The general search module is carried out at a client, and the client can be a computer end, a mobile phone end, a portable networking device and the like.

2. The image retrieval module is configured to match the input person image with the clustering image set of each target person in the database, and obtain the image with the largest matching degree.

In this embodiment, the user may upload an image of the skiing equipment, and the server extracts a feature vector from the image uploaded by the user according to an algorithm for extracting feature vectors from the image during clustering, compares the feature vector with a cluster center, finds a cluster center closest to the cluster center, and feeds back all images belonging to the cluster center to the user.

The image retrieval module has many application scenes in real life, for example, in marathon activities or other activities, an athlete can stand in front of a cabinet-type large screen with a camera shooting function, and an image of a current target person is shot, so that the shot image of the target person in the current activity is displayed on the large screen. Or in the entrance and exit of the rest room of the specific activity, the entrance guard setting matched with the image retrieval module is arranged. The camera or photographing equipment captures images of people entering and exiting to control whether the entrance guard is allowed to pass or not, so that the safety of activities is improved.

3. The text retrieval module is configured to acquire a clustering image set of the target person matched with the input color information of the set part in the database.

In this embodiment, the color information of the set portion of the image to be retrieved by the user is obtained by inputting text or voice, and other input methods may be preferred as long as the final result can be converted into text information.

And acquiring a cluster image set of the target person matched with the color information in the database according to the color information. For example: the user can manually input characteristic characters to search for target images, such as white snow clothes, black snow trousers, black snow boots and the like, and the server performs matching with color labels in the database according to the input characters and feeds back all images with highest matching degree to the user.

The text retrieval module has a lot of application scenes in real life, for example, in athletic activities or other activities, when a host or audience shoutes the name or the number of a certain player or after each part is worn, a large screen on the site rapidly acquires the picture of the certain player, displays the picture on the large screen, and increases the atmosphere on the site and the interactivity of participators.

4. The browsing information retrieval module is configured to acquire the attention degree of a user browsing image according to a set attention degree calculation rule, and acquire a clustering image set of a target person matched with the image according to an image with the attention degree larger than a set threshold value in the database.

In this embodiment, according to the browsing click condition of the user, the feature center of the image focused by the user is extracted, and then all photos of the cluster center closest to the feature center in the database are fed back to the user. The attention degree calculation rule is set as follows: (1) Recording the browsing time of each photo by a user, extracting the characteristics of the photo with long browsing time by the user, and finding a characteristic center; (2) Recording clicking operation of a user, extracting features from photos focused by the user, and finding a feature center; (3) And extracting the characteristics of the photos such as downloading, focusing, praise and the like of the user, and finding a characteristic center.

It should be noted that, in the image clustering search system provided in the foregoing embodiment, only the division of the foregoing functional modules is illustrated, in practical application, the foregoing functional allocation may be performed by different functional modules, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present invention are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present invention.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. An image clustering retrieval system comprises one or more clients and a server; the client is connected with the server and is characterized in that in the cluster retrieval system, the client comprises an image retrieval module and/or a text retrieval module and/or a browsing information retrieval module; the server comprises a database;

image clustering is carried out on target characters by an unsupervised clustering method, and the method comprises the following steps:

the method comprises the steps of respectively extracting feature vectors of a plurality of images of a plurality of target persons, and performing dimension reduction processing, wherein the method comprises the following steps:

extracting feature vectors of a plurality of images of a plurality of target persons based on a convolutional neural network, and performing dimension reduction processing on the feature vectors of the images through a self-encoder; acquiring color histograms of all parts of the target persons in a plurality of images of the plurality of target persons based on a convolutional neural network, and performing dimension reduction processing on the color histograms through a principal component analysis algorithm PCA;

acquiring cluster centers of feature vectors of various images in the database, calculating the distance between each low-dimensional feature vector and each cluster center, judging the same class if the distance is smaller than a preset distance threshold value, and updating the value of the cluster center corresponding to the class of images; otherwise, taking the value of the low-dimensional feature vector as a new clustering center;

2. The image clustering search system according to claim 1, wherein the method of acquiring the color label of the set part of the corresponding target person and the feature vector corresponding to the target person in each clustered image set is as follows:

3. The image clustering search system according to claim 1, wherein the text search module obtains a cluster image set of a target person matched with the input set part based on the input color information, and the method comprises:

4. The image clustering search system according to claim 1, wherein the browsing information search module "obtains the attention degree of the user browsing the image according to the set attention degree calculation rule" comprises the following steps: