CN115129915A

CN115129915A - Repeated image retrieval method, device, equipment and storage medium

Info

Publication number: CN115129915A
Application number: CN202110326462.2A
Authority: CN
Inventors: 苗锋; 蔡道楠; 靖振宇; 刘聪
Original assignee: Soyoung Technology Beijing Co Ltd
Current assignee: Soyoung Technology Beijing Co Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-09-30

Abstract

The application provides a repeated image retrieval method, a repeated image retrieval device, repeated image retrieval equipment and a storage medium, wherein the method comprises the following steps: acquiring a feature vector and a first local feature descriptor of an image to be retrieved; acquiring image identifications corresponding to a preset number of nearest neighbor images of the image to be retrieved according to the preset feature index and the feature vector; respectively acquiring a second local feature descriptor of each nearest neighbor image according to each image identifier; and determining an image which is repeated with the image to be retrieved from each nearest neighbor image according to the first local feature descriptor and each second local feature descriptor. According to the method and the device, a plurality of nearest neighbor images are roughly screened out from a large number of images, repeated images are determined through local feature descriptor matching, the retrieval range is narrowed, and the retrieval efficiency is improved. And the deep learning characteristic embedded model coarse screening based on the similar image training modified by manual work can accurately recall the manually processed images, and the accuracy of repeated image retrieval is improved.

Description

Repeated image retrieval method, device, equipment and storage medium

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a repeated image retrieval method, a repeated image retrieval device, repeated image retrieval equipment and a storage medium.

Background

With the rapid development of the mobile internet, each large website pays more and more attention to the construction of a content platform. The content platform usually has a huge amount of image content, and it becomes a difficult problem to search for repeated images in the huge amount of images.

At present, a retrieval method for a repeated image is provided in the related art, the method extracts a local feature descriptor of an image to be retrieved and a local feature descriptor of each image in an image library, matches the local feature descriptor of the image to be retrieved with the local feature descriptor of each image in the image library, and determines an image with the number of the matched local feature descriptors being greater than a threshold value as an image repeated with the image to be retrieved.

However, the operation of matching the local feature descriptors in the related art is time-consuming, and the efficiency is low when repeated images are retrieved from a large number of images included in the image library by using the local feature descriptor matching method.

Disclosure of Invention

The application provides a repeated image retrieval method, a device, equipment and a storage medium, wherein a plurality of nearest neighbor images are roughly screened from a large number of images, repeated images are determined from the images through local feature descriptor matching, the retrieval range is narrowed, and the retrieval efficiency is improved.

An embodiment of a first aspect of the present application provides a repeated image retrieval method, including:

acquiring a feature vector and a first local feature descriptor of an image to be retrieved;

acquiring image identifications corresponding to a preset number of nearest neighbor images of the image to be retrieved according to a preset feature index and the feature vector;

respectively acquiring a second local feature descriptor of each nearest neighbor image according to each image identifier;

and determining an image which is repeated with the image to be retrieved from each nearest neighbor image according to the first local feature descriptor and each second local feature descriptor.

In some embodiments of the present application, the obtaining, according to a preset feature index and the feature vector, image identifiers corresponding to a preset number of nearest neighbor images of the image to be retrieved includes:

searching a preset number of nearest neighbor feature vectors of the feature vectors and an image number corresponding to each nearest neighbor feature vector in a preset feature index;

and respectively determining the image identifier of each nearest neighbor image corresponding to the image to be retrieved according to a preset image information base and each acquired image number.

In some embodiments of the present application, the preset feature index includes a temporary index and a graph structure index; before the obtaining of the image identifiers corresponding to the preset number of nearest neighbor images of the image to be retrieved according to the preset feature index and the feature vector, the method further includes:

acquiring a feature vector of an image to be put into a warehouse, and storing the feature vector of the image to be put into the warehouse in the temporary index;

and when the total number of the vectors currently stored in the temporary index is greater than or equal to a preset number, storing all the vectors stored in the temporary index into the graph structure index in batches.

In some embodiments of the application, the obtaining, according to each image identifier, a second local feature descriptor of each nearest neighbor image respectively includes:

according to each image identifier, respectively acquiring a descriptor identifier corresponding to each nearest neighbor image from a preset image information base;

and respectively acquiring a second local feature descriptor of each nearest neighbor image from a preset descriptor library according to each descriptor identifier.

In some embodiments of the present application, the determining, from each of the nearest neighbor images, an image that is duplicate to the image to be retrieved according to the first local feature descriptor and each of the second local feature descriptors includes:

respectively carrying out matching operation on the first local feature descriptor and each second local feature descriptor, and determining the number of descriptors matched by the first local feature descriptor and each second local feature descriptor;

selecting second local feature descriptors with the number of matched descriptors larger than a preset threshold value from each second local feature descriptor;

and determining the nearest neighbor image corresponding to the selected second local feature descriptor as an image which is repeated with the image to be retrieved.

In some embodiments of the present application, before the obtaining the feature vector and the first local feature descriptor of the image to be retrieved, the method further includes:

constructing a structure of a deep learning feature embedded model;

acquiring a training set, wherein the training set comprises a plurality of similar image groups, and the similar image groups comprise original images and a plurality of similar images obtained by transforming the original images;

and training the deep learning feature embedded model according to the training set.

In some embodiments of the present application, the structure for constructing the deep learning feature embedded model includes:

connecting an image encoder with a distance determination module;

connecting the distance determination module with a loss determination module.

In some embodiments of the present application, said training said deep-learning feature embedding model according to said training set comprises:

acquiring a plurality of similar image groups from the training set;

respectively extracting a feature vector of each image in each similar image group through the image encoder;

respectively calculating an intra-group distance value between the feature vector of the original image and the feature vector of each similar image in the same similar image group and respectively calculating an inter-group distance value between the feature vector of the original image and the feature vector of each image in other similar image groups by the distance determining module;

selecting a maximum intra-group distance value from each of the intra-group distance values, and selecting a minimum inter-group distance value from each of the inter-group distance values;

and calculating the loss value of the current training period through the loss determining module according to the maximum inter-group distance value and the minimum inter-group distance value.

In some embodiments of the present application, the method further comprises:

if the number of the current trained cycles is larger than or equal to the preset number, determining the model parameter corresponding to the training cycle with the minimum loss value and the structure of the deep learning feature embedded model as a trained deep learning feature embedded model;

and if the number of the current training cycles is less than the preset number, adjusting model parameters according to the loss value of the current training cycle, and training the next cycle according to the adjusted model parameters.

In some embodiments of the present application, the obtaining a feature vector and a first local feature descriptor of an image to be retrieved includes:

receiving a repeated image retrieval request of a user;

if the repeated image retrieval request comprises an image identifier of an image to be retrieved, acquiring a feature vector corresponding to the image to be retrieved from a preset feature index according to the image identifier, and acquiring a first local feature descriptor corresponding to the image to be retrieved from a preset descriptor library; the feature vectors in the preset feature index are extracted through the trained deep learning feature embedded model;

if the repeated image retrieval request does not comprise the image identifier of the image to be retrieved, downloading the image to be retrieved according to the URL of the image to be retrieved, which is included in the repeated image retrieval request; extracting a feature vector of the image to be retrieved through the trained deep learning feature embedding model; and extracting a first local feature descriptor of the image to be retrieved.

In some embodiments of the present application, the method further comprises:

acquiring an image to be warehoused according to image basic information corresponding to the image to be warehoused;

extracting a feature vector of the image to be put in storage through the trained deep learning feature embedded model;

extracting a local feature descriptor of the image to be put in storage;

storing the feature vector of the image to be put in storage and the image identification included by the image basic information in the preset feature index;

storing the local feature descriptors of the images to be warehoused in a preset descriptor library to obtain descriptor identifications corresponding to the images to be warehoused;

and storing the image basic information and the descriptor identification in a preset image information base.

An embodiment of a second aspect of the present application provides a duplicate image retrieval apparatus, including:

the characteristic acquisition module is used for acquiring a characteristic vector and a first local characteristic descriptor of the image to be retrieved;

the nearest neighbor determining module is used for acquiring image identifications corresponding to a preset number of nearest neighbor images of the image to be retrieved according to a preset feature index and the feature vector; respectively acquiring a second local feature descriptor of each nearest neighbor image according to each image identifier;

and the repeated image determining module is used for determining an image which is repeated with the image to be retrieved from each nearest neighbor image according to the first local feature descriptor and each second local feature descriptor.

Embodiments of the third aspect of the present application provide an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method of the first aspect.

An embodiment of a fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor to implement the method of the first aspect.

The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages:

in the embodiment of the application, according to the feature vector and the preset feature index of the image to be retrieved, the nearest neighbor images with the preset number are roughly screened out from a large number of images. And then determining images which are repeated with the image to be retrieved from the nearest neighbor images in a local feature descriptor matching mode. The method has the advantages that the rough screening is carried out based on the feature vectors, the retrieval range of the repeated images is greatly reduced, the repeated images are retrieved from the nearest neighbor images with less number in a local feature descriptor matching mode, the time of retrieving the repeated images is shortened, and the retrieval efficiency of the repeated images is improved.

Further, coarse screening is carried out on the basis of the trained deep learning feature embedded model, the repetition proportion of images in the coarsely screened nearest neighbor images is high, and the sequence of the repeated images is closer to the front. The deep learning feature embedded model is trained by using similar images obtained through manual modification and transformation, for any two similar images, feature vectors are extracted through the deep learning feature embedded model, the distance value between the feature vectors of the similar images is very small, the improvement of the precision of nearest neighbor image retrieval is facilitated, the images formed through manual processing can be accurately recalled in the coarse screening process, and the accuracy of repeated image retrieval is improved. The vectors are stored by adopting the temporary index and the graph structure index, the vectors are stored in the temporary index when the vectors to be put in storage are received, the temporary index supports quick vector updating, the data volume of the temporary index is small, and accurate retrieval results can be quickly obtained through violent retrieval. And when the number of the vectors in the temporary index reaches a preset threshold value, storing all the vectors in the temporary index into the graph structure index in batch, thereby realizing the online real-time updating of the vectors. The graph structure index still has good query performance under mass data. Indexes are constructed by the two index libraries in a grading way, so that the vector storage efficiency is improved, and the requirements of rapid vector storage and query of business requirements are met.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings.

In the drawings:

FIG. 1 is a flow chart illustrating a method for duplicate image retrieval according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a deep learning feature embedding model provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of another structure of a deep learning feature embedding model provided by an embodiment of the present application;

fig. 4 is a schematic diagram illustrating structural units included in a residual error network 50 according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a residual network 50 provided in an embodiment of the present application for extracting feature vectors of an image;

FIG. 6 is a flow chart illustrating a method for duplicate image retrieval according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a duplicate image retrieval apparatus according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating an electronic device according to an embodiment of the present application;

fig. 9 is a schematic diagram of a storage medium according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

A duplicate image retrieval method, apparatus, device and storage medium according to an embodiment of the present application are described below with reference to the accompanying drawings.

Some duplicate image retrieval methods are proposed in the related art at present. For example, the content-based image retrieval is to search for other images with the same or similar content as the image to be retrieved, the returned result is substantially different from the repeated image retrieval, and the images with the same or similar content are not necessarily the repeated images, and the method is not suitable for being directly used for the repeated image retrieval. For another example, a method based on local feature descriptor matching is also proposed in the related art, but the number of local feature descriptors extracted from different images is inconsistent, and a certain time is required for extracting and matching the local feature descriptors, so that the method is very time-consuming to implement in a large number of images, and a complex optimization algorithm must be adopted to make a certain choice, so that real-time matching and retrieval of repeated images can be realized. Compared with the two methods, the method based on the perceptual hash similarity has the main disadvantage that the homologous images cannot be recalled after being manually processed by rotating, cutting, adding mosaic and the like.

Based on the above problems in the related art, embodiments of the present application provide a repeated image retrieval method, in which an image is transformed through various manual processing operations such as rotation, scaling, translation, clipping, mirroring, watermarking, mosaic, etc., a training set including a plurality of similar image groups is made, and a deep learning feature embedding model is trained using the training set. The intra-group distance between two images belonging to the same group of similar images is taken into account in the training process, as well as the inter-group distance between two images belonging to different groups of similar images. By utilizing the model to extract the characteristic vector of the image and performing repeated image retrieval based on the extracted characteristic vector, the images which are repeated with the image to be retrieved can be ranked more forward in the retrieval result, and the manually processed and modified images can be recalled.

According to the embodiment of the application, the preset number of nearest neighbor images corresponding to the image to be retrieved are roughly screened out through the deep learning feature embedding model, then the repeated images are determined from the preset number of nearest neighbor images in a local feature descriptor matching mode, the image range matched by the local feature descriptors is greatly reduced, the situation that the local feature descriptors are directly applied to massive images is avoided, the calculated amount is obviously reduced, and the calculation resources are saved.

Referring to fig. 1, the embodiment of the present application trains a deep learning feature embedding model through the following operations of steps S1-S3, which specifically includes:

s1: and constructing a structure of the deep learning feature embedded model.

As shown in fig. 2, the image encoder is connected to the distance determination module, and the distance determination module is connected to the loss determination module. Wherein the image encoder is used for extracting the feature vector of the image. The distance determining module is used for calculating a distance value between the feature vectors of the two images, wherein the distance value can be Euclidean distance, cosine distance and the like. And the loss determining module is used for calculating the loss value of each training period according to the distance value determined by the distance determining module.

Since the distance determining module calculates the distance value between the feature vectors of the two images, in order to improve the model training speed, two identical image encoders may be further disposed in the structure of the deep learning feature embedding model, and the feature vectors of the two images are extracted simultaneously by the two identical image encoders, as shown in fig. 3, and both of the two identical image encoders are connected to the distance determining module.

In the embodiment of the present application, the image encoder may be a residual network 50, and fig. 4 shows a structural Unit included in the residual network 50, where the structural Unit employs a ReLU (Rectified Linear Unit) activation function, and the structural Unit includes three convolutional layers, which are sequentially convolutional layers with convolutional kernels of 1 × 1 and 64 channels, convolutional layers with convolutional kernels of 3 × 3 and 64 channels, and convolutional layers with convolutional kernels of 1 × 1 and 256 channels. The residual network 50 includes a plurality of structural units shown in fig. 4, and the entire network structure of the residual network 50 is shown in table 1.

TABLE 1

As shown in fig. 5, a 128-dimensional feature vector is extracted from an image processed by the residual error network 50. In the embodiment of the present application, the image encoder may adopt the residual error network 50 or any other network capable of extracting the feature vector of the image, and the embodiment of the present application does not limit the specific network adopted by the image encoder, and the specific structure of the image encoder may be determined according to the requirement in practical application.

After the structure of the deep learning feature embedded model is constructed by this step, the model is trained by the operations of steps S2 and S3 as follows.

S2: acquiring a training set, wherein the training set comprises a plurality of similar image groups, and the similar image groups comprise original images and a plurality of similar images obtained by transforming the original images.

Firstly, a large number of original images are obtained, and each original image is modified and transformed manually, such as adding mosaic, sticker, watermark and the like in the original image, or rotating or cutting the original image. And obtaining a plurality of similar images corresponding to the original image after manual modification and transformation. For each original image, the original image and a plurality of corresponding similar images form a similar image group. And combining the obtained multiple similar image groups into a training set.

S3: and training the built deep learning feature embedded model according to the training set.

In the current training period, a plurality of similar image groups are acquired from the training set. The number of acquired similar image groups may be a batch processing amount specified for the batchsize of the deep learning feature embedding model. And inputting the acquired similar image group into the built deep learning feature embedded model. And respectively extracting the feature vector of each image in each similar image group through an image encoder, and inputting the extracted feature vector of each image into a distance determination module. And respectively calculating an intra-group distance value between the feature vector of the original image and the feature vector of each similar image in the same similar image group and respectively calculating an inter-group distance value between the feature vector of the original image and the feature vector of each image in other similar image groups by a distance determination module. A maximum intra-group distance value is selected from each intra-group distance value, and a minimum inter-group distance value is selected from each inter-group distance value. And inputting the selected maximum inter-group distance value and the selected minimum inter-group distance value into a loss determining module. And calculating the loss value of the current training period through a loss determining module according to the maximum inter-group distance value and the minimum inter-group distance value.

In the embodiment of the present application, the loss determination module calculates the loss value of each training period by using the ternary loss function shown in formula (1).

Loss＝max(d(a,p)-d(a,n)+margin,0)…(1)

In formula (1), Loss is a Loss value, the input of the ternary Loss function is a triplet < a, p, n >, a is an original image in the similar image group, p is a similar image similar to the original image a, n is an image dissimilar to the original image a, d (a, p) is a maximum intra-group distance value, d (a, n) is a minimum inter-group distance value, and margin is a preset lattice coefficient.

The above-described ternary loss function can make the distance between the feature vectors of images within the same similar image group smaller and the distance between the feature vectors of images belonging to different similar image groups larger. The calculated inter-group distance value and the inter-group distance value are Euclidean distance or cosine distance and the like.

To facilitate understanding of the training process, the following example is provided. For example, 5 ten thousand original images are acquired, each original image is subjected to operations of random mosaic, rotation, cutting, paper pasting, watermarking and the like to generate 9 similar images, and the total number of the images is increased to 50 ten thousand. 10 images of an original image and 9 corresponding similar images form a similar image group. Assuming that 3 similar image sets are processed per training cycle, 30 images are processed per training cycle. Randomly selecting 3 similar image groups from all similar image groups, and obtaining 30 images. For each image, obtaining a feature vector of the image through an image encoder, obtaining respective feature vectors of the remaining 9 images in the similar image group to which the image belongs through the image encoder, and then obtaining the maximum Euclidean distance between the feature vectors of the 9 similar images and the feature vector of the original image through calculation to obtain d (a, p). And finding the minimum Euclidean distance between the feature vector of the original image and other similar image groups to obtain d (a, n). The loss value for the current training period is then calculated by equation (1) above.

After the loss value of the current training period is calculated in the above mode, the recorded number of training periods is added by one, and the added number of training periods is compared with the preset number. And if the number of the current trained cycles is greater than or equal to the preset number, determining the model parameter corresponding to the training cycle with the minimum loss value and the structure of the deep learning feature embedded model as the trained deep learning feature embedded model. And if the number of the current trained cycles is less than the preset number, adjusting the model parameters according to the loss value of the current training cycle, and continuing to train the next cycle according to the adjusted model parameters and the training process until the number of the trained cycles reaches the preset number to obtain a trained model. The preset number of times may be 500 or 1000.

After the deep learning feature embedded model is trained in the above manner, the deep learning feature embedded model is applied to the repeated image retrieval in the embodiment of the application. The execution subject of the embodiment of the application is a server capable of providing repeated image retrieval. The server provides repeated image retrieval service based on a plurality of databases such as a preset image information base, a preset feature index and a preset descriptor base.

The server maintains a content image queue, and when receiving the image basic information corresponding to the image to be put in storage, the server inserts the image basic information into the tail of the content image queue. The server acquires a certain number of image basic information from the head of the content image queue each time, and performs parallel processing on the acquired image basic information.

Specifically, for each piece of image basic information, whether an image corresponding to the image basic information is put in storage is judged according to the image basic information. When the server puts the image in storage, the server distributes an image identifier for uniquely identifying the image for the image. And if the image basic information comprises the image identification, determining that the image corresponding to the image basic information is put in storage, and not carrying out storage operation on the image. And if the image basic information does not comprise the image identification, determining that the image basic information is not put in storage. The server downloads the image to be warehoused according to the URL included by the image basic information, distributes an image identifier for the image to be warehoused, and extracts the feature vector of the image to be warehoused through the trained deep learning feature embedded model. And then storing the image identification and the characteristic vector thereof corresponding to the image to be put in storage into a preset characteristic index. In the embodiment of the application, the preset feature index comprises a temporary index and a graph structure index, the temporary index stores feature vectors in a list form, the stored feature vector data are less, and the feature vectors are searched in the temporary index in a violent searching mode. The temporary index has the capability of quick construction and supports quick vector updating and retrieval. The graph structure index is a characteristic index of a graph structure with a hierarchy constructed by HNSW (hierarchical Navigable Small World graphs) algorithm. All the feature vectors in the graph structure index are stored in a graph structure with a hierarchy, wherein the layer of the 0 th layer contains the feature vectors of all the images, and the number of the feature vectors stored in the layer with the higher hierarchy is reduced in turn and follows exponential decay probability distribution. In the process of constructing the graph structure, the newly added feature vector is obtained through an exponential decay probability function, the highest projection of the node corresponding to the feature vector is obtained, the feature vector exists from the highest projection layer to the downward layer, and the result of each layer is sequentially inquired from top to bottom during retrieval.

The preset feature index has a real-time processing function on the feature vector of the image to be put in storage. And for the image to be put in storage, after the server obtains the feature vector of the image to be put in storage by using the trained deep learning feature embedded model, the feature vector of the image to be put in storage is stored in the temporary index. And when the total number of the vectors currently stored in the temporary index is greater than or equal to the preset number, storing all the vectors stored in the temporary index into the graph structure index in batches. And after the characteristic vector of the image to be put in storage is stored in the temporary index, a feedback value is obtained, the feedback value is the vector number of the characteristic vector of the image to be put in storage in the temporary index, the server obtains the total number of the vectors stored in the current image structure index, calculates the sum of the vector number and the total number of the vectors, and takes the sum as the image number corresponding to the image to be put in storage. The image number is used for representing the storage position of the feature vector of the image to be put in storage in the preset feature index.

For the image to be put in storage, the server also extracts a local feature descriptor of the image to be put in storage, stores the extracted local feature descriptor in a preset descriptor library, and obtains a feedback value, wherein the feedback value is called a descriptor identifier corresponding to the image to be put in storage, and the descriptor identifier is used for indicating the storage position of the local feature descriptor of the image to be put in storage in the preset descriptor library. The embodiment of the application can extract the local feature descriptors of the image through a Scale-invariant feature transform (SIFT) algorithm. Specifically, the image is zoomed into images under various scales, each image obtained through zooming is traversed and searched for image pixels, and potential interest points which are unchanged with respect to the scale zooming in the image are found by utilizing a Gaussian differential function. A function curve is fitted according to the position of each point of interest. Determining the distance of each interest point from the fitted curve, and deleting the interest points with the distance between the interest points and the curve exceeding a threshold value, wherein the interest points which are closer to the curve have higher stability. And distributing one or more directions to the positions of the rest interest points by using the local gradient direction of the image, and respectively calculating the local gradient of the image in the neighborhood corresponding to the selected image space scale and the direction of each interest point. Each of the computed gradients is then transformed into a representation that allows for relatively large local shape deformation and illumination variation, resulting in a plurality of local feature descriptors for the image.

And for the image to be put in storage, the server also stores the image to be put in storage in a local memory, and determines the storage address of the image to be put in storage. Then, the server correspondingly stores the image basic information such as the storage address, the image identifier, the URL (Uniform Resource Locator) and the like of the image to be put in storage, and the image number and the descriptor identifier corresponding to the image to be put in storage in a preset image information base.

In the embodiment of the application, the server maintains the content image queue and realizes real-time storage of the newly added image through parallel processing of a plurality of images to be stored in a storage according to the mode.

The image to be warehoused is stored in the local memory in the mode, the characteristic vector and the image number corresponding to the image to be warehoused are stored in the preset characteristic index, the local characteristic descriptor and the descriptor identification corresponding to the image to be warehoused are stored in the preset descriptor library, the image basic information, the image number and the descriptor identification corresponding to the image to be warehoused are correspondingly stored in the preset image information library, and then the whole process of warehousing the image to be warehoused is completed. The server provides repeated image retrieval service based on a preset image information base, a preset feature index, a preset descriptor base, a large number of images stored in a local memory and the trained deep learning feature embedded model.

Referring to fig. 6, the repeated image retrieval method specifically includes the following steps:

step 101: and acquiring a feature vector and a first local feature descriptor of the image to be retrieved.

And when the calling party needs to carry out repeated image retrieval, sending a repeated image retrieval request to the server. And the server receives the repeated image retrieval request and judges whether the repeated image retrieval request comprises the image identifier of the image to be retrieved. The image identifier is distributed to the image to be retrieved by the server in the process of warehousing the image to be retrieved and is used for uniquely identifying the image to be retrieved. And if the repeated image retrieval request comprises the image number of the image to be retrieved, acquiring the feature vector corresponding to the image to be retrieved from the preset feature index according to the image number. And acquiring the descriptor identification of the image to be retrieved from a preset image information base according to the image number. And acquiring a first local feature descriptor corresponding to the image to be retrieved from a preset descriptor library according to the descriptor identifier. And extracting the feature vectors in the preset feature index through the trained deep learning feature embedded model.

And if the repeated image retrieval request does not comprise the image identifier of the image to be retrieved, downloading the image to be retrieved according to the URL of the image to be retrieved, which is included in the repeated image retrieval request. And extracting the feature vector of the image to be retrieved through the trained deep learning feature embedded model. And calling an SIFT algorithm to extract a first local feature descriptor of the image to be retrieved.

Step 102: and acquiring image identifications corresponding to the nearest neighbor images with preset number of the images to be retrieved according to the preset feature index and the feature vector of the images to be retrieved.

And searching a preset number of nearest neighbor feature vectors of the feature vectors and image numbers corresponding to the nearest neighbor feature vectors in a preset feature index according to the feature vectors of the image to be searched. The preset feature index includes a temporary index and a graph structure index. The temporary index stores the feature vectors in the form of a list, and the graph structure index is an index of a graph structure constructed by the HNSW algorithm. When the nearest neighbor feature vectors of the feature vectors are searched, the temporary index and the graph structure index are searched at the same time, a violent searching mode is adopted for searching in the temporary index, the searched nearest neighbor feature vectors are sorted from small to large or from large to small according to the distance between the nearest neighbor feature vectors and the feature vectors, and a first preset number of nearest neighbor feature vectors with the smallest distance between the nearest neighbor feature vectors and the feature vectors of the image to be searched are selected from the sorted nearest neighbor feature vector sequence. The layer with the highest number where the feature vector is located is determined in the graph structure index, each layer is traversed from the layer with the highest number from top to bottom, and the nearest neighbor feature vector of the feature vector is retrieved in each layer. And sorting the nearest neighbor feature vectors searched out from each layer from small to large or from large to small according to the distance between the nearest neighbor feature vectors and the feature vectors, and selecting a second preset plurality of nearest neighbor feature vectors with the smallest distance between the nearest neighbor feature vectors and the feature vectors of the image to be searched from the sorted sequence of the nearest neighbor feature vectors. The first preset number and the first preset number may be both 100 or 200, and the first preset number and the second preset number may be equal or unequal.

Combining a first preset number of nearest neighbor feature vectors obtained by retrieving from the temporary index and a second preset number of nearest neighbor feature vectors obtained by retrieving from the graph structure index, and selecting a preset number of nearest neighbor feature vectors with the minimum distance from the feature vectors of the image to be retrieved from all the combined nearest neighbor feature vectors. The predetermined number may be 100 or 200. The embodiment of the present application does not limit the specific value of the preset number, and the specific value of the preset number may be set according to a requirement in practical application.

The server stores the vectors by adopting the temporary indexes and the graph structure indexes, the temporary indexes have the capability of quick construction, the vectors are stored in the temporary indexes when the server receives the vectors to be put in storage, the temporary indexes support quick vector updating, the data volume of the temporary indexes is small, and accurate retrieval results can be quickly obtained through violent retrieval. And when the number of the vectors in the temporary index reaches a preset threshold value, the server stores all the vectors in the temporary index into the graph structure index in batch, so that online real-time updating of the vectors is realized. The graph structure index still has good query performance under mass data. The indexes are constructed in a grading way through the two index libraries, when the number of the vectors in the temporary index reaches a threshold value, all the vectors in the temporary index are added to the graph structure index in batches, the vector storage efficiency is improved, the respective advantages of the temporary index and the graph structure index are fully utilized, and the rapid vector storage and query requirements of service requirements are met.

After a preset number of nearest neighbor feature vectors are selected in the above manner, the image number corresponding to each nearest neighbor feature vector is obtained from the preset feature index. And respectively determining the image identifier of each nearest neighbor image corresponding to the image to be retrieved according to the preset image information base and the acquired image number. Specifically, according to the image number corresponding to the nearest neighbor feature vector, the image identifier corresponding to the image number is obtained from a preset image information base, and the obtained image identifier is determined as the image identifier of the nearest neighbor image to which the nearest neighbor feature vector belongs.

In the step, based on the feature vectors extracted by the deep learning feature embedding model, a preset number of nearest neighbor images are roughly screened out from the massive images, so that the retrieval range of retrieving repeated images based on local feature descriptor matching subsequently is greatly reduced, and the retrieval speed of the repeated images is improved. And the deep learning characteristic embedding model is trained by using similar images obtained by manual modification and transformation, and for any two similar images, the characteristic vectors are extracted through the deep learning characteristic embedding model, so that the distance value between the characteristic vectors of the similar images is very small, the precision of nearest neighbor image retrieval is favorably improved, and the similar images formed by manual processing can be accurately recalled in the coarse screening process.

Step 103: and respectively acquiring a second local feature descriptor of each nearest neighbor image according to the image identifier corresponding to each nearest neighbor image.

And respectively acquiring the descriptor identifier corresponding to each nearest neighbor image from a preset image information base according to the image identifier corresponding to each nearest neighbor image. The descriptor corresponding to the nearest neighbor image identifies a storage position of a second local feature descriptor representing the nearest neighbor image in a preset descriptor library. And respectively acquiring the second local feature descriptors of each nearest neighbor image from a preset descriptor library according to the descriptor identification corresponding to each nearest neighbor image.

Step 104: and determining images which are repeated with the images to be retrieved from each nearest neighbor image according to the first local feature descriptor of the images to be retrieved and the second local feature descriptor of each nearest neighbor image.

The first local feature descriptor of the image to be retrieved and the second local feature descriptor of the nearest neighbor image are descriptor sets, that is, the first local feature descriptor includes a plurality of descriptors of the image to be retrieved. The second local feature descriptor includes a plurality of descriptors of nearest neighbor images.

And respectively carrying out matching operation on the first local feature descriptor of the image to be retrieved and the second local feature descriptor of each nearest neighbor image, and determining the number of matched descriptors between the first local feature descriptor and the second local feature descriptor of each nearest neighbor image. And selecting second local feature descriptors with the number of matched descriptors larger than a preset threshold value from each second local feature descriptor. And determining the nearest neighbor image corresponding to the selected second local feature descriptor as an image which is repeated with the image to be retrieved.

In the embodiment of the present application, a neighbor search is specifically constructed by using FLANN (Fast Library for approximation neighbor Neighbors). And for each nearest neighbor image, performing nearest neighbor matching on a first local feature descriptor of the image to be retrieved and a second local feature descriptor input FLANN of the nearest neighbor image, counting matching pairs meeting matching conditions between the image to be retrieved and the nearest neighbor image, wherein the matching pairs comprise a local feature descriptor of the image to be retrieved and a local feature descriptor of the nearest neighbor image which are matched with each other. And then comparing the number of the matching pairs with a set threshold value, and if the number of the matching pairs is less than the threshold value, determining that the nearest neighbor image and the image to be retrieved are not repeated. And if the number of the matching pairs is larger than or equal to the threshold value, determining that the nearest neighbor image and the image to be retrieved are repeated.

In order to improve the accuracy of repeated image judgment and reduce misjudgment, in the embodiment of the application, homography matrix verification can be performed on the local feature descriptors in the matching pairs, the descriptors identified by mistake are removed, and the number of the remaining matching pairs is determined. And comparing the number of the remaining matching pairs with a set threshold value to judge whether the nearest neighbor image is repeated with the image to be retrieved.

By setting different thresholds, images which are repeated with the image to be retrieved and/or images which are suspected to be repeated can be judged from the nearest neighbor images according to the number of the matching pairs. And when the image which is repeated with the image to be retrieved is determined, acquiring the repeated image from the local memory, and sending the repeated image to the calling party.

In the step, only some nearest neighbor images closest to the image to be retrieved are processed, so that the time consumption of the local feature descriptor matching algorithm can be controlled within a certain time range, and the problem of time consumption of directly and violently using the local feature descriptor matching algorithm is solved.

In the embodiment of the application, according to the feature vector and the preset feature index of the image to be retrieved, a preset number of nearest neighbor images are roughly screened from a large number of images. And then determining images which are repeated with the image to be retrieved from the nearest neighbor images in a local feature descriptor matching mode. The method has the advantages that the rough screening is carried out based on the feature vectors, the retrieval range of the repeated images is greatly reduced, the repeated images are retrieved from the nearest neighbor images with less number in a local feature descriptor matching mode, the time of retrieving the repeated images is shortened, and the retrieval efficiency of the repeated images is improved. And the image is roughly screened based on the trained deep learning characteristic embedded model, the repetition proportion of the roughly screened images in the nearest neighbor images is high, and the sequence of the repeated images is closer to the front. The deep learning feature embedded model is trained by utilizing similar images obtained through manual modification and transformation, for any two similar images, feature vectors are extracted through the deep learning feature embedded model, the distance value between the feature vectors of the similar images is small, the nearest neighbor image retrieval precision is improved, the images formed through manual processing can be accurately recalled in the coarse screening process, and the repeated image retrieval accuracy is improved. The vectors are stored by adopting the temporary index and the graph structure index, the vectors are stored in the temporary index when the vectors to be put in storage are received, the temporary index supports quick vector updating, the data volume of the temporary index is small, and accurate retrieval results can be quickly obtained through violent retrieval. And when the number of the vectors in the temporary index reaches a preset threshold value, storing all the vectors in the temporary index into the graph structure index in batch, thereby realizing the online real-time updating of the vectors. The graph structure index still has good query performance under mass data. Indexes are constructed by the two index libraries in a grading way, so that the vector storage efficiency is improved, and the requirements of rapid vector storage and query of business requirements are met.

The embodiment of the application further provides a repeated image retrieval device, and the device is used for executing the repeated image retrieval method provided by any embodiment. Referring to fig. 7, the apparatus includes:

a feature obtaining module 701, configured to obtain a feature vector and a first local feature descriptor of an image to be retrieved;

an approximate nearest neighbor determining module 702, configured to obtain, according to the preset feature index and the feature vector, image identifiers corresponding to a preset number of nearest neighbor images of the image to be retrieved; respectively acquiring a second local feature descriptor of each nearest neighbor image according to each image identifier;

a repeated image determining module 703, configured to determine, according to the first local feature descriptor and each second local feature descriptor, an image that is repeated with the image to be retrieved from each nearest neighbor image.

A nearest neighbor determining module 702, configured to retrieve a preset number of nearest neighbor feature vectors of the feature vectors and an image number corresponding to each nearest neighbor feature vector from a preset feature index; and respectively determining the image identifier of each nearest neighbor image corresponding to the image to be retrieved according to the preset image information base and the acquired image number.

The preset feature index comprises a temporary index and a graph structure index; the device also includes: the vector storage module is used for acquiring the characteristic vector of the image to be put in storage and storing the characteristic vector of the image to be put in storage in the temporary index; and when the total number of the vectors currently stored in the temporary index is greater than or equal to the preset number, storing all the vectors stored in the temporary index into the graph structure index in batches.

A nearest neighbor determination module 702, configured to obtain, according to each image identifier, a descriptor identifier corresponding to each nearest neighbor image from a preset image information base; and respectively acquiring a second local feature descriptor of each nearest neighbor image from a preset descriptor library according to each descriptor identifier.

A repeated image determining module 703, configured to perform matching operation on the first local feature descriptor and each second local feature descriptor, and determine the number of descriptors matched between the first local feature descriptor and each second local feature descriptor; selecting second local feature descriptors with the number of matched descriptors larger than a preset threshold value from each second local feature descriptor; and determining the nearest neighbor image corresponding to the selected second local feature descriptor as an image which is repeated with the image to be retrieved.

In an embodiment of the present application, the apparatus further includes: the model training module is used for constructing a structure of the deep learning feature embedded model; acquiring a training set, wherein the training set comprises a plurality of similar image groups, and the similar image groups comprise original images and a plurality of similar images obtained by transforming the original images; and training the deep learning feature embedded model according to a training set.

The model training module is used for connecting the image encoder with the distance determining module; the distance determination module is connected to the loss determination module.

The model training module is used for acquiring a plurality of similar image groups from a training set; respectively extracting a feature vector of each image in each similar image group through an image encoder; respectively calculating an intra-group distance value between the feature vector of the original image and the feature vector of each similar image in the same similar image group and respectively calculating an inter-group distance value between the feature vector of the original image and the feature vector of each image in other similar image groups by a distance determining module; selecting a maximum intra-group distance value from each intra-group distance value, and selecting a minimum inter-group distance value from each inter-group distance value; and calculating the loss value of the current training period through a loss determining module according to the maximum inter-group distance value and the minimum inter-group distance value.

The model training module is used for determining the model parameter corresponding to the training period with the minimum loss value and the structure of the deep learning feature embedded model as the well-trained deep learning feature embedded model if the number of the current trained periods is greater than or equal to the preset number; and if the number of the current trained cycles is less than the preset number, adjusting the model parameters according to the loss value of the current training cycle, and training the next cycle according to the adjusted model parameters.

A feature obtaining module 701, configured to receive a repeated image retrieval request of a user; if the repeated image retrieval request comprises an image identifier of an image to be retrieved, acquiring a feature vector corresponding to the image to be retrieved from a preset feature index according to the image identifier, and acquiring a first local feature descriptor corresponding to the image to be retrieved from a preset descriptor library; extracting the feature vectors in the preset feature index through a trained deep learning feature embedding model; if the repeated image retrieval request does not comprise the image identifier of the image to be retrieved, downloading the image to be retrieved according to the URL of the image to be retrieved, which is included in the repeated image retrieval request; extracting a feature vector of an image to be retrieved through a trained deep learning feature embedding model; and extracting a first local feature descriptor of the image to be retrieved.

In an embodiment of the present application, the apparatus further includes: the image warehousing module is used for acquiring the image to be warehoused according to the image basic information corresponding to the image to be warehoused; extracting a feature vector of an image to be put in storage through a trained deep learning feature embedding model; extracting a local feature descriptor of an image to be put in storage; storing the feature vector of the image to be put in storage and the image identification included by the image basic information in a preset feature index; storing the local feature descriptors of the images to be put in storage in a preset descriptor library to obtain descriptor identifications corresponding to the images to be put in storage; and storing the image basic information and the descriptor identification in a preset image information base.

The repeated image retrieval device provided by the above embodiment of the present application and the repeated image retrieval method provided by the embodiment of the present application have the same beneficial effects as the method adopted, operated or realized by the application program stored in the device.

The embodiment of the application also provides electronic equipment for executing the repeated image retrieval method. Please refer to fig. 8, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. As shown in fig. 8, the electronic device 8 includes: a processor 800, a memory 801, a bus 802 and a communication interface 803, the processor 800, the communication interface 803 and the memory 801 being connected by the bus 802; the memory 801 stores a computer program that can be executed on the processor 800, and the processor 800 executes the repeated image retrieval method provided by any one of the foregoing embodiments when executing the computer program.

The Memory 801 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the apparatus and at least one other network element is implemented through at least one communication interface 803 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like may be used.

Bus 802 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 801 is used for storing a program, and the processor 800 executes the program after receiving an execution instruction, and the repeated image retrieval method disclosed in any embodiment of the present application may be applied to the processor 800, or implemented by the processor 800.

The processor 800 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 800. The Processor 800 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 800 reads the information in the memory 801 and completes the steps of the method in combination with the hardware thereof.

The electronic device provided by the embodiment of the application and the repeated image retrieval method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 9, the computer readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program executes the repeated image retrieval method provided by any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the duplicate image retrieval method provided by the embodiment of the present application have the same beneficial effects as the method adopted, operated or implemented by the application program stored in the computer-readable storage medium.

It should be noted that:

in the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted to reflect the following schematic diagram: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Moreover, those of skill in the art will understand that although some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for retrieving a duplicate image, comprising:

2. The method according to claim 1, wherein the obtaining image identifiers corresponding to a preset number of nearest neighbor images of the image to be retrieved according to a preset feature index and the feature vector comprises:

and respectively determining the image identification of each nearest neighbor image corresponding to the image to be retrieved according to a preset image information base and each acquired image number.

3. The method according to claim 1 or 2, wherein the preset feature index comprises a temporary index and a graph structure index; before the obtaining of the image identifiers corresponding to the preset number of nearest neighbor images of the image to be retrieved according to the preset feature index and the feature vector, the method further includes:

acquiring a feature vector of an image to be put in storage, and storing the feature vector of the image to be put in storage in the temporary index;

4. The method according to claim 1, wherein the obtaining a second local feature descriptor of each nearest neighbor image according to each image identifier comprises:

5. The method of claim 1, wherein the determining, from each of the nearest neighbor images, an image that is duplicate to the image to be retrieved according to the first local feature descriptor and each of the second local feature descriptors comprises:

6. The method according to any one of claims 1, 2, 4, and 5, wherein before obtaining the feature vector and the first local feature descriptor of the image to be retrieved, the method further comprises:

constructing a structure of a deep learning feature embedded model;

7. The method of claim 6, wherein the constructing the structure of the deep learning feature embedding model comprises:

connecting the image encoder with a distance determination module;

connecting the distance determination module with a loss determination module.

8. The method of claim 7, wherein training the deep-learned feature embedding model according to the training set comprises:

acquiring a plurality of similar image groups from the training set;

9. The method of claim 8, further comprising:

10. The method according to claim 6, wherein the obtaining the feature vector and the first local feature descriptor of the image to be retrieved comprises:

receiving a repeated image retrieval request of a user;

if the repeated image retrieval request does not comprise the image identification of the image to be retrieved, downloading the image to be retrieved according to the URL of the image to be retrieved, which is included in the repeated image retrieval request; extracting a feature vector of the image to be retrieved through the trained deep learning feature embedded model; and extracting a first local feature descriptor of the image to be retrieved.

11. The method of claim 6, further comprising:

extracting a local feature descriptor of the image to be put in storage;

12. A duplicate image retrieval apparatus, comprising:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method of any one of claims 1-11.

14. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor to implement the method according to any of claims 1-11.