CN114491122B

CN114491122B - Picture matching method for similar image retrieval

Info

Publication number: CN114491122B
Application number: CN202111634430.5A
Authority: CN
Inventors: 杨益枘; 林旭滨; 何力; 管贻生; 张宏
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2023-07-14
Anticipated expiration: 2041-12-29
Also published as: CN114491122A

Abstract

The invention discloses a graph matching method for similar image retrieval, which mainly comprises two stages of offline data set construction and online deep learning training: the first stage comprises selecting a Pascal VOC data set as a training data set; and selecting a plurality of images which are provided with annotation points and cover all kinds of data sets as a training set. The second stage comprises the following steps: adopting a pretrained VGG-16 neural network as a feature extractor; generating a topological structure of a bidirectional edge by each image through a fully connected Delaunay triangulation technology; after the point feature embedding of the topological geometrical information is completed, carrying out the feature description of the edges on the basis of the point-edge association matrix; according to the edge characteristic description vector of each graph, an edge-to-edge similarity matrix can be constructed; through the steps, final point feature sums can be obtained, and then a similarity matrix of point-point matching is calculated. The scheme also has the advantages of high retrieval performance, high efficiency and easy implementation.

Description

Picture matching method for similar image retrieval

Technical Field

The invention relates to the technical field of image retrieval, in particular to a graph matching method for similar image retrieval.

Background

With the development of the internet, how to efficiently retrieve images meeting the demands of users in a network environment is a core technical problem. In general, the image retrieval technique is mainly divided into two branches: text-based and content-based retrieval. Text-based image retrieval typically queries images in the form of keywords or browses images under a specific category according to a hierarchical directory. While content-based image retrieval is the retrieval of other images with similar characteristics from an image database based on the semantic content and characteristics of the images.

The existing content-based image retrieval system firstly extracts the characteristic information of the image content, stores the characteristic information in a characteristic library, and then compares and sorts related characteristics according to the characteristics of the query image, so as to obtain the retrieval result of the image. The content-based image retrieval technology uses a computer to carry out unified and regular mathematical description on images, so that the manpower consumption for manually labeling the image keywords is reduced, and the retrieval efficiency is improved. With the improvement of computer performance and the development of deep learning, the computer can extract rich features such as object color, shape and structure from the image. However, matching the similarity of the structured feature information is a problem with high computational complexity.

From the mathematical optimization perspective, graph matching of structured information belongs to the NP-hard second order combination problem. Graph matching aims at searching for the corresponding relation between nodes among objects by utilizing graph structure information. On the other hand, the explosive deep learning and graph rolling neural networks have great potential in graph matching problems. By means of graph embedding technology based on graph convolution neural network, the second-order combination problem which is difficult to accurately solve in polynomial time is converted into the first-order problem which can be accurately solved in polynomial time. However, the existing depth map matching method based on the map embedding technology does not consider the second-order edge-to-edge similarity information, and the method introduces the information as cross-map embedding information, so that the precision and efficiency are improved. For this reason, the prior art needs further improvements and perfection.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a graph matching method for similar image retrieval.

The aim of the invention is achieved by the following technical scheme:

the image matching method for similar image retrieval mainly comprises two stages of offline data set construction and online deep learning training, and comprises the following specific steps:

stage one: and constructing a data set matched with the offline depth image.

Step S1: the Pascal VOC dataset was chosen as the training dataset.

Step S2: and selecting a plurality of images which are provided with annotation points and cover all kinds of data sets as a training set.

Stage two: the depth map matching network is trained online.

Step S3: the pretrained VGG-16 neural network is used as a feature extractor, and parameters of the neural network are trained on an ImageNet data set in advance.

Step S4: each image is subjected to fully connected delaunay triangulation technology to generate a topological structure of a bidirectional edge.

Step S5: after the point feature embedding of the topological geometrical information is completed, the feature description of the edges is carried out on the basis of the point-edge association matrix.

Step S6: according to the edge characteristic description vector of each graph, an edge-to-edge similarity matrix K can be constructed _e 。

Step S61: the point-edge pairing relationships of graph matching can be constructed into a correlation graph model.

Step S62: according to the topological structure of the association graph, the edge-to-edge similarity score and the point-to-point similarity can be associated to obtain a cross-graph conversion matrix.

Step S63: and taking the cross-graph distribution matrix as prior information to perform cross-graph point embedding operation.

Step S7: through the steps, the final point characteristics can be obtained

And->

Then a similarity matrix of the point-to-point matches is calculated.

As a preferred embodiment of the present invention, the step S3 further includes the steps of: the two images to be matched are obtained through a feature extractor

And->

Where d is the dimension of the feature vector, n ₁ And n ₂ The number of the characteristic points of the two images is respectively; f (F) ¹ And F ² The outputs extracted from the layers relu4_2 and relu5_1 of the VGG-16 neural network are then spliced.

As a preferred embodiment of the present invention, the step S4 further includes the steps of: the attribute of each side is composed of normalized two endpoint coordinates, and the connection information of the side represents the topological structure information of each graph; then, the point characteristic information and the side attribute information are input into a graphic neural network SplineCNN as input information; the SplineCNN is used as a geometric topology information embedding technology, and MAX aggregation is adopted in structure information aggregation; finally obtaining the point characteristics embedded with the respective geometric topology information

And->

As a preferred embodiment of the present invention, the step S5 further includes the steps of: the point-side association matrices of the two figures are respectively

And->

Wherein e ₁ And e ₂ The number of edges respectively representing the two graphs, when G _i,k ＝H _j,k When=1, it means that the edge k starts from the node i to the node j ends; edge characteristics->

And->

Is defined as follows:

as a preferred embodiment of the present invention, the step S6 further includes the steps of: edge-to-edge correspondence matrix K _e ：

Wherein,,

is a training parameter; k (K) _e Each element of the matrix represents edge-to-edge matching information, and in order to expand the difference of edge-to-edge similarity values, namely, emphasize a value with high similarity and compress a value with low similarity, normalization operation is performed on the Ke matrix to obtain a normalized epsilon matrix:

ε＝softmax(K _e ) Formula (3)

Then, the normalized epsilon matrix is converted into a cross-edge conversion matrix through the structure of the companion graph

Based on cross-map transformation matrix

The cross-graph feature embedded information can be obtained; for node->

Cross-map feature information m _j→i Is calculated as follows:

finally, vector addition operation is carried out on the cross-graph characteristic information and the point characteristic information:

a similar operation is also performed for the feature points of the second graph.

As a preferred embodiment of the present invention, the step S7 further includes the steps of: the similarity matrix formula is as follows:

the linear solution to the graph matching problem is based on the Sinkhorn iterative algorithm, which is to normalize the score matrix S sequentially along the rows and along the columns to obtain a soft distribution matrix

P _ij ＝Sinkhorn(exp(S _ij ) Equation (8).

As a preferred embodiment of the present invention, the graph matching method further includes step S8: given the truth distribution matrix

And a soft allocation matrix P, the error can be obtained by constructing a cross entropy loss function:

as a preferred embodiment of the present invention, the step S1 further includes the steps of: the dataset contains several different categories of images: aircraft, bicycles, birds, boats, bottles, buses, automobiles, cats, chairs, cattle, tables, dogs, horses, motorcycles, humans, plants, sheep, sofas, trains, televisions; each image contains 6 to 23 annotated feature point image coordinates.

As a preferred embodiment of the present invention, the step S2 further includes the steps of: 1682 sheets were selected accordingly as test sets. For each image to be trained, extracting a boundary box containing all annotation feature points, adjusting the image size to 256×256, and finally entering the training of the deep learning network.

The working process and principle of the invention are as follows: aiming at the problem of precision loss caused by neglecting second-order edge and edge similarity information in the existing depth map matching scheme based on the map embedding technology, the scheme introduces the second-order edge and edge similarity information by using a model based on the depth map matching of the cross-map embedding technology, is applied to image retrieval of similar objects, improves matching performance, remarkably reduces cost in memory consumption and finally greatly improves performance and efficiency of image retrieval.

Drawings

Fig. 1 is a schematic flow chart of a graph matching method for similar image retrieval provided by the invention.

Fig. 2 is a schematic diagram of a graph matching method for similar image retrieval provided by the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described below with reference to the accompanying drawings and examples.

Example 1:

as shown in fig. 1 to 2, the present embodiment discloses a graph matching method for similar image retrieval, and the graph matching method mainly includes two stages of offline data set construction and online deep learning training, and specifically includes the following steps:

stage one: and constructing a data set matched with the offline depth image.

Step S1: the Pascal VOC dataset was chosen as the training dataset.

Stage two: the depth map matching network is trained online.

Step S7: through the steps, the final point characteristics can be obtained

And->

Then a similarity matrix of the point-to-point matches is calculated.

And->

And->

And->

And->

Is defined as follows:

Wherein,,

ε＝softmax(K _e ) Formula (3)

Based on cross-map transformation matrix

The cross-graph feature embedded information can be obtained; for node->

Cross-map feature information m _j→i Is calculated as follows:

P _ij ＝Sinkhorn(exp(S _ij ) Equation (8).

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The image matching method for similar image retrieval is characterized by mainly comprising two stages of offline data set construction and online deep learning training, and comprises the following specific steps:

stage one: constructing a data set matched with the offline depth image;

step S1: selecting a Pascal VOC data set as a training data set;

step S2: selecting a plurality of images which are provided with annotation points and cover all kinds of data sets as a training set;

stage two: training a depth map matching network on line;

step S3: the pre-trained VGG-16 neural network is adopted as a feature extractor, and parameters of the neural network are trained on an ImageNet data set in advance;

step S4: generating a topological structure of a bidirectional edge by each image through a fully connected Delaunay triangulation technology;

step S5: after the point feature embedding of the topological geometrical information is completed, carrying out the feature description of the edges on the basis of the point-edge association matrix;

step S6: constructing an edge-to-edge similarity matrix K according to the edge feature description vectors of the respective graphs _e ；

Step S61: the point-edge pairing relation matched with the graph is constructed into a correlation graph model;

step S62: according to the topological structure of the association graph, associating the edge-to-edge similarity score with the point similarity to obtain a cross-graph conversion matrix;

step S63: taking the cross-graph distribution matrix as prior information to perform cross-graph point embedding operation;

step S7: through the steps, the final point characteristics can be obtained

And->

Then calculating a similarity matrix of point-point matching;

the step S4 further includes the steps of: the attribute of each side is composed of normalized two endpoint coordinates, and the connection information of the side represents the topological structure information of each graph; then, the point characteristic information and the side attribute information are input into a graphic neural network SplineCNN as input information; the SplineCNN is used as a geometric topology information embedding technology, and MAX aggregation is adopted in structure information aggregation; finally obtaining the point characteristics embedded with the respective geometric topology information

And->

The step S5 further includes the steps of: the point-side association matrices of the two figures are respectively

And

wherein n is ₁ And n ₂ The number of the characteristic points of the two images is respectively e ₁ And e ₂ The number of edges respectively representing the two graphs, when G _i，k ＝H _j，k When=1, it means that the edge k starts from the node i to the node j ends; edge characteristics->

And

is defined as follows, where d is the dimension of the feature vector:

the step S6 further includes the steps of: edge-to-edge correspondence matrix K _e ：

Wherein,,

is a training parameter; each element of the Ke matrix represents edge-to-edge matching information in order to expand edge-to-edge similarityThe difference of the degree values, that is, the value with high similarity is emphasized and the value with low similarity is compressed, the Ke matrix is normalized to obtain normalized +.>

Matrix:

then, the normalized product is

Transformation of matrix into cross-map transformation matrix by structure of companion map>

Based on cross-map transformation matrix

We get cross-graph feature embedding information; for node->

Cross-map feature information m _j→i Is calculated as follows:

the same is done for the feature points of the second graph.

2. The graph matching method for homogeneous image retrieval according to claim 1, wherein said step S3 further comprises the steps of: the two images to be matched are obtained through a feature extractor

And->

F ¹ And F ² The outputs extracted from the layers relu4_2 and relu5_1 of the VGG-16 neural network are then spliced.

3. The graph matching method for homogeneous image retrieval according to claim 1, wherein said step S7 further comprises the steps of: the similarity matrix formula is as follows:

P _ij ＝Sinkhorn(exp(S _ij ) Equation (8).

4. The graph matching method for homogeneous image retrieval according to claim 1, further comprising step S8: given the truth distribution matrix

And a soft allocation matrix P, the error being obtained by constructing a cross entropy loss function:

5. the graph matching method for homogeneous image retrieval according to claim 1, wherein said step S1 further comprises the steps of: the dataset contains several different categories of images: aircraft, bicycles, birds, boats, bottles, buses, automobiles, cats, chairs, cattle, tables, dogs, horses, motorcycles, humans, plants, sheep, sofas, trains, televisions; each image contains 6 to 23 annotated feature point image coordinates.

6. The graph matching method for homogeneous image retrieval according to claim 1, wherein said step S2 further comprises the steps of: correspondingly, 1682 images are selected as a test set, for each image to be trained, a boundary box containing all annotation feature points is extracted, the image size is adjusted to 256×256, and finally training of the deep learning network is performed.