CN117874277B

CN117874277B - Image retrieval method based on unsupervised domain self-adaptive hash

Info

Publication number: CN117874277B
Application number: CN202410268927.7A
Authority: CN
Inventors: 崔慧; 倪景惠; 韩晓晖
Original assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-05-10
Anticipated expiration: 2044-03-11
Also published as: CN117874277A

Abstract

An image retrieval method based on unsupervised domain self-adaptive hash relates to the technical field of image retrieval, and in order to reduce domain difference between a source domain and a target domain, prototype contrast learning of class level is designed, so that hash learning can be guided better. The domain self-adaption and hash learning are successfully integrated in a concise framework, so that the retrieval performance is remarkably improved under an unsupervised domain self-adaption hash environment. Technically, prototype contrast learning is performed in a domain shared space, and then the mapping is performed to a hamming space under the restriction of semantic relation maintenance of a source domain and a target domain and quantization constraint of hash codes. The image retrieval performance is improved, and the retrieval time and space consumption are reduced.

Description

Image retrieval method based on unsupervised domain self-adaptive hash

Technical Field

The invention relates to the technical field of image retrieval, in particular to an image retrieval method based on unsupervised domain self-adaptive hash.

Background

With the development of internet technology, image data has shown an explosive growth trend, and text-based image search has failed to meet the needs of users. How to quickly and similarly retrieve the existing images has become a basic requirement for large-scale retrieval. An image is typically made up of a large number of pixels, each of which can be represented by a high-dimensional feature vector. The traditional image retrieval method based on the feature vector is faced with the problems of dimension disaster and high calculation complexity when calculating in a high-dimensional feature space. The aim of the hash image retrieval technology is to convert high-dimensional features into compact binary hash codes, and by comparing the similarity between the hash codes, efficient image retrieval is realized in a large-scale image dataset.

Hash methods can be broadly divided into two categories, namely, data-independent methods and data-dependent methods. In a data-independent hashing method, the hash function in the model is typically randomly generated and independent of any training data, but the improvement in retrieval performance requires a trade-off of the length of the hash code. The data dependent hash method is also referred to as a learning-based hash method. Compared to data-independent methods, learning-based hash methods can achieve higher accuracy with shorter hash codes. Therefore, the hash method based on learning is more popular than the data-independent method in practical application.

Existing hash methods can be classified into a supervised hash method and an unsupervised hash method according to whether tag information is used. Compared with an unsupervised hash method, the supervised hash method is trained by using the label information, and better retrieval performance is obtained. The supervised hash method relies heavily on data sets with marked information, however, there are many large-scale unmarked data sets in real life that are expensive and time consuming, if not even possible, to mark. Compared with the supervised hash method, the unsupervised hash method is more suitable for solving the problem of searching large-scale unmarked data sets without using the data set marking information.

In conventional machine learning, it is generally assumed that training data and test data have the same distribution. However, in the real world, there is often some difference between training data and test data, i.e. belonging to different "domains", due to the influence of various factors. Such differences may result from differences in the data acquisition environment, data collection time, etc. In order to improve the robustness of the model and slow down the influence of distribution change between training and test data, a domain self-adaptive method is provided. The goal of domain adaptation is to achieve migration of models between different domains. That is, the generalization ability on the Target Domain (Target Domain) is improved by utilizing the data already marked on the Source Domain (Source Domain), i.e., the performance of the model is improved on the Target Domain.

The core task of the domain adaptation method is to solve the problem of "domain difference" between the source domain and the target domain. These differences may include shifts in the feature distribution, changes in the category distribution, and the like. By establishing an effective domain adaptive algorithm, domain differences can be reduced by fully utilizing knowledge of a source domain, so that good generalization capability of the model on a target domain is realized.

Analysis finds that the existing unsupervised domain adaptive hash method is too dependent on a general domain adaptive paradigm, so that effective integration between domain adaptation and hash learning is lacking. These methods either utilize countermeasure learning to confuse the domain discriminators or minimize domain differences by feature changes to generate domain invariant feature representations. In the last two years, the focus of the method has turned to generating reliable pseudo tags for target domains and performing domain alignment in hamming space. However, domain alignment in hamming space may result in loss of rich semantic information embedded in the original sample, resulting in inefficient domain alignment, affecting the final retrieval performance of the domain adaptive hash.

Disclosure of Invention

The invention provides an image retrieval method based on unsupervised domain adaptive hash, which improves the retrieval performance of images and reduces the retrieval time and space consumption.

The technical scheme adopted for overcoming the technical problems is as follows:

An image retrieval method based on an unsupervised domain adaptive hash comprises the following steps:

(a) Acquiring source domain data sets And target Domain dataset/>，/>，/>For/>Personal source domain data,/>，/>For the amount of source domain data,/>For/>Personal source domain data/>The corresponding label is used to identify the label,，/>For/>Personal target Domain data,/>，/>Is the amount of target domain data;

(b) Gathering target domain data Divided into training sets/>And test set/>；

(C) Constructing a depth domain self-adaptive hash model and collecting a source domain data set/>Personal source domain data/>Input into a depth domain adaptive hash model, and output to obtain a relaxation hash code/>Training set/>Middle/>Personal target Domain data/>Input into a depth domain adaptive hash model, and output to obtain a relaxation hash code/>；

(D) Based on relaxed hash codesConstruction of binary hash code/>According to the relaxed hash code/>Construction of binary hash codes；

(E) According to the firstPersonal source domain data/>Corresponding tag/>Build prototype contrast loss/>；

(F) Building relation keeping loss according to the loose hash code set of the source domain data and the loose hash code set of the target domain training set data；

(G) Constructing quantization loss of hash codes according to a loose hash code set of source domain data, a loose hash code set of target domain training set data, a binary hash code set of source domain data and a binary hash code set of target domain training set data；

(H) Loss based on prototype comparisonRelation maintenance loss/>Quantization loss of hash code/>Calculate the total loss/>；

(I) Utilizing total loss using Adam optimizerTraining a depth domain self-adaptive hash model to obtain an optimized depth domain self-adaptive hash model;

(j) Test set And (3) inputting the data in the image into the optimized depth domain self-adaptive hash model to obtain an image retrieval result.

Further, step (a) comprises the steps of:

(a-1) arbitrarily selecting two fields in ARTISTIC IMAGES (a) fields or Clip Art (C) fields or Product images (P) fields or and Real-World images (R) fields of a field adaptive algorithm dataset Office-Home as a source field and a target field, respectively;

(a-2) the first of the source domains Inputting the images into ResNet-50 model, and outputting to obtain the first/>, with dimension of 4096Personal source domain data/>First/>Personal source domain data/>The corresponding label is/>，/>，/>，/>The number of tag categories in the source domain;

(a-3) the target Domain Inputting the images into ResNet-50 model, and outputting to obtain the first/>, with dimension of 4096Personal target Domain data/>。

Further, in step (b), the target domain data set is collectedDividing into training sets/>, according to the proportion of 9:1And test set/>。

Further, step (c) comprises the steps of:

(c-1) the depth domain adaptive hash model is composed of a multi-layer perceptron and a hash encoder;

(c-2) the multi-layer perceptron of the depth domain adaptive hash model is composed of a linear layer, a batch normalization layer and a ReLU activation function in sequence, and the source domain data set is obtained />Personal source domain data/>Inputting into a multi-layer perceptron, outputting to obtain the/>Personal source domain data/>Feature representation/>, in a shared feature representation spaceTraining set/>Middle/>Personal data/>Inputting into a multi-layer perceptron, outputting to obtain the/>Individual target domain training set data/>Feature representation/>, in a shared feature representation spaceSource Domain dataset/>Feature representations in the shared feature representation space constitute a feature set/>, of the source domain data，Training set/>Feature representations in the shared feature representation space constitute the feature set/>, of the target domain training set，/>，/>Training set for target domainThe number of data in (a);

(c-3) the hash encoder of the depth domain adaptive hash model is composed of a first linear layer, a batch normalization layer, a ReLU activation function, a second linear layer, and a Tanh activation function in order, and represents the features Input to hash encoder, output to get the/>Personal source domain data/>Is a relaxed hash code/>Source Domain dataset/>All the loose hash codes in the source domain data form a loose hash code set/>，/>Representing the features/>Input to hash encoder, output to get the/>Individual target domain training set data/>Is a relaxed hash code/>Training set/>All the loose hash codes in the target domain training set data form a loose hash code set/>，/>。

Further, step (d) comprises the steps of:

(d-1) passing through the formula Calculated to obtain the/>Personal source domain data/>Binary hash code/>In/>Source domain dataset/>, as a sign functionAll binary hash codes in the code sequence form a binary hash code set/>，；

(D-2) passing through the formulaCalculated to obtain the/>Personal target Domain data/>Binary hash code/>Training set/>All binary hash codes in the code sequence form a binary hash code set/>，/>。

Further, step (e) comprises the steps of:

(e-1) passing through the formula Prototype code/>, of each category of source domain data, is calculatedIn the followingTo indicate a function,/>For/>Personal tag,/>；

(E-2) passing through the formulaCalculating to obtain pseudo tag/>, of target domain sampleIn/>As cosine similarity function,/>When the cosine similarity function reaches the maximum value/>Is a value of (2);

(e-3) passing through the formula Calculating to obtain prototype code/>, of each category of target domain data；

(E-4) passing through the formulaCalculating to obtain prototype comparative loss/>In/>Transposed,/>For/>Norms,/>To control the degree of prototype code concentration.

Further, step (f) includes the steps of:

(f-1) the first step Personal source domain data/>Corresponding tag/>Label similarity matrix/>, calculated by matrix multiplication using the torch.mm () function of pytorch toolkit；

(F-2) passing through the formulaCalculating to obtain the relation maintaining loss of the source domain dataIn/>Is a scale factor;

(f-3) passing through the formula Calculating to obtain loss/>, generated by generating corresponding hash codes through similar features in a domain shared space of constraint two domains；

(F-4) passing through the formulaCalculated relation hold loss/>In/>Is a fusion factor.

Further, step (g) includes the steps of:

(g-1) by the formula Calculating to obtain quantization loss function/>, of two domain relaxation hash codes。

Further, step (h) comprises the steps of:

(h-1) passing through the formula Calculate the total loss/>In/>、/>、/>Are all weights.

Further, the step (j) includes the steps of:

(j-1) test set />Personal target Domain data/>First/>Personal source domain data/>Input into the optimized depth domain self-adaptive hash model, and respectively output to obtain the relaxation hash code/>Relaxed hash code/>；

(J-2) passing through the formulaCalculated to obtain the/>Personal target Domain data/>Binary hash code/>；

(J-3) passing through the formulaCalculated to obtain the/>Personal source domain data/>Binary hash code/>；

(J-4) calculating a binary hash codeAnd binary hash code/>And (3) obtaining an image retrieval result.

The beneficial effects of the invention are as follows: the invention provides a simple and effective image retrieval method and system based on unsupervised domain self-adaptive hash. The invention is excellent in image retrieval task, innovates on the basis of the existing unsupervised domain self-adaptive hash method, provides category-level prototype contrast learning for reducing domain difference between a source domain and a target domain, and can better guide hash learning. The invention successfully integrates domain self-adaption and hash learning in a simple framework, thereby obviously improving the retrieval performance under the unsupervised domain self-adaption hash environment. Technically, prototype contrast learning is performed in a domain shared space, and then the mapping is performed to a hamming space under the restriction of semantic relation maintenance of a source domain and a target domain and quantization constraint of hash codes. The invention improves the retrieval performance of the images and reduces the time and space consumption of retrieval.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a diagram of the overall network architecture of the image retrieval based on unsupervised domain adaptive hashing of the present invention.

Detailed Description

The invention is further described with reference to fig. 1 and 2.

As shown in fig. 1 and fig. 2, an image retrieval method based on unsupervised domain adaptive hash includes the following steps:

(a) Acquiring source domain data sets And target Domain dataset/>，/>，/>For/>Personal source domain data,/>，/>For the amount of source domain data,/>For/>Personal source domain data/>The corresponding label is used to identify the label,，/>For/>Personal target Domain data,/>，/>Is the amount of target domain data.

(B) Gathering target domain dataDivided into training sets/>And test set/>。

(C) Constructing a depth domain self-adaptive hash model and collecting a source domain data set/>Personal source domain data/>Input into a depth domain adaptive hash model, and output to obtain a relaxation hash code/>Training set/>Middle/>Personal target Domain data/>Input into a depth domain adaptive hash model, and output to obtain a relaxation hash code/>。

(D) Based on relaxed hash codesConstruction of binary hash code/>According to the relaxed hash code/>Construction of binary hash codes。

(E) According to the firstPersonal source domain data/>Corresponding tag/>Build prototype contrast loss/>。

(F) Building relation keeping loss according to the loose hash code set of the source domain data and the loose hash code set of the target domain training set data。

(G) Constructing quantization loss of hash codes according to a loose hash code set of source domain data, a loose hash code set of target domain training set data, a binary hash code set of source domain data and a binary hash code set of target domain training set data。

(H) Loss based on prototype comparisonRelation maintenance loss/>Quantization loss of hash code/>Calculate the total loss/>。

(I) Utilizing total loss using Adam optimizerTraining the depth domain self-adaptive hash model to obtain the optimized depth domain self-adaptive hash model.

(J) Test setAnd (3) inputting the data in the image into the optimized depth domain self-adaptive hash model to obtain an image retrieval result.

The self-adaption of the unsupervised domain and the hash learning are integrated into a depth model framework, the distribution gap of the source domain and the target domain is shortened in a shared feature representation space by utilizing the prototype comparison learning of the class level, the similar semantic features between the source domain and the target domain are reserved, and the unified feature representation crossing the two domains is obtained. In the hash learning process, the source domain knowledge is transferred to the target domain for learning by directly guiding the hash learning process by using the source tag and forcefully restricting similar features in the shared feature representation space in two domains to generate corresponding hash codes. By supervising the entire hash model training process with the two-domain information, a hash code can be generated that retains the similarity relationship between samples in both domains. The main objective is to improve the image retrieval performance of the unsupervised domain adaptive hash, including cross-domain and single-domain retrieval tasks.

In one embodiment of the invention, step (a) comprises the steps of:

(a-1) arbitrarily selecting two fields in ARTISTIC IMAGES (a) field or Clip Art (C) field or Product images (P) field or and Real-World images (R) field of the field adaptive algorithm dataset Office-Home as the source field and the target field, respectively.

(A-2) the first of the source domainsInputting the images into ResNet-50 model, and outputting to obtain the first/>, with dimension of 4096Personal source domain data/>First/>Personal source domain data/>The corresponding label is/>，/>，/>，/>The number of tag categories in the source domain;

In one embodiment of the present invention, it is preferable that the target domain data set is set in step (b)Dividing into training sets/>, according to the proportion of 9:1And test set/>。

In one embodiment of the invention, step (c) comprises the steps of:

(c-1) the depth domain adaptive hash model is composed of a multi-layer perceptron and a hash encoder.

(C-2) the multi-layer perceptron of the depth domain adaptive hash model is composed of a linear layer, a batch normalization layer and a ReLU activation function in sequence, and the source domain data set is obtained/>Personal source domain data/>Inputting into a multi-layer perceptron, outputting to obtain the/>Personal source domain data/>Feature representation/>, in a shared feature representation spaceTraining set/>Middle/>Personal data/>Inputting into a multi-layer perceptron, outputting to obtain the/>Individual target domain training set data/>Feature representation/>, in a shared feature representation spaceSource Domain dataset/>Feature representations in the shared feature representation space constitute a feature set/>, of the source domain data，Training set/>Feature representations in the shared feature representation space constitute the feature set/>, of the target domain training set，/>，/>Training set for target domainThe number of data in the database.

(C-3) the hash encoder of the depth domain adaptive hash model is composed of a first linear layer, a batch normalization layer, a ReLU activation function, a second linear layer, and a Tanh activation function in order, and represents the featuresInput to hash encoder, output to get the/>Personal source domain data/>Is a relaxed hash code/>Source Domain dataset/>All the loose hash codes in the source domain data form a loose hash code set/>，/>Representing the features/>Input to hash encoder, output to get the/>Individual target domain training set data/>Is a relaxed hash code/>Training set/>All the loose hash codes in the target domain training set data form a loose hash code set/>，/>。

In one embodiment of the invention, step (d) comprises the steps of:

(d-1) passing through the formula Calculated to obtain the/>Personal source domain data/>Binary hash code/>In/>Source domain dataset/>, as a sign functionAll binary hash codes in the code sequence form a binary hash code set/>，。

In one embodiment of the invention, step (e) comprises the steps of:

(e-1) passing through the formula Prototype code/>, of each category of source domain data, is calculatedIn/>To indicate a function,/>For/>Personal tag,/>。

(E-2) since the target domain data does not have a tag, in order to perform prototype-contrast learning, it is necessary to solve a pseudo tag for the target domain data. After obtaining the prototype code of the source domain data, a method closest to the source prototype in the shared feature representation space is adopted to obtain the pseudo tag of the target domain sampleSpecifically, by the formula/>Calculating to obtain pseudo tag/>, of target domain sampleIn/>As cosine similarity function,/>When the cosine similarity function reaches the maximum value/>Is a value of (2).

(E-3) passing through the formulaCalculating to obtain prototype code/>, of each category of target domain data。

In one embodiment of the invention, step (f) comprises the steps of:

(f-1) the first step Personal source domain data/>Corresponding tag/>Label similarity matrix/>, calculated by matrix multiplication using the torch.mm () function of pytorch toolkit。

(F-2) passing through the formulaCalculating to obtain the relation maintaining loss/>, of the source domain dataIn/>Is a scale factor.

(F-3) forcing similar features in the domain-shared representation space of the two domains to generate corresponding hash codes. Normalizing the characteristics of the source domain data and the characteristics of the target domain data, and obtaining the similarity between the source characteristics and the target characteristics by using matrix multiplication; and carrying out normalization operation on the source domain sample loose hash code and the target domain sample loose hash code, and obtaining the similarity between the source domain data loose hash code and the target domain data loose hash code by using matrix multiplication. Specifically, by the formulaCalculating to obtain loss/>, generated by generating corresponding hash codes through similar features in a domain shared space of constraint two domains。

In one embodiment of the invention, step (g) comprises the steps of:

In one embodiment of the invention, step (h) comprises the steps of:

In one embodiment of the invention, step (j) comprises the steps of:

(j-1) test set />Personal target Domain data/>First/>Personal source domain data/>Input into the optimized depth domain self-adaptive hash model, and respectively output to obtain the relaxation hash code/>Relaxed hash code/>。

(J-2) passing through the formulaCalculated to obtain the/>Personal target Domain data/>Binary hash code/>。

(J-3) passing through the formulaCalculated to obtain the/>Personal source domain data/>Binary hash code/>。

(J-4) calculating a binary hash codeAnd binary hash code/>The closer the hamming distance is, the more similar the two images are.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image retrieval method based on an unsupervised domain adaptive hash is characterized by comprising the following steps:

(a) Acquiring source domain data sets And target Domain dataset/>，/>，/>For/>The number of source domain data is selected,，/>For the amount of source domain data,/>For/>Personal source domain data/>Corresponding tag,/>，For/>Personal target Domain data,/>，/>Is the amount of target domain data;

(b) Gathering target domain data Divided into training sets/>And test set/>；

(D) Based on relaxed hash codesConstruction of binary hash code/>According to the relaxed hash code/>Construction of binary hash code/>；

2. The image retrieval method based on unsupervised domain adaptive hashing of claim 1, wherein the step (a) comprises the steps of:

3. The image retrieval method based on unsupervised domain adaptive hashing according to claim 1, wherein: in step (b) the target domain data setDividing into training sets/>, according to the proportion of 9:1And test set/>。

4. The image retrieval method based on unsupervised domain adaptive hashing of claim 2, wherein the step (c) comprises the steps of:

5. The unsupervised domain adaptive hash-based image retrieval method according to claim 4, wherein the step (d) comprises the steps of:

(d-1) passing through the formula Calculated to obtain the/>Personal source domain data/>Binary hash code/>In the followingSource domain dataset/>, as a sign functionAll binary hash codes in the code sequence form a binary hash code set/>，；

(D-2) passing through the formulaCalculated to obtain the/>Personal target Domain data/>Binary hash code/>Training setAll binary hash codes in the code sequence form a binary hash code set/>，/>。

6. The unsupervised domain adaptive hash based image retrieval method according to claim 4, wherein the step (e) comprises the steps of:

(e-1) passing through the formula Prototype code/>, of each category of source domain data, is calculatedIn/>To indicate a function,/>For/>Personal tag,/>；

7. The unsupervised domain adaptive hash-based image retrieval method according to claim 4, wherein the step (f) comprises the steps of:

(F-2) passing through the formulaCalculating to obtain the relation maintaining loss/>, of the source domain dataIn/>Is a scale factor;

8. The unsupervised domain adaptive hash-based image retrieval method according to claim 5, wherein the step (g) comprises the steps of:

9. The image retrieval method based on unsupervised domain adaptive hashing of claim 1, wherein the step (h) comprises the steps of:

10. The unsupervised domain adaptive hash based image retrieval method according to claim 5, wherein the step (j) comprises the steps of: