CN114168782B - Deep hash image retrieval method based on triplet network - Google Patents

Deep hash image retrieval method based on triplet network Download PDF

Info

Publication number
CN114168782B
CN114168782B CN202111448102.6A CN202111448102A CN114168782B CN 114168782 B CN114168782 B CN 114168782B CN 202111448102 A CN202111448102 A CN 202111448102A CN 114168782 B CN114168782 B CN 114168782B
Authority
CN
China
Prior art keywords
image
images
hash
triplet
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111448102.6A
Other languages
Chinese (zh)
Other versions
CN114168782A (en
Inventor
曹从军
陈星�
廖开阳
周建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202111448102.6A priority Critical patent/CN114168782B/en
Publication of CN114168782A publication Critical patent/CN114168782A/en
Application granted granted Critical
Publication of CN114168782B publication Critical patent/CN114168782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep hash image retrieval method based on a triplet network, which specifically comprises the following steps: adopting a triplet network as a CNN network to learn image characteristics; carrying out loss function optimization according to the characteristic learning condition, so that the distance between sample image pairs in the same category is smaller than that between sample image pairs in different categories, and optimizing a neural network model; inputting the query image and the test set image into a triplet network to respectively obtain binary hash codes of the images and depth feature vectors of fc7 layers, firstly, performing coarse retrieval by utilizing the binary hash codes, obtaining candidate set images of similar images according to the Hamming distance and a set threshold value, then performing fine retrieval by utilizing the depth features of fc7 layers, calculating Euclidean distances of the feature vectors of fc7 layers of the query image and the candidate set images, and obtaining final n retrieval results according to the sequence from small Euclidean distances to large. The invention solves the problem of low image retrieval precision in the prior art.

Description

Deep hash image retrieval method based on triplet network
Technical Field
The invention belongs to the technical field of computer image retrieval methods, and relates to a deep hash image retrieval method based on a triplet network.
Background
In recent years, hash technology has been widely used in large-scale image retrieval, and rapid image retrieval is performed by converting high-dimensional features of an image into binary hash codes in a low-dimensional space, and calculating a hamming distance. The method has the advantages of low storage cost and high calculation efficiency. The traditional hash method is to extract high-dimensional features first, learn the features to obtain a hash function, and then convert the high-dimensional features of the image into low-dimensional space hash codes by utilizing hash function mapping. With the development of CNN, a depth hash algorithm combining the depth features extracted by CNN with hash learning appears, so that hash codes representing deeper information of images can be directly obtained. However, most of the existing deep hash methods have insufficient utilization of supervision information, and cannot acquire more distinguishable hash codes, so that a model obtained through training does not reach an expected effect, and the accuracy of a search result is directly affected.
Therefore, how to provide an image retrieval method to improve the retrieval accuracy of the hash technique has important value.
Disclosure of Invention
The invention aims to provide a deep hash image retrieval method based on a triplet network, which solves the problem of low image retrieval precision in the prior art.
The technical scheme adopted by the invention is that the deep hash image retrieval method based on the triplet network is implemented according to the following steps:
step 1, adopting three parallel Alexnet network models as CNN network models, adding a hidden layer after the fc7 layer of the last full-connection layer of each Alexnet network model, namely adding a hash layer, and sharing weight parameters among the three Alexnet network models;
step 2, selecting an image data set CIFAR-10, dividing the image data set CIFAR-10 into a training set and a test set, taking a training set image as an input image of the CNN network model constructed in the step 1, wherein each input image is a triplet image and comprises an anchor point image, a positive sample image and a negative sample image, wherein the positive sample image and the anchor point image are of the same category, and the negative sample image and the anchor point image are of different categories;
step 3, respectively inputting an anchor point image, a positive sample image and a negative sample image of each input image into an Alexnet network model, and respectively generating a depth feature of an output fc7 layer and a preliminary hash code representing image features at the output fc7 layer and the hash layer by the anchor point image, the positive sample image and the negative sample image;
step 4, introducing a triplet loss function into the CNN network model, and optimizing the CNN network model by minimizing the triplet loss function by using the preliminary hash code generated in the step 3, so that the distances of the anchor point image and the positive sample image in the feature space are smaller than those of the anchor point image and the negative sample image in the feature space;
and 5, inputting the test set image and the query image into an Alexnet network model of the CNN network model optimized in the step 4, which corresponds to the input anchor point image, to obtain corresponding hash codes and fc7 layer depth characteristics, wherein the test image and the query image are anchor point images, calculating the Hamming distance of binary hash codes corresponding to the query image and the test set image, considering that the query image is similar to the test set image according to a set threshold value if the Hamming distance is smaller than the threshold value, obtaining m most similar candidate image sets in the test set image, then calculating the Euclidean distance between depth characteristic vectors of the query image vector and the candidate set image, and obtaining a final retrieval result according to the descending order, wherein n most similar images are found in the candidate image sets, namely the query image most similar n images.
The present invention is also characterized in that,
the CIFAR-10 image data set is 60000, the training set is classified into 10 types, 5000 types are classified into each type, and the test set is classified into 10 types, and 1000 types are classified into each type.
In the step 4, the distance between the anchor point image and the positive sample image in the feature space is smaller than the distance between the anchor point image and the negative sample image in the feature space, namely:
S(x i ,x i + )<S(x i ,x i - ) (2)
equivalent to
S 2 (x i ,x i + )<S 2 (x i ,x i - ) (3)
Wherein x is i For anchor pictures, x i + Is a positive sample image, i.e. is associated with x i Sample images of the same class, x i - As a negative sample image, i.e. with x i Sample images of different classes, S (x, x') representing two sheetsThe similarity of images x and x', i.e. S (x i ,x i + ) For similarity of anchor image and positive sample image, S (x i ,x i - ) Similarity of anchor image and negative sample image, according to which the triplet loss function is defined as follows:
loss(x i ,x i + ,x i - )=max{0,g+S 2 (x i ,x i + )-S 2 (x i ,x i - )} (4)
in the formula (x) i ,x i + ,x i - ) Represents a triplet loss function, where g is a threshold parameter that measures the distance between the same category image pair and a different category image pair.
g=-1。
The optimization of the CNN network model, namely the optimization of the triplet loss function, comprises the following specific processes:
record training set x= { X i (X) i }=(x i ,x i + ,x i - ) The optimization of the triplet loss function is then translated into the minimization problem of the following objective function L, namely:
wherein W is a neural network parameter, lambda is a regularization coefficient introduced for improving the generalization capability of the network model, and n is the number of triplets in the training set;
introducing a function d (W, X i ) Representing distances between anchor images, positive sample images and anchor image, negative sample image pairs in triplets, i.e
d(W,X i )=S 2 (x i ,x i + )-S 2 (x i ,x i - ) (6)
The objective function may be further written as:
to optimize the objective function to minimize it, the partial derivative of the objective function is:
by d (W, X) i ) Can be defined by d (W, X) i ) The gradient of (2) is as follows:
from this, it can be seen that, inputting the triplet image into the network, performing forward and reverse computation to obtain the gradient of the triplet loss function, and then using the gradient descent algorithm to obtain the optimized objective function, thereby optimizing the triplet loss function and obtaining the optimal CNN network model.
In step 5, the Hamming distance of binary hash codes corresponding to the query image and the test set image is specifically:
firstly, giving an image I in a test set, inputting a query image I into a CNN network model optimized in the step 5, extracting a hidden layer, namely, taking the output of a hash layer as an image characteristic, marking as Out (H), carrying Out binarization by setting a threshold value to obtain a corresponding binary hash code, namely, for each bit j=1..h, wherein H is the node number of the hidden layer, and the output binary code is as follows
Setting Γ= { I 1 ,I 2 ,...,I n Test set of image formations for retrieval,I n Representing the nth test set image, each corresponding binary code being represented as Γ H ={H 1 ,H 2 ,...,H n }, wherein H i ∈{0,1} h
For query image I q And its corresponding binary code H q Calculate H q And H i If H is the Hamming distance of q And H i ∈Γ H If the Hamming distance is smaller than the threshold value, the Hamming distance is considered as a similar image, and a candidate image set with m similar images is finally obtained according to the threshold value setting, so that a corresponding candidate image set is obtained
In step 5, the Euclidean distance between the query image vector and the depth feature vector of the candidate set image is specifically:
for query image I q And candidate image set P, V q Feature vector, V, representing query image q i P Representing feature vectors of the ith image in the candidate set images, the euclidean distance between their corresponding feature vectors is calculated by:
s i =||V q -V i P || (12)
the smaller the Euclidean distance is, the higher the similarity of the two images is, and finally n most similar result images are obtained in the candidate image set according to the sequence from small to large of the Euclidean distance, wherein n is smaller than m.
The beneficial effects of the invention are as follows:
the invention selects the triplet network sharing weight to learn the image characteristics, can fully utilize the supervision information of the images to obtain the depth characteristics of the representative images, and further optimizes the network model by utilizing the triplet loss function, so that the distance between images of the same category is smaller, the distance between images of different categories is larger, and the binary hash code with more differentiation and higher query precision is obtained. When searching the image, firstly, rough searching is carried out by utilizing hash coding to obtain a candidate image set, then fine searching is carried out by utilizing fc7 layer depth characteristics of the image, and finally, n most similar images are obtained in the candidate image set, thereby effectively improving the searching precision.
Drawings
FIG. 1 is a schematic diagram of image feature learning in a deep hash image retrieval method based on a triplet network;
FIG. 2 is a flow chart of a deep hash image retrieval method based on a triplet network in accordance with the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention discloses a deep hash image retrieval method based on a triplet network, and the flow of the method is shown in figure 2. The method is implemented according to the following steps:
step 1, as shown in fig. 1, three parallel Alexnet network models are adopted as CNN network models, and a hidden layer is added after the fc7 layer of the last full-connection layer of each Alexnet network model, namely a hash layer is added, and weight parameters are shared among the three Alexnet network models;
step 2, selecting an image data set CIFAR-10, dividing the image data set CIFAR-10 into a training set and a test set, taking the training set image as an input image of the CNN network model constructed in the step 1, wherein the image data set CIFAR-10 is divided into 10 types, each type of training set is divided into 5000 types, each type of test set is divided into 10 types, each type of test set is 1000, each input image is a triplet image and comprises an anchor point image, a positive sample image and a negative sample image, the positive sample image and the anchor point image are of the same type, and the negative sample image and the anchor point image are of different types;
step 3, respectively inputting an anchor point image, a positive sample image and a negative sample image of each input image into an Alexnet network model, and respectively generating a depth feature of an output fc7 layer and a preliminary hash code representing image features at the output fc7 layer and the hash layer by the anchor point image, the positive sample image and the negative sample image;
step 4, introducing a triplet loss function into the CNN network model, and using the preliminary hash codes generated in the step 3 to minimize the triplet loss function, so that the distances of the anchor point image and the positive sample image in the feature space are smaller than those of the anchor point image and the negative sample image in the feature space, namely the hash codes generated by the images of the same category are more similar, and the hash code differences of the images of different categories are larger, thereby completing the optimization of the CNN network model;
the optimization process comprises the following steps:
assume that the relative distance relationship between sample images is represented using a series of image triplets: { X i }=(x,x + ,x - ) Where x is the anchor image, x + Is a positive sample, i.e. a sample image of the same class as x, x - Is a negative sample image, i.e. a sample image of a different class than x. According to the distance of the images in European space, the similarity of two images x, x' is defined as:
S(x,x`)=||F(x)-F(x`)|| 2 (1)
where S (x, x') is the similarity of two images, F (x) is a function of mapping image data to euclidean space, and for a feature mapping function implemented by a deep neural network, F (x) represents the network output of image x, or a feature representation of image x.
For a triplet during training, the distance between the anchor point image and the positive sample image in the feature space is smaller than the distance between the anchor point image and the negative sample image in the feature space, namely the feature representation expected to be output should be in L 2 The following condition is satisfied under the norm:
S(x i ,x i + )<S(x i ,x i - ) (2)
equivalent to
S 2 (x i ,x i + )<S 2 (x i ,x i - ) (3)
Wherein x is i For anchor pictures, x i + Is a positive sample image, i.e. is associated with x i Sample images of the same class, x i - As a negative sample image, i.e. with x i Sample images of different classes, S (x, x ') representing the similarity of the two images x and x', i.e. S (x) i ,x i + ) For similarity of anchor image and positive sample image, S (x i ,x i - ) The similarity of the anchor image and the negative sample image, where the form of square is used, facilitates the calculation of the derivative in the subsequent implementation of the optimization. For a given triplet { X ] i For classification problems, it is desirable to maximize the distance between the same class image pair and the non-same class image pair, so the triplet loss function is defined as follows:
loss(x i ,x i + ,x i - )=max{0,g+S 2 (x i ,x i + )-S 2 (x i ,x i - )} (4)
in the formula (x) i ,x i + ,x i - ) Represents a triplet loss function, where g is a threshold parameter that measures the distance between the same-category image pair and a different-category image pair, g= -1.
The optimization of the CNN network model, namely the optimization of the triplet loss function, comprises the following specific processes:
record training set x= { X i (X) i }=(x i ,x i + ,x i - ) The optimization of the triplet loss function is then translated into the minimization problem of the following objective function L, namely:
wherein W is a neural network parameter, lambda is a regularization coefficient introduced for improving the generalization capability of the network model, and n is the number of triplets in the training set;
introducing a function d (W, X i ) Representing distances between anchor images, positive sample images and anchor image, negative sample image pairs in triplets, i.e
d(W,X i )=S 2 (x i ,x i + )-S 2 (x i ,x i - ) (6)
The objective function may be further written as:
to optimize the objective function to minimize it, the partial derivative of the objective function is:
by d (W, X) i ) Can be defined by d (W, X) i ) The gradient of (2) is as follows:
from this, it can be seen that, inputting the triplet image into the network, performing forward and reverse computation to obtain the gradient of the triplet loss function, and then using the gradient descent algorithm to obtain the optimized objective function, thereby optimizing the triplet loss function, obtaining the optimal CNN network model, and extracting the most essential deep features of the image, and obtaining the hash code with higher differentiation and query precision.
Step 5, inputting the test set image and the query image into an Alexnet network model of the optimized CNN network model corresponding to the input anchor point image in step 4, obtaining corresponding hash codes and fc7 layer depth characteristics, wherein the test image and the query image are anchor point images, calculating the Hamming distance of binary hash codes corresponding to the query image and the test set image, considering that the query image is similar to the test set image according to a set threshold value, obtaining m most similar candidate image sets in the test set image if the Hamming distance is smaller than the threshold value, then calculating the Euclidean distance between depth feature vectors of the query image vector and the candidate set image, and obtaining a final retrieval result according to the descending order, wherein n most similar images are found in the candidate image sets, namely the query image most similar n images;
the Hamming distance of binary hash codes corresponding to the query image and the test set image is calculated specifically as follows:
firstly, giving an image I in a test set, inputting a query image I into a CNN network model optimized in the step 5, extracting a hidden layer, namely, taking the output of a hash layer as an image characteristic, marking as Out (H), carrying Out binarization by setting a threshold value to obtain a corresponding binary hash code, namely, for each bit j=1..h, wherein H is the node number of the hidden layer, and the output binary code is as follows
Setting Γ= { I 1 ,I 2 ,...,I n Test set of images for retrieval, I n Representing the nth test set image, each corresponding binary code being represented as Γ H ={H 1 ,H 2 ,...,H n }, wherein H i ∈{0,1} h
For query image I q And its corresponding binary code H q Calculate H q And H i If H is the Hamming distance of q And H i ∈Γ H If the Hamming distance is smaller than the threshold value, the Hamming distance is considered as a similar image, and a candidate image set with m similar images is finally obtained according to the threshold value setting, so that a corresponding candidate image set is obtained
The Euclidean distance between the query image vector and the depth feature vector of the candidate set image is calculated specifically as follows:
for query image I q And candidate image set P, V q Feature vector, V, representing query image q i P Special for representing ith image in candidate set imageThe eigenvectors, the euclidean distance between their corresponding eigenvectors is calculated by:
s i =||V q -V i P || (12)
the smaller the Euclidean distance is, the higher the similarity of the two images is, and finally n most similar result images are obtained in the candidate image set according to the sequence from small to large of the Euclidean distance, wherein n is smaller than m.

Claims (7)

1. The deep hash image retrieval method based on the triplet network is characterized by comprising the following steps of:
step 1, adopting three parallel Alexnet network models as CNN network models, adding a hidden layer after the fc7 layer of the last full-connection layer of each Alexnet network model, namely adding a hash layer, and sharing weight parameters among the three Alexnet network models;
step 2, selecting an image data set CIFAR-10, dividing the image data set CIFAR-10 into a training set and a test set, taking a training set image as an input image of the CNN network model constructed in the step 1, wherein each input image is a triplet image and comprises an anchor point image, a positive sample image and a negative sample image, wherein the positive sample image and the anchor point image are of the same category, and the negative sample image and the anchor point image are of different categories;
step 3, respectively inputting an anchor point image, a positive sample image and a negative sample image of each input image into an Alexnet network model, and respectively generating a depth feature of an output fc7 layer and a preliminary hash code representing image features at the output fc7 layer and the hash layer by the anchor point image, the positive sample image and the negative sample image;
step 4, introducing a triplet loss function into the CNN network model, and optimizing the CNN network model by minimizing the triplet loss function by using the preliminary hash code generated in the step 3, so that the distances of the anchor point image and the positive sample image in the feature space are smaller than those of the anchor point image and the negative sample image in the feature space;
and 5, inputting the test set image and the query image into an Alexnet network model of the CNN network model optimized in the step 4, which corresponds to the input anchor point image, to obtain corresponding hash codes and fc7 layer depth characteristics, wherein the test image and the query image are anchor point images, calculating the Hamming distance of binary hash codes corresponding to the query image and the test set image, considering that the query image is similar to the test set image according to a set threshold value if the Hamming distance is smaller than the threshold value, obtaining m most similar candidate image sets in the test set image, then calculating the Euclidean distance between depth characteristic vectors of the query image vector and the candidate set image, and obtaining a final retrieval result according to the descending order, wherein n most similar images are found in the candidate image sets, namely the query image most similar n images.
2. The method for searching deep hash images based on a triplet network according to claim 1, wherein the image dataset CIFAR-10 is 60000, the training set is classified into 10 categories, 5000 categories, and the testing set is classified into 10 categories, 1000 categories.
3. The method for searching deep hash images based on triple-play network according to claim 1, wherein in the step 4, the distance between the anchor point image and the positive sample image in the feature space is smaller than the distance between the anchor point image and the negative sample image in the feature space, namely:
S(x i ,x i + )<S(x i ,x i - ) (2)
equivalent to
S 2 (x i ,x i + )<S 2 (x i ,x i - ) (3)
Wherein x is i For anchor pictures, x i + Is a positive sample image, i.e. is associated with x i Sample images of the same class, x i - As a negative sample image, i.e. with x i Sample images of different classes, S (x, x ') representing the similarity of the two images x and x', i.e. S (x) i ,x i + ) For anchor and positive sample imagesSimilarity, S (x i ,x i - ) Similarity of anchor image and negative sample image, according to which the triplet loss function is defined as follows:
loss(x i ,x i + ,x i - )=max{0,g+S 2 (x i ,x i + )-S 2 (x i ,x i - )} (4)
in the formula (x) i ,x i + ,x i - ) Represents a triplet loss function, where g is a threshold parameter that measures the distance between the same category image pair and a different category image pair.
4. A deep hash image retrieval method based on a triplet network according to claim 3, characterized in that g= -1.
5. A deep hash image retrieval method based on a triplet network according to claim 3, wherein the optimization of the CNN network model is the optimization of the triplet loss function, and the specific process is:
record training set x= { X i (X) i }=(x i ,x i + ,x i - ) The optimization of the triplet loss function is then translated into the minimization problem of the following objective function L, namely:
wherein W is a neural network parameter, lambda is a regularization coefficient introduced for improving the generalization capability of the network model, and n is the number of triplets in the training set;
introducing a function d (W, X i ) Representing distances between anchor images, positive sample images and anchor image, negative sample image pairs in triplets, i.e
d(W,X i )=S 2 (x i ,x i + )-S 2 (x i ,x i - ) (6)
The objective function is further written as:
to optimize the objective function to minimize it, the partial derivative of the objective function is:
by d (W, X) i ) Can be defined by d (W, X) i ) The gradient of (2) is as follows:
from this, it can be seen that, inputting the triplet image into the network, performing forward and reverse computation to obtain the gradient of the triplet loss function, and then using the gradient descent algorithm to obtain the optimized objective function, thereby optimizing the triplet loss function and obtaining the optimal CNN network model.
6. The method for searching deep hash images based on triple-play network according to claim 4, wherein the calculating the hamming distance of binary hash codes corresponding to the query image and the test set image in step 5 is specifically:
firstly, giving an image I in a test set, inputting a query image I into a CNN network model optimized in the step 5, extracting a hidden layer, namely, taking the output of a hash layer as an image characteristic, marking as Out (H), carrying Out binarization by setting a threshold value to obtain a corresponding binary hash code, namely, for each bit j=1..h, wherein H is the node number of the hidden layer, and the output binary code is as follows
Setting Γ= { I 1 ,I 2 ,...,I n Test set of images for retrieval, I n Representing the nth test set image, each corresponding binary code being represented as Γ H ={H 1 ,H 2 ,...,H n }, wherein H i ∈{0,1} h
For query image I q And its corresponding binary code H q Calculate H q And H i If H is the Hamming distance of q And H i ∈Γ H If the Hamming distance is smaller than the threshold value, the Hamming distance is considered as a similar image, and a candidate image set with m similar images is finally obtained according to the threshold value setting, so that a corresponding candidate image set is obtained
7. The method for searching deep hash images based on triple-play network according to claim 6, wherein the euclidean distance between the query image vector and the depth feature vector of the candidate set image in step 5 is specifically:
for query image I q And candidate image set P, V q Feature vector, V, representing query image q i P Representing feature vectors of the ith image in the candidate set images, the euclidean distance between their corresponding feature vectors is calculated by:
s i =||V q -V i P || (12)
the smaller the Euclidean distance is, the higher the similarity of the two images is, and finally n most similar result images are obtained in the candidate image set according to the sequence from small to large according to the Euclidean distance, wherein n is smaller than m.
CN202111448102.6A 2021-11-30 2021-11-30 Deep hash image retrieval method based on triplet network Active CN114168782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111448102.6A CN114168782B (en) 2021-11-30 2021-11-30 Deep hash image retrieval method based on triplet network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111448102.6A CN114168782B (en) 2021-11-30 2021-11-30 Deep hash image retrieval method based on triplet network

Publications (2)

Publication Number Publication Date
CN114168782A CN114168782A (en) 2022-03-11
CN114168782B true CN114168782B (en) 2024-03-15

Family

ID=80482302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111448102.6A Active CN114168782B (en) 2021-11-30 2021-11-30 Deep hash image retrieval method based on triplet network

Country Status (1)

Country Link
CN (1) CN114168782B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417046B (en) * 2022-03-31 2022-07-12 腾讯科技(深圳)有限公司 Training method of feature extraction model, image retrieval method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017092183A1 (en) * 2015-12-03 2017-06-08 中山大学 Image retrieval method based on variable-length deep hash learning
CN113190699A (en) * 2021-05-14 2021-07-30 华中科技大学 Remote sensing image retrieval method and device based on category-level semantic hash

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017092183A1 (en) * 2015-12-03 2017-06-08 中山大学 Image retrieval method based on variable-length deep hash learning
CN113190699A (en) * 2021-05-14 2021-07-30 华中科技大学 Remote sensing image retrieval method and device based on category-level semantic hash

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯兴杰 ; 程毅玮 ; .基于深度卷积神经网络与哈希的图像检索.计算机工程与设计.2020,(03),全文. *

Also Published As

Publication number Publication date
CN114168782A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN111428073B (en) Image retrieval method of depth supervision quantization hash
CN110941734B (en) Depth unsupervised image retrieval method based on sparse graph structure
CN113177132B (en) Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN108984642B (en) Printed fabric image retrieval method based on Hash coding
CN108446334B (en) Image retrieval method based on content for unsupervised countermeasure training
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN111274424B (en) Semantic enhanced hash method for zero sample image retrieval
CN111104555A (en) Video hash retrieval method based on attention mechanism
CN111008224A (en) Time sequence classification and retrieval method based on deep multitask representation learning
CN112036511B (en) Image retrieval method based on attention mechanism graph convolution neural network
CN110647907A (en) Multi-label image classification algorithm using multi-layer classification and dictionary learning
CN112199532A (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN114168782B (en) Deep hash image retrieval method based on triplet network
CN116452862A (en) Image classification method based on domain generalization learning
CN115329120A (en) Weak label Hash image retrieval framework with knowledge graph embedded attention mechanism
CN114860973A (en) Depth image retrieval method for small sample scene
CN114612747A (en) Remote sensing image retrieval method based on unsupervised weighted hash
CN113033345B (en) V2V video face recognition method based on public feature subspace
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
CN110717068A (en) Video retrieval method based on deep learning
CN115100694A (en) Fingerprint quick retrieval method based on self-supervision neural network
CN116310407A (en) Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service
CN110750672B (en) Image retrieval method based on deep measurement learning and structure distribution learning loss
CN113326393B (en) Image retrieval method based on deep hash feature and heterogeneous parallel processing
Zhang et al. Class-based Core Feature Extraction Network for Few-shot Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant